正则表达式使用特殊字符(破折号、撇号)分割名称

我有一个包含姓名的列,它们都是串联的(也就是说,名字和姓氏之间没有空格)。我正在尝试拆分名字和姓氏,该网站上已经询问过这个问题。然而在这里,有些名称带有破折号\-或撇号\'

Speed-WagonMario
CruiserPetey
SthesiaAnna
De’wayneJohn

我想确保它被我的正则表达式查询捕获:

clean_names = re.split(r'([A-Z][a-z\']+\-[A-Z][a-z\']+|[A-Z][a-z\']+)', names)

它适用于破折号,破折号仅出现在大写字母之前,但不适用于撇号。

有人对如何解决我的查询有意见吗?提前致谢


饮歌长啸
浏览 161回答 1
1回答

收到一只叮咚

您可以将正向lookbehind(小写)与正向lookahead(大写)结合起来。两个匹配的环视在拆分时都会保留。/&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// BEGIN EXPRESSION(?<=[a-z])&nbsp; // POSITIVE LOOKBEHIND [a-z](?=[A-Z])&nbsp; &nbsp;// POSITIVE LOOKAHEAD&nbsp; [A-Z]/&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// END EXPRESSIONPython 示例#!/usr/bin/env python3import redef pair_to_person(pair):&nbsp; person = {}&nbsp; person['firstName'] = pair[1]&nbsp; person['lastName'] = pair[0]&nbsp; return persondef parse_name_column(column_text):&nbsp; return map(pair_to_person,&nbsp; &nbsp; map(lambda name: re.split(r'(?<=[a-z])(?=[A-Z])', name),&nbsp; &nbsp; &nbsp; map(lambda x: x, column_text.strip().split('\n'))))print_list = lambda list: print('\n'.join(map(str, list)))&nbsp;if __name__ == '__main__':&nbsp; column_text = '''Speed-WagonMarioCruiserPeteySthesiaAnnaDe’wayneJohn'''&nbsp; names = parse_name_column(column_text)&nbsp; print_list(names)输出{'firstName': 'Mario', 'lastName': 'Speed-Wagon'}{'firstName': 'Petey', 'lastName': 'Cruiser'}{'firstName': 'Anna', 'lastName': 'Sthesia'}{'firstName': 'John', 'lastName': 'De’wayne'}JS 示例const data = `Speed-WagonMarioCruiserPeteySthesiaAnnaDe’wayneJohn`;const names = data.trim().split('\n')&nbsp; .map(name => name.trim().split(/(?<=[a-z])(?=[A-Z])/))&nbsp; .map(pair => ({ firstName: pair[1], lastName: pair[0] }));console.log(names);.as-console-wrapper { top: 0; max-height: 100% !important; }
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python