Python 正则表达式:仅当单词前面有空格和逗号或者单词是起始单词时

对于给定的字符串,如下所示:

'Rob and Amber Mariano, Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady John, Jimmy Nichols, Melanie Carbone, Jim Green and Nancy Brown, Todd and Sana Clegg with Tatiana Perkin'

我想确定可能被称为“John 和 Jane Doe”的夫妇或其他家庭成员,并排除“Jim Green 和 Nancy Brown”等案例

我只想识别以下内容:

Rob and Amber Mariano, Jane and John Smith, Kiwan and Nicholas Brady John, Todd and Sana Clegg

下面正则表达式中的组似乎捕获了我想要的大多数情况,但我在排除“Jim Green”时遇到了麻烦。

我想提出的条件是第一个单词是一个名称,但它要么位于字符串的开头,要么前面只有空格和逗号。

但由于某种原因,我的表达不起作用。我期望 ([^|,\s']?) 捕捉到这一点,但它似乎并没有这样做。

([^|\,\s]?)([A-Z][a-zA-Z]+)(\s*and\s*)([A-Z][a-zA-Z]+)(\s[A-Z][a-zA-Z]+)(\s[A-Z][a-zA-Z]+)?



慕妹3242003
浏览 136回答 3
3回答

慕尼黑5688855

让我们将答案分解为两个简单的步骤。将整个字符串转换为一组情侣姓名。获取所有与所请求的模式匹配的对。我们对遵循以下模式的情侣名字感兴趣:<Name1>&nbsp;and&nbsp;<Name2>&nbsp;<Last-name>&nbsp;<May-or-may-not-be-words-separated-by-spaces>.<Name1> and <Name2> <Last-name>但我们只对每个匹配字符串的部分感兴趣。现在我们已经定义了我们想要做什么,下面是相同的代码。import retestStr = """Rob and Amber Mariano, Heather Robinson,&nbsp;Jane and John Smith, Kiwan and Nichols Brady John,&nbsp;Jimmy Nichols, Melanie Carbone, Jim Green and Nancy Brown,&nbsp;Todd and Sana Clegg with Tatiana Perkin"""# Pattern definition for the matchregExpr = re.compile("^(\w+\sand\s\w+\s\w+)(\s\w)*")# Remove whitespaces introduced at the beginning due to splittingcoupleList = [s.strip() for s in testStr.split(',')]# Find all strings that have a matching string, for rest match() returns NonematchedList = [regExpr.match(s) for s in coupleList]# Select first group which extracts the necessary pattern from every matched stringresult = [s.group(1) for s in matchedList if s is not None ]

慕婉清6462132

有点晚了,但可能是最简单的正则表达式import reregex = r"(?:, |^)(\w+\sand\s\w+\s\w+)"test_str = "Rob and Amber Mariano, Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady, John, Jimmy Nichols, Melanie Carbone, Jim Green and Nancy Brown, Todd and Sana Clegg with Tatiana Perkin"matches = re.finditer(regex, test_str, re.MULTILINE)for matchNum, match in enumerate(matches, start=1):&nbsp; &nbsp; for groupNum in range(0, len(match.groups())):&nbsp; &nbsp; &nbsp; &nbsp; groupNum = groupNum + 1&nbsp; &nbsp; &nbsp; &nbsp; print (match.group(groupNum))输出:Rob and Amber MarianoJane and John SmithKiwan and Nichols BradyTodd and Sana Clegg

皈依舞

试试这个...按预期完美工作(,\s|^)([A-Z][a-z]+\sand\s[A-Z][a-z]+(\s[A-Z][a-z]+)+)测试脚本:import rea=re.findall("(,\s|^)([A-Z][a-z]+\sand\s[A-Z][a-z]+(\s[A-Z][a-z]+)+)","Rob and Amber Mariano, Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady John, Jimmy Nichols, Melanie Carbone, Jim Green and Nancy Brown, Todd and Sana Clegg with Tatiana Perkin")print(a)回复:[('', 'Rob and Amber Mariano', ' Mariano'), (', ', 'Jane and John Smith', ' Smith'), (', ', 'Kiwan and Nichols Brady John', ' John'), (', ', 'Todd and Sana Clegg', ' Clegg')]
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python