猿问

正则表达式以匹配字符串中可能的名称

我想从字符串中匹配可能的名称。名称应为2-4个单词,每个单词包含3个或更多字母,所有单词均大写。例如,给定以下字符串列表:


Her name is Emily.

I work for Surya Soft.

I sent an email for Ery Wulandari.

Welcome to the Link Building Partner program!

我想要一个返回的正则表达式:


None

Surya Soft

Ery Wulandari

Link Building Partner

目前这是我的代码:


data = [

   'Her name is Emily.', 

   'I work for Surya Soft.', 

   'I sent an email for Ery Wulandari.', 

   'Welcome to the Link Building Partner program!'

]


for line in data:

    print re.findall('(?:[A-Z][a-z0-9]{2,}\s+[A-Z][a-z0-9]{2,})', line)

它适用于前三行,但不适用于最后一行。


慕斯709654
浏览 160回答 3
3回答

森林海

您可以使用:re.findall(r'((?:[A-Z]\w{2,}\s*){2,4})', line)它可能会添加一个尾随空格,可以用 .strip()

Qyouu

非正则表达式解决方案:from string import punctuation as puncdef solve(strs):&nbsp; &nbsp;words = [[]]&nbsp; &nbsp;for i,x in enumerate(strs.split()):&nbsp; &nbsp; &nbsp; x = x.strip(punc)&nbsp; &nbsp; &nbsp; if x[0].isupper() and len(x)>2:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;if words[-1] and words[-1][-1][0] == i-1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; words[-1].append((i,x))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; words.append([(i,x)])&nbsp; &nbsp;names = [" ".join(y[1] for y in x) for x in words if 2 <= len(x) <= 4]&nbsp; &nbsp;return ", ".join(names) if names else Nonedata = [&nbsp; &nbsp;'Her name is Emily.',&nbsp;&nbsp; &nbsp;'I work for Surya Soft.',&nbsp;&nbsp; &nbsp;'I sent an email for Ery Wulandari.',&nbsp;&nbsp; &nbsp;'Welcome to the Link Building Partner abc Fooo Foo program!']for x in data:&nbsp; &nbsp;print solve(x)输出:NoneSurya SoftEry WulandariLink Building Partner, Fooo Foo
随时随地看视频慕课网APP

相关分类

Python
我要回答