用另一个列表扩展列表的问题

问题定义

将每一行分成句子。假设以下字符分隔句子：句点 ('.')、问号 ('?') 和感叹号 ('!')。这些定界符也应该从返回的句子中省略。删除每个句子中的任何前导或尾随空格。如果在上述之后，一个句子是空白的（空字符串，''），则应该省略该句子。返回句子列表。句子的顺序必须与它们在文件中出现的顺序相同。

这是我当前的代码

import re

def get_sentences(doc):

assert isinstance(doc, list)

result = []

for line in doc:

result.extend(

[sentence.strip() for sentence in re.split(r'\.|\?|\!', line) if sentence]

)

return result

# Demo:

get_sentences(demo_input)

输入

demo_input = [" This is a phrase; this, too, is a phrase. But this is another sentence.",

"Hark!",

" ",

"Come what may <-- save those spaces, but not these --> ",

"What did you say?Split into 3 (even without a space)? Okie dokie."]

期望的输出

["This is a phrase; this, too, is a phrase",

"But this is another sentence",

"Hark",

"Come what may <-- save those spaces, but not these -->",

"What did you say",

"Split into 3 (even without a space)",

"Okie dokie"]

但是，我的代码产生了这个：

['This is a phrase; this, too, is a phrase',

'But this is another sentence',

'Hark',

'',

'Come what may <-- save those spaces, but not these -->',

'What did you say',

'Split into 3 (even without a space)',

'Okie dokie']

问题：为什么''即使我的代码忽略了它，我也会在其中得到那个空句子？

我可以使用以下代码解决问题，但我将不得不再次浏览列表，我不想这样做。我想在同一个过程中做到这一点。

import re

def get_sentences(doc):

assert isinstance(doc, list)

result = []

for line in doc:

result.extend([sentence.strip() for sentence in re.split(r'\.|\?|\!', line)])

result = [s for s in result if s]

return result

# Demo:

get_sentences(demo_input)

喵喔喔

浏览 105回答 1

1回答

HUX布斯

尝试使用if sentence.strip()，即：for line in doc:     result.extend([sentence.strip() for sentence in re.split(r'\.|\?|\!', line) if sentence.strip()])

0 0

随时随地看视频慕课网APP