用另一个列表扩展列表的问题

问题定义


将每一行分成句子。假设以下字符分隔句子:句点 ('.')、问号 ('?') 和感叹号 ('!')。这些定界符也应该从返回的句子中省略。删除每个句子中的任何前导或尾随空格。如果在上述之后,一个句子是空白的(空字符串,''),则应该省略该句子。返回句子列表。句子的顺序必须与它们在文件中出现的顺序相同。


这是我当前的代码


import re


def get_sentences(doc):

    assert isinstance(doc, list)

    result = []

    for line in doc:

        result.extend(

            [sentence.strip() for sentence in re.split(r'\.|\?|\!', line) if sentence]

        )

    return result


# Demo:

get_sentences(demo_input)

输入


demo_input = ["  This is a phrase; this, too, is a phrase. But this is another sentence.",

                  "Hark!",

                  "    ",

                  "Come what may    <-- save those spaces, but not these -->    ",

                  "What did you say?Split into 3 (even without a space)? Okie dokie."]

期望的输出


["This is a phrase; this, too, is a phrase",

 "But this is another sentence",

 "Hark",

 "Come what may    <-- save those spaces, but not these -->",

 "What did you say",

 "Split into 3 (even without a space)",

 "Okie dokie"]

但是,我的代码产生了这个:


['This is a phrase; this, too, is a phrase',

 'But this is another sentence',

 'Hark',

 '',

 'Come what may    <-- save those spaces, but not these -->',

 'What did you say',

 'Split into 3 (even without a space)',

 'Okie dokie']

问题:为什么''即使我的代码忽略了它,我也会在其中得到那个空句子?


我可以使用以下代码解决问题,但我将不得不再次浏览列表,我不想这样做。我想在同一个过程中做到这一点。


import re


def get_sentences(doc):

    assert isinstance(doc, list)

    result = []

    for line in doc:

        result.extend([sentence.strip() for sentence in re.split(r'\.|\?|\!', line)])

        result = [s for s in result if s]

    return result


# Demo:

get_sentences(demo_input)


喵喔喔
浏览 105回答 1
1回答

HUX布斯

尝试使用if sentence.strip(),即:for&nbsp;line&nbsp;in&nbsp;doc: &nbsp;&nbsp;&nbsp;&nbsp;result.extend([sentence.strip()&nbsp;for&nbsp;sentence&nbsp;in&nbsp;re.split(r'\.|\?|\!',&nbsp;line)&nbsp;if&nbsp;sentence.strip()])
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python