使用正则表达式匹配成绩单中的名称、对话和动作

正则表达式是解决此问题的一种方法，但您也可以将其视为遍历文本中的每个标记并应用一些逻辑来形成组。例如，我们可以先找到一组名称和文本：from itertools import groupbydef isName(word):    # Names end with ':'    return word.endswith(":")text_split = [    " ".join(list(g)).rstrip(":")     for i, g in groupby(text.replace("]", "] ").split(), isName)]print(text_split)#['CHRIS',# 'Hello, how are you...',# 'PETER',# 'Great, you?',# 'PAM',# 'He is resting. [PAM SHOWS THE COUCH] [PETER IS NODDING HIS HEAD]',# 'CHRIS',# 'Are you ok?']接下来，您可以将成对的连续元素收集text_split到元组中：print([(text_split[i*2], text_split[i*2+1]) for i in range(len(text_split)//2)])#[('CHRIS', 'Hello, how are you...'),# ('PETER', 'Great, you?'),# ('PAM', 'He is resting. [PAM SHOWS THE COUCH] [PETER IS NODDING HIS HEAD]'),# ('CHRIS', 'Are you ok?')]我们几乎达到了所需的输出。我们只需要处理方括号中的文本。您可以为此编写一个简单的函数。（诚然，正则表达式是这里的一个选项，但我在这个答案中故意避免这样做。）这是我想出的快速方法：def isClosingBracket(word):    return word.endswith("]")def processWords(words):    if "[" not in words:        return [words, None]    else:        return [            " ".join(g).replace("]", ".")             for i, g in groupby(map(str.strip, words.split("[")), isClosingBracket)        ]print(    [(text_split[i*2], *processWords(text_split[i*2+1])) for i in range(len(text_split)//2)])#[('CHRIS', 'Hello, how are you...', None),# ('PETER', 'Great, you?', None),# ('PAM', 'He is resting.', 'PAM SHOWS THE COUCH. PETER IS NODDING HIS HEAD.'),# ('CHRIS', 'Are you ok?', None)]请注意，使用将*的结果解包processWords到tuple严格来说是python 3 的功能。

使用正则表达式匹配成绩单中的名称、对话和动作

3回答