在字典中的不同行上组织相等的正则表达式匹配

首页课程实战体系课手记专栏慕课教程

在字典中的不同行上组织相等的正则表达式匹配

我正在尝试提取正则表达式出现的元数据。特别是我坚持如何（最好）提取发生匹配的文本行。当有多个相等的匹配时，问题就出现了。

因此，我编写了一个小脚本来提取所需的模式并使用 re.finditer 对其进行循环。但是，我被困在如何最好地“暂停”我的循环以返回正确的 match_index 与该行。我觉得生成器可能值得一看，或者我可能忽略了一种开箱即用的方法。

执行以下操作的最“pythonic”（并且实际有效）的方法是什么？

import re

string = """a zero line

we can write pattern_1 here

let's buffer here, just chilling, everything's ok

I think it's time for a second pattern_2

let's a do another pattern_1

ciao

"""

pattern = re.compile(r"\w{7}_\d")

found = re.finditer(pattern, string)

matches_list = []

for match_index, match in enumerate(list(found)):

for index, line in enumerate(string.splitlines()):

if match.group() in line:

match_meta_dict = {

'match_index': match_index,

'line': index

}

matches_list.append(match_meta_dict)

break

print(matches_list)

我想得到一个字典列表，其中该行对应于相应的模式，如下所示：

[{'match_index': 0, 'line': 1}, {'match_index': 1, 'line': 3}, {'match_index': 2, 'line': 4}]

相反，我得到（显然）：

[{'match_index': 0, 'line': 1}, {'match_index': 1, 'line': 3}, {'match_index': 2, 'line': 1}]

陪伴而非守候

浏览 184回答 2

2回答

慕娘9325324

您确定字典数组是存储它的最佳数据结构吗？我认为一个整数数组就足够了，因为match_index总是从 0 开始并增加 1，所以你真的只需要存储行号。该行号的索引是匹配索引。如果您坚持使用字典数组，则可以轻松地将行号数组转换为该数组。line_numbers = []for index, line in enumerate(string.splitlines()):    for match in re.finditer(pattern, line):        line_numbers.append(index)转换为字典数组：matches_list = []for index, line_number in enumerate(line_numbers):    matches_list.append({"match_index": index, "line": line_number})

0 0

守候你守候我

只需遍历行，每当您找到匹配项时，就会增加一个计数器变量。

0 0

随时随地看视频慕课网APP