如何遍历 Python 中的字符串列表并连接属于标签的字符串?

在 Python 3 中遍历元素列表时,如何“隔离”感兴趣的元素之间的内容?


我有一个清单:


list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]

在此列表中,有带有标签 < h > 的元素和其他没有标签的元素。这个想法是具有此标签的元素是“标题”,直到下一个标签的以下元素是它的内容。


如何连接属于 header 的列表元素以具有两个相等大小的列表:


headers = ["<h1> question 1", "<h1> answer 1", "<h1> question 2", "<h> answer 2"]

content = ["question 1 content question 1 more content", "answer 1 content answer 1 more content", "question 2 content", "answer 2 content"]

这两个列表的长度相同,在这种情况下,每个列表有 4 个元素。


我能够将这些部分分开,但您可以使用一些帮助来完成:


list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]


headers = []

content = []


for i in list:

    if "<h1>" in i:

        headers.append(i)


    if "<h1>" not in i:

        tempContent = []

        tempContent.append(i)

        content.append(tempContent)

关于如何组合这些文本以使它们一一对应的任何想法?


谢谢!


SMILET
浏览 102回答 2
2回答

catspeake

假设在每个标题之后所有元素都是该标题的内容,并且第一个元素始终是标题 - 您可以使用itertools.groupby.key可以是元素是否具有标题标签,这样标题的内容将在其后分组:from itertools import groupbylst = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]headers = []content = []for key, values in groupby(lst, key=lambda x: "<h" in x):&nbsp; &nbsp; if key:&nbsp; &nbsp; &nbsp; &nbsp; headers.append(*values)&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; content.append(" ".join(values))print(headers)print(content)给出:['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content']您当前方法的问题是您总是只将一项添加到内容中。您要做的是累积temp_content列表,直到遇到下一个标题,然后才添加它并重置:headers = []content = []temp_content = Nonefor i in list:&nbsp; &nbsp; if "<h" in i:&nbsp; &nbsp; &nbsp; &nbsp; if temp_content is not None:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; content.append(" ".join(temp_content))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; temp_content = []&nbsp; &nbsp; &nbsp; &nbsp; headers.append(i)&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; temp_content.append(i)

慕勒3428872

您可以在collections.defaultdict迭代列表时将标题和内容收集到 a 中。然后将键和值拆分为最后headers的content列表。我们可以通过简单地检查一个字符串来检测标题。str.startswith&nbsp;"<h"我还使用该continue语句在找到标头后立即进入下一次迭代。也可以在这里只使用一个else语句。from collections import defaultdictlst = [&nbsp; &nbsp; "<h1> question 1",&nbsp; &nbsp; "question 1 content",&nbsp; &nbsp; "question 1 more content",&nbsp; &nbsp; "<h1> answer 1",&nbsp; &nbsp; "answer 1 content",&nbsp; &nbsp; "answer 1 more content",&nbsp; &nbsp; "<h1> question 2",&nbsp; &nbsp; "question 2 content",&nbsp; &nbsp; "<h> answer 2",&nbsp; &nbsp; "answer 2 content",]header_map = defaultdict(list)header = Nonefor item in lst:&nbsp; &nbsp; if item.startswith("<h"):&nbsp; &nbsp; &nbsp; &nbsp; header = item&nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; header_map[header].append(item)headers = list(header_map)print(headers)content = [" ".join(v) for v in header_map.values()]print(content)输出:['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content'
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python