如何遍历 Python 中的字符串列表并连接属于标签的字符串？

首页课程实战体系课手记专栏慕课教程

如何遍历 Python 中的字符串列表并连接属于标签的字符串？

在 Python 3 中遍历元素列表时，如何“隔离”感兴趣的元素之间的内容？

我有一个清单：

list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]

在此列表中，有带有标签 < h > 的元素和其他没有标签的元素。这个想法是具有此标签的元素是“标题”，直到下一个标签的以下元素是它的内容。

如何连接属于 header 的列表元素以具有两个相等大小的列表：

headers = ["<h1> question 1", "<h1> answer 1", "<h1> question 2", "<h> answer 2"]

content = ["question 1 content question 1 more content", "answer 1 content answer 1 more content", "question 2 content", "answer 2 content"]

这两个列表的长度相同，在这种情况下，每个列表有 4 个元素。

我能够将这些部分分开，但您可以使用一些帮助来完成：

list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]

headers = []

content = []

for i in list:

if "<h1>" in i:

headers.append(i)

if "<h1>" not in i:

tempContent = []

tempContent.append(i)

content.append(tempContent)

关于如何组合这些文本以使它们一一对应的任何想法？

谢谢！

SMILET

浏览 131回答 2

2回答

catspeake

假设在每个标题之后所有元素都是该标题的内容，并且第一个元素始终是标题 - 您可以使用itertools.groupby.key可以是元素是否具有标题标签，这样标题的内容将在其后分组：from itertools import groupbylst = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]headers = []content = []for key, values in groupby(lst, key=lambda x: "<h" in x):    if key:        headers.append(*values)    else:        content.append(" ".join(values))print(headers)print(content)给出：['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content']您当前方法的问题是您总是只将一项添加到内容中。您要做的是累积temp_content列表，直到遇到下一个标题，然后才添加它并重置：headers = []content = []temp_content = Nonefor i in list:    if "<h" in i:        if temp_content is not None:            content.append(" ".join(temp_content))            temp_content = []        headers.append(i)    else:        temp_content.append(i)

0 0

慕勒3428872

您可以在collections.defaultdict迭代列表时将标题和内容收集到 a 中。然后将键和值拆分为最后headers的content列表。我们可以通过简单地检查一个字符串来检测标题。str.startswith "<h"我还使用该continue语句在找到标头后立即进入下一次迭代。也可以在这里只使用一个else语句。from collections import defaultdictlst = [    "<h1> question 1",    "question 1 content",    "question 1 more content",    "<h1> answer 1",    "answer 1 content",    "answer 1 more content",    "<h1> question 2",    "question 2 content",    "<h> answer 2",    "answer 2 content",]header_map = defaultdict(list)header = Nonefor item in lst:    if item.startswith("<h"):        header = item        continue    header_map[header].append(item)headers = list(header_map)print(headers)content = [" ".join(v) for v in header_map.values()]print(content)输出：['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content'

0 0

随时随地看视频慕课网APP

相关分类

Python