将 html 字符串拆分为列表

使用 re.findall用于此目的以字符串列表的形式返回字符串中模式的所有非重叠匹配项。从左到右扫描字符串，并按找到的顺序返回匹配项。如果模式中存在一个或多个组，则返回组列表；如果模式有多个组，这将是一个元组列表。结果中包含空匹配项In [1]: a='<.tag> xxxxx<./tag> <.tag>'In [2]: import reIn [4]: re.findall(r'<[^>]+>|\w+',a)Out[4]: ['<.tag>', 'xxxxx', '<./tag>', '<.tag>']In [5]: re.findall(r'<[^>]+>|[^<]+',a)Out[5]: ['<.tag>', ' xxxxx', '<./tag>', ' ', '<.tag>']In [17]: [i.strip() for i in re.findall(r'<[^>]+>|[^<]+',a) if not i.isspace()]Out[17]: ['<.tag>', 'xxxxx', '<./tag>', '<.tag>']

将 html 字符串拆分为列表

2回答