Python Beautiful Soup - 删除的标签仍然影响输出

首页课程实战体系课手记专栏慕课教程

Python Beautiful Soup - 删除的标签仍然影响输出

你好，

from bs4 import BeautifulSoup

html = 'This is a test.'

soup = BeautifulSoup(html, 'lxml')

print(soup)

for tag in soup.find_all('i'):

tag.replace_with('is')

print(soup)

print("\n")

print(soup.prettify())

print("\n")

for string in soup.stripped_strings:

print(string)

该程序输出以下内容：

<html>

<body>

This

a test.

</body>

</html>

This

a test

为什么呢？为什么字符串仍然分为三部分，就好像删除的标签仍然存在一样？如果我使用This is a test.（这是我替换标签后的输出）作为我的起始 html，一切都工作正常。

我究竟做错了什么？

提前致谢

侃侃尔雅

浏览 167回答 1

1回答

守着星空守着你

看起来它替换is为is，但它没有替换树中的节点，并且它仍然is作为树中的单独项目运行。您必须将树转换为字符串并再次解析它才能将其作为树中的单个节点。html = str(soup)#print(html)soup = BeautifulSoup(html, 'lxml')如果您希望文本作为一个字符串那么您可以尝试get_text(strip=True, separator=" ")from bs4 import BeautifulSouphtml = 'This is a test.'soup = BeautifulSoup(html, 'lxml')print(soup.get_text(strip=True, separator=" "))

0 0

随时随地看视频慕课网APP