>>> from bs4 import BeautifulSoup>>> soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'lxml')>>> [s.extract() for s in soup('script')]>>> soupbaba
为可能需要将来参考的人员更新了答案:正确答案是。 decompose() 您可以使用不同的方式,但是decompose可以在原地工作。用法示例:soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>')soup.i.decompose()print str(soup)#prints '<p>This is a slimy text and</p>'消除诸如“ script”,“ img”之类的碎屑非常有用。
如(官方文档)中所述,您可以使用extract方法删除与搜索匹配的所有子树。import BeautifulSoupa = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>")[x.extract() for x in a.findAll('script')]