用Python字符串解码HTML实体?

用Python字符串解码HTML实体?

我正在用BeautifulSoup 3解析一些HTML,但是它包含的HTML实体不是针对我自动解码的:


>>> from BeautifulSoup import BeautifulSoup


>>> soup = BeautifulSoup("<p>&pound;682m</p>")

>>> text = soup.find("p").string


>>> print text

&pound;682m

如何解码HTML实体text得到"£682m"而不是"&pound;682m".


智慧大石
浏览 1257回答 4
4回答

慕村9548890

美丽的汤处理实体转换。在“美丽汤”3中,您需要指定convertEntities对BeautifulSoup构造函数(请参阅“实体转换”(存档文档的部分)。在美汤4,实体被自动解码。美汤3>>>&nbsp;from&nbsp;BeautifulSoup&nbsp;import&nbsp;BeautifulSoup>>>&nbsp;BeautifulSoup("<p>&pound;682m</p>",&nbsp;...&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;convertEntities=BeautifulSoup.HTML_ENTITIES)<p>£682m</p>美汤4>>>&nbsp;from&nbsp;bs4&nbsp;import&nbsp;BeautifulSoup>>>&nbsp;BeautifulSoup("<p>&pound;682m</p>")<html><body><p>£682m</p></body></html>

千万里不及你

您可以使用w3lib.html库中的替换_实体。In&nbsp;[202]:&nbsp;from&nbsp;w3lib.html&nbsp;import&nbsp;replace_entitiesIn&nbsp;[203]:&nbsp;replace_entities("&pound;682m")Out[203]: &nbsp;u'\xa3682m'In&nbsp;[204]:&nbsp;print&nbsp;replace_entities("&pound;682m")£682m
打开App,查看更多内容
随时随地看视频慕课网APP