我对 Python 中的 Beautiful Soup 非常熟悉,我一直用来抓取实时网站。
现在我正在抓取本地 HTML 文件(链接,如果您想测试代码),唯一的问题是重音字符没有以正确的方式表示(在抓取实时网站时,我从未发生过这种情况)。
这是代码的简化版本
import requests, urllib.request, time, unicodedata, csv
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('AH.html'), "html.parser")
tables = soup.find_all('table')
titles = tables[0].find_all('tr')
print(titles[55].text)
打印以下输出
2:22 - Il Destino È Già Scritto (2017 ITA/ENG) [1080p] [BLUWORLD]
而正确的输出应该是
2:22 - Il Destino È Già Scritto (2017 ITA/ENG) [1080p] [BLUWORLD]
我寻找解决方案,阅读了许多问题/答案并找到了这个答案,我通过以下方式实现了它
import requests, urllib.request, time, unicodedata, csv
from bs4 import BeautifulSoup
import codecs
response = open('AH.html')
content = response.read()
html = codecs.decode(content, 'utf-8')
soup = BeautifulSoup(html, "html.parser")
但是,它运行时出现以下错误
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: a bytes-like object is required, not 'str'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\user\Desktop\score.py", line 8, in <module>
html = codecs.decode(content, 'utf-8')
TypeError: decoding with 'utf-8' codec failed (TypeError: a bytes-like object is required, not 'str')
我想解决这个问题很容易,但是怎么办呢?
慕姐8265434
梦里花落0921
相关分类