为什么Python坚持使用ascii？

首页课程实战体系课手记专栏慕课教程

为什么Python坚持使用ascii？

使用“请求和精美的汤”解析HTML文件时，以下行在某些网页上引发异常：

if 'var' in str(tag.string):

这里是上下文：

response = requests.get(url)

soup = bs4.BeautifulSoup(response.text.encode('utf-8'))

for tag in soup.findAll('script'):

if 'var' in str(tag.string): # This is the line throwing the exception

print(tag.string)

这是例外：

UnicodeDecodeError：'ascii'编解码器无法解码位置15的字节0xc3：序数不在范围内（128）

我已经尝试过使用和不使用encode('utf-8')该BeautifulSoup行中的函数，这没有什么区别。我确实注意到，对于那些引发异常的页面Ã，即使response.encoding报告的编码为，但javascript的注释中还是有一个字符ISO-8859-1。我确实意识到我可以使用unicodedata.normalize删除有问题的字符，但是我更愿意将tag变量转换为utf-8并保留字符。以下方法均无法将变量更改为utf-8：

tag.encode('utf-8')

tag.decode('ISO-8859-1').encode('utf-8')

tag.decode(response.encoding).encode('utf-8')

为了将其转换为可用字符串，我必须怎么做utf-8？

POPMUISE

浏览 199回答 2

2回答

随时随地看视频慕课网APP