猿问

避免'字符参数不在范围内'python3解码

我正在尝试解码对requests.get()特定 url 的调用内容。导致问题的 url 在代码的多次运行中并不总是相同的,但是产生问题的请求内容的部分具有三个反斜杠,这在使用unicode-escape.


作为在 Python 3.6.1 中运行的代码的简化版本


r=b'\xf0\\\xebI'

r.decode('unicode-escape').strip().replace('{','\n')

产生以下错误:


OverflowError: character argument not in range(0x110000)


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

OverflowError: decoding with 'unicode-escape' codec failed (OverflowError: character argument not in range(0x110000))

我想跳过产生错误的部分。我是一个新手 python 程序员,所以非常感谢任何帮助。


婷婷同学_
浏览 223回答 2
2回答

慕姐8265434

这些步骤应该适用于您的情况In [1]: r=b'\xf0\\\xebI'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;#Decode to utf-8 using backslashreplaceIn [2]: x=r.decode('utf-8', errors='backslashreplace')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;In [3]: x&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Out[3]: '\\xf0\\\\xebI'#Replace the extra backslashIn [4]: y = x.replace('\\\\','\\')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;In [5]: y&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Out[5]: '\\xf0\\xebI'#Encode to ascii and decode to unicode-escapeIn [6]: z = y.encode('ascii').decode('unicode-escape')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;In [7]: z&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Out[7]: 'ðëI'请注意,这也适用于双斜杠,您的正常情况r=b'\xf0\\xebI'x=r.decode('utf-8', errors='backslashreplace')y = x.replace('\\\\','\\')z = y.encode('ascii').decode('unicode-escape')print(z)#ðëI

炎炎设计

数据似乎被编码为 latin-1 *,因此最简单的解决方案是解码然后删除反斜杠。>>> r=b'\xf0\\\xebI'>>> r.decode('latin-1').replace('\\', '')'ðëI'*我猜是 latin-1(也称为 ISO-8859-1)——响应的内容类型标头应该指定使用的编码,它可能是其他 ISO-8859-* 编码之一。
随时随地看视频慕课网APP

相关分类

Python
我要回答