请求库在Python 2和Python 3上崩溃

我正在尝试使用以下代码解析带有requests和BeautifulSoup库的任意网页:


try:

    response = requests.get(url)

except Exception as error:

    return False


if response.encoding == None:

    soup = bs4.BeautifulSoup(response.text) # This is line 809

else:

    soup = bs4.BeautifulSoup(response.text, from_encoding=response.encoding)

在大多数网页上,这都可以正常工作。但是,在某些任意页面(<1%)上,出现此崩溃:


Traceback (most recent call last):

  File "/home/dotancohen/code/parser.py", line 155, in has_css

    soup = bs4.BeautifulSoup(response.text)

  File "/usr/lib/python3/dist-packages/requests/models.py", line 809, in text

    content = str(self.content, encoding, errors='replace')

  TypeError: str() argument 2 must be str, not None

作为参考,这是请求库的relevent方法:


@property

def text(self):

    """Content of the response, in unicode.


    if Response.encoding is None and chardet module is available, encoding

    will be guessed.

    """


    # Try charset from content-type

    content = None

    encoding = self.encoding


    # Fallback to auto-detected encoding.

    if self.encoding is None:

        if chardet is not None:

            encoding = chardet.detect(self.content)['encoding']


    # Decode unicode from given encoding.

    try:

        content = str(self.content, encoding, errors='replace') # This is line 809

    except LookupError:

        # A LookupError is raised if the encoding was not found which could

        # indicate a misspelling or similar mistake.

        #

        # So we try blindly encoding.

        content = str(self.content, errors='replace')


    return content

可以看出,抛出此错误时,我没有传递编码。我如何错误地使用该库,以及如何防止该错误?这是在Python 3.2.3上实现的,但我也可以在Python 2上获得相同的结果。


临摹微笑
浏览 219回答 1
1回答

天涯尽头无女友

这意味着服务器未发送标头中内容的编码,并且chardet库也无法确定内容的编码。实际上,您实际上是在测试是否缺少编码;如果没有可用的编码,为什么要尝试获取解码的文本?您可以尝试将解码留给BeautifulSoup解析器:if response.encoding is None:&nbsp; &nbsp;soup = bs4.BeautifulSoup(response.content)并有没有必要在编码BeautifulSoup通过,因为如果.text没有失败,你正在使用Unicode和BeautifulSoup反正会忽略编码参数:else:&nbsp; &nbsp;soup = bs4.BeautifulSoup(response.text)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python