UnicodeEncodeError:'ascii'编解码器无法编码字符'\ xe9'--

我正在编写一个脚本,该脚本转到链接列表并解析信息。


它适用于大多数站点,但在某些情况下令人窒息:“ UnicodeEncodeError:'ascii'编解码器无法在位置13编码字符'\ xe9':序数不在范围内(128)”


它在python3上urlib的client.py上停止


确切的链接是:http : //finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html


这里有很多类似的帖子,但是似乎没有答案对我有用。


我的代码是:


from urllib import request


def __request(link,debug=0):      


try:

    html = request.urlopen(link, timeout=35).read() #made this long as I was getting lots of timeouts

    unicode_html = html.decode('utf-8','ignore')


# NOTE the except HTTPError must come first, otherwise except URLError will also catch an HTTPError.

except HTTPError as e:

    if debug:

        print('The server couldn\'t fulfill the request for ' + link)

        print('Error code: ', e.code)

    return ''

except URLError as e:

    if isinstance(e.reason, socket.timeout):

        print('timeout')

        return ''    

else:

    return unicode_html

这调用了请求功能

链接=' http: //finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html'页面= __request(链接)


追溯是:


Traceback (most recent call last):

  File "<string>", line 250, in run_nodebug

  File "C:\reader\get_news.py", line 276, in <module>

    main()

  File "C:\reader\get_news.py", line 255, in main

    body = get_article_body(item['link'],debug=0)

  File "C:\reader\get_news.py", line 155, in get_article_body

    page = __request('na',url)

  File "C:\reader\get_news.py", line 50, in __request

    html = request.urlopen(link, timeout=35).read()

  File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen

    return opener.open(url, data, timeout)

  File "C:\Python33\Lib\urllib\request.py", line 469, in open

    response = self._open(req, data)

  File "C:\Python33\Lib\urllib\request.py", line 487, in _open

任何帮助表示赞赏它使我发疯,我想我已经尝试过x.decode和类似内容的所有组合



白衣染霜花
浏览 582回答 3
3回答

MM们

我不确定在URL的其他部分是否会出现问题,所以我将其拆分然后重新构建url_tuple = parse.urlsplit(link)parse.quote_plus(url_tuple [2])+ url_tuple [3] + parse.quote_plus(url_tuple [4]))encode_link =“%s://%s%s?%s%s”%(url_tuple [0],url_tuple [1],parse.quote(url_tuple [2]) ,url_tuple [3],parse.quote(url_tuple [4]))&nbsp;
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python