u'\ ufeff'在Python字符串中

u'\ ufeff'在Python字符串中

我得到了以下模式的错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 155: ordinal not in range(128)

不知道是什么u'\ufeff',它在网络抓取时显示出来。我该如何纠正这种情况?该.replace()字符串的方法不能进行这项工作。


撒科打诨
浏览 1289回答 3
3回答

富国沪深

Unicode字符U+FEFF是字节顺序标记或BOM,用于区分大端和小端UTF-16编码。如果使用正确的编解码器解码网页,Python将为您删除它。例子:#!python2#coding: utf8u = u'ABC'e8 = u.encode('utf-8')        # encode without BOMe8s = u.encode('utf-8-sig')   # encode with BOMe16 = u.encode('utf-16')      # encode with BOMe16le = u.encode('utf-16le')  # encode without BOMe16be = u.encode('utf-16be')  # encode without BOMprint 'utf-8     %r' % e8print 'utf-8-sig %r' % e8sprint 'utf-16    %r' % e16print 'utf-16le  %r' % e16leprint 'utf-16be  %r' % e16beprintprint 'utf-8  w/ BOM decoded with utf-8     %r' % e8s.decode('utf-8')print 'utf-8  w/ BOM decoded with utf-8-sig %r' % e8s.decode('utf-8-sig')print 'utf-16 w/ BOM decoded with utf-16    %r' % e16.decode('utf-16')print 'utf-16 w/ BOM decoded with utf-16le  %r' % e16.decode('utf-16le')请注意,这EF BB BF是一个UTF-8编码的BOM。它不是UTF-8所必需的,但仅作为签名(通常在Windows上)。输出:utf-8     'ABC'utf-8-sig '\xef\xbb\xbfABC'utf-16    '\xff\xfeA\x00B\x00C\x00'    # Adds BOM and encodes using native processor endian-ness.utf-16le  'A\x00B\x00C\x00'utf-16be  '\x00A\x00B\x00C'utf-8  w/ BOM decoded with utf-8     u'\ufeffABC'    # doesn't remove BOM if present.utf-8  w/ BOM decoded with utf-8-sig u'ABC'          # removes BOM if present.utf-16 w/ BOM decoded with utf-16    u'ABC'          # *requires* BOM to be present.utf-16 w/ BOM decoded with utf-16le  u'\ufeffABC'    # doesn't remove BOM if present.请注意,utf-16编解码器需要 BOM存在,否则Python将不知道数据是大端还是小端。

缥缈止盈

该字符是BOM或“字节顺序标记”。它通常作为文件的前几个字节接收,告诉您如何解释其余数据的编码。您只需删除该字符即可继续。虽然,因为错误说你试图转换为'ascii',你应该选择另一种编码,无论你想做什么。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python