我正在尝试将位于不同文件夹中的多个文本文件读取到一个文件中,并针对奇怪的格式问题进行调整,尤其是读取时的特殊字符。
我的输入文件如下所示:
cat date col1 col2 col3
x 3/1/2010 " 823,312,356 "" 145,019,711 "" "" 666,666 "" "
x 3/8/2010 " 3,423,115,838 "" 111,422,457 "" "" 311,512 "" "
x 3/15/2010 " 4,117,664,854 ""115,115,141 "" "" 213,550 """
x 3/22/2010 527,337,127 " "" 153,423,891 "" "" 216,365 "" "
x 3/29/2010 "459,227,151" " "" 57,213,333 "" 454,718
x 4/6/2010 "367,221,146" " "" 72,458,231 """ "264,130"
x 4/13/2010 - - $0
我需要解决很多奇怪的格式问题。
我正在尝试这个:
import glob
read_files = glob.glob(data_path + "*.txt")
with open(data_path +"final.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
infile = re.sub(r"[-()\"#@;:<>{}`+=~|.!?,]", "", infile)
outfile.write(infile.read())
但我收到一条错误消息,内容如下:
类型错误:预期的字符串或类似字节的对象
有人遇到过同样的问题吗?
弑天下
相关分类