将不同文件夹中的多个文本文件加载到一个文件中并考虑特殊字符

首页课程实战体系课手记专栏慕课教程

将不同文件夹中的多个文本文件加载到一个文件中并考虑特殊字符

我正在尝试将位于不同文件夹中的多个文本文件读取到一个文件中，并针对奇怪的格式问题进行调整，尤其是读取时的特殊字符。

我的输入文件如下所示：

cat date col1 col2 col3

x 3/1/2010 " 823,312,356 "" 145,019,711 "" "" 666,666 "" "

x 3/8/2010 " 3,423,115,838 "" 111,422,457 "" "" 311,512 "" "

x 3/15/2010 " 4,117,664,854 ""115,115,141 "" "" 213,550 """

x 3/22/2010 527,337,127 " "" 153,423,891 "" "" 216,365 "" "

x 3/29/2010 "459,227,151" " "" 57,213,333 "" 454,718

x 4/6/2010 "367,221,146" " "" 72,458,231 """ "264,130"

x 4/13/2010 - - $0

我需要解决很多奇怪的格式问题。

我正在尝试这个：

import glob

read_files = glob.glob(data_path + "*.txt")

with open(data_path +"final.txt", "wb") as outfile:

for f in read_files:

with open(f, "rb") as infile:

infile = re.sub(r"[-()\"#@;:<>{}`+=~|.!?,]", "", infile)

outfile.write(infile.read())

但我收到一条错误消息，内容如下：

类型错误：预期的字符串或类似字节的对象

有人遇到过同样的问题吗？

潇湘沐

浏览 100回答 1

1回答

弑天下

with open(f, "rb") as infile:    infile = re.sub(r"[-()\"#@;:<>{}`+=~|.!?,]", "", infile)    outfile.write(infile.read())首先，打开文件时使用'b'将文件内容视为bytes对象，而不是字符串（类型str）。这不适用于作为字符串给出的正则表达式模式。所以你应该忽略'b'. 由于其余的'r'是打开文件的默认模式，因此您可以完全省略第二个参数open()。接下来，您将文件对象与文件内容混淆，并且操作顺序错误。infile.read()读取文件的内容并将其作为字符串返回（当省略时'b'）。该字符串可以传递给re.sub.所以正确的顺序是：with open(f) as infile:    text = infile.read()    replaced_text = re.sub(r"[-()\"#@;:<>{}`+=~|.!?,]", "", text)    outfile.write(replaced_text)

0 0

随时随地看视频慕课网APP

相关分类

Python