总结一个文本文件的内容

我有一个像这个例子的文本文件：

chrX 7970000 8670000 3 2 7 7 RPS6KA6 4

chrX 7970000 8670000 3 2 7 7 SATL1 3

chrX 7970000 8670000 3 2 7 7 SH3BGRL 4

chrX 7970000 8670000 3 2 7 7 VCX2 1

chrX 86580000 86980000 1 1 1 5 KLHL4 2

chrX 87370000 88620000 4 4 11 11 CPXCR1 2

chrX 87370000 88620000 4 4 11 11 FAM9A 2

chrX 89050000 91020000 11 6 10 13 FAM9B 3

chrX 89050000 91020000 11 6 10 13 PABPC5 2

我想计算每行重复的次数 ( only 1st, 2nd and 3rd columns)。在output，会有5 columns。the1st 3 columns将相同（每行仅重复一次），但4th column在 thesame column和 the 中会有多个字符same line（这些字符在8th columnof 中original file）。the5th column是1st 3 lines are repeatedin的次数original file。

in short: 在input file,columns 4,5,6,7 and 9 are useless对于输出文件。我们应该算在其中的行数1st 3 columns are the same，因此，在output file该1st 3 column would be the same as input file（但only repeated once）。该5th column is the number of times行是重复的。的4th column of output是所有字符从8th column这些都是重复行。在expected output，这一行是repeated 4 times：chrX 7970000 8670000。所以，5th column is 4和4th column is: RPS6KA6,SATL1,SH3BGRL,VCX2。正如您在4th column are comma separated.

这是预期的输出：

chrX 7970000 8670000 RPS6KA6,SATL1,SH3BGRL,VCX2 4

chrX 86580000 86980000 KLHL4 1

chrX 87370000 88620000 CPXCR1,FAM9A 2

chrX 89050000 91020000 FAM9B,PABPC5 2

我试图在 Python 中做到这一点并编写了以下代码：

file = open("myfile.txt", 'rb')

infile = []

for line in file:

infile.append(line)

count = 0

final = []

for i in range(len(infile)):

count += 1

if infile[i-1] == infile[i]

final.append(infile[0,1,2,7, count])

这段代码没有返回我想要的。你知道如何解决吗？

jeck猫

浏览 186回答 3

3回答

喵喔喔

这应该做你想做的：from collection import defaultdict # 1lines = [line.rstrip().split() for line in open('file.txt').readlines()] # 2counter = defaultdict(list) # 3for line in lines:    counter[(line[0], line[1], line[2])].append(line[7]) # 4for key, value in counter.iteritems(): # 5    print '{} {} {}'.format(' '.join(key), ','.join(value), len(value)) # 6解释：我们将使用一个方便的库，它为我们提供了一个带有默认值的字典读取整个输入文件，删除末尾的新行并拆分为多个部分（在空白处）为任何键创建一个默认值为空列表的字典遍历行并填充字典第 1-3 列是关键对于第8列的每个字符序列，我们把它添加到列表中（如果我们没有使用defaultdict与list该会更复杂）迭代字典的键值对打印输出，将数据结构加入所需的格式。希望这有帮助🙂。

随时随地看视频慕课网APP