我试图找到多个 .txt 文件之间的相似之处。我已经把所有这些文件放在一个字典中,文件名作为关键字。
当前代码:
import pandas as pd
from os import listdir, chdir, getcwd
path = (r'C:\...path')
chdir(path)
files = [f for f in listdir(path)]
files_dict = {}
for filename in files:
if filename.lower().endswith(('.txt')):
files_dict[str(filename)] = pd.read_csv(filename).to_dict('split')
for key, value in files_dict.items():
print(key + str(value) +'\n')
在这种情况下,关键是文件名。值是标题和数据。我想找出多个文件之间的值是否有重复,以便我可以在 SQL 中加入它们。我不知道该怎么做
编辑示例文件:
timestamp,Name,Description,Default Column Layout,Analysis View Name
00000000B42852FA,ADM_EIG,Administratief eigenaar,ADM_EIG,ADM_EIG
000000005880959E,OPZ,Opzeggingen,STANDAARD,
并从代码:
Acc_ Schedule Name.txt{'index': [0, 1], 'columns': ['timestamp', 'Name', 'Description', 'Default Column Layout', 'Analysis View Name'], 'data': [['00000000B42852FA', 'ADM_EIG', 'Administratief eigenaar', 'ADM_EIG', 'ADM_EIG'], ['000000005880959E', 'OPZ', 'Opzeggingen', 'STANDAARD', nan]]}
编辑 2:建议的代码
for key, value in files_dict.items():
data = value['data']
counter = Counter([item for sublist in data for item in sublist])
print([value for value, count in counter.items()])
输出: ['00000000B99BD831', 5050, 'CK102', '0,00000000000000000000', 'Thuiswonend', 0, '00000000B99BD832', ........
翻阅古今
相关分类