Python，循环遍历某个目录下的文件，统计词频，输出结果到txt

我将您的代码片段转换为一个函数，该函数将包含输入文件的文件夹的路径作为参数。以下代码获取指定文件夹中的所有文件，并为该文件夹中的每个文件生成 cleaned_output.txt 和 test.txt 到新创建的输出目录。输出文件在末尾附加了它们生成的输入文件的名称，以便更容易区分它们，但您可以更改它以满足您的需要。from collections import Counterfrom nltk.corpus import stopwordsfrom nltk.tokenize import word_tokenizeimport ospath = 'input/'def clean_text(path):  try:    os.mkdir('output')  except:    pass    out_path = 'output/'  files = [f for f in os.listdir(path) if os.path.isfile(path+f)]  file_paths = [path+f for f in files]  file_names = [f.strip('.txt') for f in files]    for idx, f in enumerate(file_paths):    stop_words = set(stopwords.words('english'))    file1 = open(f)    line = file1.read()    words =  line.split()    words = [word.lower() for word in words]    print(words)    for r in words:        if not r in stop_words:            appendFile = open(out_path + 'cleaned_output_{}.txt'.format(file_names[idx]),'a')            appendFile.write(" "+r)            appendFile.close()    with open(out_path + 'cleaned_output_{}.txt'.format(file_names[idx])) as input_file:        count = Counter(word for line in input_file                            for word in line.split())    print(count.most_common(10), file=open(out_path + 'test_{}.txt'.format(file_names[idx]),'a'))clean_text(path)这是你要找的吗？

Python，循环遍历某个目录下的文件，统计词频，输出结果到txt

2回答