将列表更改为字符串以删除字符

我有一个文件,我正在尝试对其进行词频列表,但我在列表和字符串方面遇到了问题。我将文件更改为字符串以从文件中删除数字,但这最终会弄乱标记化。预期的输出是我打开的文件的字数,不包括数字,但我得到的是以下内容:


Counter({'<_io.TextIOWrapper': 1, "name='german/test/polarity/negative/neg_word_list.txt'": 1, "mode='r'": 1, "encoding='cp'>": 1})

done

这是代码:


import re

from collections import Counter


def word_freq(file_tokens):

    global count

    for word in file_tokens:

        count = Counter(file_tokens)

    return count


f = open("german/test/polarity/negative/neg_word_list.txt")


clean = re.sub(r'[0-9]', '', str(f))


file_tokens = clean.split()


print(word_freq(file_tokens))

print("done")

f.close()


守着星空守着你
浏览 110回答 2
2回答

慕村225694

这最终奏效了,感谢Rakeshimport refrom collections import Counterdef word_freq(file_tokens):&nbsp; &nbsp; global count&nbsp; &nbsp; for word in file_tokens:&nbsp; &nbsp; &nbsp; &nbsp; count = Counter(file_tokens)&nbsp; &nbsp; return countf = open("german/test/polarity/negative/neg_word_list.txt")clean = re.sub(r'[0-9]', '', f.read())file_tokens = clean.split()print(word_freq(file_tokens))print("done")f.close()

蝴蝶刀刀

进一步阅读我注意到你没有“阅读”文件,你只是打开了它。如果您只打印打开文件:f = open("german/test/polarity/negative/neg_word_list.txt")print(f)你会注意到它会告诉你对象是什么,“io.TextIOWrapper”。所以你需要阅读它:f_path = open("german/test/polarity/negative/neg_word_list.txt")f = f_path.read()f_path.close() # don't forget to do this to clear stuffprint(f)# >>> what's really inside the file或者没有“close()”的另一种方法:# adjust your encodingwith open("german/test/polarity/negative/neg_word_list.txt", encoding="utf-8") as r:&nbsp; &nbsp; f = r.read()这样做可能不会在列表中,而是在纯文本文件中,因此您可以迭代每一行:list_of_lines = []# adjust your encodingwith open("german/test/polarity/negative/neg_word_list.txt", encoding="utf-8") as r:&nbsp; &nbsp; # read each line and append to list&nbsp; &nbsp; for line in r:&nbsp; &nbsp; &nbsp; &nbsp; list_of_lines.append(line)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python