在python中查找文件中的单词数

我是 python 的新手，并试图做一个练习，我打开一个 txt 文件，然后阅读它的内容（对大多数人来说可能是直接的，但我承认我有点挣扎）。

我打开我的文件并使用 .read() 来读取文件。然后我继续删除任何标点的文件。接下来我创建了一个 for 循环。在这个循环中，我开始使用 .split() 并添加到一个表达式中： words = words + len(characters) 之前在循环外定义为 0 的单词和在循环开始时拆分的字符。长话短说，我现在遇到的问题是，不是将整个单词添加到我的计数器中，而是添加了每个单独的字符。我可以做些什么来解决我的 for 循环中的问题？

my_document = open("book.txt")

readTheDocument = my_document.read

comma = readTheDocument.replace(",", "")

period = comma.replace(".", "")

stripDocument = period.strip()

numberOfWords = 0

for line in my_document:

splitDocument = line.split()

numberOfWords = numberOfWords + len(splitDocument)

print(numberOfWords)

至尊宝的传说

浏览 203回答 2

2回答

沧海一幻觉

一种更 Pythonic 的方法是使用with：with open("book.txt") as infile:    count = len(infile.read().split())你必须明白，通过使用.split()你并没有真正获得真正的语法词。你得到了类似单词的片段。如果你想要合适的词，请使用 module nltk：import nltkwith open("book.txt") as infile:    count = len(nltk.word_tokenize(infile.read()))

0 0

慕标5832272

只需打开文件并拆分即可获得单词数。file=open("path/to/file/name.txt","r+")count=0for word in file.read().split():    count = count + 1print(count)

0 0

随时随地看视频慕课网APP