有没有办法分析文本文件来检查这个标准

我需要创建一个程序来分析文件中的一段文本,然后进行计数:

  • 多少字

  • 单词的平均长度

  • 每个单词出现多少次

  • 字母表中每个字母开头有多少个单词

到目前为止,我已经成功完成了前两个要点(如下所示),

fileName = open(input('Please enter the full name of the file: '), 'r') 

    w = [len(word) for line in fileName for word in line.rstrip().split(" ")]

    total_w = len(w)

    avg_w = sum(w) / total_w

    

    

  print('The total number of words in this file is:', total_w)

  print('The average length of the words in this file is:', avg_w)


当年话下
浏览 80回答 1
1回答

幕布斯6054654

collections.Counter使得这相对简单。我用来re.findall(r'[\w]+', data)查找单词(单词是带有字母、下划线和数字的东西)。根据需要进行调整。import refrom collections import Counterfn = input('Please enter the full name of the file: ')with open(fn, 'r') as f:    words = Counter(re.findall(r'[\w]+', f.read()))    # use words = Counter(f.read().split()) if everything split by spaces    # adjust regular expression depending on whether you want or don't want    # stuff like numbers to be counted as "words"print('Total number of words:', sum(words.values()))# this is weighted by word occurrence, not sure whether this is correctprint('Average length of words:',       sum(len(w) * o for w, o in words.items()) / sum(words.values()))print('Word occurrence:', words)# this only shows letters that actually occur. If you need all letters of # the alphabet, you have to add the restprint('Start letter occurrence', Counter(w[0] for w in words.elements()))
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python