猿问

如何将 ngrams 生成器结果保存在文本文件中?

我正在使用 nltk 和 python 从语料库中提取 ngram,我需要将生成的 ngram 保存在文本文件中。


我试过这段代码,但没有结果:


import nltk, re, string, collections

from nltk.util import ngrams 

with open("titles.txt", "r", encoding='utf-8') as file:

    text = file.read()

tokenized = text.split()

Monograms = ngrams(tokenized, 1)

MonogramFreq = collections.Counter(Monograms)

with open('output.txt', 'w') as f:    

   f.write(str(MonogramFreq))

这是titles.txt的示例:


Joli appartement s3 aux jardins de carthage mz823

Villa 600m2 haut standing à hammamet

Hammem lif

S2 manzah 7

Terrain constructible de 252m2 clôturé

Terrain nu a gammarth

Terrain agrecole al fahes

Bureau 17 pièces

Usine 5000m2 mannouba

MongramFreq 的简单打印必须给出如下内容:


('atelier',): 17, ('430',): 17, ('jabli',): 17, ('mall',): 17, ('palmeraies',): 17, ('r4',): 17, ('dégagée',): 17, ('fatha',): 17

但甚至没有创建output.txt文件。


我更正了我的代码如下:


import nltk, re, string, collections

from nltk.util import ngrams 

with open("titles.txt", "r", encoding='utf-8') as file:

text = file.read()

tokenized = text.split()

Threegrams = ngrams(tokenized, 3)

ThreegramFreq = collections.Counter(Threegrams)

for i in ThreegramFreq.elements():

with open('output.txt', 'a') as w:

w.write(str(i))

w.close()

但是我需要在 output.txt 文件中包含每个 3-gram 的频率。怎么做 ?


慕码人8056858
浏览 178回答 2
2回答

慕桂英3389331

请至少阅读评论:from collections import Counterfrom nltk import word_tokenize, ngramstext='''Joli appartement s3 aux jardins de carthage mz823Villa 600m2 haut standing à hammametHammem lifS2 manzah 7Terrain constructible de 252m2 clôturéTerrain nu a gammarthTerrain agrecole al fahesBureau 17 piècesUsine 5000m2 mannouba'''# Create a counter object to track ngrams and counts.ngram_counters = Counter()# Split the text into sentences, # For now, assume '\n' delimits the sentences.for line in text.split('\n'):    # Update the counters with ngrams in each sentence,    ngram_counters.update(ngrams(word_tokenize(line), n=3))# Opens a file to print out.with open('ngram_counts.tsv', 'w') as fout:    # Iterate through the counter object, like a dictionary.    for ng, counts in ngram_counters.items():        # Use space to join the tokens in the ngrams before printing.        # Print the counts in a separate column.        print(' '.join(ng) +'\t' + str(counts), end='\n', file=fout)
随时随地看视频慕课网APP

相关分类

Python
我要回答