在NLTK / Python中使用电影评论语料库进行分类

我希望根据NLTK第6章进行分类。这本书似乎跳过了创建类别的步骤,而且我不确定自己做错了什么。我在这里有以下响应的脚本。我的问题主要来自第一部分-基于目录名称的类别创建。此处的其他一些问题使用了文件名(即pos_1.txt和neg_1.txt),但我希望创建可以将文件转储到的目录。


from nltk.corpus import movie_reviews


reviews = CategorizedPlaintextCorpusReader('./nltk_data/corpora/movie_reviews', r'(\w+)/*.txt', cat_pattern=r'/(\w+)/.txt')

reviews.categories()

['pos', 'neg']


documents = [(list(movie_reviews.words(fileid)), category)

            for category in movie_reviews.categories()

            for fileid in movie_reviews.fileids(category)]


all_words=nltk.FreqDist(

    w.lower() 

    for w in movie_reviews.words() 

    if w.lower() not in nltk.corpus.stopwords.words('english') and w.lower() not in  string.punctuation)

word_features = all_words.keys()[:100]


def document_features(document): 

    document_words = set(document) 

    features = {}

    for word in word_features:

        features['contains(%s)' % word] = (word in document_words)

    return features

print document_features(movie_reviews.words('pos/11.txt'))


featuresets = [(document_features(d), c) for (d,c) in documents]

train_set, test_set = featuresets[100:], featuresets[:100]

classifier = nltk.NaiveBayesClassifier.train(train_set)


print nltk.classify.accuracy(classifier, test_set)

classifier.show_most_informative_features(5)

返回:


File "test.py", line 38, in <module>

    for w in movie_reviews.words()


File "/usr/local/lib/python2.6/dist-packages/nltk/corpus/reader/plaintext.py", line 184, in words

    self, self._resolve(fileids, categories))


File "/usr/local/lib/python2.6/dist-packages/nltk/corpus/reader/plaintext.py", line 91, in words

    in self.abspaths(fileids, True, True)])


File "/usr/local/lib/python2.6/dist-packages/nltk/corpus/reader/util.py", line 421, in concat

    raise ValueError('concat() expects at least one object!')


ValueError: concat() expects at least one object!

慕无忌1623718
浏览 1362回答 1
1回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python