猿问

ValueError:模型的特征数必须与输入匹配(sklearn)

我正在尝试对一些电影评论数据运行分类器。数据已经被分成reviews_train.txt和reviews_test.txt。然后我加载数据并将每个数据分成评论和标签(正 (0) 或负 (1)),然后对这些数据进行矢量化。这是我的代码:


from sklearn import tree

from sklearn.metrics import accuracy_score

from sklearn.feature_extraction.text import TfidfVectorizer

#read the reviews and their polarities from a given file


def loadData(fname):

    reviews=[]

    labels=[]

    f=open(fname)

    for line in f:

        review,rating=line.strip().split('\t')  

        reviews.append(review.lower())    

        labels.append(int(rating))

    f.close()


    return reviews,labels


rev_train,labels_train=loadData('reviews_train.txt')

rev_test,labels_test=loadData('reviews_test.txt')


#vectorizing the input

vectorizer = TfidfVectorizer(ngram_range=(1,2))

vectors_train = vectorizer.fit_transform(rev_train)

vectors_test = vectorizer.fit_transform(rev_test)


clf = tree.DecisionTreeClassifier()

clf = clf.fit(vectors_train, labels_train)


#prediction

pred=clf.predict(vectors_test)

#print accuracy


print (accuracy_score(pred,labels_test))

但是我不断收到此错误:


ValueError: Number of features of the model must match the input.

Model n_features is 118686 and input n_features is 34169 

我对 Python 很陌生,所以如果这是一个简单的修复,我提前道歉。


牧羊人nacy
浏览 785回答 1
1回答
随时随地看视频慕课网APP

相关分类

Python
我要回答