如何获得scikit学习分类器的大多数信息功能？

3回答

翻阅古今

分类器本身不记录要素名称，它们仅显示数字数组。但是，如果您使用Vectorizer/ CountVectorizer/ TfidfVectorizer/ 提取了特征DictVectorizer，并且使用的是线性模型（例如LinearSVCNaive Bayes或Naive Bayes），则可以应用文档分类示例所使用的技巧。示例（未经测试，可能包含一个或两个错误）：def print_top10(vectorizer, clf, class_labels):    """Prints features with the highest coefficient values, per class"""    feature_names = vectorizer.get_feature_names()    for i, class_label in enumerate(class_labels):        top10 = np.argsort(clf.coef_[i])[-10:]        print("%s: %s" % (class_label,              " ".join(feature_names[j] for j in top10)))这是用于多类分类的；对于二进制情况，我认为您应该clf.coef_[0]只使用。您可能需要对进行排序class_labels。

0 0

饮歌长啸

在larsmans代码的帮助下，我想到了以下二进制情况的代码：def show_most_informative_features(vectorizer, clf, n=20):    feature_names = vectorizer.get_feature_names()    coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))    top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])    for (coef_1, fn_1), (coef_2, fn_2) in top:        print "\t%.4f\t%-15s\t\t%.4f\t%-15s" % (coef_1, fn_1, coef_2, fn_2)

0 0

潇潇雨雨

实际上，我必须在NaiveBayes分类器上找到功能重要性，尽管我使用了上述功能，但无法基于类获得功能重要性。我浏览了scikit-learn的文档，并对上述功能进行了一些调整，以发现它可以解决我的问题。希望它也对您有帮助！def important_features(vectorizer,classifier,n=20):    class_labels = classifier.classes_    feature_names =vectorizer.get_feature_names()    topn_class1 = sorted(zip(classifier.feature_count_[0], feature_names),reverse=True)[:n]    topn_class2 = sorted(zip(classifier.feature_count_[1], feature_names),reverse=True)[:n]    print("Important words in negative reviews")    for coef, feat in topn_class1:        print(class_labels[0], coef, feat)    print("-----------------------------------------")    print("Important words in positive reviews")    for coef, feat in topn_class2:        print(class_labels[1], coef, feat)请注意，您的分类器（在我的情况下是NaiveBayes）必须具有feature_count_属性才能起作用。

0 0