IRIS 决策树分类问题

我想在 IRIS 数据集上写一些简单的分类并获得召回率和精度分数,跟着一个 youtube 视频但是在测试准确性时它给了我 100。我对错误有一些假设但不知道该怎么做。你能帮我扩展代码以使其更好吗?以及如何为这个版本的分类编写召回函数?


from sklearn.datasets import load_iris 

from sklearn.tree import DecisionTreeClassifier 

import graphviz

from sklearn.tree import

   export_graphviz

    iris = load_iris() 

  x = iris.data  #feature   

y = iris.target

   #prediction 

  tree_clf =DecisionTreeClassifier()   

model = tree_clf.fit(x,y) #model_fitting  dot_data = export_graphviz

   (tree_clf,out_file=None,feature_names=iris.feature_names,class_names=iris.target_names,filled=True,rounded=True,special_characters=True)


   graph=graphviz.Source(dot_data) graph.render("iris")

   accuracy=tree_clf.score(x,y) 

print(accuracy)


jeck猫
浏览 100回答 3
3回答

牛魔王的故事

为了检查你的结果,你可以使用 sklearn.metricsfrom sklearn.metrics import classification_reportprint(classification_report(y, model.predict(x))) precision    recall  f1-score   support           0       1.00      1.00      1.00        50           1       1.00      1.00      1.00        50           2       1.00      1.00      1.00        50    accuracy                           1.00       150   macro avg       1.00      1.00      1.00       150weighted avg       1.00      1.00      1.00       150如果您对结果有疑问,请目视检查。print(model.predict(x))

慕的地10843

您在机器学习中犯了一个根本性错误 - 根据用于训练它的数据评估模型。相反,您需要将数据分成两组 - 训练和测试。在训练数据上训练您的模型,并在测试数据上进行评估。请参阅https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html尝试这样的事情:x_train, x_test, y_train, y_test = train_test_split(x, y)model = tree_clf.fit(x_train,y_train)accuracy=tree_clf.score(x_test, y_test)要了解为什么这是一个问题,请考虑“作弊”模型的极端情况,它只记住输入数据并输出它记住的任何内容。使用您的代码,它将获得 100% 的准确性,同时什么也学不到。

aluckdog

所以我根据建议实施和更改并实施了我为 150 个数据点(120 个训练和 30 个测试)创建的一些源代码所以我的问题是我的分类报告实施是否正确?发件人import pandas as pdfrom sklearn import treefrom sklearn.tree import DecisionTreeClassifierfrom sklearn.metrics import classification_reportdef accuracy(y_true,y_predict):     count=0;    for i in range(0,len(y_true)):        if y_true[i] == y_predict[i]:            count=count+1;       return(count*100*1.0/len(y_true));#reading trainning datatrain_data=pd.read_csv("iris_train_data.csv",header=0)x_train=train_data.values[:,0:4];y_train=train_data.values[:,4];#training the classifier clf=DecisionTreeClassifier(criterion= 'entropy');clf.fit(x_train,y_train);print('Depth of learnt tree is ',clf.tree_.max_depth)#t=clf.get_n_leaves()print('Number of leaf nodes in learnt tree is 9','\n')#reading test datatest_data=pd.read_csv("iris_test_data.csv",header=0)x_test=test_data.values[:,0:4];y_test=test_data.values[:,4];#Training accuracy and Test accuracy without pruningprint('Training accuracy of classifier is ',accuracy(y_train,clf.predict(x_train)))print('Test accuracy using classifier is ',accuracy(y_test,clf.predict(x_test)),'\n')import pandas as pdfrom sklearn import treefrom sklearn.tree import DecisionTreeClassifierfrom sklearn.metrics import classification_reportdef accuracy(y_true,y_predict):     count=0;    for i in range(0,len(y_true)):        if y_true[i] == y_predict[i]:            count=count+1;       return(count*100*1.0/len(y_true));def pruning_by_max_leaf_nodes(t):    for i in range(1, t-1):        clfnxt1 = DecisionTreeClassifier(criterion= 'entropy',max_leaf_nodes=t-i);        clfnxt1.fit(x_train,y_train)        print('Max_leaf_nodes = ',t-i,'Test Accuracy = ',accuracy(y_test,clfnxt1.predict(x_test)))        return;def pruning_by_max_depth(t):    for i in range(1, t):        clfnxt2 = DecisionTreeClassifier(criterion= 'entropy',max_depth=t-i);        clfnxt2.fit(x_train,y_train)        print('Max_depth = ',clfnxt2.tree_.max_depth,'Test Accuracy = ',accuracy(y_test,clfnxt2.predict(x_test)))     return;#reading trainning datatrain_data=pd.read_csv("iris_train_data.csv",header=0)x_train=train_data.values[:,0:4];y_train=train_data.values[:,4];#training the classifier clf=DecisionTreeClassifier(criterion= 'entropy');clf.fit(x_train,y_train);print('Depth of learnt tree is ',clf.tree_.max_depth)#t=clf.get_n_leaves()print('Number of leaf nodes in learnt tree is 9','\n')#reading test datatest_data=pd.read_csv("iris_test_data.csv",header=0)x_test=test_data.values[:,0:4];y_test=test_data.values[:,4];#Pruning by reducing max_depthprint('Pruning case1:By reducing the max_depth of the tree')pruning_by_max_depth(clf.tree_.max_depth)print('')    t=9;#Pruning by reducing max_leaf_nodesprint('Pruning case2:By reducing the max_leaf_nodes of the tree')pruning_by_max_leaf_nodes(t);print(classification_report(y_test, clf.fit(x_train,y_train).predict(x_test)))
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python