如何有效地比较所有模型的准确性

已经对上一个答案投了赞成票，我继续证明该错误确实是由于您score.append()在for循环之外：我们实际上不需要拟合任何模型；我们可以通过对您的代码进行以下修改来模拟这种情况，这不会改变问题的本质：import numpy as npimport pandas as pdmodels = ['ran', 'knn', 'log', 'xgb', 'gbc', 'svc', 'ext', 'ada', 'gnb', 'gpc', 'bag']         scores = []cv=10# Sequentially fit and cross validate all modelsfor mod in models:    acc = np.array([np.random.rand() for i in range(cv)]) # simulate your accuracy herescores.append(acc.mean()) # as in your code, i.e outside the for loop# Create a dataframe of resultsresults = pd.DataFrame({    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',      'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],    'Score': scores})不出所料，这基本上复制了您的错误：ValueError: arrays must all be same length因为，正如在另一个答案中已经讨论过的，您的scores列表只有一个元素，即acc.mean()仅来自循环的最后一次迭代：len(scores)# 1scores# [0.47317491043203785]因此大熊猫抱怨，因为它无法填充 11 行数据框......正如其他答案中已经建议的那样，scores.append()在for循环内移动可以解决问题：for mod in models:    acc = np.array([np.random.rand() for i in range(cv)])    scores.append(acc.mean()) # moved inside the loop# Create a dataframe of resultsresults = pd.DataFrame({    'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',      'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],    'Score': scores})print(results)# output:                   Model     Score0          Random Forest  0.4923641    K Nearest Neighbour  0.6240682    Logistic Regression  0.6136533                XGBoost  0.5364884      Gradient Boosting  0.4841955                    SVC  0.3815566            Extra Trees  0.2749227               AdaBoost  0.5092978   Gaussian Naive Bayes  0.3628669       Gaussian Process  0.60653810    Bagging Classifier  0.393950您可能还想记住，您不需要model.fit()代码中的部分 -cross_val_score所有必要的拟合本身...

如何有效地比较所有模型的准确性

2回答