SVM 中的随机答案

我正在使用 svm 来查看是否可以获取棒球数据并对击球进行分类并估算本垒打。当我多次运行模型时,我似乎得到了不同的结果,因此,我做了一个模拟,它运行了 100 次模型,但我不明白为什么以及是什么导致了变化。有人可以解释为什么会这样吗?我确实设置了 random_state=42


import pandas as pd

from mlxtend.plotting import plot_decision_regions

import matplotlib.pyplot as plt

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

from sklearn import metrics

import statistics

import numpy as np

result_array = []

players = [488768,  517369, 461314, 477165, 506560, 572114, 641319, 592669, 622534, 605486, 602922, 518466, 572362, 519082, 623182, 595978, 543272]


dfSave = pd.DataFrame(columns=['Mean','Max','Min','Std', 'Accuracy', 'Precision', 'f1_score', 'Recall_Score', 'First_Name', 'Last_Name'])

for i in players:

    batter = i


    df = pd.read_csv('D:baseballData_2016_use.csv')


    df2 = pd.read_csv('D:padres_2016_home.csv')  #Team to test


    dataFilter = df.loc[df['Home_Team'] == 'Orioles'] #Stadium to train model to.


    dataFilter2 = df2.loc[df2['Batter_ID'] == batter] #Players to test in stadium


    j = 0

    while j <= 100:




     predict = dataFilter2.iloc[:,[4,5]]



     X =dataFilter.iloc[:,[4,5]]

     y = dataFilter.iloc[:,3]

     y = y.astype(np.integer)


     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30)




     svclassifier = SVC(C=4, cache_size=200, class_weight= None, coef0=0.0,

    decision_function_shape='ovo', degree=3, gamma=0.001, kernel='rbf',

    max_iter=-1, probability=False, random_state=42, shrinking=False,

    tol=0.001, verbose=False) #defaults


     svclassifier.fit(X_train, y_train)


     y_pred = svclassifier.predict(X_test)




     predicted= svclassifier.predict(predict)



     listDf = []


     sum = 0  # print predicted home runs

     for i in predicted:

      if i == 1:

       sum = sum + 1

     result_array.append(sum)

     print(sum)


慕虎7371278
浏览 185回答 1
1回答

莫回无

在您的代码中,随机性来自train_test_split在每次运行时给出不同的分割。您可以通过修复来避免这种情况,random_state但多次运行它被认为是更好的做法(正如您所做的那样),获取输出分数的分布,计算分数的置信区间并报告。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python