.笔记.

模型评估2
数据分离
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
数据分离:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.4)
定义一个k的范围:k_range = list(range(1,26))
定义变量存储多个数据:score_train = []
将单个数据存储在范围变量中:score_train.append(accuracy_score(y_train, y_train_pred))
定义一个循环:for k in k_range:
以线的形式绘制一个对比图:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(k_range,score_test)
plt.xlabel('K(Knn model)')
plt.ylabel('test_accuracy')
当k=1的时候,模型是最复杂的
#在一定范围内寻求最优解
k_range = list(range(1,26))
#print(k_range)
scores_train = []
scores_test = []
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train,y_train)
y_train_pred = knn.predict(X_train)
y_test_pred = knn.predict(X_test)
scores_train.append(accuracy_score(y_train,y_train_pred))
scores_test.append(accuracy_score(y_test,y_test_pred))
for k in k_range:
print(k,scores_train[k-1])
数据分离
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.5)
import matplotlib.pyplot as plt
#在该界面展示
%matplotlib inline
plt.plot(k_range,score_train)
plt.xlabe('K(KNN mode)')
plt.ylable('Training Accuracy')
K越小复杂度越高,所以K=1时训练集的准确率是1