逻辑回归（Logistic Regression）

前言

dataset
Out[12]: 
      User ID  Gender  Age  EstimatedSalary  Purchased0    15624510    Male   19            19000          01    15810944    Male   35            20000          02    15668575  Female   26            43000          03    15603246  Female   27            57000          04    15804002    Male   19            76000          05    15728773    Male   27            58000          06    15598044  Female   27            84000          07    15694829  Female   32           150000          18    15600575    Male   25            33000          0//...

这是一张从社交网络中获得的用户年龄、性别和工资水平和是否购买了某公司的SUV的数据集，我们假设前三个变量和第四个变量之间存在线性关系，以此建立模型来进行预测。

Step 1: 数据预处理

（1）导入库

import numpy as npimport matplotlib.pyplot as pltimport pandas as pd

ps：我在导入matplotlib.pyplot时提示缺少Python-tk模块（我已经转移到Linux上进行试验了，因为平常工作用Linux比较多，推荐比较稳定好用的Linux版本是Mint，看个人喜好了。），所以使用命令安装该模块
sudo apt-get install python-tk

（2）导入数据集

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

（3）划分训练集和测试集

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

（4）特征缩放

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step2：逻辑回归模型

（1）将逻辑回归应用于训练集

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)

Step3：预测

（1）预测测试集结果

y_pred = classifier.predict(X_test)

Step4：评估预测

（1）生成混淆矩阵

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

（2）可视化

使用matplotlib库进行数据可视化。

from matplotlib.colors import ListedColormap
X_set,y_set=X_train,y_train
X1,X2=np. meshgrid(np. arange(start=X_set[:,0].min()-1, stop=X_set[:, 0].max()+1, step=0.01),
                   np. arange(start=X_set[:,1].min()-1, stop=X_set[:,1].max()+1, step=0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(),X1.max())
plt.ylim(X2.min(),X2.max())for i,j in enumerate(np. unique(y_set)):
    plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1],
                c = ListedColormap(('red', 'green'))(i), label=j)

plt. title(' LOGISTIC(Training set)')
plt. xlabel(' Age')
plt. ylabel(' Estimated Salary')
plt. legend()
plt. show()

训练集

X_set,y_set=X_test,y_test
X1,X2=np. meshgrid(np. arange(start=X_set[:,0].min()-1, stop=X_set[:, 0].max()+1, step=0.01),
                   np. arange(start=X_set[:,1].min()-1, stop=X_set[:,1].max()+1, step=0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(),X1.max())
plt.ylim(X2.min(),X2.max())for i,j in enumerate(np. unique(y_set)):
    plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1],
                c = ListedColormap(('red', 'green'))(i), label=j)

plt. title(' LOGISTIC(Test set)')
plt. xlabel(' Age')
plt. ylabel(' Estimated Salary')
plt. legend()
plt. show()

测试集

作者：JustMe23
链接：https://www.jianshu.com/p/2b3eee881346

【6%】100小时机器学习——逻辑回归