【机器学习】线性回归——多变量向量化梯度下降算法实现（Python版）-原创手记-慕课网

【向量化】

单一变量的线性回归函数，我们将其假设为： $h_\theta(\chi)= \theta_0+\theta_1\chi$ 但是如果我们的变量个数不止一个的话，那么我们的假设函数就应该是如下的形式：其中n-1为数据集中特征属性的个数
$h_\theta(\chi)=\theta_0+\sum_{i=1}^{n-1}\theta_i*\chi_i$
为了结构的统一，我们引入 $x_0=1$ ,则上式转化为如下的形式：
$h_\theta(\chi)=\sum_{i=1}^{n}\theta_i*\chi_i$
进而对其进行向量化，上式可以转换为：
$h_\theta(\chi)=\sum_{i=1}^{n}\theta_i*\chi_i= theta^T\chi$
其中 $\chi=[\chi_0,\chi_1,\cdots,\chi_n]^T$ 的列向量， $\theta=[ \theta_0,\theta_1,\cdots,\theta_n]^T$ ,再次强调 $\chi$ 中的 $\chi_0$ 是一个始终为1的属性。
对于正则化的梯度下降公式的推导如下：
$J(\theta)=\frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^{2}$
对 $J(\theta)$ 进行求偏导：
$\frac{\partial j(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
其中对于矩阵求导，请自行查阅矩阵求导变换公式
进一步，那么梯度下降迭代公式如下所示：
$\theta_j :=\theta_j-\alpha \frac{1}{m}\sum_{i=1}^{m}(h_ \theta(x^{(i)})-y^{(i)})x_j^{(i)}$

【向量化的优点】

向量化相对于for循环而言，能一次性计算整个数据集，效率有明显的提升，并且Python内部对矩阵运算也进行了优化，能够充分利用计算机并行运算的能力。当然同时也有缺点，就是相对于for循环而言，理解起来更复杂。

【相关知识点——特征缩放】

特征缩放，能够有效的提高梯度下降的速率，减少迭代次数，使梯度下降算法更快的收敛。如果一些特征的取值范围较大，另外一些特征取值相对较小，那么绘制出的等高线图，便会便显出长扁的外形特征，如下图（来源吴恩达讲义）：图片描述
那么梯度下降迭代就会表现出弯弯曲曲迭代的特性（图中红色轨迹），而对于特征范围接近的数据集，其等高线图如下所示：（来源吴恩达讲义）

很明显等高线越圆，迭代速度越快。
通常对于特征范围较大的变量，我们的解决办法是：尝试将所有特征的尺度都尽量缩放到-1 到 1 之间，通常我们采用以下方法对特征进行缩放：
$chi_n=dfrac{chi_n-mu_n}{s_n}$
其中 $mu_n$ 是平均值， $s_n$ 是标准差。

具体代码实现如下：

#多变量梯度下降算法的实现，数据集采用吴恩达机器学习教程“ex1data2.txt”
#对于多变量线性回归梯度下降算法的实现，这里采用向量化的方式去进行

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


def readData(path,name=[]):
    data = pd.read_csv(path,names=name) 
    data = (data - data.mean()) / data.std()
    data.insert(0,'First',1)
    return data


def costFunction(X,Y,theta):
    inner = np.power(((X * theta.T) - Y.T), 2)
    return np.sum(inner) / (2 * len(X))

def gradientDescent(data,theta,alpha,iterations):
    eachIterationValue = np.zeros((iterations,1))
    temp =np.matrix(np.zeros(theta.shape))
    X = np.matrix(data.iloc[:,0:-1].values)
    print(X)
    Y =np.matrix(data.iloc[:,-1].values)
    m = X.shape[0]
    colNum=X.shape[1]
    for i in range(iterations):
        error = (X * theta.T)-Y.T
        for j in range(colNum):
            term =np.multiply(error,X[:,j])
            temp[0,j] =theta[0,j]-((alpha/m) * np.sum(term))
        theta =temp
        eachIterationValue[i,0]=costFunction(X,Y,theta)
    return theta,eachIterationValue   

if __name__ == "__main__":
    data = readData('ex1data2.txt',['Size', 'Bedrooms', 'Price'])
    #data = (data - data.mean()) / data.std()
    theta =np.matrix(np.array([0,0,0]))
    
    iterations=1500
    alpha =0.01
    
    theta,eachIterationValue=gradientDescent(data,theta,alpha,iterations)
    
    print(theta)
    
    plt.plot(np.arange(iterations),eachIterationValue)
    plt.title('CostFunction')
    plt.show()

运行结果如下图：
图片描述