使用python和numpy进行梯度下降

def gradient(X_norm,y,theta,alpha,m,n,num_it):

    temp=np.array(np.zeros_like(theta,float))

    for i in range(0,num_it):

        h=np.dot(X_norm,theta)

        #temp[j]=theta[j]-(alpha/m)*(  np.sum( (h-y)*X_norm[:,j][np.newaxis,:] )  )

        temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))

        temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))

        theta=temp

    return theta




X_norm,mean,std=featureScale(X)

#length of X (number of rows)

m=len(X)

X_norm=np.array([np.ones(m),X_norm])

n,m=np.shape(X_norm)

num_it=1500

alpha=0.01

theta=np.zeros(n,float)[:,np.newaxis]

X_norm=X_norm.transpose()

theta=gradient(X_norm,y,theta,alpha,m,n,num_it)

print theta

我上面的代码中的theta是100.2 100.2,但是应该100.2 61.09在matlab中是正确的。


慕妹3242003
浏览 887回答 3
3回答

catspeake

我认为您的代码有点太复杂了,它需要更多的结构,因为否则您将迷失在所有方程式和运算中。最后,此回归可归结为以下四个操作:计算假设h = X * theta计算损耗= h-y,也许是成本的平方(loss ^ 2)/ 2m计算梯度= X'*损耗/ m更新参数theta = theta-alpha *渐变就您而言,我想您已经m与混淆了n。这里m表示训练集中的示例数量,而不是特征数量。让我们看看我的代码变化:import numpy as npimport random# m denotes the number of examples here, not the number of featuresdef gradientDescent(x, y, theta, alpha, m, numIterations):    xTrans = x.transpose()    for i in range(0, numIterations):        hypothesis = np.dot(x, theta)        loss = hypothesis - y        # avg cost per example (the 2 in 2*m doesn't really matter here.        # But to be consistent with the gradient, I include it)        cost = np.sum(loss ** 2) / (2 * m)        print("Iteration %d | Cost: %f" % (i, cost))        # avg gradient per example        gradient = np.dot(xTrans, loss) / m        # update        theta = theta - alpha * gradient    return thetadef genData(numPoints, bias, variance):    x = np.zeros(shape=(numPoints, 2))    y = np.zeros(shape=numPoints)    # basically a straight line    for i in range(0, numPoints):        # bias feature        x[i][0] = 1        x[i][1] = i        # our target variable        y[i] = (i + bias) + random.uniform(0, 1) * variance    return x, y# gen 100 points with a bias of 25 and 10 variance as a bit of noisex, y = genData(100, 25, 10)m, n = np.shape(x)numIterations= 100000alpha = 0.0005theta = np.ones(n)theta = gradientDescent(x, y, theta, alpha, m, numIterations)print(theta)首先,我创建一个小的随机数据集,其外观应如下所示:线性回归如您所见,我还添加了由excel计算的生成的回归线和公式。您需要注意使用梯度下降的回归直觉。当您完成对数据X的完整批量传递时,需要将每个示例的m损失减少为一次权重更新。在这种情况下,这是所有梯度之和的平均值,因此除以m。接下来需要注意的是跟踪收敛并调整学习率。为此,您应该始终跟踪每次迭代的成本,甚至可能将其绘制出来。如果运行我的示例,返回的theta将如下所示:Iteration 99997 | Cost: 47883.706462Iteration 99998 | Cost: 47883.706462Iteration 99999 | Cost: 47883.706462[ 29.25567368   1.01108458]实际上,这与excel计算的方程非常接近(y = x + 30)。请注意,当我们将偏差传递到第一列时,第一个theta值表示偏差权重。

森林海

我知道这个问题已经回答了,但是我对GD函数做了一些更新:  ### COST FUNCTIONdef cost(theta,X,y):     ### Evaluate half MSE (Mean square error)     m = len(y)     error = np.dot(X,theta) - y     J = np.sum(error ** 2)/(2*m)     return J cost(theta,X,y)def GD(X,y,theta,alpha):    cost_histo = [0]    theta_histo = [0]    # an arbitrary gradient, to pass the initial while() check    delta = [np.repeat(1,len(X))]    # Initial theta    old_cost = cost(theta,X,y)    while (np.max(np.abs(delta)) > 1e-6):        error = np.dot(X,theta) - y        delta = np.dot(np.transpose(X),error)/len(y)        trial_theta = theta - alpha * delta        trial_cost = cost(trial_theta,X,y)        while (trial_cost >= old_cost):            trial_theta = (theta +trial_theta)/2            trial_cost = cost(trial_theta,X,y)            cost_histo = cost_histo + trial_cost            theta_histo = theta_histo +  trial_theta        old_cost = trial_cost        theta = trial_theta    Intercept = theta[0]     Slope = theta[1]      return [Intercept,Slope]res = GD(X,y,theta,alpha)该函数在迭代过程中降低了alpha值,从而使函数收敛速度更快,请参阅R中的示例使用Gradient Descent(Steepest Descent)估计线性回归。我在Python中应用了相同的逻辑。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python