将数据放入 DataFrame 会在 SciKit 学习算法中给出不同的结果

首页课程实战体系课手记专栏慕课教程

将数据放入 DataFrame 会在 SciKit 学习算法中给出不同的结果

我刚刚注意到 Sci-Kit Learn 的线性回归算法将一些不同的结果加载到 Pandas 数据帧中，而不是仅在原始状态下使用它们。

我不明白为什么会这样。

考虑以下线性回归示例：

from sklearn.datasets import load_boston

from sklearn.linear_model import LinearRegression

boston = load_boston()

X1 = pd.DataFrame(boston.data)

X1.columns = boston.feature_names

X2 = boston.data

y2 = boston.target

y1 = boston.target

lreg = LinearRegression()

X1 = (X1 - X1.mean()) / X1.std()

X2 = (X2 - X2.mean()) / X2.std()

结果模型给出了相同的 R^2 值和预测值，但系数和截距的结果却大不相同。

展示：

intcpt1 = lreg.fit(X1, y1).intercept_

intcpt2 = lreg.fit(X2, y2).intercept_

f"Intercept for model with dataframe: {intcpt1}, model with numpy array: {intcpt2}"

给出：

'Intercept for model with dataframe: 22.53280632411069, model with numpay array: -941.8009906279219'

同样，系数也大不相同：

coef1 = lreg.fit(X1, y1).coef_[:3]

coef2 = lreg.fit(X2, y2).coef_[:3]

f"First the coeffs for model with dataframe: {coef1}, modely with numpy array: {coef2}"

这使：

'First the coeffs for model with dataframe: [-0.92906457 1.08263896 0.14103943], modely with numpy array: [-15.67844685 6.73818665 2.98419849]'

但是得分和预测是一样的：

score1 = lreg.fit(X1, y1).score(X1, y1)

score2 = lreg.fit(X2, y2).score(X2, y2)

f"Score for model with dataframe: {score1}, model with numpy array: {score2}"

产量：

'Score for model with dataframe: 0.7406426641094094, model with numpy array: 0.7406426641094073'

同样对于系数：

pred1 = lreg.fit(X1, y1).predict(X1)[:3]

pred2 = lreg.fit(X2, y2).predict(X2)[:3]

f"First 3 predictions with dataframe: {pred1}, with numpy array: {pred2}"

提供：

'First 3 predictions with dataframe: [30.00384338 25.02556238 30.56759672], with numpy array: [30.00384338 25.02556238 30.56759672]'

哔哔one

浏览 163回答 1

1回答

动漫人物

这是因为你的转变：X1 = (X1 - X1.mean()) / X1.std() X2 = (X2 - X2.mean()) / X2.std()Pandas 将计算沿列的均值和标准差。要为 numpy 执行此操作，请将轴参数添加到mean和std：X2 = (X2 - X2.mean(axis=0)) / X2.std(axis=0)

0 0

随时随地看视频慕课网APP