为什么线性回归的预测值与真实值完全相同？

首页课程实战体系课手记专栏慕课教程

为什么线性回归的预测值与真实值完全相同？

我正在做回归LinearRegression并得到均方误差0。我认为应该有一些偏差（至少很小）。您能解释一下这个现象吗？

## Import packages

import numpy as np

import pandas as pd

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

import urllib.request

## Import dataset

urllib.request.urlretrieve('https://raw.githubusercontent.com/Data-Science-FMI/ml-from-scratch-2019/master/data/house_prices_train.csv',

'house_prices_train.csv')

df_train = pd.read_csv('house_prices_train.csv')

x = df_train['GrLivArea'].values.reshape(1, -1)

y = df_train['SalePrice'].values.reshape(1, -1)

print('The explanatory variable is', x)

print('The variable to be predicted is', y)

## Regression

reg = LinearRegression().fit(x, y)

mean_squared_error(y, reg.predict(x))

print('The MSE is', mean_squared_error(y, reg.predict(x)))

print('Predicted value is', reg.predict(x))

print('True value is', y)

结果是

The explanatory variable is [[1710 1262 1786 ... 2340 1078 1256]]

The variable to be predicted is [[208500 181500 223500 ... 266500 142125 147500]]

The MSE is 0.0

Predicted value is [[208500. 181500. 223500. ... 266500. 142125. 147500.]]

True value is [[208500 181500 223500 ... 266500 142125 147500]]

白猪掌柜的

浏览 237回答 1

1回答

一只甜甜圈

虽然模型在其自身训练集上的得分会被夸大的评论肯定是正确的，但它不太可能与线性回归完美契合，尤其是只有一个特征。您的问题是您错误地重塑了数据：reshape(1, -1)创建了一个 shape 数组(1, n)，因此您的模型认为它具有仅单个样本的n特征和n输出，因此具有完美拟合的多元线性回归。尝试使用reshape(-1, 1)forx而不是重塑 for y。

0 0

随时随地看视频慕课网APP