我正在做回归LinearRegression并得到均方误差0。我认为应该有一些偏差(至少很小)。您能解释一下这个现象吗?
## Import packages
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import urllib.request
## Import dataset
urllib.request.urlretrieve('https://raw.githubusercontent.com/Data-Science-FMI/ml-from-scratch-2019/master/data/house_prices_train.csv',
'house_prices_train.csv')
df_train = pd.read_csv('house_prices_train.csv')
x = df_train['GrLivArea'].values.reshape(1, -1)
y = df_train['SalePrice'].values.reshape(1, -1)
print('The explanatory variable is', x)
print('The variable to be predicted is', y)
## Regression
reg = LinearRegression().fit(x, y)
mean_squared_error(y, reg.predict(x))
print('The MSE is', mean_squared_error(y, reg.predict(x)))
print('Predicted value is', reg.predict(x))
print('True value is', y)
结果是
The explanatory variable is [[1710 1262 1786 ... 2340 1078 1256]]
The variable to be predicted is [[208500 181500 223500 ... 266500 142125 147500]]
The MSE is 0.0
Predicted value is [[208500. 181500. 223500. ... 266500. 142125. 147500.]]
True value is [[208500 181500 223500 ... 266500 142125 147500]]
一只甜甜圈
相关分类