计算 OLS 回归的 yhat

我已经实现了一种方法来计算 Python 中 OLS 回归的 beta。现在，我想使用 R^2 为我的模型评分。对于我的作业，我不允许使用 Python 包来执行此操作，因此必须从头开始实现一个方法。

#load the data

import numpy as np

import pandas as pd

from numpy.linalg import inv

from sklearn.datasets import load_boston

boston = load_boston()

# Set the X and y variables.

X = boston.data

y = boston.target

#append ones to my X matrix.

int = np.ones(shape=y.shape)[..., None]

X = np.concatenate((int, X), 1)

#compute betas.

betas = inv(X.transpose().dot(X)).dot(X.transpose()).dot(y)

# extract the feature names of the boston data set and prepend the

#intercept

names = np.insert(boston.feature_names, 0, 'INT')

# collect results into a DataFrame for pretty printing

results = pd.DataFrame({'coeffs':betas}, index=names)

#print the results

print(results)

coeffs

INT 36.491103

CRIM -0.107171

ZN 0.046395

INDUS 0.020860

CHAS 2.688561

NOX -17.795759

RM 3.804752

AGE 0.000751

DIS -1.475759

RAD 0.305655

TAX -0.012329

PTRATIO -0.953464

B 0.009393

LSTAT -0.525467

现在，我想实现一个 R^2 来在这个数据（或任何其他数据）上对我的模型进行评分。（见这里：https : //en.wikipedia.org/wiki/Coefficient_of_determination）

我的问题是我不完全确定如何计算分子 SSE。在代码中它看起来像这样：

#numerator

sse = sum((Y - yhat ** 2)

其中 Y 是波士顿房价，yhat 是这些房子的预测价格。但是，我如何计算术语，yhat？

慕村9548890

浏览 272回答 1