线性回归的负精度

Sckit-learn 的 LinearRegression 分数使用 𝑅2 分数。负 𝑅2 意味着该模型与您的数据拟合得非常糟糕。由于 𝑅2 将模型的拟合度与原假设（水平直线）的拟合度进行比较，因此当模型拟合度比水平线差时，𝑅2 为负。𝑅2 = 1 - (SUM((y - ypred)**2) / SUM((y - AVG(y))**2))因此，如果 SUM((y - ypred)**2大于SUM((y - AVG(y))**2，则 𝑅2 将为负数。原因及纠正方法问题 1：您正在执行时间序列数据的随机分割。随机分割将忽略时间维度。解决方案：保留时间流（参见下面的代码）问题2：目标值太大。解决方案：除非我们使用基于树的模型，否则您将必须进行一些目标特征工程，以将数据缩放到模型可以学习的范围内。这是一个代码示例。使用 LinearRegression 的默认参数和log|exp目标值的转换，我的尝试产生了约 87% 的 R2 分数：import pandas as pdimport numpy as np# we need to transform/feature engineer our target# I will use log from numpy. The np.log and np.exp to make the value learnablefrom sklearn.linear_model import LinearRegressionfrom sklearn.compose import TransformedTargetRegressor# your data, df# transform year to referencedf = df.assign(ref_year = lambda x: x.year - 1960)df.population = df.population.astype(int)split = int(df.shape[0] *.9) #split at 90%, 10%-ishdf = df[['ref_year', 'population']]train_df = df.iloc[:split]test_df = df.iloc[split:]X_train = train_df[['ref_year']]y_train = train_df.populationX_test = test_df[['ref_year']]y_test = test_df.population# regressorregressor = LinearRegression()lr = TransformedTargetRegressor(        regressor=regressor,         func=np.log, inverse_func=np.exp)lr.fit(X_train,y_train)print(lr.score(X_test,y_test))对于那些有兴趣让它变得更好的人，这里有一种读取该数据集的方法import pandas as pdimport iodf = pd.read_csv(io.StringIO('''year,population1960,22151278.0 1961,22671191.0 1962,23221389.0 1963,23798430.0 1964,24397022.0 1965,25013626.0 1966,25641044.0 1967,26280132.0 1968,26944390.0 1969,27652709.0 1970,28415077.0 1971,29248643.0 1972,30140804.0 1973,31036662.0 1974,31861352.0 1975,32566854.0 1976,33128149.0 1977,33577242.0 1978,33993301.0 1979,34487799.0 1980,35141712.0 1981,35984528.0 1982,36995248.0 1983,38142674.0 1984,39374348.0 1985,40652141.0 1986,41965693.0 1987,43329231.0 1988,44757203.0 1989,46272299.0 1990,47887865.0 1991,49609969.0 1992,51423585.0 1993,53295566.0 1994,55180998.01995,57047908.0 1996,58883530.0 1997,60697443.0 1998,62507724.0 1999,64343013.0 2000,66224804.0 2001,68159423.0 2002,70142091.0 2003,72170584.0 2004,74239505.02005,76346311.02006,78489206.0 2007,80674348.0 2008,82916235.0 2009,85233913.0 2010,87639964.0 2011,90139927.0 2012,92726971.0 2013,95385785.0 2014,98094253.0 2015,100835458.0 2016,103603501.0 2017,106400024.0 2018,109224559.0'''))结果：

线性回归的负精度

2回答