斯蒂芬大帝
让我们创建一个试验数据集来解决您的问题:import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt = np.linspace(0, 30*2*np.pi, 30*24*2)td = pd.date_range("2020-01-01", freq='30T', periods=t.size)T0 = np.sin(t)*8 - 15 + np.random.randn(t.size)*0.2T1 = np.sin(t)*7 - 13 + np.random.randn(t.size)*0.1T2 = np.sin(t)*9 - 10 + np.random.randn(t.size)*0.3T3 = np.sin(t)*8.5 - 11 + np.random.randn(t.size)*0.5T = np.vstack([T0, T1, T2, T3]).Tfeatures = pd.DataFrame(T, columns=["s1", "s2", "s3", "s4"], index=td)看起来像:axe = features[:"2020-01-04"].plot()axe.legend()axe.grid()然后,如果您的时间序列线性相关良好,您可以简单地通过普通最小二乘回归的平均值来预测缺失值。SciKit-Learn 提供了一个方便的接口来执行此类计算:from sklearn import linear_modelfrom sklearn.model_selection import train_test_split# Remove target site from features:target = features.pop("s4")# Split dataset into train (actual data) and test (missing temperatures):x_train, x_test, y_train, y_test = train_test_split(features, target, train_size=0.25, random_state=123)# Create a Linear Regressor and train it:reg = linear_model.LinearRegression()reg.fit(x_train, y_train)# Assess regression score with test data:reg.score(x_test, y_test) # 0.9926150729585087# Predict missing values:ypred = reg.predict(x_test)ypred = pd.DataFrame(ypred, index=x_test.index, columns=["s4p"])结果如下:axe = features[:"2020-01-04"].plot()target[:"2020-01-04"].plot(ax=axe)ypred[:"2020-01-04"].plot(ax=axe, linestyle='None', marker='.')axe.legend()axe.grid()error = (y_test - ypred.squeeze())axe = error.plot()axe.legend(["Prediction Error"])axe.grid()