莫回无
对于那些感兴趣的人,我创建了一个更复杂的示例 df 来测试上面提出的每个解决方案的效率。我原来的方法(这里最慢,但如果组很少则效率高):%%timeitdf = pd.DataFrame({"column1": range(600), "column2": range(600), "column3": range(600), "column4": range(600), "column5": range(600), "column6": range(600), "column7": range(600), "column8": range(600), 'group': 5*['l'+str(i) for i in range(120)], 'date':pd.date_range("20190101", periods=600)})### Set the date the samedf.loc[:,'date']=df.loc[0,'date']cols = ['column1','column2','column3','column4','column5','column6','column7','column8']newcols = ['col1','col2','col3','col4','col5','col6','col7','col8']if newcols[0] not in df.columns: df = df.reindex(columns=df.columns.tolist()+newcols)df[newcols]=df.groupby('group').apply(lambda x: x.rolling('2D',on='date')[cols].sum() ).sort_index(level=1).drop('date',axis=1).values# timeit output345 ms ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)大卫埃里克森的解决方案。如果有很多组且每个组中的观察值很少,那么它是有效的。%%timeitdf = pd.DataFrame({"column1": range(600), "column2": range(600), "column3": range(600), "column4": range(600), "column5": range(600), "column6": range(600), "column7": range(600), "column8": range(600), 'group': 5*['l'+str(i) for i in range(120)], 'date':pd.date_range("20190101", periods=600)})### Set the date the samedf.loc[:,'date']=df.loc[0,'date']cols = ['column1','column2','column3','column4','column5','column6','column7','column8']newcols = ['col1','col2','col3','col4','col5','col6','col7','col8']if newcols[0] not in df.columns: df = df.reindex(columns=df.columns.tolist()+newcols)my_dict = {}my_dict["index"] = "max"my_dict.update(dict.fromkeys(cols, "sum"))df[newcols]=df.reset_index().groupby('group').rolling('2D',on='date').agg(my_dict).sort_values('index').drop('index',axis=1).values# timeit output110 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)RichieV 提出的最快的解决方案:%%timeitdf = pd.DataFrame({"column1": range(600), "column2": range(600), "column3": range(600), "column4": range(600), "column5": range(600), "column6": range(600), "column7": range(600), "column8": range(600), 'group': 5*['l'+str(i) for i in range(120)], 'date':pd.date_range("20190101", periods=600)})### Set the date the samedf.loc[:,'date']=df.loc[0,'date']cols = ['column1','column2','column3','column4','column5','column6','column7','column8']newcols = ['col1','col2','col3','col4','col5','col6','col7','col8']if newcols[0] not in df.columns: df = df.reindex(columns=df.columns.tolist()+newcols) df=df.sort_values(['group','date'],kind='mergesort').reset_index(drop=True)df[newcols]=df.groupby('group').rolling('2D',on='date')[cols].sum().valuesdf=df.sort_values('column1',kind='mergesort').reset_index(drop=True)# timeit output40 ms ± 6.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)