在 Pandas 中按多列填充缺失的年份 groupy 并按顺序水平显示多列

DataFrame.swaplevel与一起使用DataFrame.sort_index，还添加了另一个解决方案reindex：rng = pd.date_range('2015', '2017', freq='YS').yearc = df['city'].unique()d = df['district'].unique()mux = pd.MultiIndex.from_product([c, d, rng], names=['city','district','year'])df = df.set_index(['city','district','year']).reindex(mux)df['pct'] = df.sort_values('year').groupby(['city', 'district']).value.pct_change()df = df.pivot_table(columns='year',                     index=['city','district'],                    values=['value', 'pct'],                    fill_value='NaN')df = df.swaplevel(0,1, axis=1).sort_index(axis=1, level=0)print (df)year          2015       2016        2017                     pct value  pct value   pct valuecity district                                  bj   c         NaN   4.0  0.0   NaN -0.25     3sh   a         NaN   2.0  0.5     3  0.00   NaN     b         NaN   5.0 -0.4     3  0.00   NaN编辑：错误：ValueError：无法处理非唯一的多索引！表示每个传递给 groupby 的列都有重复项，所以这里是 by ['city','district','year']。解决方案是创建唯一值 - 例如通过聚合平均值：print (df)#  city district  value  year#0   sh        a      2  2015#0   sh        a     20  2015#1   sh        a      3  2016#2   sh        b      5  2015#3   sh        b      3  2016#4   bj        c      4  2015#5   bj        c      3  2017rng = pd.date_range('2015', '2017', freq='YS').yearc = df['city'].unique()d = df['district'].unique()mux = pd.MultiIndex.from_product([c, d, rng], names=['city','district','year'])print (df.groupby(['city','district','year'])['value'].mean())city  district  yearbj    c         2015     4                2017     3sh    a         2015    11                2016     3      b         2015     5                2016     3Name: value, dtype: int64df = df.groupby(['city','district','year'])['value'].mean().reindex(mux)print (df)#city  district  year#sh    a         2015    11.0#                2016     3.0#                2017     NaN#      b         2015     5.0#                2016     3.0#                2017     NaN#      c         2015     NaN#                2016     NaN#                2017     NaN#bj    a         2015     NaN#                2016     NaN#                2017     NaN#      b         2015     NaN#                2016     NaN#                2017     NaN#      c         2015     4.0#                2016     NaN#                2017     3.0#Name: value, dtype: float64

在 Pandas 中按多列填充缺失的年份 groupy 并按顺序水平显示多列

1回答