Python groupby 嵌套字典在聚合中存在歧义

我目前正在研究我的论文,并且在我想做的 groupby 函数中面临一些问题。我想找出某人的总购买量、平均购买量、购买次数、总共购买了多少产品以及每件产品的平均价值。


数据看起来像这样:


    id  purchase_amount price_products  #_products

0   123 30              20.00           2

2   123 NaN             10.00           NaN

3   124 50.00           25.00           3

4   124 NaN             15.00           NaN

5   124 NaN             10.00           NaN

我的代码如下所示:


df.groupby('id')[['purchase_amount','price_products','#_products']].agg(total_purchase_amount=('purchase_amount','sum'),average_purchase_amount=('purchase_amount','mean'),times_purchased=('#_products','count'),total_amount_products_purchased=('price_products','count'),average_value_products=('price_products','mean'))

但我收到以下错误:


SpecificationError:嵌套字典在聚合中不明确


我似乎找不到我做错了什么,希望有人能帮助我!


繁星淼淼
浏览 136回答 3
3回答

素胚勾勒不出你

您可以使用字典以有组织的方式进行聚合。df = pd.DataFrame([[123, 30, 20, 2],                   [123, np.nan, 10, np.nan],                   [124, 50, 25, 3],                   [124, np.nan, 15, np.nan],                   [124, np.nan, 10, np.nan]],                  columns=['id', 'purchase_amount', 'price_products', 'num_products']                  )agg_dict = {    'purchase_amount': [np.sum, np.mean],    'num_products': [np.count_nonzero],    'price_products': [np.count_nonzero, np.mean],}print(df.groupby('id').agg(agg_dict))输出:    purchase_amount        num_products price_products                           sum  mean count_nonzero  count_nonzero       meanid                                                               123            30.0  30.0           2.0              2  15.000000124            50.0  50.0           3.0              3  16.666667

慕的地6264312

由于您有多个变量要聚合,我建议使用以下聚合形式:df.groupby('id')[<variables-list>].agg([<statistics-list>])例如:df_agg = df.groupby('id')[['purchase_amount','price_products','#_products']].agg(["count", "mean", "sum"])这将创建一个列式多级输出数据框,df_agg如下所示:&nbsp; &nbsp; purchase_amount&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;price_products&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #_products&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; count&nbsp; mean&nbsp; &nbsp;sum&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; count mean sum&nbsp; &nbsp; &nbsp; count mean&nbsp; sumid&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;123&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; 30.0&nbsp; 30.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp;15&nbsp; 30&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; 2.0&nbsp; 2.0124&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; 50.0&nbsp; 50.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3&nbsp; &nbsp;17&nbsp; 51&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; 3.0&nbsp; 3.0然后,您可以使用多索引引用输出数据框中的特定条目,如下所示:df_agg['purchase_amount']['mean']id123&nbsp; &nbsp; 30.0124&nbsp; &nbsp; 50.0Name: mean, dtype: float64或者,如果您想要所有方法,请使用横截面方法xs():df_agg.xs('mean', axis=1, level=1)&nbsp; &nbsp; &nbsp;purchase_amount&nbsp; price_products&nbsp; #_productsid&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;123&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;30.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 15&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2.0124&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;50.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 17&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.0注意:据推测,上面的代码会使 Python 计算比需要更多的统计信息,就像您的示例中的情况一样。但这在某些情况下可能不是问题,并且它的优点是代码更短并且可以泛化到要聚合的任何集合和数量的(数字和浮点数)变量。

米琪卡哇伊

对所有计算都这样做df.groupby('id')['purchase_amount'].agg({'total_purchase_amount':'sum'})
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python