在 agg 函数中聚合具有一个属性的多列

假设我有一个data_stores类似于以下内容的Pandas dataFrame ( ):


store| item1 | item2 | item3

------------------------------

1    | 45    | 50    | 53  

1    | 200   | 300   | 250

2    | 20    | 17    | 21  

2    | 300   | 350   | 400

比方说,我想在列聚集item1与mean和列item2,并item3用sum。


这通常可以通过以下方式完成:


data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', 'item2': 'sum', 'item3': 'sum' })

但是,这不能通过以下方式(更有效地)完成:


 data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', ['item2', 'item3']: 'sum' })

既不是以下对字典键更有意义的方式:


 data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'mean': 'item1':, 'sum': ['item2', 'item3']})

有没有办法在数据帧的某些列上使用相同的函数进行聚合,而无需在agg函数中为每个列编写新的字典属性?


ABOUTYOU
浏览 168回答 1
1回答

慕慕森

这是不可能的,只有你可以用函数的键和列名的列表定义字典,然后在循环中用值交换键:data_stores = pd.DataFrame({'store': [1, 1, 2, 2],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'item1': [45, 200, 20, 300],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'item2': [50, 300, 17, 350],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'item3': [53, 250, 21, 400]})print (data_stores)&nbsp; &nbsp;store&nbsp; item1&nbsp; item2&nbsp; item30&nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;45&nbsp; &nbsp; &nbsp;50&nbsp; &nbsp; &nbsp;531&nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; 200&nbsp; &nbsp; 300&nbsp; &nbsp; 2502&nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;20&nbsp; &nbsp; &nbsp;17&nbsp; &nbsp; &nbsp;213&nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; 300&nbsp; &nbsp; 350&nbsp; &nbsp; 400d = {'mean':'item1', 'sum' : ['item2', 'item3']}out = {}for k, v in d.items():&nbsp; &nbsp; if isinstance(v, list):&nbsp; &nbsp; &nbsp; &nbsp; for x in v:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; out[x] = k&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; out[v] = kprint (out){'item1': 'mean', 'item2': 'sum', 'item3': 'sum'}data_stores_total = data_stores.groupby('store', as_index=False).agg(out)print (data_stores_total)&nbsp; &nbsp;store&nbsp; item1&nbsp; item2&nbsp; item30&nbsp; &nbsp; &nbsp; 1&nbsp; 122.5&nbsp; &nbsp; 350&nbsp; &nbsp; 3031&nbsp; &nbsp; &nbsp; 2&nbsp; 160.0&nbsp; &nbsp; 367&nbsp; &nbsp; 421或者:d = {'mean':['item1'], 'sum' : ['item2', 'item3']}d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}print (d1){'item1': 'mean', 'item2': 'sum', 'item3': 'sum'}data_stores_total = data_stores.groupby('store', as_index=False).agg(d1)print (data_stores_total)&nbsp; &nbsp;store&nbsp; item1&nbsp; item2&nbsp; item30&nbsp; &nbsp; &nbsp; 1&nbsp; 122.5&nbsp; &nbsp; 350&nbsp; &nbsp; 3031&nbsp; &nbsp; &nbsp; 2&nbsp; 160.0&nbsp; &nbsp; 367&nbsp; &nbsp; 421编辑:如果想通过相同的聚合函数聚合所有列而没有几个列,您可以通过所有列创建字典,difference并按列表过滤,然后添加缺失的对键:列值:聚合函数:out = dict.fromkeys(data_stores.columns.difference(['store','item1']), 'sum')out['item1'] = 'mean'print (out){'item2': 'sum', 'item3': 'sum', 'item1': 'mean'}data_stores_total = data_stores.groupby('store', as_index=False).agg(out)print (data_stores_total)&nbsp; &nbsp;store&nbsp; item2&nbsp; item3&nbsp; item10&nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; 350&nbsp; &nbsp; 303&nbsp; 122.51&nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; 367&nbsp; &nbsp; 421&nbsp; 160.0您还可以传递使用此列的自定义函数:def func(x):&nbsp; &nbsp; return x.sum() / x.mean()out = dict.fromkeys(data_stores.columns.difference(['store','item1']), 'sum')out['item1'] = funcprint (out){'item2': 'sum', 'item3': 'sum', 'item1': <function func at 0x000000000F3950D0>}data_stores_total = data_stores.groupby('store', as_index=False).agg(out)print (data_stores_total)&nbsp; &nbsp;store&nbsp; item2&nbsp; item3&nbsp; item10&nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; 350&nbsp; &nbsp; 303&nbsp; &nbsp; &nbsp; 21&nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; 367&nbsp; &nbsp; 421&nbsp; &nbsp; &nbsp; 2
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python