使用pandas GroupBy.agg()对同一列进行多次聚合

给定以下(完全过大的)数据帧示例


import pandas as pd

import datetime as dt

df = pd.DataFrame({

         "date"    :  [dt.date(2012, x, 1) for x in range(1, 11)], 

         "returns" :  0.05 * np.random.randn(10), 

         "dummy"   :  np.repeat(1, 10)

})

是否有一种现有的内置方法将两个不同的聚合函数应用于同一列,而无需agg多次调用?


语法上错误但直观上正确的方法是:


# Assume `function1` and `function2` are defined for aggregating.

df.groupby("dummy").agg({"returns":function1, "returns":function2})

显然,Python不允许重复的键。还有其他表达方式agg吗?也许元组列表[(column, function)]可以更好地工作,以允许将多个函数应用于同一列?但似乎它只接受字典。


除了定义仅在其中应用两个功能的辅助功能之外,是否还有其他解决方法?(无论如何,这如何与聚合一起使用?)


holdtom
浏览 5544回答 3
3回答

慕森王

大熊猫> = 0.25:命名汇总熊猫已经改变了行为,GroupBy.agg转而使用更直观的语法来指定命名聚合。请参阅0.25文档部分中的增强功能以及相关的GitHub问题GH18366和GH26512。从文档中为了通过控制输出列名来支持特定于列的聚合,pandas接受特殊的语法GroupBy.agg(),称为“命名聚合”,其中关键字是输出列名称值是元组,其第一个元素是要选择的列,第二个元素是要应用于该列的聚合。Pandas为pandas.NamedAgg namedtuple提供了字段['column','aggfunc'],以使参数更清晰。通常,聚合可以是可调用的或字符串别名。您现在可以通过关键字参数传递一个元组。元组遵循的格式(<colName>, <aggFunc>)。import pandas as pdpd.__version__&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;# '0.25.0.dev0+840.g989f912ee'# Setupdf = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'height': [9.1, 6.0, 9.5, 34.0],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'weight': [7.9, 7.5, 9.9, 198.0]})df.groupby('kind').agg(&nbsp; &nbsp; max_height=('height', 'max'), min_weight=('weight', 'min'),)&nbsp; &nbsp; &nbsp; max_height&nbsp; min_weightkind&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;cat&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 9.5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7.9dog&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;34.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7.5另外,您可以使用pd.NamedAgg(本质上是namedtuple)使事情更明确。df.groupby('kind').agg(&nbsp; &nbsp; max_height=pd.NamedAgg(column='height', aggfunc='max'),&nbsp;&nbsp; &nbsp; min_weight=pd.NamedAgg(column='weight', aggfunc='min'))&nbsp; &nbsp; &nbsp; max_height&nbsp; min_weightkind&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;cat&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 9.5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7.9dog&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;34.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7.5对于Series来说甚至更简单,只需将aggfunc传递给关键字arguments.t即可。df.groupby('kind')['height'].agg(max_height='max', min_height='min')&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; max_height&nbsp; min_heightkind&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;cat&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 9.5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;9.1dog&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;34.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;6.0&nbsp; &nbsp; &nbsp; &nbsp;最后,如果您的列名不是有效的python标识符,请使用带有解包功能的字典:df.groupby('kind')['height'].agg(**{'max height': 'max', ...})熊猫<0.25在最新版本的熊猫(最高可达0.24)中,如果使用字典为聚合输出指定列名,则会得到FutureWarning:df.groupby('dummy').agg({'returns': {'Mean': 'mean', 'Sum': 'sum'}})# FutureWarning: using a dict with renaming is deprecated and will be removed&nbsp;# in a future versionv0.20中不建议使用字典重命名列。在较新版本的熊猫上,可以通过传递元组列表来更简单地指定它。如果以这种方式指定函数,则该列的所有函数都必须指定为(名称,函数)对的元组。df.groupby("dummy").agg({'returns': [('op1', 'sum'), ('op2', 'mean')]})&nbsp; &nbsp; &nbsp; &nbsp; returns&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; op1&nbsp; &nbsp; &nbsp; &nbsp;op2dummy&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;1&nbsp; &nbsp; &nbsp; 0.328953&nbsp; 0.032895要么,df.groupby("dummy")['returns'].agg([('op1', 'sum'), ('op2', 'mean')])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; op1&nbsp; &nbsp; &nbsp; &nbsp;op2dummy&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;1&nbsp; &nbsp; &nbsp; 0.328953&nbsp; 0.032895

慕运维8079593

这样的事情会做:In [7]: df.groupby('dummy').returns.agg({'func1' : lambda x: x.sum(), 'func2' : lambda x: x.prod()})Out[7]:&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; func2&nbsp; &nbsp; &nbsp;func1dummy&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;1&nbsp; &nbsp; &nbsp;-4.263768e-16 -0.188565
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python