慕桂英546537
agg在您的情况下,将标记一列作为源,您可以在之前创建另一列groupbydf['New'] = np.where(df['is_main_video'], df['file_size'], 0)summary_df = df.groupby(['provider', 'id']).agg( title =('title', 'first'), file_size = ('New', 'sum')).reset_index()更新summary_df = df.assign(New = np.where(df['is_main_video'], df['file_size'], 0)).groupby(['provider', 'id']).agg( title =('title', 'first'), file_size = ('New', 'sum')).reset_index()
猛跑小猪
您可以Series.where暂时“忽略”您的 file_sizes,其中“is_main_video”为 False,然后执行 groupby 操作来对剩余内容进行求和:import pandas as pddf = pd.DataFrame({ "provider": ["A", "A", "A", "B", "B"], "title": ["hello", "world", "pandas", "example", "here"], "is_main_video": [True, False, True, True, False], "file_size": [10, 12, 20, 19, 10]})print(df) provider title is_main_video file_size0 A hello True 101 A world False 122 A pandas True 203 B example True 194 B here False 10aggregated_df = (df.assign(file_size=df["file_size"].where(df["is_main_video"])) .groupby("provider", as_index=False) .agg( title=("title", "first"), file_size=("file_size", "sum")) )print(aggregated_df) provider title file_size0 A hello 30.01 B example 19.0