如何合并 pandas 中的行以获得无序的列对?

我的数据框中的行如下-


     Team 1          Team 2         Winner       winner_count

0   Afghanistan     Australia     Australia             1

19    Australia   Afghanistan     Australia             1


我想将这两行合并为一行并对 Winner_count 求和,这样 -


     Team 1          Team 2         Winner       winner_count

0   Afghanistan     Australia     Australia             2


这应该是我的输出。


杨__羊羊
浏览 137回答 3
3回答

萧十郎

如果数据框中有很多行,我会尽量避免使用传递axis=1. s1 = np.where(df['Team 2'] > df['Team 1'], df['Team 1'], df['Team 2'])s2 = np.where(df['Team 2'] < df['Team 1'], df['Team 1'], df['Team 2'])df['Team 1'] = s1df['Team 2'] = s2df = df.groupby(['Team 1', 'Team 2', 'Winner'])['winner_count'].sum().reset_index()dfOut[1]:         Team 1     Team 2     Winner  winner_count0  Afghanistan  Australia  Australia             2

MYYA

试穿这款尺码:import pandas as pd# Initialise test dataframedf = pd.DataFrame({'Team 1': ['A', 'B'], 'Team 2': ['B', 'A'], 'winner_count': [1, 1]})# Get the smallest team name (alphabetically)df['First'] = df[['Team 1', 'Team 2']].min(axis=1)# Get the largest team name (alphabetically)df['Second'] = df[['Team 1', 'Team 2']].max(axis=1)# Groupby the teams to sum winner_countsdf = df[['First', 'Second', 'winner_count']].groupby(['First', 'Second']).sum().reset_index()这似乎是一个效率低下的解决方案,所以看看其他人是否想出更好的方法。

Smart猫小萌

IMO 该解决方案应该处理数据框中有两个以上团队的可能性,例如:df = pd.DataFrame({"Team1": ["Afghanistan", "Australia", "Australia", "Belgium", "Afghanistan"],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;"Team2": ["Australia", "Afghanistan", "Afghanistan", "Afghanistan","Belgium"],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;"Winner": ["Afghanistan", "Australia", "Australia", "Afghanistan", "Afghanistan"],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;"winner_count": [1, 1, 1, 1, 1]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;})我首先会通过新的“团队”列来识别每个团队组合:df.loc[:,"Teams"] = df.apply(lambda x: ", ".join(sorted([x["Team1"],x["Team2"]])), axis=1)然后可以按团队组合进行分组,并将获胜团队放在每个组的顶部:group = df.groupby(by=["Teams","Winner"]).agg({&nbsp; &nbsp; "Teams":"first",&nbsp; &nbsp; "Team1":"first",&nbsp; &nbsp; "Team2":"first",&nbsp; &nbsp; "winner_count":"sum"}).sort_values(by=["winner_count"], ascending=False)IMO 已经有了“团队”列,两个团队中哪一个是团队 1 或团队 2 并不重要。重复项将被删除:group.drop_duplicates(subset="Teams", inplace=True)还可以通过以下方式删除 MultiIndex ["Teams", "Winner"]:group.reset_index(level=1, drop=True, inplace=True)结果数据框组(没有索引“Teams”):+-------------+------------------------+-----------+-------------+--------------+|&nbsp; &nbsp;Winner&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Teams&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;Team1&nbsp; &nbsp;|&nbsp; &nbsp; Team2&nbsp; &nbsp; | winner_count |+-------------+------------------------+-----------+-------------+--------------+|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; || Australia&nbsp; &nbsp;| Afghanistan, Australia | Australia | Afghanistan |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2 || Afghanistan | Afghanistan, Belgium&nbsp; &nbsp;| Belgium&nbsp; &nbsp;| Afghanistan |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2 |+-------------+------------------------+-----------+-------------+--------------+
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python