Pandas:按复杂条件合并组内的两行

我有一个 df 如下;将 pandas 导入为 pd


df = pd.DataFrame({

    "ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],

    'Sender': [28, 'delete', 'flag_source', 56, 28, 312, 'delete', 'flag_source', 78, 102, 26, 101, 96],

    'Receiver': [129, 28, 'delete', 172, 56, 28, 61, 'delete', 12, 78, 98, 26, 101],

    'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],

    'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],

    'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],

    'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]

})

它是这样的:


           ID       Sender Receiver        Date Sender_type Receiver_type  Price  

0   company A           28      129  2020-04-12       house         house  32 

1   company A       delete       28  2020-03-20        temp         house  50 # combine this row with below

2   company A  flag_source   delete  2020-03-20       house          temp  47 # combine this row with above

3   company B           56      172  2019-02-11       house         house  21 

4   company B           28       56  2019-01-31       house         house  23 

5   company B          312       28  2018-04-02       house         house  19 



我希望通过以下规则合并/合并每个组“ID”(公司 x)的两行:将“Sender”中包含“flag_source”的行及其上面的行合并为一个新行。在这个新行中:Sender 是 flag_source,'Revceiver' 是其上面的值(删除两个 'delete' 值),Date 是上面的日期,Sender_type 和 Receiver_type 是 'house','Price' 是上面的上一个值价值。然后删除两行。例如,对于 A 公司,它将合并第 1 行和第 2 行以生成以下新行:


ID        Sender        Receiver  Date        Sender_type  Receiver_type  Price

company A flag_source   28        2020-03-20  house        house          50


幕布斯7119047
浏览 110回答 3
3回答

跃然一笑

import pandas as pddf = pd.DataFrame({    "ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],    'Sender': [28, 'delete', 'flag_source', 56, 28, 312, 'delete', 'flag_source', 78, 102, 26, 101, 96],    'Receiver': [129, 28, 'delete', 172, 56, 28, 61, 'delete', 12, 78, 98, 26, 101],    'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],    'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],    'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],    'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]})flaggedData = (df[df["Sender"] == "flag_source"])for i,row in flaggedData.iterrows():  # Row variable contains row having sender as flag_source    deleteRow = df[df.index == i-1].values[0]   # delete variable contains row having sender as delete    combined = [row[0],  # ID                row[1],  # Sender                deleteRow[2],  # Receiver                deleteRow[3],  # Date                row[4],  # Sender_type                deleteRow[5],  # Receiver_type                deleteRow[6]]  # Price    df.loc[i-1] = combined  # replace with new values    df = df.drop(index=i)  # drop old valuesdf = df.reset_index()  # resent index for better access on future.print(df.loc[1])我假设每个“删除”行都位于“flag_source”行上方。如果你还是不明白,请阅读评论,评论你的疑问。

精慕HU

如果delete/flag_source始终位于同一日期,并且该日期+ ID上没有其他行,则可以对ID和日期使用groupby聚合函数以避免使用长循环。如果您的数据顺序不正确,您始终可以sort_values提前进行更改。cols = df.columnsnew_df = df.groupby(['ID', 'Date']).aggregate({    'Sender': 'last',     'Receiver': 'first',     'Sender_type': 'last',     'Receiver_type': 'first',     'Price': 'first'    }).reset_index()# Reorder as per original datanew_df[cols].sort_values(['ID', 'Date'], ascending=[1, 0])

慕容3067478

看来您只需要删除每对的第二行并替换其余行中的一些值。df = df[dd.Receiver == 'delete']df.Sender = df.Sender.str.replace('delete', 'flag_source')df.Sender_type = df.Sender_type.str.replace('temp', 'house')
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python