删除列中具有重复项的行，仅适用于随后连续几天出现的重复项

以下方法适用于任何数据顺序：按名称/日期排序，应用日期偏移并检查增量。import pandas as pddata = {'Date':['2020-07-21', '2020-04-24', '2020-04-25', '2020-04-25', '2020-04-26', '2020-07-20', '2020-04-24'], 'Name':['John', 'John', 'John', 'Bob', 'John', 'John', 'Bob'], 'Points':[0,3,5,0,8,2,7]}df = pd.DataFrame(data)print(df)df['Date']=pd.to_datetime(df['Date'])df.sort_values(['Name', 'Date'], inplace=True)print(df[df['Date'].shift(1)-df['Date']!= '-1 days'])#print(df) - Note not sorted         Date  Name  Points0  2020-07-21  John       71  2020-04-25  John       52  2020-04-24  John       33  2020-04-25   Bob       04  2020-04-26  John       85  2020-07-20  John       26  2020-04-24   Bob       0#print(df) - Output        Date  Name  Points6 2020-04-24   Bob       02 2020-04-24  John       35 2020-07-20  John       2以下方法仅适用于问题中的数据顺序：我将把它留在这里，以防将来有人发现它有帮助。澄清后，对于这个特定问题来说它已经过时了。使用 Shift 与以前的名称进行比较：df=df[df['Name'].shift(1) != df['Name']]完整示例：import pandas as pddata = {'Date':['2020-04-24', '2020-04-25', '2020-04-26', '2020-04-24', '2020-04-25', '2020-04-20', '2020-04-21', ], 'Name':['John', 'John', 'John', 'Bob', 'Bob', 'John', 'John', ], 'Points':[3, 5, 8, 0, 0, 2, 7]}df = pd.DataFrame(data)print(df)df=df[df['Name'].shift(1) != df['Name']]print(df)

删除列中具有重复项的行，仅适用于随后连续几天出现的重复项

1回答