我有一个数据框,其中包含有关谁在工作、执行哪个任务以及他/她开始工作的时间的“日志”信息:
index | Entrance time | Name | Last name | Employee_ID | Task
--------------------------------------------------------------------
0 |2000-01-01 00:00:00 | John | Fischer | 001 | Maintenance
1 |2000-01-01 00:04:30 | John | Fischer | 001 | Development
2 |2000-01-01 00:04:30 | Bob | Conrad | 002 | Maintenance
3 |2000-01-01 00:10:00 | Mary | Smith | 003 | Multitasking
4 |2000-01-01 00:09:30 | John | Fischer | 001 | Maintenance
5 |2000-01-01 00:15:30 | John | Fischer | 001 | Maintenance
6 |2000-01-02 00:04:30 | Bob | Conrad | 002 | Maintenance
7 |2000-01-02 00:10:00 | Mary | Smith | 003 | Multitasking
然后,如果我们正在查找的任务和其他任务之间的进入时间差小于 10 分钟,并且任务和名称相同,我想消除重复项。所以生成的数据框应该是:
index | Entrance time | Name | Last name | Employee_ID | Task
--------------------------------------------------------------------
0 |2000-01-01 00:00:00 | John | Fischer | 001 | Maintenance
1 |2000-01-01 00:04:30 | John | Fischer | 001 | Development
2 |2000-01-01 00:04:30 | Bob | Conrad | 002 | Maintenance
3 |2000-01-01 00:10:00 | Mary | Smith | 003 | Multitasking
5 |2000-01-01 00:15:30 | John | Fischer | 001 | Maintenance
6 |2000-01-02 00:04:30 | Bob | Conrad | 002 | Maintenance
7 |2000-01-02 00:10:00 | Mary | Smith | 003 | Multitasking
我使用过drop_duplicates(subset=["Name", "Last name", "Task"]),但我不知道如何应用时间条件来将每一行与其余行进行比较。
希望你能帮助我,提前谢谢你
芜湖不芜
相关分类