我有一个 csv 如下
+-----+---------+-----------+------------+
| ID | version | Name | State |
+-----+---------+-----------+------------+
| 101 | 1 | Nut | In-Transit |
| 101 | 1 | Nut | Cancelled |
| 101 | 1 | Nut | Delivered |
| 101 | 2 | Nut 2.0 | In-Transit |
| 102 | 1 | Screw | Shipped |
| 102 | 1 | Screw | In-Transit |
| 102 | 2 | Screw 2.0 | Shipped |
| 102 | 2 | Screw 2.0 | Cancelled |
+-----+---------+-----------+------------+
现在我想在每个 ID 和版本组合的所有可用状态中采用最高状态(基于以下优先级)。
我的定制订单
发表
在途中
发货
取消
预期产出
+-----+---------+-----------+------------+
| ID | version | Name | State |
+-----+---------+-----------+------------+
| 101 | 1 | Nut | Delivered |
| 101 | 2 | Nut 2.0 | In-Transit |
| 102 | 1 | Screw | In-Transit |
| 102 | 2 | Screw 2.0 | Shipped |
+-----+---------+-----------+------------+
我试过下面的查询但没有工作。我是 python 的新手,我不确定如何解决这个问题。
import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
mydata['state'] = pd.Categorical(mydata['state'], ["Delivered","In-Transit","Shipped","Cancelled"])
mydate.sort_values('state').drop_duplicates(['ID','VERSION'],keep='first')
蝴蝶刀刀
12345678_0001
相关分类