有一个数据框数据如下
InsuranceId InsuranceStatus Date
0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00
1 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00
2 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00
3 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
4 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00
5 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00
尝试根据 InsuranceId 和 max(Date) 分组创建行号/排名
df['RowNum'] = df.groupby('InsuranceId')['InsuranceStatus']['Date'].rank(method="first", ascending=True)
and
df['RowNum'] = df.groupby(by=['InsuranceId'])['InsuranceStatus']['Date'].transform(lambda x: x.rank())
通过引用PANDAS 中类似 SQL 的窗口函数:Python Pandas Dataframe 中的行编号
Error: Index Error: Columns status already selected
试图达到以下输出
InsuranceId InsuranceStatus Date RowNum
0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00 1
1 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00 2
2 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00 3
3 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00 1
4 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00 2
5 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00 3
有什么我想补充的吗?请提出任何建议
最终输出:
InsuranceId InsuranceStatus Date
Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00
Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
largeQ
相关分类