我有以下形式的数据框:(除了这些之外还有更多的列 - 为简洁起见删除)
import pandas as pd
headers = ['A','B','C']
data = [['p1','','v1'],
['p2','','ba'],
['p3',9,'fg'],
['p1',1,'fg'],
['p2',45,'af'],
['p3',1,'fg'],
['p1',1,'hf']
]
df = pd.DataFrame(data,columns=headers)
A B C
0 p1 v1
1 p2 ba
2 p3 9 fg
3 p1 1 fg
4 p2 45 af
5 p3 1 fg
6 p1 1 hf
B 列有重复项,因此最新值应该是非 NA(但可能不是)
我想用最新的非 NA 值替换 col B 值。像这样的东西:
unique_people = df['A'].unique()
for person in unique_people:
sub_df = df[df['A'] == person]
val = sub_df['B'].tail(1).values
df['A'][df['A'] == person] = val # this also doesnt work because its not inplace
我确定有更好的方法来做到这一点,但我不确定如何。有人能指出更好的方法吗?
呼唤远方
相关分类