我有一个包含刺痛和浮点数列表的数据框,可以说
Names Prob
[Anne, Mike, Anne] [10.0, 10.0, 80.0]
[Sophie, Andy, Vera, Kate] [30.0, 4.5, 5.5, 60.0]
[Josh, Anne, Sophie] [51, 24, 25]
我想要做的是循环Names,如果名称包含在预定义的组中,则重新标记它,然后从Prob.
例如,如果team1 = ['Anne', 'Mike', 'Sophie']我想结束:
Names Prob
[Team_One] [100.0]
[Andy, Kate, Team_One, Vera] [4.5, 60.0, 30.0, 5.5]
[Josh, Team_One] [51, 49]
我写的是这个,但我认为这有点荒谬,我在循环内创建一个临时数据框,然后进行分组;对我来说听起来有点矫枉过正,而且太重了。
请问有没有更有效的方法?(如果重要的话,我正在使用 Python 3)
import pandas as pd
def pool(df):
team1 = ['Anne', 'Mike', 'Sophie']
names = df['Names']
prob = df['Prob']
out_names = []
out_prob = []
for key, name in enumerate(names):
# relabel if in team1 otherwise keep it the same
name = ['Team_One' if x in team1 else x for x in name]
# make a temp dataframe and group by name
temp = pd.DataFrame({'name': name, 'prob': prob[key]} )
temp = temp.groupby('name').sum()
# make the output
out_names.append(temp.index.tolist())
out_prob.append(temp['prob'].tolist())
df['Names'] = out_names
df['Prob'] = out_prob
return df
df = pd.DataFrame({
'Names':[['Anne', 'Mike', 'Anne'],
['Sophie', 'Andy', 'Vera', 'Kate'],
['Josh', 'Anne', 'Sophie']
],
'Prob': [[10., 10., 80.],
[30., 4.5, 5.5, 60.],
[51, 24, 25]
]
})
out = pool(df)
print(out)
谢谢!
陪伴而非守候
繁花不似锦
相关分类