我有一个数据框,如下所示:
col1 = ['a','b','c','a','c','a','b','c','a']
col2 = [1,1,0,1,1,0,1,1,0]
df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])
name count
0 a 1
1 b 1
2 c 0
3 a 1
4 c 1
5 a 0
6 b 1
7 c 1
8 a 0
我试图找到“名称”列中每个元素对应的零数与零+一总和的比率。首先我将计数汇总如下:
for j in df2.name.unique():
print(j)
zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]
full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]
zero_pb = zero_ct / full_ct
one_pb = 1 - zero_pb
print(f"ZERO rations for {j} = {zero_pb}")
print(f"One ratios for {j} = {one_pb}")
print("="*30)
输出如下:
a
ZERO ratios for a = 0 0.5
dtype: float64
One ratios for a = 0 0.5
dtype: float64
==============================
b
ZERO ratios for b = 1 0.0
dtype: float64
One ratios for b = 1 1.0
dtype: float64
==============================
c
ZERO ratios for c = 2 0.333333
dtype: float64
One ratios for c = 2 0.666667
dtype: float64
==============================
我的目标是向数据框中添加 2 个新列:“name_0”和“name_1”,以及“name”列中每个元素的比率值。我尝试了一些方法,但没有给出预期的结果:
for j in df2.name.unique():
print(j)
zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]
full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]
zero_pb = zero_ct / full_ct
one_pb = 1 - zero_pb
print(f"ZERO Probablitliy for {j} = {zero_pb}")
print(f"One Probablitliy for {j} = {one_pb}")
print("="*30)
condition1 = [ df2['name'].eq(j) & df2['count'].eq(0)]
condition2 = [ df2['name'].eq(j) & df2['count'].eq(1)]
choice1 = zero_pb.tolist()
choice2 = one_pb.tolist()
该列将使用名称元素“c”的值进行更新。这是可以预料的,因为最后计算的值将用于更新所有值。
还有另一种方法可以有效地使用 np.select 吗?
慕侠2389804
慕容3067478
相关分类