根据现有列的条件创建新的 pandas 列

我有一个数据框,如下所示:


col1 = ['a','b','c','a','c','a','b','c','a']

col2 = [1,1,0,1,1,0,1,1,0]

df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])


    name    count

0   a       1

1   b       1

2   c       0

3   a       1

4   c       1

5   a       0

6   b       1

7   c       1

8   a       0

我试图找到“名称”列中每个元素对应的零数与零+一总和的比率。首先我将计数汇总如下:


for j in df2.name.unique():

    print(j)

    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]

    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]

    zero_pb = zero_ct / full_ct

    one_pb = 1 - zero_pb

    print(f"ZERO rations for {j} = {zero_pb}")

    print(f"One ratios for {j} = {one_pb}")

    print("="*30)

输出如下:


a

ZERO ratios for a = 0    0.5

dtype: float64

One ratios for a = 0    0.5

dtype: float64

==============================

b

ZERO ratios for b = 1    0.0

dtype: float64

One ratios for b = 1    1.0

dtype: float64

==============================

c

ZERO ratios for c = 2    0.333333

dtype: float64

One ratios for c = 2    0.666667

dtype: float64

==============================

我的目标是向数据框中添加 2 个新列:“name_0”和“name_1”,以及“name”列中每个元素的比率值。我尝试了一些方法,但没有给出预期的结果:


for j in df2.name.unique():

    print(j)

    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]

    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]

    zero_pb = zero_ct / full_ct

    one_pb = 1 - zero_pb

    print(f"ZERO Probablitliy for {j} = {zero_pb}")

    print(f"One Probablitliy for {j} = {one_pb}")

    print("="*30)

    

    condition1 = [ df2['name'].eq(j) & df2['count'].eq(0)]

    condition2 = [ df2['name'].eq(j) & df2['count'].eq(1)]

    choice1 = zero_pb.tolist()

    choice2 = one_pb.tolist()


该列将使用名称元素“c”的值进行更新。这是可以预料的,因为最后计算的值将用于更新所有值。


还有另一种方法可以有效地使用 np.select 吗?


互换的青春
浏览 102回答 2
2回答

慕侠2389804

我无法访问 Zero_one_frequencies df。所以我冒昧地尝试用我的方式解决这个问题。import pandas as pdimport numpy as npcol1 = ['a','b','c','a','c','a','b','c','a']col2 = [1,1,0,1,1,0,1,1,0]df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])df2["name_0"] = 0df2["name_1"] = 0for name in df2['name'].unique():  df_name = df2[df2['name'] == name]  prob_1 = sum(df_name['count']/df_name.shape[0])  for count in df2['count'].unique():    indx = np.where((df2['name'] == name) & (df2['count'] == count))    df2["name_" + str(count)].loc[indx] = np.abs(((count +1) % 2) - prob_1)输出:name    count   name_0  name_10   a   1   0.000000    0.5000001   b   1   0.000000    1.0000002   c   0   0.333333    0.0000003   a   1   0.000000    0.5000004   c   1   0.000000    0.6666675   a   0   0.500000    0.0000006   b   1   0.000000    1.0000007   c   1   0.000000    0.6666678   a   0   0.500000    0.000000

慕容3067478

以下代码解决了该问题。但是,我找不到使用 numpy.select 获得相同效果的方法。df2["name"+str("_0")] = 0.0df2["name"+str("_1")] = 0.0for j in df2.name.unique():    print(j)    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]    zero_pb = zero_ct / full_ct    one_pb = 1 - zero_pb    print(f"ZERO Probablitliy for {j} = {zero_pb.tolist()[0]}")    print(f"One Probablitliy for {j} = {one_pb.tolist()[0]}")    print("="*30)    for idx in df2[df2['name']== j ].index:        print("Index:::", idx)        if df2['count'].iloc[idx] == 0:            df2.at[idx, "name"+str("_0")] = zero_pb.tolist()[0]            print(f'Count for {j} at index {idx} is {a}')            print('printing name_0: ', df2["name"+str("_0")].iloc[idx])            print("*"*30)        elif df2['count'].iloc[idx] == 1:            df2.at[idx, "name"+str("_1")] = one_pb.tolist()[0]            print(f'Count for {j} at index {idx} is {b}')            print('printing name_1: ', df2["name"+str("_1")].iloc[idx])            print("*"*30)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python