创建具有多个可变条件的新数据框并提取失败的原因 - pandas

我有下面的代码。


考虑到以下限制,我想创建如下输出:


A > 5,B > 4,C > 3


如果不满足条件,我想读取数据框中的下面一行,存储数据,并创建一个名为“失败原因”的新列,其中列出 A、B 或 C 是否失败。


然后我希望脚本也报告通过的数据帧的行的“X”、“Y”和“Z”的相应值。


此后,脚本应按“组”分组并显示每组的最大“Hs”。


我真的很难在数据框中使用多个变量来完成这项工作...任何帮助将不胜感激


所需输出


   Group   Hs Fail Reason    X    Y     Z

0      1  1.0      [A, B]  0.9  1.9  0.54

1      2  0.5   [A, B, C]  0.8  2.7  0.43

主要代码- 我当前的尝试


import pandas as pd


data = [[1,0.5,8,8,8,0.85,1.64,0.5],

        [1,1,8,8,8,0.9,1.9,0.54],

        [1,1.5,0,0,10,1.1,2.0,0.74],

        [2,0.5,6,5,4,0.8,2.7,0.43],

        [2,1,1,1,1,0.9,2.9,0.45],

        [2,1.5,1,2,1,1.1,3.1,0.47]]


columns = ['Group', 'Hs', 'A', 'B', 'C', 'X', 'Y', 'Z']


df = pd.DataFrame(data=data, columns=columns)


Limit_A = 5

Limit_B = 4

Limit_C = 3


# Opens an empty dataframe for appending

df_new = pd.DataFrame(columns=['Group', 'Hs'])


groups = df['Group'].unique()


# for g in groups

for g in groups:

    # Create new temp dataframe

    df_1 = df[df['Group'] == g]

    # Input conditions, checks the columns one by one are NOT EQUAL TO ZERO. Outputs boolean values.

    pass_criteria = (df_1['A'] > Limit_A) & (df_1['B'] > Limit_B) & (df_1['C'] > Limit_C)


    # PASSES DATAFRAME. Locates rows where the conditions of mask_1 are SATISFIED and creates another temp dataframe.

    df_passes = df_1.loc[pass_criteria]


    # Find the max value in the dataframe e.g. the greatest operational wave height

    max_num = df_passes['Hs'].max()


    # Does the opposite of mask_1

    fail_criteria = (df_1['A'] < Limit_A) & (df_1['B'] < Limit_B) &(df_1['C'] < Limit_C)


    # FAILED DATAFRAME. Locates rows where the conditions of pass_criteria are SATISFIED and creates another temp dataframe.

    df_fails = df_1.loc[fail_criteria]


    # Uses the dataframe with FAIL and mkes the value_vars rows in the melted dataframe

    melted = pd.melt(df_fails, value_vars=['A', 'B', 'C'])

print(df_new)


aluckdog
浏览 60回答 1
1回答

UYOU

IIUC 首先将 A、B、C 列与您的限制进行比较,然后agg返回map结果:res = df[["A","B","C"]]>[5,4,3]s = (pd.concat([df, (~res[~res.all(1)]).agg(lambda x: res.columns[x].tolist(),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; axis=1).rename("Fail reason")], axis=1)&nbsp; &nbsp; &nbsp; &nbsp;.dropna().drop_duplicates("Group").set_index("Group")["Fail reason"])print (df.assign(failed_reason=df["Group"].map(s))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.loc[res.all(1)].sort_values(["Group", "Hs"])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.drop_duplicates("Group", keep="last"))&nbsp; &nbsp;Group&nbsp; &nbsp;Hs&nbsp; A&nbsp; B&nbsp; C&nbsp; &nbsp; X&nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp;Z failed_reason1&nbsp; &nbsp; &nbsp; 1&nbsp; 1.0&nbsp; 8&nbsp; 8&nbsp; 8&nbsp; 0.9&nbsp; 1.9&nbsp; 0.54&nbsp; &nbsp; &nbsp; &nbsp; [A, B]3&nbsp; &nbsp; &nbsp; 2&nbsp; 0.5&nbsp; 6&nbsp; 5&nbsp; 4&nbsp; 0.8&nbsp; 2.7&nbsp; 0.43&nbsp; &nbsp; &nbsp;[A, B, C]
打开App,查看更多内容
随时随地看视频慕课网APP