Pandas Logic - 动态规划的空间

按照你的解释："""goal waiting  Column_A Column_B Column_A_B     Status   Qty0    test1      WO1   test1W01  Cancelled    121    test4      WO6   test4WO6     Active  30002    test6      WO6   test6WO6       Open    143    test6      WO6   test6WO6     Active    884    test7      WO7   test7WO7     Active  1500"""import pandas as pdimport numpy as npfrom numpy import NaNdf = pd.DataFrame({'Column_A':['test1', 'test7', 'test7', 'test4', 'test6', 'test6', 'test7'],'Column_B':['WO1','WO7', 'WO7', 'WO6', 'WO6', 'WO6', 'WO7'],                   'Status': ['Cancelled','Cancelled', 'Active', 'Active', 'Open', 'Active', 'Active'],                   'Qty': ['12', '34' , '13', '3000', '14', '88', '1500']})df_deleted = df.copy(deep=True)df_deleted.drop(df.index,inplace=True)#Step1def process(r):    return r['Column_A'] + r['Column_B']df["Column_A_B"] = df.apply(lambda row: process(row), axis = 1)print("step 1");print(df)#Step2df['countAB'] = df.groupby('Column_A_B')['Qty'].transform('count')print("step 2");print(df)#Step3df['True_False']=df['countAB'] == 1print("step 3");print(df)#Step4todelete = df[(df['Status'] == 'Cancelled') & (df['True_False'] == False)]df = df[(df['Status'] != 'Cancelled') | (df['True_False'] == True)]df.drop(['countAB','True_False'], axis=1, inplace=True)todelete.drop(['True_False', 'countAB'], axis=1, inplace=True)df_deleted = df_deleted.append(todelete)print("step 4");print(df);print("step 4 - deleted");print(df_deleted)#5tep5df['Qty'] = df['Qty'].astype(int)df['maxAB'] = df.groupby('Column_A_B')['Qty'].transform('max')  todelete = df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]df= df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]df = df.reset_index(drop=True)todelete.drop(['maxAB'], axis=1, inplace=True)df_deleted = df_deleted.append(todelete)df.drop(['maxAB'], axis=1, inplace=True)print("step 5");print(df);print("step 5 - deleted");print(df_deleted)输出：  Column_A Column_B     Status   Qty Column_A_B0    test1      WO1  Cancelled    12   test1WO11    test4      WO6     Active  3000   test4WO62    test6      WO6       Open    14   test6WO63    test6      WO6     Active    88   test6WO64    test7      WO7     Active  1500   test7WO7step 5 - deleted  Column_A Column_A_B Column_B Qty     Status1    test7   test7WO7      WO7  34  Cancelled2    test7   test7WO7      WO7  13     Active一些解释：对于步骤 1：它只是将 2 列与 lambda 连接起来，当您使用 apply 时，您会在每一行（轴 = 1）上执行某些操作，结果在新列“Column_A_B”中#Step1# definition of lambda function (others ways to do exist)def process(r):    return r['Column_A'] + r['Column_B'] # i concatenate the 2 valuesdf["Column_A_B"] = df.apply(lambda row: process(row), axis = 1)print("step 1");print(df)结果：step 1  Column_A Column_B     Status   Qty Column_A_B0    test1      WO1  Cancelled    12   test1WO11    test7      WO7  Cancelled    34   test7WO72    test7      WO7     Active    13   test7WO73    test4      WO6     Active  3000   test4WO64    test6      WO6       Open    14   test6WO65    test6      WO6     Active    88   test6WO66    test7      WO7     Active  1500   test7WO7对于第 5 步：这个想法是在每个组中创建一个具有最大值 Qty 的新列（这里的组是 Column_A_B），所以在这个命令之后：df['maxAB'] = df.groupby('Column_A_B')['Qty'].transform('max') print("maxAB");print(df)结果：maxAB  Column_A Column_B     Status   Qty Column_A_B  maxAB0    test1      WO1  Cancelled    12   test1WO1     12  *max value of group test1WO12    test7      WO7     Active    13   test7WO7   1500  *max value of group test7WO73    test4      WO6     Active  3000   test4WO6   3000  *max value of group test4WO64    test6      WO6       Open    14   test6WO6     88  *max value of group test6WO65    test6      WO6     Active    88   test6WO6     88  *max value of group test6WO66    test7      WO7     Active  1500   test7WO7   1500  *max value of group test7WO7正如你所看到的，你在自己面前拥有每个组的最大值（对不起我的英语）现在，对于 Qty > 99 和 Qty <=16 的每个组，我只删除 Qty <= 16 的行。所以下一个命令说：我保留所有回答这个过滤器的行并放入 todelete 数据帧todelete = df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]所以在 todelete 我想保留，但在 df 我想删除（并保留所有其他行）。我们必须使用相反的过滤器。在逻辑 => A and b 中，相反的not ( A and B) = (not A) or (not B)所以“不是”的逻辑df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]是：df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]    所以在这个命令之后：# i want to keep rows which have a Qty <= 99 #             or # rows which have a Qty > 16df= df[(df['maxAB'] <= 99) | (df['Qty'] > 16)] 您可以通过使用变量来简化：filter = (df['maxAB'] > 99) & (df['Qty'] <= 16)todelete = df[filter]df= df[~filter]~filter等价于not filter我重建索引（0到4）df = df.reset_index(drop=True)最后，等待最终结果（删除临时列后）希望这有助于理解...

Pandas Logic - 动态规划的空间

1回答