猿问

Pandas Logic - 动态规划的空间

请查看下面的示例数据帧和我的代码。以下是我的逐步目标:步骤 1:将 A 列和 B 列合并到 Column_A_B步骤 2:计算“ColumnA_B”中值的每个实例步骤 3过滤掉“ColumnA_B”中只有 1 个值实例的行第 4 步:删除“状态”列中已取消的每一行,仅删除其中已取消的行 - ColumnA_B 中可能有一些具有相同值但“状态”值不同(请注意,第 3 步过滤器是正在应用)我在第 5 步之前的代码似乎有效,它只是第 5 步,我真的坚持 第 5 步:'Column_A_B' 的过滤器仍然打开(即过滤掉计数为 1)查看冗余值(所以当你计算 'Column_A_B_' 中的值时将是 2 或更大),然后对于所述分组计数查看“数量”列。如果该组的数量少于 16 且超过 99,则仅删除“QTY”为 16 的行。超过 99 不要删除任何东西。

当程序完成时,我希望上面的数据框看起来像什么(上面我命名为 keep.xlsx)应该是这样的:


import pandas as pd


goaldf = pd.DataFrame({'Column_A':['test1', 'test4', 'test6', 'test6', 'test7'],'Column_B':['WO1', 'WO6', 'WO6', 'WO6', 'WO7'],

                   'Column_A_B': ['test1W01','test4WO6','test6WO6','test6WO6', 'test7WO7'], 'Satus': ['Cancelled', 'Active', 'Open', 'Active', 'Active'],

                   'Qty': ['12', '3000', '14', '88', '1500']})

writer = pd.ExcelWriter('goaldf.xlsx', engine='xlsxwriter')

goaldf.to_excel(writer, sheet_name='Sheet1')

writer.save()


婷婷同学_
浏览 184回答 1
1回答

慕雪6442864

按照你的解释:"""goal waiting&nbsp; Column_A Column_B Column_A_B&nbsp; &nbsp; &nbsp;Status&nbsp; &nbsp;Qty0&nbsp; &nbsp; test1&nbsp; &nbsp; &nbsp; WO1&nbsp; &nbsp;test1W01&nbsp; Cancelled&nbsp; &nbsp; 121&nbsp; &nbsp; test4&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp;test4WO6&nbsp; &nbsp; &nbsp;Active&nbsp; 30002&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp;test6WO6&nbsp; &nbsp; &nbsp; &nbsp;Open&nbsp; &nbsp; 143&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp;test6WO6&nbsp; &nbsp; &nbsp;Active&nbsp; &nbsp; 884&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; &nbsp;test7WO7&nbsp; &nbsp; &nbsp;Active&nbsp; 1500"""import pandas as pdimport numpy as npfrom numpy import NaNdf = pd.DataFrame({'Column_A':['test1', 'test7', 'test7', 'test4', 'test6', 'test6', 'test7'],'Column_B':['WO1','WO7', 'WO7', 'WO6', 'WO6', 'WO6', 'WO7'],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'Status': ['Cancelled','Cancelled', 'Active', 'Active', 'Open', 'Active', 'Active'],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'Qty': ['12', '34' , '13', '3000', '14', '88', '1500']})df_deleted = df.copy(deep=True)df_deleted.drop(df.index,inplace=True)#Step1def process(r):&nbsp; &nbsp; return r['Column_A'] + r['Column_B']df["Column_A_B"] = df.apply(lambda row: process(row), axis = 1)print("step 1");print(df)#Step2df['countAB'] = df.groupby('Column_A_B')['Qty'].transform('count')print("step 2");print(df)#Step3df['True_False']=df['countAB'] == 1print("step 3");print(df)#Step4todelete = df[(df['Status'] == 'Cancelled') & (df['True_False'] == False)]df = df[(df['Status'] != 'Cancelled') | (df['True_False'] == True)]df.drop(['countAB','True_False'], axis=1, inplace=True)todelete.drop(['True_False', 'countAB'], axis=1, inplace=True)df_deleted = df_deleted.append(todelete)print("step 4");print(df);print("step 4 - deleted");print(df_deleted)#5tep5df['Qty'] = df['Qty'].astype(int)df['maxAB'] = df.groupby('Column_A_B')['Qty'].transform('max')&nbsp;&nbsp;todelete = df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]df= df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]df = df.reset_index(drop=True)todelete.drop(['maxAB'], axis=1, inplace=True)df_deleted = df_deleted.append(todelete)df.drop(['maxAB'], axis=1, inplace=True)print("step 5");print(df);print("step 5 - deleted");print(df_deleted)输出:&nbsp; Column_A Column_B&nbsp; &nbsp; &nbsp;Status&nbsp; &nbsp;Qty Column_A_B0&nbsp; &nbsp; test1&nbsp; &nbsp; &nbsp; WO1&nbsp; Cancelled&nbsp; &nbsp; 12&nbsp; &nbsp;test1WO11&nbsp; &nbsp; test4&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp;Active&nbsp; 3000&nbsp; &nbsp;test4WO62&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp; &nbsp;Open&nbsp; &nbsp; 14&nbsp; &nbsp;test6WO63&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp;Active&nbsp; &nbsp; 88&nbsp; &nbsp;test6WO64&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; &nbsp; &nbsp;Active&nbsp; 1500&nbsp; &nbsp;test7WO7step 5 - deleted&nbsp; Column_A Column_A_B Column_B Qty&nbsp; &nbsp; &nbsp;Status1&nbsp; &nbsp; test7&nbsp; &nbsp;test7WO7&nbsp; &nbsp; &nbsp; WO7&nbsp; 34&nbsp; Cancelled2&nbsp; &nbsp; test7&nbsp; &nbsp;test7WO7&nbsp; &nbsp; &nbsp; WO7&nbsp; 13&nbsp; &nbsp; &nbsp;Active一些解释:对于步骤 1:它只是将 2 列与 lambda 连接起来,当您使用 apply 时,您会在每一行(轴 = 1)上执行某些操作,结果在新列“Column_A_B”中#Step1# definition of lambda function (others ways to do exist)def process(r):&nbsp; &nbsp; return r['Column_A'] + r['Column_B'] # i concatenate the 2 valuesdf["Column_A_B"] = df.apply(lambda row: process(row), axis = 1)print("step 1");print(df)结果:step 1&nbsp; Column_A Column_B&nbsp; &nbsp; &nbsp;Status&nbsp; &nbsp;Qty Column_A_B0&nbsp; &nbsp; test1&nbsp; &nbsp; &nbsp; WO1&nbsp; Cancelled&nbsp; &nbsp; 12&nbsp; &nbsp;test1WO11&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; Cancelled&nbsp; &nbsp; 34&nbsp; &nbsp;test7WO72&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; &nbsp; &nbsp;Active&nbsp; &nbsp; 13&nbsp; &nbsp;test7WO73&nbsp; &nbsp; test4&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp;Active&nbsp; 3000&nbsp; &nbsp;test4WO64&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp; &nbsp;Open&nbsp; &nbsp; 14&nbsp; &nbsp;test6WO65&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp;Active&nbsp; &nbsp; 88&nbsp; &nbsp;test6WO66&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; &nbsp; &nbsp;Active&nbsp; 1500&nbsp; &nbsp;test7WO7对于第 5 步:这个想法是在每个组中创建一个具有最大值 Qty 的新列(这里的组是 Column_A_B),所以在这个命令之后:df['maxAB'] = df.groupby('Column_A_B')['Qty'].transform('max')&nbsp;print("maxAB");print(df)结果:maxAB&nbsp; Column_A Column_B&nbsp; &nbsp; &nbsp;Status&nbsp; &nbsp;Qty Column_A_B&nbsp; maxAB0&nbsp; &nbsp; test1&nbsp; &nbsp; &nbsp; WO1&nbsp; Cancelled&nbsp; &nbsp; 12&nbsp; &nbsp;test1WO1&nbsp; &nbsp; &nbsp;12&nbsp; *max value of group test1WO12&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; &nbsp; &nbsp;Active&nbsp; &nbsp; 13&nbsp; &nbsp;test7WO7&nbsp; &nbsp;1500&nbsp; *max value of group test7WO73&nbsp; &nbsp; test4&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp;Active&nbsp; 3000&nbsp; &nbsp;test4WO6&nbsp; &nbsp;3000&nbsp; *max value of group test4WO64&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp; &nbsp;Open&nbsp; &nbsp; 14&nbsp; &nbsp;test6WO6&nbsp; &nbsp; &nbsp;88&nbsp; *max value of group test6WO65&nbsp; &nbsp; test6&nbsp; &nbsp; &nbsp; WO6&nbsp; &nbsp; &nbsp;Active&nbsp; &nbsp; 88&nbsp; &nbsp;test6WO6&nbsp; &nbsp; &nbsp;88&nbsp; *max value of group test6WO66&nbsp; &nbsp; test7&nbsp; &nbsp; &nbsp; WO7&nbsp; &nbsp; &nbsp;Active&nbsp; 1500&nbsp; &nbsp;test7WO7&nbsp; &nbsp;1500&nbsp; *max value of group test7WO7正如你所看到的,你在自己面前拥有每个组的最大值(对不起我的英语)现在,对于 Qty > 99 和 Qty <=16 的每个组,我只删除 Qty <= 16 的行。所以下一个命令说:我保留所有回答这个过滤器的行并放入 todelete 数据帧todelete = df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]所以在 todelete 我想保留,但在 df 我想删除(并保留所有其他行)。我们必须使用相反的过滤器。在逻辑 => A and b 中,相反的not ( A and B) = (not A) or (not B)所以“不是”的逻辑df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]是:df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]&nbsp; &nbsp;&nbsp;所以在这个命令之后:# i want to keep rows which have a Qty <= 99&nbsp;#&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;or&nbsp;# rows which have a Qty > 16df= df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]&nbsp;您可以通过使用变量来简化:filter = (df['maxAB'] > 99) & (df['Qty'] <= 16)todelete = df[filter]df= df[~filter]~filter等价于not filter我重建索引(0到4)df = df.reset_index(drop=True)最后,等待最终结果(删除临时列后)希望这有助于理解...
随时随地看视频慕课网APP

相关分类

Python
我要回答