萧十郎
“当 customerEmail 中存在重复项时,希望我的 Fraud 列具有空值。”所以在你的预期输出中你忘记添加,name_4 因为customerEmail它也是重复的 df1 = pd.DataFrame({ 'customerEmail':['name0','name1','name2','name3','name4','name1'], 'Fraud':[False,True,True,True,False,False]} )df2 = pd.DataFrame({ 'customerEmail': ['name0', 'name1', 'name2', 'name3', 'name4', 'name1'], 'ID':[0,1,2,3,4,5]})df3=pd.merge(df1, df2, on='customerEmail', how='left')#here you need to know which customers are duplicated, to fill for them rows in column Frauddf_duplicates = df3.drop_duplicates(subset=['customerEmail'],keep='last')print(df_duplicates) customerEmail Fraud ID0 name0 False 03 name2 True 24 name3 True 35 name4 False 47 name1 False 5#now for those duplicates fill cells in column Fraud using iloc and np.nandf_duplicates.loc[:,'Fraud'] = np.nanprint(df_duplicates) customerEmail Fraud ID0 name0 NaN 03 name2 NaN 24 name3 NaN 35 name4 NaN 47 name1 NaN 5#so now you have two df's , one df_duplicates with Nans duplicates values above,#and main df3 with original merged values#now you need to add those df's using concat , (add column to column )#but you dont need values with same customerEmail that you used for df_duplicated, so you can delete them using drop_duplicatesresult = pd.concat([df3,df_duplicates]).drop_duplicates(subset=['customerEmail','Fraud'])#after concat True and False values has been coverted to 1.0 and 0 , for we need to change the type to False and True againresult.Fraud = result.Fraud.astype('boolean')print(result) customerEmail Fraud ID0 name0 False 01 name1 True 13 name2 True 24 name3 True 35 name4 False 46 name1 False 10 name0 <NA> 03 name2 <NA> 24 name3 <NA> 35 name4 <NA> 47 name1 <NA> 5