键列具有重复值。我正在尝试合并数据框

我想DF1与DF2customerEmail 合并,这在两者中都很常见,但具有重复的值。DF1


 customer_Email  Fraud

 name_0          False

 name_1          True

 name_2          True

 name_3          True

 name_4          False

 name_1          False        

DF2



customer_Email  ID

 name_0           0

 name_1           1

 name_2           2

 name_3           3

 name_4           4

 name_1           5

请注意,DF1 和 DF2 仅用于示例目的。


customerEmail在两个数据框中都不是唯一的


因此,当我使用 联接这两个表时,当重复相同操作时,它会用其上方行的随机值pd.merge(DF1, DF2, on='customerEmail', how=left)填充我的目标列。FraudcustomerEmail


当 中存在重复项时,我希望我的Fraud列具有空值customerEmail。


customer_email在两个数据框中都不是唯一的。


电流输出:



customer_Email ID     Fraud

 name_0        0      False

 name_1        1      True

 name_2        2      True

 name_3        3      True   

 name_4        4      False

 name_1        5      True

 name_2        0      True

 name_1        1      True

 name_3        2      True

预期输出:


customer_Email ID     Fraud

 name_0        0      False

 name_1        1      True

 name_2        2      True

 name_3        3      True   

 name_4        4      False

 name_1        5      N/A

 name_2        0      N/A

 name_1        1      N/A

 name_3        2      N/A


缥缈止盈
浏览 153回答 4
4回答

catspeake

import pandas as pddf1 = pd.read_csv('1.csv')df2 = pd.read_csv('2.csv')out = pd.merge(df1, df2, on='customer_Email', how='left')out.loc[~out['customer_Email'].isin(df2.drop_duplicates(subset='customer_Email', keep=False)['customer_Email'].tolist()), 'Fraud'] = Noneout给出:    customer_Email  Fraud   ID0   name_0  0.0 01   name_1  NaN 12   name_1  NaN 53   name_2  1.0 24   name_3  1.0 35   name_4  0.0 46   name_1  NaN 17   name_1  NaN 5

萧十郎

“当 customerEmail 中存在重复项时,希望我的 Fraud 列具有空值。”所以在你的预期输出中你忘记添加,name_4 因为customerEmail它也是重复的&nbsp;df1 = pd.DataFrame({&nbsp; &nbsp; 'customerEmail':['name0','name1','name2','name3','name4','name1'],&nbsp; &nbsp; 'Fraud':[False,True,True,True,False,False]}&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )df2 = pd.DataFrame({&nbsp; &nbsp; 'customerEmail': ['name0', 'name1', 'name2', 'name3', 'name4', 'name1'],&nbsp; &nbsp; 'ID':[0,1,2,3,4,5]})df3=pd.merge(df1, df2, on='customerEmail', how='left')#here you need to know which customers are duplicated, to fill for them rows in column Frauddf_duplicates = df3.drop_duplicates(subset=['customerEmail'],keep='last')print(df_duplicates)&nbsp; customerEmail&nbsp; Fraud&nbsp; ID0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name0&nbsp; False&nbsp; &nbsp;03&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name2&nbsp; &nbsp;True&nbsp; &nbsp;24&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name3&nbsp; &nbsp;True&nbsp; &nbsp;35&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name4&nbsp; False&nbsp; &nbsp;47&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name1&nbsp; False&nbsp; &nbsp;5#now for those duplicates fill cells in column Fraud using iloc and np.nandf_duplicates.loc[:,'Fraud'] = np.nanprint(df_duplicates)&nbsp; customerEmail&nbsp; Fraud&nbsp; ID0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name0&nbsp; &nbsp; NaN&nbsp; &nbsp;03&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name2&nbsp; &nbsp; NaN&nbsp; &nbsp;24&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name3&nbsp; &nbsp; NaN&nbsp; &nbsp;35&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name4&nbsp; &nbsp; NaN&nbsp; &nbsp;47&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name1&nbsp; &nbsp; NaN&nbsp; &nbsp;5#so now you have two df's , one df_duplicates with Nans duplicates values above,#and main df3 with original merged values#now you need to add those df's using concat , (add column to column )#but you dont need values with same customerEmail that you used for df_duplicated, so you can delete them using drop_duplicatesresult = pd.concat([df3,df_duplicates]).drop_duplicates(subset=['customerEmail','Fraud'])#after concat True and False values has been coverted to 1.0 and 0 , for we need to change the type to False and True againresult.Fraud = result.Fraud.astype('boolean')print(result)&nbsp; customerEmail&nbsp; Fraud&nbsp; ID0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name0&nbsp; False&nbsp; &nbsp;01&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name1&nbsp; &nbsp;True&nbsp; &nbsp;13&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name2&nbsp; &nbsp;True&nbsp; &nbsp;24&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name3&nbsp; &nbsp;True&nbsp; &nbsp;35&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name4&nbsp; False&nbsp; &nbsp;46&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name1&nbsp; False&nbsp; &nbsp;10&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name0&nbsp; &nbsp;<NA>&nbsp; &nbsp;03&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name2&nbsp; &nbsp;<NA>&nbsp; &nbsp;24&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name3&nbsp; &nbsp;<NA>&nbsp; &nbsp;35&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name4&nbsp; &nbsp;<NA>&nbsp; &nbsp;47&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name1&nbsp; &nbsp;<NA>&nbsp; &nbsp;5

慕标5832272

您可以使用重复函数来keep=False获取 df1 和/或 df2 中的重复电子邮件。下面是对 df1 或 df2 中具有重复电子邮件的任何行添加 N/A 的代码。df = pd.merge(DF1, DF2, on='customerEmail', how='left')dups_1 = set(DF1.customerEmail[DF1.customerEmail.duplicated(keep=False)])  # get duplicated emails in df1dups_2 = set(DF2.customerEmail[DF2.customerEmail.duplicated(keep=False)])  # get duplicated emails in df2dups = dups_1.union(dups_2)    # get duplicated emails in df1 or df2 (you can also use only dups_1 or only dups_2)df["Fraud"] = df.apply(lambda row: "N/A" if row.customerEmail in dups else row.Fraud, axis=1)  # put N/A if email in dups

回首忆惘然

那么下面的呢?(假设customer_email在 df2 中是唯一的):df3 = pd.merge(df1, df2, on=['customer_Email'], how="left")df3["count"] = df3.groupby("customer_Email").cumcount()df3.loc[df3["count"]>0,"Fraud"] = "N/A"df3[["customer_Email","Fraud","ID"]]输出:&nbsp; &nbsp; customer_Email&nbsp; Fraud&nbsp; &nbsp;ID0&nbsp; &nbsp;name_0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; False&nbsp; &nbsp;01&nbsp; &nbsp;name_1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; True&nbsp; &nbsp; 12&nbsp; &nbsp;name_1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp;53&nbsp; &nbsp;name_2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; True&nbsp; &nbsp; 24&nbsp; &nbsp;name_3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; True&nbsp; &nbsp; 35&nbsp; &nbsp;name_4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; False&nbsp; &nbsp;46&nbsp; &nbsp;name_1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp;1&nbsp;&nbsp;7&nbsp; &nbsp;name_1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp;5
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python