-
慕工程0101907
如果只想要第一个重复值到最后重复使用transformwithfirst然后NaN通过locwith设置值duplicated:df = pd.DataFrame({'id':[1,2,3,4,5], 'name':list('brslp'), 'codeid':[11,12,13,11,13]})df['relation'] = df.groupby('codeid')['name'].transform('first')print (df) id name codeid relation0 1 b 11 b1 2 r 12 r2 3 s 13 s3 4 l 11 b4 5 p 13 s#get first duplicated values of codeidprint (df['codeid'].duplicated(keep='last'))0 True1 False2 True3 False4 FalseName: codeid, dtype: bool#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows print (~df['codeid'].duplicated(keep=False))0 False1 True2 False3 False4 FalseName: codeid, dtype: bool#chain boolen mask together print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))0 True1 True2 True3 False4 FalseName: codeid, dtype: bool#replace True values by mask by NaN df.loc[df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False), 'relation'] = np.nanprint (df) id name codeid relation0 1 b 11 NaN1 2 r 12 NaN2 3 s 13 NaN3 4 l 11 b4 5 p 13 s
-
万千封印
这不是最佳解决方案,因为它会占用您的内存,但这是我的尝试。df1创建是为了保存列的null值relation,因为似乎空值是第一次出现。经过一些清理后,两个数据帧被合并为一个。import pandas as pddf = pd.DataFrame([['bag', 11, 'null'], ['shoes', 12, 'null'], ['shopper', 13, 'null'], ['leather', 11, 'bag'], ['plastic', 13, 'shopper'], ['something',13,""]], columns = ['name', 'codeid', 'relation'])df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relationdf1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entrydf1=df1.drop("relation",axis=1)#drop the unneeded columnfinal_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names
-
繁星点点滴滴
我想你想做这样的事情:import pandas as pddf = pd.DataFrame([['bag', 11, 'null'], ['shoes', 12, 'null'], ['shopper', 13, 'null'], ['leather', 11, 'bag'], ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])def codeid_analysis(rows): if rows['codeid'] == 11: rows['relation'] = 'bag' elif rows['codeid'] == 12: rows['relation'] = 'shirt' #for example. You should put what you want here elif rows['codeid'] == 13: rows['relation'] = 'pants' #for example. You should put what you want here return rowsresult = df.apply(codeid_analysis, axis = 1)print(result)