具有不完美数据的 df:
df = pd.DataFrame({'A Surname' : ['Smith', 'Longshore', 'Jones'],
'A Title': ['Mr', 'Miss', np.nan],
'B Surname' : ['Smith', np.nan, 'Nguyen'],
'B Title': ['Mrs', np.nan, np.nan]})
我正在寻找一列,该列包含适合在可能的情况下同时寻址 A 和 B 的字符串。如果有np.nan,则组合字段返回np.nan,并且它需要符合逻辑(例如,如果“B姓氏”是np.nan,则不要使用“B标题”),所以我需要一系列规则来确定最合适的组合。我不成功的做法:
def combined(x):
full = df['A Title'] + ' ' & df['A Surname'] & ' & ' & df['B Title'] & ' ' & df['B Surname']
no_title = df['A Surname'] & ' & ' & df['B Surname']
# more combinations
if full != np.nan:
return full
elif no_title != np.nan:
return no_title
# more elifs
else:
return df['A Surname']
df['combined string'] = np.nan
df['combined string'] = df['combined string'].apply(combined)
所需的输出如下所示:
desired_df = pd.DataFrame({'A Surname' : ['Smith', 'Longshore', 'Jones'],
'A Title': ['Mr', 'Miss', 'Mr'],
'B Surname' : ['Smith', np.nan, 'Whatever'],
'B Title': ['Mrs', np.nan, np.nan],
'combined string': ['Mr Smith & Mrs Smith', 'Miss Longshore', 'Jones & Whatever']})
有什么实际的方法可以做到这一点?
沧海一幻觉
qq_遁去的一_1
相关分类