通过多次加入自身表来创建新列

我有一个包含大家庭成员列表的熊猫数据框。


import pandas as pd


data = {'child':['Joe','Anna','Anna','Steffani','Bob','Rea','Dani','Dani','Selma','John','Kevin'],

             'parents':['Steffani','Bob','Steffani','Dani','Selma','Anna','Selma','John','Kevin','-','Robert'],

            }

df = pd.DataFrame(data)

从这个数据框中,我需要通过在右侧添加多个列来显示数据之间的关系来构建一个新表。右栏中的值显示了长辈关系。每列代表关系。如果我可以绘制图表,它可能看起来像这样:


child --> parents --> grandparents --> parents of grandparents --> grandparents of grandparents --> etc.

因此,数据帧的预期输出将如下所示:


    child       parents     A           B           C           D (etc)

---------------------------------------------------------------------------------

0   Joe         Steffani    Dani        Selma       Kevin       <If still possible>

1   Joe         Steffani    Dani        John        -

2   Anna        Bob         Selma       Kevin       Robert

3   Anna        Steffani    Dani        Selma       Kevin

4   Anna        Steffani    Dani        John        -

5   Steffani    Dani        Selma       Kevin       Robert

6   Steffani    Dani        John        -           -

7   Bob         Selma       Kevin       Robert      -

8   Rea         Anna        Bob         Selma       Kevin

9   Rea         Anna        Steffani    Dani        Selma

10  Rea         Anna        Steffani    Dani        John

11  Dani        Selma       Kevin       Robert      -

12  Dani        John        -           -           -

13  Selma       Kevin       Robert      -           -

14  John        -           -           -           -

15  Kevin       Robert      -           -           -

目前,我使用手动构建新表pandas.merge。但是我需要做很多次,直到最后一列与左列没有长辈关系。例如:


步骤1


df2 = pd.merge(df, df, left_on='parents', right_on='child', how='left').fillna('-')

df2 = df2[['child_x','parents_x','parents_y']]

df2.columns = ['child','parents','A']

第2步


df3 = pd.merge(df2, df, left_on='A', right_on='child', how='left').fillna('-')

df3 = df3[['child_x','parents_x','A','parents_y']]

df3.columns = ['child','parents','A','B']

第 3 步

紫衣仙女
浏览 106回答 2
2回答

素胚勾勒不出你

reduce考虑使用suffixes参数对merge重复列名进行一些处理并删除中间子列的链合并:def proc_build(x,y):&nbsp; &nbsp; temp = (pd.merge(x, y, left_on='parents', right_on='child',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;how='left', suffixes=['_',''])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .fillna('-'))&nbsp; &nbsp; return temp&nbsp; &nbsp; &nbsp; &nbsp;final_df = (reduce(proc_build, [df, df, df, df])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.set_axis(['child', 'parents',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child1', 'A',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child2', 'B',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child3', 'C'], axis='columns', inplace=False)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.reindex(['child', 'parents'] + list('ABC'), axis='columns')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;)print(final_df)#&nbsp; &nbsp; &nbsp; &nbsp; child&nbsp; &nbsp;parents&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;A&nbsp; &nbsp; &nbsp; &nbsp;B&nbsp; &nbsp; &nbsp; &nbsp;C# 0&nbsp; &nbsp; &nbsp; &nbsp; Joe&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin# 1&nbsp; &nbsp; &nbsp; &nbsp; Joe&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-# 2&nbsp; &nbsp; &nbsp; &nbsp;Anna&nbsp; &nbsp; &nbsp; &nbsp;Bob&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert# 3&nbsp; &nbsp; &nbsp; &nbsp;Anna&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin# 4&nbsp; &nbsp; &nbsp; &nbsp;Anna&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-# 5&nbsp; &nbsp;Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert# 6&nbsp; &nbsp;Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 7&nbsp; &nbsp; &nbsp; &nbsp; Bob&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp; &nbsp;Kevin&nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-# 8&nbsp; &nbsp; &nbsp; &nbsp; Rea&nbsp; &nbsp; &nbsp; Anna&nbsp; &nbsp; &nbsp; &nbsp;Bob&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin# 9&nbsp; &nbsp; &nbsp; &nbsp; Rea&nbsp; &nbsp; &nbsp; Anna&nbsp; Steffani&nbsp; &nbsp; Dani&nbsp; &nbsp;Selma# 10&nbsp; &nbsp; &nbsp; &nbsp;Rea&nbsp; &nbsp; &nbsp; Anna&nbsp; Steffani&nbsp; &nbsp; Dani&nbsp; &nbsp; John# 11&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp; &nbsp;Kevin&nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-# 12&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 13&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp; &nbsp;Kevin&nbsp; &nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 14&nbsp; &nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 15&nbsp; &nbsp; &nbsp;Kevin&nbsp; &nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-要扩展另一列,例如D ,请在and中添加另一个带有附加列表项的df可迭代参数,特别是and 。虽然有一些方法可以使这些项目动态化,但可能会变得昂贵,因此应该以一些声明性的强调来处理。reduceset_axisreindex['child4', 'D']list('ABCD')reducefinal_df = (reduce(proc_build, [df] * 5)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.set_axis(['child', 'parents',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child1', 'A',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child2', 'B',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child3', 'C',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'child4', 'D'], axis='columns', inplace=False)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.reindex(['child', 'parents'] + list('ABCD'), axis='columns')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;)print(final_df)#&nbsp; &nbsp; &nbsp; &nbsp; child&nbsp; &nbsp;parents&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;A&nbsp; &nbsp; &nbsp; &nbsp;B&nbsp; &nbsp; &nbsp; &nbsp;C&nbsp; &nbsp; &nbsp; &nbsp;D# 0&nbsp; &nbsp; &nbsp; &nbsp; Joe&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert# 1&nbsp; &nbsp; &nbsp; &nbsp; Joe&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 2&nbsp; &nbsp; &nbsp; &nbsp;Anna&nbsp; &nbsp; &nbsp; &nbsp;Bob&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-# 3&nbsp; &nbsp; &nbsp; &nbsp;Anna&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert# 4&nbsp; &nbsp; &nbsp; &nbsp;Anna&nbsp; Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 5&nbsp; &nbsp;Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-# 6&nbsp; &nbsp;Steffani&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 7&nbsp; &nbsp; &nbsp; &nbsp; Bob&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp; &nbsp;Kevin&nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 8&nbsp; &nbsp; &nbsp; &nbsp; Rea&nbsp; &nbsp; &nbsp; Anna&nbsp; &nbsp; &nbsp; &nbsp;Bob&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin&nbsp; Robert# 9&nbsp; &nbsp; &nbsp; &nbsp; Rea&nbsp; &nbsp; &nbsp; Anna&nbsp; Steffani&nbsp; &nbsp; Dani&nbsp; &nbsp;Selma&nbsp; &nbsp;Kevin# 10&nbsp; &nbsp; &nbsp; &nbsp;Rea&nbsp; &nbsp; &nbsp; Anna&nbsp; Steffani&nbsp; &nbsp; Dani&nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp;-# 11&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp; &nbsp;Kevin&nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 12&nbsp; &nbsp; &nbsp; Dani&nbsp; &nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 13&nbsp; &nbsp; &nbsp;Selma&nbsp; &nbsp; &nbsp;Kevin&nbsp; &nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 14&nbsp; &nbsp; &nbsp; John&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-# 15&nbsp; &nbsp; &nbsp;Kevin&nbsp; &nbsp; Robert&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-&nbsp; &nbsp; &nbsp; &nbsp;-

千巷猫影

这是我的一个粗略的解决方案。你应该优化它。加载所有数据帧将所有数据框的名称保存在列表中list_data = [data1,data2]list_df = []i = 0for data in list_data:&nbsp; &nbsp; vars()[f'df{i}'] = pd.DataFrame(data)&nbsp; &nbsp; list_df.append(f'df{i}')&nbsp; &nbsp; i += 1然后创建2个代理变量;df_family :这将是一个输出last_df :为了打破循环,如果父列中的每一行都是'-',但列表中还剩下数据框。last_df = Falsedf_family = pd.DataFrame()这部分将根据需要将数据框合并在一起。我还将名称更改为 1,2,...,n,以便您轻松重命名。for df in list_df:&nbsp; &nbsp; if last_df:&nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; if (eval(df)['parents'] == '-').all():&nbsp; &nbsp; &nbsp; &nbsp; last_df = True&nbsp; &nbsp; if df_family.empty:&nbsp; &nbsp; &nbsp; &nbsp; df_family = eval(df)&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; df_family = pd.merge(df_family,eval(df), how = 'left', left_on = df_family.columns[-1], right_on = eval(df).columns[0])&nbsp; &nbsp; &nbsp; &nbsp; df_family.drop(columns = [eval(df).columns[0]], axis = 1, inplace = True)&nbsp; &nbsp; list_cols = [i for i in range(df_family.shape[1])]&nbsp; &nbsp; df_family.columns = list_cols
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python