仅在数据框中保留直接父子 ID 对

首页课程实战体系课手记专栏慕课教程

仅在数据框中保留直接父子 ID 对

我有以下数据框：

id_parent id_child

0 1100 1090

1 1100 1080

2 1100 1070

3 1100 1060

4 1090 1080

5 1090 1070

6 1080 1070

我只想保持直接父子连接。示例：1100 有 3 个连接，但只保留 1090，因为 1080 和 1070 已经是 1090 的子节点。此示例 df 仅包含 1 个样本，df 由多个父/子集群组成。

因此，输出应如下所示：

id_parent id_child

0 1100 1090

1 1090 1080

2 1080 1070

3 1100 1060

示例代码：

import pandas as pd

#create sample input

df_input = pd.DataFrame.from_dict({'id_parent': {0: 1100, 1: 1100, 2: 1100, 3: 1100, 4: 1090, 5: 1090, 6: 1080}, 'id_child': {0: 1090, 1: 1080, 2: 1070, 3: 1060, 4: 1080, 5: 1070, 6: 1070}})

#create sample output

df_output = pd.DataFrame.from_dict({'id_parent': {0: 1100, 1: 1090, 2: 1080, 3: 1100}, 'id_child': {0: 1090, 1: 1080, 2: 1070, 3: 1060}})

我目前的方法是基于这个问题：Creating dictionary of parent child pairs in pandas dataframe 但也许有一种简单干净的方法可以解决这个问题而不依赖于额外的非标准库？

弑天下

浏览 131回答 2

2回答

凤凰求蛊

这对我有用：# First: group df by child idgrouped  = df_input.groupby(['id_child'], as_index=True).apply(lambda a: a[:])# Second: Create a new output dataframeOUTPUT = pd.DataFrame(columns=['id_parent','id_child'])# Third: Fill it with the unique childs ids and the minimun id for their parent in case of more than one. for i,id_ch in enumerate(df_input.id_child.unique()):    OUTPUT.loc[i] = [min(grouped.loc[id_ch].id_parent), id_ch]

0 0

慕妹3146593

我可以使用得到结果drop_duplicatesIn [6]: dfOut[6]:   id_parent  id_child0       1100      10901       1100      10802       1100      10703       1090      10804       1090      10705       1080      1070In [9]: df.drop_duplicates(subset=['id_parent']).reset_index(drop=True)Out[9]:   id_parent  id_child0       1100      10901       1090      10802       1080      1070

0 0

随时随地看视频慕课网APP