如何在 pandas 中获得真正的外部连接?这意味着它实际上为您提供了整个输出,而不是组合要合并的列。在我看来,这有点愚蠢,因为它使得很难确定要连续执行哪种操作。我一直这样做是为了检测是否应该插入、更新或删除数据,但是我总是必须在列上创建额外的合并副本,这只是某些数据集上的大量开销(有时是大量开销)。
例子:
import pandas as pd
keys = ["A","B"]
df1 = pd.DataFrame({"A":[1,2,3],"B":["one","two","three"],"C":["testThis","testThat", "testThis"],"D":[None,hash("B"),hash("C")]})
df2 = pd.DataFrame({"A":[2,3,4],"B":["two","three","four"],"C":["testThis","testThat", "testThis"], "D":[hash("G"),hash("C"),hash("D")]})
fullJoinDf = df1.merge(df2, how="outer", left_on=keys, right_on=keys, suffixes=["","_r"])
display(
fullJoinDf,
)
A B C D C_r D_r
0 1 one testThis NaN NaN NaN
1 2 two testThat -3.656526e+18 testThis -9.136326e+18
2 3 three testThis -8.571400e+18 testThat -8.571400e+18
3 4 four NaN NaN testThis -4.190116e+17
注意到它如何输出A并B神奇地组合成一组列。我想要的是在 SQL 外连接等中得到的结果,例如:
A B C D A_r B_r C_r D_r
0 1 one testThis NaN NaN NaN NaN NaN
1 2 two testThat -3.656526e+18 2 two testThis -9.136326e+18
2 3 three testThis -8.571400e+18 3 three testThat -8.571400e+18
3 NaN NaN NaN NaN 4 four testThis -4.190116e+17
编辑@Felipe Whitaker
使用连接:
df3 = df1.copy().set_index(keys)
df4 = df2.copy().set_index(keys)
t = pd.concat([df3,df4], axis=1)
t.reset_index(),
A B C D C D
0 1 one testThis NaN NaN NaN
1 2 two testThat -3.656526e+18 testThis -9.136326e+18
2 3 three testThis -8.571400e+18 testThat -8.571400e+18
3 4 four NaN NaN testThis -4.190116e+17
编辑示例*鉴于答案,我将发布更多测试,因此任何其他偶然发现此问题的人都可以看到我在执行此操作时发现的更多“gatcha”变体。
慕的地8271018
ibeautiful
相关分类