Python Dataframes:根据条件合并两个数据帧(Pandas)

假设我有这两个 DataFrame:


DATAFRAME 1

    onset  offset

0       1     200

1     201     400

2     401     600

3     601     800

4     801    1000

5    1001    1200

6    1201    1400

7    1401    1600

8    1601    1800

9    1801    2000

10   2001    2200

11   2201    2400

12   2401    2600

13   2601    2800

14   2801    3000

15   3001    3200

16   3201    3400

17   3401    3600

18   3601    3800

19   3801    4000

20   4001    4200

21   4201    4400

22   4401    4600

23   4601    4800

24   4801    5000

25   5001    5200

26   5201    5400

27   5401    5600

28   5601    5800

29   5801    6000

DATAFRAME 2

   onset rhythm_name  rhythm_code  offset

0      1         NSR          100    2760

1   2761  JUNCTIONAL         4000    3938

2   3939         NSR          100    6000

我的目标是将两个数据帧与起始偏移间隔合并,并添加它们各自的rhythm_name和rhythm_code以获得如下内容:


    onset  offset  rhythm_name  rhythm_code

0       1     200        NSR          100 

1     201     400        NSR          100

2     401     600        NSR          100

3     601     800        NSR          100

4     801    1000        NSR          100

5    1001    1200        NSR          100

6    1201    1400        NSR          100

7    1401    1600        NSR          100

8    1601    1800        NSR          100

9    1801    2000        NSR          100

10   2001    2200        NSR          100

11   2201    2400        NSR          100

12   2401    2600        NSR          100

13   2601    2800        Null         Null


我可以用什么来做到这一点?我找不到解决这个问题的方法。我试过类似的东西:


df1["rhythm_name"] = df2[(df1['onset'] >= df2['onset']) & (df1['offset'] <= df2['offset'])])

我明白了:


ValueError: Can only compare identically-labeled Series objects


有只小跳蛙
浏览 123回答 2
2回答

素胚勾勒不出你

您可以pd.merge_asof, 并屏蔽第二个条件:dfm = pd.merge_asof(df1, df2, on='onset', direction='backward', suffixes=('','_y'))dfm[['rhythm_name', 'rhythm_code']] = (dfm[['rhythm_name', 'rhythm_code']]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .where(dfm['offset'] <= dfm['offset_y']))dfm.drop('offset_y', axis=1)输出:&nbsp; &nbsp; onset&nbsp; offset rhythm_name&nbsp; rhythm_code0&nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.01&nbsp; &nbsp; &nbsp;201&nbsp; &nbsp; &nbsp;400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.02&nbsp; &nbsp; &nbsp;401&nbsp; &nbsp; &nbsp;600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.03&nbsp; &nbsp; &nbsp;601&nbsp; &nbsp; &nbsp;800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.04&nbsp; &nbsp; &nbsp;801&nbsp; &nbsp; 1000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.05&nbsp; &nbsp; 1001&nbsp; &nbsp; 1200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.06&nbsp; &nbsp; 1201&nbsp; &nbsp; 1400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.07&nbsp; &nbsp; 1401&nbsp; &nbsp; 1600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.08&nbsp; &nbsp; 1601&nbsp; &nbsp; 1800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.09&nbsp; &nbsp; 1801&nbsp; &nbsp; 2000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.010&nbsp; &nbsp;2001&nbsp; &nbsp; 2200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.011&nbsp; &nbsp;2201&nbsp; &nbsp; 2400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.012&nbsp; &nbsp;2401&nbsp; &nbsp; 2600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.013&nbsp; &nbsp;2601&nbsp; &nbsp; 2800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NaN14&nbsp; &nbsp;2801&nbsp; &nbsp; 3000&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.015&nbsp; &nbsp;3001&nbsp; &nbsp; 3200&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.016&nbsp; &nbsp;3201&nbsp; &nbsp; 3400&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.017&nbsp; &nbsp;3401&nbsp; &nbsp; 3600&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.018&nbsp; &nbsp;3601&nbsp; &nbsp; 3800&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.019&nbsp; &nbsp;3801&nbsp; &nbsp; 4000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NaN20&nbsp; &nbsp;4001&nbsp; &nbsp; 4200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.021&nbsp; &nbsp;4201&nbsp; &nbsp; 4400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.022&nbsp; &nbsp;4401&nbsp; &nbsp; 4600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.023&nbsp; &nbsp;4601&nbsp; &nbsp; 4800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.024&nbsp; &nbsp;4801&nbsp; &nbsp; 5000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.025&nbsp; &nbsp;5001&nbsp; &nbsp; 5200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.026&nbsp; &nbsp;5201&nbsp; &nbsp; 5400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.027&nbsp; &nbsp;5401&nbsp; &nbsp; 5600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.028&nbsp; &nbsp;5601&nbsp; &nbsp; 5800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.029&nbsp; &nbsp;5801&nbsp; &nbsp; 6000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.0

德玛西亚99

如果你的数据不是太大,你可以使用广播方式:cond1 = df1.onset.values[:,None] >= df2.onset.valuescond2 = df1.offset.values[:,None] <= df2.offset.valuesmask = (cond1&cond2)idx = np.where(mask.any(1), mask.argmax(1), np.nan)for col in ['rhythm_name', 'rhythm_code']:&nbsp; &nbsp; df1[col] = df2[col].reindex(idx).values输出:0&nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.01&nbsp; &nbsp; &nbsp;201&nbsp; &nbsp; &nbsp;400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.02&nbsp; &nbsp; &nbsp;401&nbsp; &nbsp; &nbsp;600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.03&nbsp; &nbsp; &nbsp;601&nbsp; &nbsp; &nbsp;800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.04&nbsp; &nbsp; &nbsp;801&nbsp; &nbsp; 1000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.05&nbsp; &nbsp; 1001&nbsp; &nbsp; 1200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.06&nbsp; &nbsp; 1201&nbsp; &nbsp; 1400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.07&nbsp; &nbsp; 1401&nbsp; &nbsp; 1600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.08&nbsp; &nbsp; 1601&nbsp; &nbsp; 1800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.09&nbsp; &nbsp; 1801&nbsp; &nbsp; 2000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.010&nbsp; &nbsp;2001&nbsp; &nbsp; 2200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.011&nbsp; &nbsp;2201&nbsp; &nbsp; 2400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.012&nbsp; &nbsp;2401&nbsp; &nbsp; 2600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.013&nbsp; &nbsp;2601&nbsp; &nbsp; 2800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NaN14&nbsp; &nbsp;2801&nbsp; &nbsp; 3000&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.015&nbsp; &nbsp;3001&nbsp; &nbsp; 3200&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.016&nbsp; &nbsp;3201&nbsp; &nbsp; 3400&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.017&nbsp; &nbsp;3401&nbsp; &nbsp; 3600&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.018&nbsp; &nbsp;3601&nbsp; &nbsp; 3800&nbsp; JUNCTIONAL&nbsp; &nbsp; &nbsp; &nbsp;4000.019&nbsp; &nbsp;3801&nbsp; &nbsp; 4000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NaN20&nbsp; &nbsp;4001&nbsp; &nbsp; 4200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.021&nbsp; &nbsp;4201&nbsp; &nbsp; 4400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.022&nbsp; &nbsp;4401&nbsp; &nbsp; 4600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.023&nbsp; &nbsp;4601&nbsp; &nbsp; 4800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.024&nbsp; &nbsp;4801&nbsp; &nbsp; 5000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.025&nbsp; &nbsp;5001&nbsp; &nbsp; 5200&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.026&nbsp; &nbsp;5201&nbsp; &nbsp; 5400&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.027&nbsp; &nbsp;5401&nbsp; &nbsp; 5600&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.028&nbsp; &nbsp;5601&nbsp; &nbsp; 5800&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.029&nbsp; &nbsp;5801&nbsp; &nbsp; 6000&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NSR&nbsp; &nbsp; &nbsp; &nbsp; 100.0选项 2:另一种(更好的)方法merge_asof:(pd.merge_asof(df1,df2,on='onset',direction='backward',suffixes=['','_y'])&nbsp; &nbsp;.query('offset<=offset_y')&nbsp; &nbsp;.reindex(df1.index)&nbsp; &nbsp;.drop('offset_y', axis=1)&nbsp; &nbsp;.fillna(df1))你得到相同的输出。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python