Pandas 通过在两个不同的数据框/熊猫中选择多个列来创建条件列

问题:我有 2 个数据框;

  1. df1 具有线圈 ID、样本因子、序列。每个coil_id 有449 条记录(范围1-499)并且有大约1000 个唯一的coil_id。

  2. df2 具有线圈 ID、样本、仪表。每个coil_id 大约有500 条记录(范围10-5000;可以更少)并且具有与df1 中相同的1000 个唯一coil_id。

df1:

+-------+-----------------

|coil_id|sample_factor|SEQ

+-------+-----------------

|E101634|10.4066      |  1

|E101634|20.8132      |  2

|E101634|31.2198      |  3 

|E101634|41.6264      |  4

|E101634|5220.033     |449

df2:


+-------+------+------+--

|coil_id|SAMPLE|GAUGE |

+-------+------+------+--

|E101634|    10|0.0565|

|E101634|    20|0.0569|

|E101634|    30|0.0567|

|E101634|    40|0.0561|

|E101634|  5000| 0.055|

由于记录数不同,我无法加入两个表。如果我这样做,我的样本值和仪表会发生变化。所以我不应该加入。接下来,我需要检查df1.sample_factor是否位于 df2.sample 和 df2.sample+1 之间,然后对gauge进行计算。示例:(如果 10.4 位于 10 和 20 之间,则0.0565+(((0.0569-0.0565)/10)*(10.4-10)))基本上按比例分配仪表。


我想从 df1 中的 Sample_factor 迭代每一行,并检查它是否位于 df2 中的 sample[i] 和 sample[i+1] 之间。然后对仪表执行按比例计算并将结果添加到 df1。


我试过这个:


def new_gauge : for row in df1('sample_factor'):

    if df1['sample_factor'] > df2['sample'] and df1['sample_factor'] < df2['sample'] + 1:

        return df2['gauge']+(((df2['gauge']+1)-df2['gauge'])/10)*(df1['sample_factor']-df2['sample']))

df1['new_gauge'] = df1.apply(new_gauge)

我知道它在语法上绝对错误,只是为了了解我想要什么。


任何帮助表示赞赏。谢谢:)


输出:

http://img2.mukewang.com/612f64d00001dd7b04710138.jpg

侃侃尔雅
浏览 152回答 1
1回答

慕少森

这是与您的预期输出相匹配的起始样本数据df1&nbsp; &nbsp;coil_id&nbsp; sample_factor&nbsp; SEQ0&nbsp; E101634&nbsp; &nbsp; &nbsp; &nbsp; 10.4066&nbsp; &nbsp; 11&nbsp; E101634&nbsp; &nbsp; &nbsp; &nbsp; 20.8132&nbsp; &nbsp; 22&nbsp; E101634&nbsp; &nbsp; &nbsp; &nbsp; 31.2198&nbsp; &nbsp; 33&nbsp; E101634&nbsp; &nbsp; &nbsp; &nbsp; 41.6264&nbsp; &nbsp; 44&nbsp; E101634&nbsp; &nbsp; &nbsp; &nbsp; 52.0330&nbsp; &nbsp; 55&nbsp; E101634&nbsp; &nbsp; &nbsp; &nbsp; 62.4396&nbsp; &nbsp; 66&nbsp; E101634&nbsp; &nbsp; &nbsp; 5220.0330&nbsp; 449df2&nbsp; &nbsp;coil_id&nbsp; SAMPLE&nbsp; &nbsp;GAUGE0&nbsp; E101634&nbsp; &nbsp; &nbsp; 10&nbsp; 0.05501&nbsp; E101634&nbsp; &nbsp; &nbsp; 20&nbsp; 0.05682&nbsp; E101634&nbsp; &nbsp; &nbsp; 30&nbsp; 0.05433&nbsp; E101634&nbsp; &nbsp; &nbsp; 40&nbsp; 0.05314&nbsp; E101634&nbsp; &nbsp; &nbsp; 50&nbsp; 0.05295&nbsp; E101634&nbsp; &nbsp; &nbsp; 60&nbsp; 0.0519第一步是merge_asof将样本因子带到最接近的样本。然后计算new_gauge每一行的列。但是,我们只会在 sample_factor 介于其当前行和下一行的值之间并且线圈 ID 与其和下一行的值相同时才实际分配一个值。import pandas as pdmerged = pd.merge_asof(df2.assign(SAMPLE = df2.SAMPLE.astype('float')).sort_values('SAMPLE'),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;df1.sort_values('sample_factor'),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;by='coil_id',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;left_on='SAMPLE',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;right_on='sample_factor',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;direction='forward')print(merged)#&nbsp; &nbsp;coil_id&nbsp; SAMPLE&nbsp; &nbsp;GAUGE&nbsp; sample_factor&nbsp; SEQ#0&nbsp; E101634&nbsp; &nbsp; 10.0&nbsp; 0.0550&nbsp; &nbsp; &nbsp; &nbsp; 10.4066&nbsp; &nbsp; 1#1&nbsp; E101634&nbsp; &nbsp; 20.0&nbsp; 0.0568&nbsp; &nbsp; &nbsp; &nbsp; 20.8132&nbsp; &nbsp; 2#2&nbsp; E101634&nbsp; &nbsp; 30.0&nbsp; 0.0543&nbsp; &nbsp; &nbsp; &nbsp; 31.2198&nbsp; &nbsp; 3#3&nbsp; E101634&nbsp; &nbsp; 40.0&nbsp; 0.0531&nbsp; &nbsp; &nbsp; &nbsp; 41.6264&nbsp; &nbsp; 4#4&nbsp; E101634&nbsp; &nbsp; 50.0&nbsp; 0.0529&nbsp; &nbsp; &nbsp; &nbsp; 52.0330&nbsp; &nbsp; 5#5&nbsp; E101634&nbsp; &nbsp; 60.0&nbsp; 0.0519&nbsp; &nbsp; &nbsp; &nbsp; 62.4396&nbsp; &nbsp; 6# Now perform your calculation:new_gauge = (merged.GAUGE.shift(1)&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+ ((merged.GAUGE - merged.GAUGE.shift(1))/10&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* (merged.sample_factor - merged.SAMPLE.shift(1))))# Assign it only where it makes sense# Assumes df2 was sorted on ['coil_id',&nbsp; 'SAMPLE']mask = (merged.sample_factor.between(merged.SAMPLE, merged.SAMPLE.shift(-1))&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; & (merged.coil_id == merged.coil_id.shift(-1)))merged.loc[mask, 'new_gauge'] = new_gauge[mask]&nbsp;输出: merged&nbsp; &nbsp;coil_id&nbsp; SAMPLE&nbsp; &nbsp;GAUGE&nbsp; sample_factor&nbsp; SEQ&nbsp; new_gauge0&nbsp; E101634&nbsp; &nbsp; 10.0&nbsp; 0.0550&nbsp; &nbsp; &nbsp; &nbsp; 10.4066&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; &nbsp; NaN1&nbsp; E101634&nbsp; &nbsp; 20.0&nbsp; 0.0568&nbsp; &nbsp; &nbsp; &nbsp; 20.8132&nbsp; &nbsp; 2&nbsp; &nbsp;0.0569462&nbsp; E101634&nbsp; &nbsp; 30.0&nbsp; 0.0543&nbsp; &nbsp; &nbsp; &nbsp; 31.2198&nbsp; &nbsp; 3&nbsp; &nbsp;0.0539953&nbsp; E101634&nbsp; &nbsp; 40.0&nbsp; 0.0531&nbsp; &nbsp; &nbsp; &nbsp; 41.6264&nbsp; &nbsp; 4&nbsp; &nbsp;0.0529054&nbsp; E101634&nbsp; &nbsp; 50.0&nbsp; 0.0529&nbsp; &nbsp; &nbsp; &nbsp; 52.0330&nbsp; &nbsp; 5&nbsp; &nbsp;0.0528595&nbsp; E101634&nbsp; &nbsp; 60.0&nbsp; 0.0519&nbsp; &nbsp; &nbsp; &nbsp; 62.4396&nbsp; &nbsp; 6&nbsp; &nbsp; &nbsp; &nbsp; NaN在这种情况下,我们没有分配最后一行,因为您提供的子集中没有样本 > 60。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python