Pandas - 按索引向前和向后填充

DataFrame.groupby在列上使用patient_id并使用applytoffill和bfill：df['inclusion_timestamp'] = df.groupby('patient_id')['inclusion_timestamp']\ .apply(lambda x: x.ffill().bfill())DataFrame.groupby或者使用with的另一个想法Series.combine_first：g = df.groupby('patient_id')['inclusion_timestamp'] df['inclusion_timestamp'] = g.ffill().combine_first(g.bfill())使用两个连续的另一个想法Series.groupby：df['inclusion_timestamp'] = df['inclusion_timestamp'].groupby(df['patient_id'])\ .ffill().groupby(df['patient_id']).bfill()结果： patient_id inclusion_timestamp pre_event_1 post_event_1 post_event_20 1 28-06-2020 13:05 27-06-2020 12:26 NaN NaN1 1 28-06-2020 13:05 NaN NaN NaN2 1 28-06-2020 13:05 NaN 29-06-2020 14:00 NaN3 1 28-06-2020 13:05 NaN NaN 29-06-2020 23:574 2 29-06-2020 18:26 29-06-2020 10:11 NaN NaN5 2 29-06-2020 18:26 NaN NaN NaN6 2 29-06-2020 18:26 NaN 30-06-2020 19:36 NaN7 2 29-06-2020 18:26 NaN NaN 31-06-2020 21:208 3 30-06-2020 09:06 29-06-2020 06:35 NaN NaN9 3 30-06-2020 09:06 29-06-2020 07:28 NaN NaN10 3 30-06-2020 09:06 NaN NaN NaN11 3 30-06-2020 09:06 NaN NaN 01-07-2020 12:10性能（使用测量timeit）：df.shape(1200000, 5)%%timeit -n10 @Method 1 (Best Method)263 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%%timeit -n10 @Method 2342 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%%timeit -n10 @Method3297 ms ± 4.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Pandas - 按索引向前和向后填充

1回答