猿问

如何比较两个数据框中的日期并更新列中的值

我有两个引用气象站的数据框:


      import pandas as pd

      df_shift = pd.DataFrame({'Date': ['2010-10-05', '2010-10-20', '2011-03-15',

                              '2012-03-22', '2015-01-17', '2015-01-23',

                              '2015-01-30'], 

                    'Sensor_id': [1024, 1024, 1024, 1024, 

                                  2210, 2210, 1010]})


       df_station = pd.DataFrame({'Sensor_id': [1024, 1024, 1024, 2210, 2210],

                       'Sensor_type': ['analog', 'analog', 'analog', 'dig', 'dig'], 

                       'Date': ['2010-10-01', '2010-10-22', '2011-03-14',

                                '2015-01-13', '2015-01-22']})

我想在 df_station 中创建一个新列,这个列称为“new_column”。


我希望此列填充数据框(班次和气象站)的日期字段之间的天数差异较少。


我做了以下代码:


       # Starting with a very large value

       df_station['new_column'] = 90000


       for i in range(0, len(df_station)):

           for j in range(0, len(df_shift)):

   

               var_Difference_Date = abs(pd.to_datetime(df_station['Date'].iloc[i], 

                                         format='%Y/%m/%d') -

                                        pd.to_datetime(df_shift['Date'].iloc[j], format='%Y/%m/%d'))

   

   

                if(df_station['Sensor_id'].iloc[i] == df_shift['Sensor_id'].iloc[j]):

       

                    if(var_Difference_Date.days < df_station['new_column'].iloc[i]):

       

                       df_station['new_column'].loc[i] = var_Difference_Date.days

显示结果,符合预期:


             Sensor_id  Sensor_type     Date       new_column

                1024         analog    2010-10-01   4

                1024         analog    2010-10-22   2

                1024         analog    2011-03-14   1

                2210          dig      2015-01-13   4

                2210          dig      2015-01-22   1

但是,有没有更有效的方法来做到这一点而不必使用两个 For()?谢谢。


慕斯709654
浏览 175回答 4
4回答

阿晨1998

我们这样做merge_asof,使用by和ondf_station['Date'] = pd.to_datetime(df_station['Date'])df_shift['Date'] = pd.to_datetime(df_shift['Date'])df_shift['DIFF'] = df_shift['Date']df = pd.merge_asof(df_station, df_shift[['Date', 'Sensor_id', 'DIFF']],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;on='Date',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;by='Sensor_id',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;direction='nearest')df['DIFF'] = (df.Date - df.DIFF).dt.days.abs()dfOut[377]:&nbsp;&nbsp; &nbsp;Sensor_id Sensor_type&nbsp; &nbsp; &nbsp; &nbsp;Date&nbsp; DIFF0&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog 2010-10-01&nbsp; &nbsp; &nbsp;41&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog 2010-10-22&nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog 2011-03-14&nbsp; &nbsp; &nbsp;13&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig 2015-01-13&nbsp; &nbsp; &nbsp;44&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig 2015-01-22&nbsp; &nbsp; &nbsp;1

DIEA

# Converting both dates in pandas datetime formatdf_shift['Date'] = pd.to_datetime(df_shift['Date'])df_station['Date'] = pd.to_datetime(df_station['Date'])# Aggregating for each Sensor_id, all the dates in a lista = df_shift.groupby(['Sensor_id'])['Date'].apply(list).reset_index(name='dates_list')# Merging it with the df_stationdf_station = df_station.merge(a, on='Sensor_id', how='left')# Finding LESS number of daysdef get_diff(x):&nbsp; &nbsp; d1, l = x&nbsp; &nbsp; for i,d2 in enumerate(l):&nbsp; &nbsp; &nbsp; &nbsp; if i==0:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; diff = abs((d2-d1).days)&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; t = abs((d2-d1).days)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if t<diff:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; diff = t&nbsp; &nbsp; return diffdf_station['new_column'] = df_station[['Date', 'dates_list']].apply(get_diff, axis=1)

小唯快跑啊

构建输入数据帧:import pandas as pddf_shift = pd.DataFrame({'Date': ['2010-10-05', '2010-10-20', '2011-03-15', '2012-03-22', '2015-01-17', '2015-01-23', '2015-01-30'], 'Sensor_id': [1024, 1024, 1024, 1024, 2210, 2210, 1010]})df_station = pd.DataFrame({'Sensor_id': [1024, 1024, 1024, 2210, 2210], 'Sensor_type': ['analog', 'analog', 'analog', 'dig', 'dig'], 'Date': ['2010-10-01', '2010-10-22', '2011-03-14', '2015-01-13', '2015-01-22']})df_shift["Date"] = pd.to_datetime(df_shift["Date"]).dt.datedf_station["Date"] = pd.to_datetime(df_station["Date"]).dt.date合并两个数据帧并计算绝对日期差异:df_merge = pd.merge(df_station, df_shift, how="left", on="Sensor_id", suffixes=["_station","_shift"])df_merge['Date_abs_diff'] = (df_merge.Date_shift - df_merge.Date_station).abs()合并的数据框现在是:>>> df_merge&nbsp; &nbsp;Date_station&nbsp; Sensor_id Sensor_type&nbsp; Date_shift Date_abs_diff0&nbsp; &nbsp; 2010-10-01&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-05&nbsp; &nbsp; &nbsp; &nbsp; 4 days1&nbsp; &nbsp; 2010-10-01&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-20&nbsp; &nbsp; &nbsp; &nbsp;19 days2&nbsp; &nbsp; 2010-10-01&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2011-03-15&nbsp; &nbsp; &nbsp; 165 days3&nbsp; &nbsp; 2010-10-01&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2012-03-22&nbsp; &nbsp; &nbsp; 538 days4&nbsp; &nbsp; 2010-10-22&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-05&nbsp; &nbsp; &nbsp; &nbsp;17 days5&nbsp; &nbsp; 2010-10-22&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-20&nbsp; &nbsp; &nbsp; &nbsp; 2 days6&nbsp; &nbsp; 2010-10-22&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2011-03-15&nbsp; &nbsp; &nbsp; 144 days7&nbsp; &nbsp; 2010-10-22&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2012-03-22&nbsp; &nbsp; &nbsp; 517 days8&nbsp; &nbsp; 2011-03-14&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-05&nbsp; &nbsp; &nbsp; 160 days9&nbsp; &nbsp; 2011-03-14&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-20&nbsp; &nbsp; &nbsp; 145 days10&nbsp; &nbsp;2011-03-14&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2011-03-15&nbsp; &nbsp; &nbsp; &nbsp; 1 days11&nbsp; &nbsp;2011-03-14&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2012-03-22&nbsp; &nbsp; &nbsp; 374 days12&nbsp; &nbsp;2015-01-13&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig&nbsp; 2015-01-17&nbsp; &nbsp; &nbsp; &nbsp; 4 days13&nbsp; &nbsp;2015-01-13&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig&nbsp; 2015-01-23&nbsp; &nbsp; &nbsp; &nbsp;10 days14&nbsp; &nbsp;2015-01-22&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig&nbsp; 2015-01-17&nbsp; &nbsp; &nbsp; &nbsp; 5 days15&nbsp; &nbsp;2015-01-22&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig&nbsp; 2015-01-23&nbsp; &nbsp; &nbsp; &nbsp; 1 days接下来,执行 groupby 计算,取日期差异的最小值:df_min = df_merge.groupby(by="Date_station")["Date_abs_diff"].agg("min").reset_index()>>> df_min&nbsp; Date_station Date_abs_diff0&nbsp; &nbsp;2010-10-01&nbsp; &nbsp; &nbsp; &nbsp; 4 days1&nbsp; &nbsp;2010-10-22&nbsp; &nbsp; &nbsp; &nbsp; 2 days2&nbsp; &nbsp;2011-03-14&nbsp; &nbsp; &nbsp; &nbsp; 1 days3&nbsp; &nbsp;2015-01-13&nbsp; &nbsp; &nbsp; &nbsp; 4 days4&nbsp; &nbsp;2015-01-22&nbsp; &nbsp; &nbsp; &nbsp; 1 days最后,将其合并回 df_station 并清理以获得最终结果:df_output = pd.merge(df_station, df_min, how="left", left_on="Date", right_on="Date_station")df_output.drop(columns='Date_station', inplace=True)df_output.rename(columns={'Date_abs_diff': 'new_column'}, inplace=True)df_output['new_column'] = df_output['new_column'].dt.days>>> df_output&nbsp; &nbsp;Sensor_id Sensor_type&nbsp; &nbsp; &nbsp; &nbsp; Date&nbsp; new_column0&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-01&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;41&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2010-10-22&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;22&nbsp; &nbsp; &nbsp; &nbsp;1024&nbsp; &nbsp; &nbsp; analog&nbsp; 2011-03-14&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;13&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig&nbsp; 2015-01-13&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;44&nbsp; &nbsp; &nbsp; &nbsp;2210&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dig&nbsp; 2015-01-22&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1

回首忆惘然

df_shift['Date_s'] = pd.to_datetime(df_shift['Date'])df_station['Date'] = pd.to_datetime(df_station['Date'])t = pd.merge_asof(df_station, df_shift[['Date_s','Sensor_id']],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; left_on='Date',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; right_on='Date_s',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; direction='nearest')t = t[t['Sensor_id_x']==t['Sensor_id_y']]t['new column'] = abs((t['Date_s'] - t['Date']).dt.days)t.drop(columns=['Date_s','Sensor_id_x'], inplace=True)t.columns = ['Sensor_type','Date','Sensor_id','new column']输出&nbsp; &nbsp; Sensor_type Date&nbsp; &nbsp; &nbsp; &nbsp; Sensor_id&nbsp; &nbsp;new column0&nbsp; &nbsp;analog&nbsp; &nbsp; &nbsp; 2010-10-01&nbsp; 1024&nbsp; &nbsp; &nbsp; &nbsp; 41&nbsp; &nbsp;analog&nbsp; &nbsp; &nbsp; 2010-10-22&nbsp; 1024&nbsp; &nbsp; &nbsp; &nbsp; 22&nbsp; &nbsp;analog&nbsp; &nbsp; &nbsp; 2011-03-14&nbsp; 1024&nbsp; &nbsp; &nbsp; &nbsp; 13&nbsp; &nbsp;dig&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2015-01-13&nbsp; 2210&nbsp; &nbsp; &nbsp; &nbsp; 44&nbsp; &nbsp;dig&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2015-01-22&nbsp; 2210&nbsp; &nbsp; &nbsp; &nbsp; 1
随时随地看视频慕课网APP

相关分类

Python
我要回答