根据多个条件合并两个数据框

所以你想做一种“软”匹配。这是一个尝试矢量化日期范围匹配的解决方案。# notice working with dates as strings, inequalities will only work if dates in format y-m-d# otherwise it is safer to parse all date columns like `df_a.Date = pd.to_datetime(df_a)`# create a groupby object once so we can efficiently filter df_b inside the loop# good idea if df_b is considerably large and has many different IDsgdf_b = df_b.groupby('ID')b_IDs = gdf_b.indices # returns a dictionary with grouped rows {ID: arr(integer-indices)}matched = [] # so we can collect matched rows from df_b# iterate over rows with `.itertuples()`, more efficient than iterating range(len(df_a))for i, ID, date in df_a.itertuples():    if ID in b_IDs:        gID = gdf_b.get_group(ID) # get the filtered df_b        inrange = gID.Start_Date.le(date) & gID.End_Date.ge(date)        if any(inrange):            matched.append(                gID.loc[inrange.idxmax()] # get the first row with date inrange                .values[1:] # use the array without column indices and slice `ID` out            )        else:            matched.append([np.nan] * (df_b.shape[1] - 1)) # no date inrange, fill with NaNs    else:        matched.append([np.nan] * (df_b.shape[1] - 1)) # no ID match, fill with NaNsdf_c = df_a.join(pd.DataFrame(matched, columns=df_b.columns[1:]))print(df_c)输出      ID        Date  Start_Date    End_Date    A    B    C     D     E0    cd2  2020-05-12         NaN         NaN  NaN  NaN  NaN   NaN   NaN1    cd2  2020-04-12         NaN         NaN  NaN  NaN  NaN   NaN   NaN2    cd2  2020-06-10  2020-06-01  2020-06-24    a    b    c  10.0  20.03   cd15  2020-04-28         NaN         NaN  NaN  NaN  NaN   NaN   NaN4  cd193  2020-04-13         NaN         NaN  NaN  NaN  NaN   NaN   NaN

根据多个条件合并两个数据框

1回答