-
慕森王
使用groupbydef f(s): s = s.reset_index(drop=True) one = s[s.eq(1)] if one.empty: return -1 return -s.index + one.index[0]df.groupby('categories').event.transform(f) categories dates event time_until0 a 0 0 31 b 0 0 12 c 0 0 -13 a 1 0 24 b 1 1 05 c 1 0 -16 a 2 0 17 b 2 0 -18 c 2 0 -19 a 3 1 010 b 3 0 -211 c 3 0 -1请注意,即使在事件发生之后,它也会找到距离。因此,对于以下事件,您将获得以下输出event = [0, 0, 0, 1, 0, 0]until = [3, 2, 1, 0, -1, -2]如果您需要使所有负值保持不变-1,那么只需在最后进行调整df.time_until.where(df.time_until >= -1, -1)
-
眼眸繁星
替代解决方案:df.sort_values(by=['categories', 'dates'], ascending=[True, False], inplace=True)df['tmp'] = df.groupby('categories')['event'].transform('cumsum')df['time_until'] = df.groupby('categories')['tmp'].transform('cumsum') - 1df.drop(columns='tmp', inplace=True)df.sort_values(by=['dates', 'categories'], ascending=[True, True], inplace=True)输出: categories dates event time_until0 a 0 0 31 b 0 0 12 c 0 0 -13 a 1 0 24 b 1 1 05 c 1 0 -16 a 2 0 17 b 2 0 -18 c 2 0 -19 a 3 1 010 b 3 0 -111 c 3 0 -1
-
三国纷争
尝试这样的事情:import pandas as pdimport numpy as npdata = {'categories':['a','b','c']*4, 'dates':[i for i in range(4) for j in range(3)], 'event':[0, 1, 0]*4}df = pd.DataFrame(data)print(df)# One waydf.loc[df.event == 0, 'Newevents'] = 'Cancelled'df.loc[df.event != 0, 'Newevents'] = 'Scheduled'# Another wayconditions = [ (df['categories'] == "a"), (df['categories'] == "b"), (df['categories'] == "c")]choices = ['None', 'Completed', 'Scheduled']df['NewCategories'] = np.select(conditions, choices, default='black')print(df)输出:categories dates event0 a 0 01 b 0 12 c 0 03 a 1 04 b 1 15 c 1 06 a 2 07 b 2 18 c 2 09 a 3 010 b 3 111 c 3 0categories dates event Newevents NewCategories0 a 0 0 Cancelled None1 b 0 1 Scheduled Completed2 c 0 0 Cancelled Scheduled3 a 1 0 Cancelled None4 b 1 1 Scheduled Completed5 c 1 0 Cancelled Scheduled6 a 2 0 Cancelled None7 b 2 1 Scheduled Completed8 c 2 0 Cancelled Scheduled9 a 3 0 Cancelled None10 b 3 1 Scheduled Completed11 c 3 0 Cancelled