MultiIndex DataFrame：如何根据其他列中的值创建新列？

3回答

长风秋雁

我们可以groupby在您的索引（id）的第一级，然后标记所有的行eq。然后使用cumsumwhich 也转换True为1and Falseto 0：df['status'] = df.groupby(level=0).apply(lambda x: x.eq(1).cumsum())输出         event  statusid year               1  2013      1       1   2014      0       1   2015      0       1   2016      0       1   2017      0       12  2014      0       0   2015      0       0   2016      1       1   2017      0       13  2016      1       1   2017      0       14  2013      0       0   2014      1       1   2015      0       15  2014      0       0   2015      0       0   2016      0       0   2017      1       1

0 0

翻阅古今

关键是使用cumsum下groupbydf = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5],                   'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017,                             2016,2017,2013,2014,2015,2014,2015,2016,2017],                   'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})(df.assign(status = lambda x: x.event.eq(1).mul(1).groupby(x['id']).cumsum())   .set_index(['id','year']))输出        event   statusid  year        1   2013    1   1    2014    0   1    2015    0   1    2016    0   1    2017    0   12   2014    0   0    2015    0   0    2016    1   1    2017    0   13   2016    1   1    2017    0   14   2013    0   0    2014    1   1    2015    0   15   2014    0   0    2015    0   0    2016    0   0    2017    1   1

0 0

呼唤远方

带有段落解释的基本答案：import pandas as pddf = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5],                   'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017,                             2016,2017,2013,2014,2015,2014,2015,2016,2017],                   'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})# extract unique IDs as listids = list(set(df["id"]))# initialize a list to keep the resultslist_event_years =[]#open a loop on IDsfor id in ids :    # set happened to 0    event_happened = 0    # open a loop on DF pertaining to the actual ID    for index, row in df[df["id"] == id].iterrows() :        # if event happened set the variable to 1        if row["event"] == 1 :            event_happened = 1        # add the var to the list of results        list_event_years.append(event_happened)# add the list of results as DF columndf["event-happened"] = list_event_years### OUTPUT>>> df    id  year  event  event-year0    1  2013      1           11    1  2014      0           12    1  2015      0           13    1  2016      0           14    1  2017      0           15    2  2014      0           06    2  2015      0           07    2  2016      1           18    2  2017      0           19    3  2016      1           110   3  2017      0           111   4  2013      0           012   4  2014      1           113   4  2015      0           114   5  2014      0           015   5  2015      0           016   5  2016      0           017   5  2017      1           1如果您需要像示例中那样对它们进行索引，请执行以下操作：df.set_index(['id', 'year'], inplace = True)df.sort_index(inplace = True)### OUTPUT>>> df         event  event-yearid year                   1  2013      1           1   2014      0           1   2015      0           1   2016      0           1   2017      0           12  2014      0           0   2015      0           0   2016      1           1   2017      0           13  2016      1           1   2017      0           14  2013      0           0   2014      1           1   2015      0           15  2014      0           0   2015      0           0   2016      0           0   2017      1           1

0 0