如何计算数据框中每一行去年发生的案例数?

问题是,对于同一组中的每一行,计算当前行的去年内完成的案例数。


数据集 (df) 如下所示:


ID         Date    

abc      07/12/16    

abc      02/04/17   

abc      02/13/17    

abc      02/16/19    

xyz      11/03/14  

xyz      11/06/14    

xyz      02/17/16

我的想法:首先创建一个列lastyr:df['date'] - timedelta(days=365); 然后将每一行与整个组进行比较,计算组中有多少个日期 >= lastyr 和 < df['date']


我试图在 python 中定义一个函数,如:


# Create the dataframe

d = {'ID': ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz'], 

     'Date': ['07/12/16', '02/04/17', '02/13/17', '02/16/19', '11/03/14', '11/06/14', '02/17/16']} 

df = pd.DataFrame(data=d)

df['Date'] = df['Date'].apply(pd.to_datetime)


df_1 = df 

# df_1 is same as df. I tried to compare each row in df to whole column in df_1.


# Define and apply the function

def lastyear(row):

    curr = row['Date']

    lastyr = curr - datetime.timedelta(days=365)

    if df['ID'] == df_1['ID']: # The compare is for same ID.

        return (df_1['Date'] < curr) & (df_1['Date'] >= lastyr)


df.apply(lastyear, axis=1).groupby(['ID']).count()

但是它返回所有错误值。我认为这是因为它仍然比较两个数据框中的每一行,但我不知道如何重写它以将每一行与整列进行比较。


所需的输出是:


Group      Date       Count # of cases happened in last year


abc      07/12/16              0

abc      02/04/17              1

abc      02/13/17              2

abc      02/16/19              0

xyz      11/03/14              0

xyz      11/06/14              1

xyz      02/17/16              0


青春有我
浏览 159回答 2
2回答

至尊宝的传说

IIUC,这是我的回答:df['Date'] = pd.to_datetime(df.Date)df['delta'] = df.groupby('ID')['Date'].diff().dt.daysdf['flag'] = (df.groupby('ID').delta.cumsum()<365).astype(int)group_ids = df.flag.diff().ne(0).cumsum()df['count'] = df['flag'].groupby([df['ID'], group_ids]).cumsum()结果:(删除不相关的列)&nbsp; &nbsp; ID&nbsp; &nbsp; &nbsp; &nbsp;Date&nbsp; count0&nbsp; abc 2016-07-12&nbsp; &nbsp; &nbsp; 01&nbsp; abc 2017-02-04&nbsp; &nbsp; &nbsp; 12&nbsp; abc 2017-02-13&nbsp; &nbsp; &nbsp; 23&nbsp; abc 2019-02-16&nbsp; &nbsp; &nbsp; 04&nbsp; xyz 2014-11-03&nbsp; &nbsp; &nbsp; 05&nbsp; xyz 2014-11-06&nbsp; &nbsp; &nbsp; 16&nbsp; xyz 2016-02-17&nbsp; &nbsp; &nbsp; 0

慕哥9229398

我只是用稍微修改过的代码复制了你的逻辑:....df['Date'] = pd.to_datetime(df.Date)def lastyear(row):&nbsp; &nbsp; curr = row.Date&nbsp; &nbsp; lastyr = curr - pd.Timedelta(days=365)&nbsp; &nbsp; return (df[(df.ID == row.ID) & (df.Date > lastyr) & (df.Date < curr)]).ID.sizedf['Count'] = df.apply(lastyear, axis=1)df#Out[79]:&nbsp;#&nbsp; &nbsp; ID&nbsp; &nbsp; &nbsp; &nbsp;Date&nbsp; Count#0&nbsp; abc 2016-07-12&nbsp; &nbsp; &nbsp; 0#1&nbsp; abc 2017-02-04&nbsp; &nbsp; &nbsp; 1#2&nbsp; abc 2017-02-13&nbsp; &nbsp; &nbsp; 2#3&nbsp; abc 2019-02-16&nbsp; &nbsp; &nbsp; 0#4&nbsp; xyz 2014-11-03&nbsp; &nbsp; &nbsp; 0#5&nbsp; xyz 2014-11-06&nbsp; &nbsp; &nbsp; 1#6&nbsp; xyz 2016-02-17&nbsp; &nbsp; &nbsp; 0
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python