问题是,对于同一组中的每一行,计算当前行的去年内完成的案例数。
数据集 (df) 如下所示:
ID Date
abc 07/12/16
abc 02/04/17
abc 02/13/17
abc 02/16/19
xyz 11/03/14
xyz 11/06/14
xyz 02/17/16
我的想法:首先创建一个列lastyr:df['date'] - timedelta(days=365); 然后将每一行与整个组进行比较,计算组中有多少个日期 >= lastyr 和 < df['date']
我试图在 python 中定义一个函数,如:
# Create the dataframe
d = {'ID': ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz'],
'Date': ['07/12/16', '02/04/17', '02/13/17', '02/16/19', '11/03/14', '11/06/14', '02/17/16']}
df = pd.DataFrame(data=d)
df['Date'] = df['Date'].apply(pd.to_datetime)
df_1 = df
# df_1 is same as df. I tried to compare each row in df to whole column in df_1.
# Define and apply the function
def lastyear(row):
curr = row['Date']
lastyr = curr - datetime.timedelta(days=365)
if df['ID'] == df_1['ID']: # The compare is for same ID.
return (df_1['Date'] < curr) & (df_1['Date'] >= lastyr)
df.apply(lastyear, axis=1).groupby(['ID']).count()
但是它返回所有错误值。我认为这是因为它仍然比较两个数据框中的每一行,但我不知道如何重写它以将每一行与整列进行比较。
所需的输出是:
Group Date Count # of cases happened in last year
abc 07/12/16 0
abc 02/04/17 1
abc 02/13/17 2
abc 02/16/19 0
xyz 11/03/14 0
xyz 11/06/14 1
xyz 02/17/16 0
至尊宝的传说
慕哥9229398
相关分类