猿问

熊猫:如何用时间开始和结束汇总每小时计数

我有一个数据框,其中包含每个唯一评级 ID 的开始和结束时间。

d={'ID':['01','02','03','04','05','06'],'Hour Start':[5,9,13,15,20,23],'Hour End':[6,9,15,19,0,2]}
df=pd.DataFrame(data=d)

我的目标是汇总整个数据集每小时活跃的评级数量。例如,ID:01 在早上 5 点和早上 6 点开始。那么早上 5 点和早上 6 点都应该各加 1 个计数。

但是对于ID:06,评分从晚上11点开始,到次日凌晨2点结束。因此,从晚上 11 点到凌晨 2 点,每小时应该增加 1 个计数。

我想输出一个如下所示的每小时摘要表。

我一直在思考解决方案。

任何帮助将不胜感激!谢谢 !


炎炎设计
浏览 136回答 2
2回答

慕田峪7331174

您可以将小时开始和结束列都转换为日期时间。然后你计算时间差。最后,将时差转换为小时差(将秒除以 3600):df['Hours_s'] = pd.to_datetime(df['Hour Start'], format='%H' )df['Hours_e'] = pd.to_datetime(df['Hour End'], format='%H' )df['delta'] = df['Hours_e']-df['Hours_s']df["count"] = df["delta"].apply(lambda x: x.seconds//3600)输出:ID   Hour_Start Hour_End count0          5       6       11          9       9       02          13      15      23          15      19      44          20      0       45          23      2       3更新:final_tab = pd.DataFrame({"Hour": range(0,24), "Count": [0]*24})for i, row in df.iterrows():    if row["delta"].days != 0:        final_tab.iloc[row["Hour Start"]:24,1] =final_tab.iloc[row["Hour Start"]:24,1] +1        final_tab.iloc[0:row["Hour End"]+1,1] =final_tab.iloc[0:row["Hour End"]+1,1] +1    else:        final_tab.iloc[row["Hour Start"]:row["Hour Start"]+row["count"],1] = final_tab.iloc[row["Hour Start"]:row["Hour Start"]+row["count"],1] + 1输出:print(final_tab)   Hour Count0   0   21   1   12   2   13   3   04   4   05   5   16   6   17   7   08   8   09   9   110  10  011  11  012  12  013  13  114  14  115  15  216  16  117  17  118  18  119  19  120  20  121  21  122  22  123  23  2

慕尼黑5688855

pd.to_datetimeIIUC,您可以使用and这样做pd.date_range:#Convert hours to datetimedf['endTime'] = pd.to_datetime(df['Hour End'], format='%H')df['startTime'] = pd.to_datetime(df['Hour Start'], format='%H')#If 'Hour End' less thn 'Hour Start' assume next daydf['endTime'] = np.where(df['Hour End'] < df['Hour Start'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;df['endTime']+pd.Timedelta(days=1),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;df['endTime'])#Create a series of hours per defined ranges ('Hour Start' to 'Hour End')df_hourly = df.apply(lambda x: pd.Series(pd.date_range(x['startTime'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;x['endTime'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;freq='H')),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;axis=1)\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .stack().dt.hour#Use value counts to count the hours and reindex to 24-hour day to fill missing hours.df_hourly.value_counts().reindex(np.arange(0,24)).fillna(0).astype(int)输出:0&nbsp; &nbsp; &nbsp;21&nbsp; &nbsp; &nbsp;12&nbsp; &nbsp; &nbsp;13&nbsp; &nbsp; &nbsp;04&nbsp; &nbsp; &nbsp;05&nbsp; &nbsp; &nbsp;16&nbsp; &nbsp; &nbsp;17&nbsp; &nbsp; &nbsp;08&nbsp; &nbsp; &nbsp;09&nbsp; &nbsp; &nbsp;110&nbsp; &nbsp; 011&nbsp; &nbsp; 012&nbsp; &nbsp; 013&nbsp; &nbsp; 114&nbsp; &nbsp; 115&nbsp; &nbsp; 216&nbsp; &nbsp; 117&nbsp; &nbsp; 118&nbsp; &nbsp; 119&nbsp; &nbsp; 120&nbsp; &nbsp; 121&nbsp; &nbsp; 122&nbsp; &nbsp; 123&nbsp; &nbsp; 2或者,使用explode和value_counts:df.apply(lambda x: pd.date_range(x['startTime'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;x['endTime'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;freq='H'), axis=1)\&nbsp; .explode().dt.hour.value_counts()\&nbsp; .reindex(np.arange(0,24), fill_value=0)
随时随地看视频慕课网APP

相关分类

Python
我要回答