在熊猫数据框中按时间段计算一列

我想在熊猫数据框中按时间段计算列数。


我的桌子:


 id1       date_time               adress       a_size       

 reom      2005-8-20 22:51:10      75157.5413   ceifwekd

 reom      2005-8-20 22:55:25      3571.37946   ceifwekd

 reom      2005-8-20 11:21:01      3571.37946   tnohcve

 reom      2005-8-20 11:29:09      97439.219    tnohcve

 penr      2005-8-20 17:07:16     97439.219    ceifwekd

 penr      2005-8-20 19:10:37      7391.6258    ceifwekd

 ....

我需要:


id1      time_period                     num_of_address

reom     2005-8-20 22:50:00 - 23:00:00      2

reom     2005-8-20 11:20:00 - 11:30:00      2

penr     2005-8-20 17:00:00 - 17:10:00      1

我的代码:我创建了一个新列来获取 date_time 的小时数。


 df['num_per_10_minutes'] = df['id1'].map(df.groupby('id1', 'hours').apply(lambda x: x['date_time'].count()))

但这不是我想要的。我需要每 10 分钟计算“地址”的数量。


小唯快跑啊
浏览 148回答 2
2回答

慕盖茨4494581

首先制作间隔列,然后使用pandas.DataFrame.groupby:import pandas as pddf['date_time'] = pd.to_datetime(df['date_time'])df = df.set_index('date_time', drop= True).sort_index()df['intervals'] = ["%s - %s" % (i, i+1)&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;for i in pd.date_range('2005-08-20', '2005-08-21', freq='10 min')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;for d in df.index if i<= d <= (i+1)]df.groupby(['id1', 'intervals'])['adress'].count().reset_index()输出:&nbsp; &nbsp; id1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; intervals&nbsp; adress0&nbsp; penr&nbsp; 2005-08-20 17:00:00 - 2005-08-20 17:10:00&nbsp; &nbsp; &nbsp; &nbsp;11&nbsp; penr&nbsp; 2005-08-20 19:10:00 - 2005-08-20 19:20:00&nbsp; &nbsp; &nbsp; &nbsp;12&nbsp; reom&nbsp; 2005-08-20 11:20:00 - 2005-08-20 11:30:00&nbsp; &nbsp; &nbsp; &nbsp;23&nbsp; reom&nbsp; 2005-08-20 22:50:00 - 2005-08-20 23:00:00&nbsp; &nbsp; &nbsp; &nbsp;2

RISEBY

第一个聚合计数GroupBy.sizewith Series.dt.floor:df['date_time'] = pd.to_datetime(df['date_time'])df = df.groupby(['id1', df['date_time'].dt.floor('10Min')]).size().reset_index(name='adress')print (df)&nbsp; &nbsp; id1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;date_time&nbsp; adress0&nbsp; penr 2005-08-20 17:00:00&nbsp; &nbsp; &nbsp; &nbsp;11&nbsp; penr 2005-08-20 19:10:00&nbsp; &nbsp; &nbsp; &nbsp;12&nbsp; reom 2005-08-20 11:20:00&nbsp; &nbsp; &nbsp; &nbsp;23&nbsp; reom 2005-08-20 22:50:00&nbsp; &nbsp; &nbsp; &nbsp;2Series.dt.strftime然后用 next改变日期时间的格式10 Min:df['date_time'] = (df['date_time'].dt.strftime('%Y-%m-%d %H:%M:%S') +&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(df['date_time'] + pd.Timedelta(10, unit='min')).dt.strftime(' - %H:%M:%S'))print (df)&nbsp; &nbsp; id1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;date_time&nbsp; adress0&nbsp; penr&nbsp; 2005-08-20 17:00:00 - 17:10:00&nbsp; &nbsp; &nbsp; &nbsp;11&nbsp; penr&nbsp; 2005-08-20 19:10:00 - 19:20:00&nbsp; &nbsp; &nbsp; &nbsp;12&nbsp; reom&nbsp; 2005-08-20 11:20:00 - 11:30:00&nbsp; &nbsp; &nbsp; &nbsp;23&nbsp; reom&nbsp; 2005-08-20 22:50:00 - 23:00:00&nbsp; &nbsp; &nbsp; &nbsp;2df['date_time'] = (df['date_time'].dt.strftime('%Y-%m-%d %H:%M:%S') +&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(df['date_time'] + pd.Timedelta(10, unit='min')).&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dt.strftime(' - %Y-%m-%d %H:%M:%S'))print (df)&nbsp; &nbsp; id1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; date_time&nbsp; adress0&nbsp; penr&nbsp; 2005-08-20 17:00:00 - 2005-08-20 17:10:00&nbsp; &nbsp; &nbsp; &nbsp;11&nbsp; penr&nbsp; 2005-08-20 19:10:00 - 2005-08-20 19:20:00&nbsp; &nbsp; &nbsp; &nbsp;12&nbsp; reom&nbsp; 2005-08-20 11:20:00 - 2005-08-20 11:30:00&nbsp; &nbsp; &nbsp; &nbsp;23&nbsp; reom&nbsp; 2005-08-20 22:50:00 - 2005-08-20 23:00:00&nbsp; &nbsp; &nbsp; &nbsp;2
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python