根据具有时间戳的另一列中的时间间隔创建包含值计数的列表

首页课程实战体系课手记专栏慕课教程

根据具有时间戳的另一列中的时间间隔创建包含值计数的列表

假设我有一个包含两列、一个字符串和一个日期时间的 pandas 数据框，如下所示：

ORDER TIMESTAMP

GO 6/4/2019 09:59:49.497000

STAY 6/4/2019 09:05:27.036000

WAIT 6/4/2019 10:33:05.645000

GO 6/4/2019 10:28:03.649000

STAY 6/4/2019 11:23:11.614000

GO 6/4/2019 11:00:33.574000

WAIT 6/4/2019 11:41:55.744000

我想创建一个列表，其中每个条目都是一个包含三个值的列表。对于每个选择的时间间隔（例如一小时），每个条目是：[开始时间、总行数、顺序为 GO 的行的百分比]。

例如，对于上面的数据框，我的列表是：

[6/4/2019 09:00:00.000000, 2, 50]

[6/4/2019 10:00:00.000000, 2, 50]

[6/4/2019 11:00:00.000000, 3, 33.3]

我创建了一个简单的 while 循环：

go= []

while t<=df["timestamp"].iloc[-1]:

tmp1 = df[(df["date_time"]>=t) & (df["timestamp"]<t+timedelta(hour=1))]

tmp2 = df[(df["date_time"]>=t) & (df["timestamp"]<t+timedelta(hour=1)) & (df["Order"]=="GO")]

go.append([t, tmp1.shape[0], 100.0*tmp2.shape[0]/tmp1.shape[0]])

#increment the time by the interval

t=t+timedelta(hour=1)

然而，我的初始数据帧有数百万行，我希望我的时间间隔比一个小时短得多，所以这种方法非常慢。更Pythonic的方法是什么？

BIG阳

浏览 117回答 1

1回答

小怪兽爱吃肉

让我们尝试groupby().agg()使用size行数并mean获取行的比率GO：(df.ORDER.eq('GO').astype(int)   .groupby(df.TIMESTAMP.dt.floor('1H'))   # groupby interval of choice   .agg(['size','mean'])   .reset_index()              # get timestamp back   .to_numpy().tolist()        # this is to generate the list)输出：[[Timestamp('2019-06-04 09:00:00'), 2, 0.5], [Timestamp('2019-06-04 10:00:00'), 2, 0.5], [Timestamp('2019-06-04 11:00:00'), 3, 0.3333333333333333]]

0 0

随时随地看视频慕课网APP