我在 Python 中有以下代码:
import pandas as pd
import numpy as np
date_rng = pd.date_range(start='5/18/2019', end='7/22/2020', freq='S')
df = pd.DataFrame(date_rng, columns=['start_timestamp'])
df['end_timestamp'] = date_rng
df['start_timestamp'] = np.random.randint(1589760000,1595376000,size=(len(date_rng)))
df['end_timestamp'] = np.random.randint(1589760000,1595376000,size=(len(date_rng)))
df = df[(df.end_timestamp/df.start_timestamp<=1.000009)&(df.end_timestamp/df.start_timestamp>=1.000001)]
df = df.sort_values(by=['start_timestamp','end_timestamp'])
df['start_timestamp'] = pd.to_datetime(df['start_timestamp'],unit='s')
df['end_timestamp'] = pd.to_datetime(df['end_timestamp'],unit='s')
结果,我有以下数据框:
start_timestamp end_timestamp
2020-05-18 00:00:30 2020-05-18 00:54:07
2020-05-18 00:01:40 2020-05-18 03:50:39
2020-05-18 00:02:08 2020-05-18 02:39:41
2020-05-18 00:04:01 2020-05-18 00:47:25
2020-05-18 00:04:01 2020-05-18 02:26:50
2020-05-18 00:04:44 2020-05-18 02:17:53
.
.
.
我应该怎么做才能确保在我的数据集中每个end_timestamp都是在其下一行之前的日期时间start_timestamp?
已实施的解决方案
我基本上将数据集转换为数组,将其按升序排列并将其转换回数据框。它可能不是最优雅的解决方案,但它工作正常并为我打算使用的内容生成了一致的数据。
import pandas as pd
import numpy as np
date_rng = pd.date_range(start='7/22/2019', end='7/22/2020', freq='S')
df = pd.DataFrame(date_rng, columns=['start_timestamp'])
df['end_timestamp'] = date_rng
df['start_timestamp'] = np.random.randint(1563753600,1595376000,size=(len(date_rng)))
df['end_timestamp'] = np.random.randint(1563753600,1595376000,size=(len(date_rng)))
df = df[(df.end_timestamp/df.start_timestamp<=1.0000009)&(df.end_timestamp/df.start_timestamp>=1.0000001)]
df = df.to_numpy()
df = df.reshape(df.shape[0]*2,1)
df = np.sort(df,axis=0)
df = df.reshape(int(df.shape[0]/2),2)
df = pd.DataFrame(df,columns=['start_timestamp','end_timestamp'])
df['start_timestamp'] = pd.to_datetime(df['start_timestamp'],unit='s')
df['end_timestamp'] = pd.to_datetime(df['end_timestamp'],unit='s')
扬帆大鱼
相关分类