检查一行中的日期是否早于下一行中的另一个日期

我在 Python 中有以下代码:


import pandas as pd

import numpy as np

date_rng = pd.date_range(start='5/18/2019', end='7/22/2020', freq='S')


df = pd.DataFrame(date_rng, columns=['start_timestamp'])

df['end_timestamp'] = date_rng

df['start_timestamp'] = np.random.randint(1589760000,1595376000,size=(len(date_rng)))

df['end_timestamp'] = np.random.randint(1589760000,1595376000,size=(len(date_rng)))

df = df[(df.end_timestamp/df.start_timestamp<=1.000009)&(df.end_timestamp/df.start_timestamp>=1.000001)]

df = df.sort_values(by=['start_timestamp','end_timestamp'])

df['start_timestamp'] = pd.to_datetime(df['start_timestamp'],unit='s')

df['end_timestamp'] = pd.to_datetime(df['end_timestamp'],unit='s')

结果,我有以下数据框:


  start_timestamp     end_timestamp

2020-05-18 00:00:30 2020-05-18 00:54:07

2020-05-18 00:01:40 2020-05-18 03:50:39

2020-05-18 00:02:08 2020-05-18 02:39:41

2020-05-18 00:04:01 2020-05-18 00:47:25

2020-05-18 00:04:01 2020-05-18 02:26:50

2020-05-18 00:04:44 2020-05-18 02:17:53


                .

                .

                .

我应该怎么做才能确保在我的数据集中每个end_timestamp都是在其下一行之前的日期时间start_timestamp?


已实施的解决方案


我基本上将数据集转换为数组,将其按升序排列并将其转换回数据框。它可能不是最优雅的解决方案,但它工作正常并为我打算使用的内容生成了一致的数据。


import pandas as pd

import numpy as np

date_rng = pd.date_range(start='7/22/2019', end='7/22/2020', freq='S')


df = pd.DataFrame(date_rng, columns=['start_timestamp'])

df['end_timestamp'] = date_rng

df['start_timestamp'] = np.random.randint(1563753600,1595376000,size=(len(date_rng)))

df['end_timestamp'] = np.random.randint(1563753600,1595376000,size=(len(date_rng)))

df = df[(df.end_timestamp/df.start_timestamp<=1.0000009)&(df.end_timestamp/df.start_timestamp>=1.0000001)]

df = df.to_numpy()

df = df.reshape(df.shape[0]*2,1)

df = np.sort(df,axis=0)

df = df.reshape(int(df.shape[0]/2),2)

df = pd.DataFrame(df,columns=['start_timestamp','end_timestamp'])

df['start_timestamp'] = pd.to_datetime(df['start_timestamp'],unit='s')

df['end_timestamp'] = pd.to_datetime(df['end_timestamp'],unit='s')


慕斯709654
浏览 132回答 1
1回答

扬帆大鱼

编写您的逻辑代码,一切都很好freq='S'没有任何意义,您将生成与开始日期和结束日期之间的秒数一样多的行在随机化开始时间后,使用当前行和下一行作为结束时间随机函数的种子。这是作为列表理解吗在范围的开始和结束处获取 UTC 秒数时更聪明一些import pandas as pdimport numpy as npfrom datetime import datetime# date_rng = pd.date_range(start='5/18/2019', end='7/22/2020', freq='S')date_rng = pd.date_range(start='5/18/2019', end='5/19/2019', freq='min')sec = [(date_rng.min() - datetime(1970, 1, 1)).total_seconds(),&nbsp; &nbsp; &nbsp; &nbsp;(date_rng.max() - datetime(1970, 1, 1)).total_seconds() ]df = pd.DataFrame(date_rng, columns=['start_timestamp'])df['start_timestamp'] = np.random.randint(sec[0],sec[1],size=(len(date_rng)))df = df.sort_values(by="start_timestamp")l = df["start_timestamp"].tolist()&nbsp; # get randomised start timesl[-1] = sec[1] # set last time to end of range# randomise end time between two start timesdf['end_timestamp'] = [np.random.randint(l[i], l[i+1]) if i<len(l)-1&nbsp; and l[i]<l[i+1] else l[i] for i, s in enumerate(l)]df['start_timestamp'] = pd.to_datetime(df['start_timestamp'],unit='s')df['end_timestamp'] = pd.to_datetime(df['end_timestamp'],unit='s')
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python