优雅的熊猫使用 date_range 和各种可能的频率设置进行预填充

我正在尝试预填充类似于以下内容的数据框:

http://img1.mukewang.com/61b0670b00012d7504210291.jpg

在示例中,我随机删除了一些行以突出挑战。我正在尝试*优雅地计算 dti 值。第一行中的 dti 值将为 0(即使第一行按照脚本被删除)但由于 dti 序列中出现间隙需要跳过缺失的行。一种合乎逻辑的方法是将 dt/delta 相除以创建一个唯一的整数来表示桶,但我尝试过的任何东西都感觉不到或看起来很优雅。


一些代码来帮助模拟问题:


from datetime import datetime, timedelta

import pandas as pd

import numpy as np


start = datetime.now()

nin = 24

delta='4H'


df = pd.date_range( start, periods=nin, freq=deltadf, name ='dt') 


# remove some random data points

frac_points = 8/24                  # Fraction of points to retain

r = np.random.rand(nin)

df = df[r <= frac_points]           # reduce the number of points

df = df.to_frame(index=False)       # reindex


df['dti'] = ...

先感谢您,


胡说叔叔
浏览 214回答 1
1回答

万千封印

一种解决方案是将每行之间的时间差除以 timedelta:from datetime import datetime, timedeltaimport pandas as pdimport numpy as npstart = datetime.now()nin = 24delta='4H'df = pd.date_range(start, periods=nin, freq=delta, name='dt')# Round to nearest ten minutes for better readabilitydf = df.round('10min')# Ensure reproducibilitynp.random.seed(1)# remove some random data pointsfrac_points = 8/24&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Fraction of points to retainr = np.random.rand(nin)df = df[r <= frac_points]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# reduce the number of pointsdf = df.to_frame(index=False)&nbsp; &nbsp; &nbsp; &nbsp;# reindexdf['dti'] = df['dt'].diff() / pd.to_timedelta(delta)df['dti'] = df['dti'].fillna(0).cumsum().astype(int)df&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dt&nbsp; dti0 2019-03-17 18:10:00&nbsp; &nbsp; 01 2019-03-17 22:10:00&nbsp; &nbsp; 12 2019-03-18 02:10:00&nbsp; &nbsp; 23 2019-03-18 06:10:00&nbsp; &nbsp; 34 2019-03-18 10:10:00&nbsp; &nbsp; 45 2019-03-19 10:10:00&nbsp; &nbsp;106 2019-03-19 18:10:00&nbsp; &nbsp;127 2019-03-20 10:10:00&nbsp; &nbsp;168 2019-03-20 14:10:00&nbsp; &nbsp;179 2019-03-21 02:10:00&nbsp; &nbsp;20
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python