如果股票数据落在数据框中的特定时间之间,则 pandas 会合并股票数据

我有 2017 年到 2019 年每分钟的股票数据。我只想保留每天 9:16 之后的数据,因此我想将 9:00 到 9:16 之间的任何数据转换为 9:16 的值,即:

09:16 的值应该是

  • open:9:00 - 9:16 期间第一个数据的值,此处为 116.00

  • high:9:00 - 9:16 之间的最高值,此处为 117.00

  • low:9:00 - 9:16 之间的最低值,此处为 116.00

  • close:这将是 9:16 的值,这里是 113.00

                       open    high     low   close

date                                               

2017-01-02 09:08:00  116.00  116.00  116.00  116.00

2017-01-02 09:16:00  116.10  117.80  117.00  113.00

2017-01-02 09:17:00  115.50  116.20  115.50  116.20

2017-01-02 09:18:00  116.05  116.35  116.00  116.00

2017-01-02 09:19:00  116.00  116.00  115.60  115.75

...                     ...     ...     ...     ...

2029-12-29 15:56:00  259.35  259.35  259.35  259.35

2019-12-29 15:57:00  260.00  260.00  260.00  260.00

2019-12-29 15:58:00  260.00  260.00  259.35  259.35

2019-12-29 15:59:00  260.00  260.00  260.00  260.00

2019-12-29 16:36:00  259.35  259.35  259.35  259.35

这是我尝试过的:


#Get data from/to 9:00 - 9:16 and create only one data item


convertPreTrade = df.between_time("09:00", "09:16") #09:00 - 09:16


#combine modified value to original data


df.loc[df.index.strftime("%H:%M") == "09:16" , 

    ["open","high","low","close"] ] = [convertPreTrade["open"][0],

                                        convertPreTrade["high"].max(),

                                        convertPreTrade["low"].min(),

                                        convertPreTrade['close'][-1] ] 

但这不会给我准确的数据


阿晨1998
浏览 137回答 3
3回答

杨__羊羊

d = {'date': 'last', 'open': 'last',     'high': 'max', 'low': 'min', 'close': 'last'}# df.index = pd.to_datetime(df.index)s1 = df.between_time('09:00:00', '09:16:00')s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')df1 = pd.concat([df.drop(s1.index), s2]).sort_index()细节:用于DataFrame.between_time过滤数据框中介于以下df时间之间的行:09:0009:16print(s1)         open   high    low  close date                                            2017-01-02 09:08:00  116.0  116.0  116.0  116.0 2017-01-02 09:16:00  116.1  117.8  117.0  113.0用于DataFrame.groupby将此过滤后的数据帧分组s1并date使用字典进行聚合d:print(s2)         open   high    low  close date                                            2017-01-02 09:16:00  116.1  117.8  116.0  113.0使用从原始数据帧中删除介于 时间 之间的DataFrame.drop行,然后使用将其与 相连接,最后使用对索引进行排序:df09:00-09:16pd.concats2DataFrame.sort_indexprint(df1)                       open    high     low   closedate                                               2017-01-02 09:16:00  116.10  117.80  116.00  113.002017-01-02 09:17:00  115.50  116.20  115.50  116.202017-01-02 09:18:00  116.05  116.35  116.00  116.002017-01-02 09:19:00  116.00  116.00  115.60  115.752019-12-29 15:57:00  260.00  260.00  260.00  260.002019-12-29 15:58:00  260.00  260.00  259.35  259.352019-12-29 15:59:00  260.00  260.00  260.00  260.002019-12-29 16:36:00  259.35  259.35  259.35  259.352029-12-29 15:56:00  259.35  259.35  259.35  259.35

喵喵时光机

利用 @r-beginners 数据并添加额外的几行:import pandas as pdimport numpy as npimport iodata = '''datetime open high low close"2017-01-02 09:08:00"&nbsp; 116.00&nbsp; 116.00&nbsp; 116.00&nbsp; 116.00"2017-01-02 09:16:00"&nbsp; 116.10&nbsp; 117.80&nbsp; 117.00&nbsp; 113.00"2017-01-02 09:17:00"&nbsp; 115.50&nbsp; 116.20&nbsp; 115.50&nbsp; 116.20"2017-01-02 09:18:00"&nbsp; 116.05&nbsp; 116.35&nbsp; 116.00&nbsp; 116.00"2017-01-02 09:19:00"&nbsp; 116.00&nbsp; 116.00&nbsp; 115.60&nbsp; 115.75"2017-01-03 09:08:00"&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35"2017-01-03 09:09:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00"2017-01-03 09:16:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00"2017-01-03 09:17:00"&nbsp; 261.00&nbsp; 261.00&nbsp; 261.00&nbsp; 261.00"2017-01-03 09:18:00"&nbsp; 262.00&nbsp; 262.00&nbsp; 262.00&nbsp; 262.00"2017-12-03 09:18:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 259.35&nbsp; 259.35"2017-12-04 09:05:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00"2017-12-04 09:22:00"&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35'''df = pd.read_csv(io.StringIO(data), sep='\s+')下面的代码开始了整个过程。可能不是最好的方法,但这是快速而肮脏的方法:df['datetime'] = pd.to_datetime(df['datetime'])df = df.set_index('datetime')df['date'] = df.index.datedates = np.unique(df.index.date)first_rows = df.between_time('9:16', '00:00').reset_index().groupby('date').first().set_index('datetime')first_rows['date'] = first_rows.index.datedffs = []for d in dates:&nbsp; &nbsp; df_day = df[df['date'] == d].sort_index()&nbsp; &nbsp; first_bar_of_the_day = first_rows[first_rows['date'] == d].copy()&nbsp; &nbsp; bars_until_first = df_day.loc[df_day.index <= first_bar_of_the_day.index.values[0]]&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; if ~first_bar_of_the_day.empty:&nbsp; &nbsp; &nbsp; &nbsp; first_bar_of_the_day['open'] = bars_until_first['open'].values[0]&nbsp; &nbsp; &nbsp; &nbsp; first_bar_of_the_day['high'] = bars_until_first['high'].max()&nbsp; &nbsp; &nbsp; &nbsp; first_bar_of_the_day['low'] = bars_until_first['low'].min()&nbsp; &nbsp; &nbsp; &nbsp; first_bar_of_the_day['close'] = bars_until_first['close'].values[-1]&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; bars_after_first = df_day.loc[df_day.index > first_bar_of_the_day.index.values[0]]&nbsp; &nbsp; if len(bars_after_first) > 1:&nbsp; &nbsp; &nbsp; &nbsp; dff = pd.concat([first_bar_of_the_day, bars_after_first])&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; dff = first_bar_of_the_day.copy()&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; print(dff)&nbsp; &nbsp; dffs.append(dff)&nbsp; &nbsp;&nbsp;combined_df = pd.concat([x for x in dffs])print(combined_df)打印结果如下:dff对于不同日期&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;open&nbsp; &nbsp; high&nbsp; &nbsp; low&nbsp; &nbsp;close&nbsp; &nbsp; &nbsp; &nbsp; datedatetime&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;2017-01-02 09:16:00&nbsp; 116.00&nbsp; 117.80&nbsp; 116.0&nbsp; 113.00&nbsp; 2017-01-022017-01-02 09:17:00&nbsp; 115.50&nbsp; 116.20&nbsp; 115.5&nbsp; 116.20&nbsp; 2017-01-022017-01-02 09:18:00&nbsp; 116.05&nbsp; 116.35&nbsp; 116.0&nbsp; 116.00&nbsp; 2017-01-022017-01-02 09:19:00&nbsp; 116.00&nbsp; 116.00&nbsp; 115.6&nbsp; 115.75&nbsp; 2017-01-02&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;open&nbsp; &nbsp;high&nbsp; &nbsp; &nbsp;low&nbsp; close&nbsp; &nbsp; &nbsp; &nbsp; datedatetime&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2017-01-03 09:16:00&nbsp; 259.35&nbsp; 260.0&nbsp; 259.35&nbsp; 260.0&nbsp; 2017-01-032017-01-03 09:17:00&nbsp; 261.00&nbsp; 261.0&nbsp; 261.00&nbsp; 261.0&nbsp; 2017-01-032017-01-03 09:18:00&nbsp; 262.00&nbsp; 262.0&nbsp; 262.00&nbsp; 262.0&nbsp; 2017-01-03&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; open&nbsp; &nbsp;high&nbsp; &nbsp; &nbsp;low&nbsp; &nbsp;close&nbsp; &nbsp; &nbsp; &nbsp; datedatetime&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2017-12-03 09:18:00&nbsp; 260.0&nbsp; 260.0&nbsp; 259.35&nbsp; 259.35&nbsp; 2017-12-03&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; open&nbsp; &nbsp;high&nbsp; &nbsp; &nbsp;low&nbsp; &nbsp;close&nbsp; &nbsp; &nbsp; &nbsp; datedatetime&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2017-12-04 09:22:00&nbsp; 260.0&nbsp; 260.0&nbsp; 259.35&nbsp; 259.35&nbsp; 2017-12-04这combined_df&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;open&nbsp; &nbsp; high&nbsp; &nbsp; &nbsp;low&nbsp; &nbsp;close&nbsp; &nbsp; &nbsp; &nbsp; datedatetime&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2017-01-02 09:16:00&nbsp; 116.00&nbsp; 117.80&nbsp; 116.00&nbsp; 113.00&nbsp; 2017-01-022017-01-02 09:17:00&nbsp; 115.50&nbsp; 116.20&nbsp; 115.50&nbsp; 116.20&nbsp; 2017-01-022017-01-02 09:18:00&nbsp; 116.05&nbsp; 116.35&nbsp; 116.00&nbsp; 116.00&nbsp; 2017-01-022017-01-02 09:19:00&nbsp; 116.00&nbsp; 116.00&nbsp; 115.60&nbsp; 115.75&nbsp; 2017-01-022017-01-03 09:16:00&nbsp; 259.35&nbsp; 260.00&nbsp; 259.35&nbsp; 260.00&nbsp; 2017-01-032017-01-03 09:17:00&nbsp; 261.00&nbsp; 261.00&nbsp; 261.00&nbsp; 261.00&nbsp; 2017-01-032017-01-03 09:18:00&nbsp; 262.00&nbsp; 262.00&nbsp; 262.00&nbsp; 262.00&nbsp; 2017-01-032017-12-03 09:18:00&nbsp; 260.00&nbsp; 260.00&nbsp; 259.35&nbsp; 259.35&nbsp; 2017-12-032017-12-04 09:22:00&nbsp; 260.00&nbsp; 260.00&nbsp; 259.35&nbsp; 259.35&nbsp; 2017-12-04旁注:我不太确定您清除数据的方式是否是最好的,也许您可以看看是否完全忽略每天上午 9:16 之前的时间,甚至进行分析以检查前 15 个数据的波动性分钟来决定。

元芳怎么了

摘录时间为 9:00 至 9:16。数据框按年、月和日分组,并根据 OHLC 值进行计算。该逻辑使用您的代码。最后,添加 9:16 的日期列。由于我们没有所有数据,因此我们可能遗漏了一些考虑因素,但基本形式保持不变。import pandas as pdimport numpy as npimport iodata = '''date open high low close"2017-01-02 09:08:00"&nbsp; 116.00&nbsp; 116.00&nbsp; 116.00&nbsp; 116.00"2017-01-02 09:16:00"&nbsp; 116.10&nbsp; 117.80&nbsp; 117.00&nbsp; 113.00"2017-01-02 09:17:00"&nbsp; 115.50&nbsp; 116.20&nbsp; 115.50&nbsp; 116.20"2017-01-02 09:18:00"&nbsp; 116.05&nbsp; 116.35&nbsp; 116.00&nbsp; 116.00"2017-01-02 09:19:00"&nbsp; 116.00&nbsp; 116.00&nbsp; 115.60&nbsp; 115.75"2017-01-03 09:08:00"&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35"2017-01-03 09:09:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00"2017-12-03 09:18:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 259.35&nbsp; 259.35"2017-12-04 09:05:00"&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00&nbsp; 260.00"2017-12-04 09:22:00"&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35&nbsp; 259.35'''df = pd.read_csv(io.StringIO(data), sep='\s+')df.reset_index(drop=True, inplace=True)df['date'] = pd.to_datetime(df['date'])# 9:00-9:16df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]# calculatedf_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .agg(open_first=('open', lambda x: x.iloc[0,]),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;high_max=('high','max'),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;low_min=('low', 'min'),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;close_shift=('close', lambda x: x.iloc[-1,])))df_new.index.names = ['year', 'month', 'day']df_new.reset_index(inplace=True)df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'year&nbsp; &nbsp; month&nbsp; &nbsp;day open_first&nbsp; high_max&nbsp; &nbsp; low_min close_shift date0&nbsp; &nbsp;2017&nbsp; &nbsp; 1&nbsp; &nbsp;2&nbsp; &nbsp;116.00&nbsp; 117.8&nbsp; &nbsp;116.00&nbsp; 113.0&nbsp; &nbsp;2017-1-2 09:16:001&nbsp; &nbsp;2017&nbsp; &nbsp; 1&nbsp; &nbsp;3&nbsp; &nbsp;259.35&nbsp; 260.0&nbsp; &nbsp;259.35&nbsp; 260.0&nbsp; &nbsp;2017-1-3 09:16:002&nbsp; &nbsp;2017&nbsp; &nbsp; 12&nbsp; 4&nbsp; &nbsp;260.00&nbsp; 260.0&nbsp; &nbsp;260.00&nbsp; 260.0&nbsp; &nbsp;2017-12-4 09:16:00
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python