从多索引数据框中提取数据子集并计算列的差异

我有一个 pandas 数据框,其中第一行有多个条目,但第二行有重复的列。


               A                    B                 C

Date           open    r    close   open    r  close  open    r   close      

2000-07-03     19.7    5    17.1    66.26   4  6.22   23.26.  1   9.9

2000-07-05     49.8    2    8.3     78.81   6  4.34   39.81   5   5.1

2000-07-15     89.5    3    4.1     43.45   7  2.45   29.3    8   1.2

2000-08-13     74.7    6    7.4     34.26   8  6.4    72.26   9   5.4

2000-08-25     39.84   1    8.4     95.43   3  4.3    69.81.  0   5.2

2000-08-28     61.8    4    4.2     43.81   1  2.2    129.81  6   1.3

2000-09-11     82.79   7    7.4     66.26   1  6.5    72.25   6   5.6

2000-09-16     64.8    8    8.7     73.45   5  4.7    69.45   4   5.4

2000-09-22     58.5    9    3.3     13.81   8  2.9    777.8   8   1.4

我想提取 2000 年第 7 个月的数据,并找出 A 或 B 或 C 中哪个最低(开盘价 - 收盘价)?


我的计划:


s=data.stack(level=0)

values = s[s.index.get_level_values(1)]['open', 'close'].reset_index()

values['Date'] = pd.to_datetime(values['Date'])

start_date = 2000-07-01

end_date = 2000-08-01

mask = (data['date'] > start_date) & (data['date'] <= end_date)

df = data.loc[mask]

df['Val_Diff'] = df['open'] - df['close']

print(df['Val_Diff'].max()) 

我得到错误


KeyError: "None of [Index are in the [columns]"

为什么 multiindex 是这段代码的问题?


回首忆惘然
浏览 121回答 1
1回答

慕神8447489

我认为这是由于堆栈垂直变形时索引中未命名的列引起的。工艺流程:展平多索引的列名。使用 wide_to_long 函数从水平转换为垂直将日期序列转换为“日期时间”格式以进行条件提取。import pandas as pdimport numpy as npimport ioimport datetimedata = '''Date open r close open r close open r close&nbsp;&nbsp;2000-07-03 19.7 5 17.1 66.26 4 6.22 23.26 1 9.92000-07-05 49.8 2 8.3 78.81 6 4.34 39.81 5 5.12000-07-15 89.5 3 4.1 43.45 7 2.45 29.3 8 1.22000-08-13 74.7 6 7.4 34.26 8 6.4 72.26 9 5.42000-08-25 39.84 1 8.4 95.43 3 4.3 69.81 0 5.22000-08-28 61.8 4 4.2 43.81 1 2.2 129.81 6 1.32000-09-11 82.79 7 7.4 66.26 1 6.5 72.25 6 5.62000-09-16 64.8 8 8.7 73.45 5 4.7 69.45 4 5.42000-09-22 58.5 9 3.3 13.81 8 2.9 777.8 8 1.4'''data = pd.read_csv(io.StringIO(data), sep='\s+')idx = pd.MultiIndex.from_arrays([['','A','A','A','B','B','B','C','C','C'], ['Date','open','r','close','open','r','close','open','r','close']])data.columns = idxnew_cols = [k[1]+'_'+k[0] for k in data.columns[1:]]new_cols.insert(0, 'Date')data.columns = new_colsdata = pd.wide_to_long(data,['open','r','close'], i='Date', j='item', sep='_', suffix='\\w+')data.reset_index(inplace=True)data['Date'] = pd.to_datetime(data['Date'])start_date = datetime.datetime(2000,7,1)end_date = datetime.datetime(2000,8,1)mask = (data.Date > start_date) & (data.Date <= end_date)data = data.loc[mask]data&nbsp; &nbsp; Date&nbsp; &nbsp; item&nbsp; &nbsp; open&nbsp; &nbsp; r&nbsp; &nbsp;close0&nbsp; &nbsp;2000-07-03&nbsp; A&nbsp; &nbsp;19.70&nbsp; &nbsp;5&nbsp; &nbsp;17.101&nbsp; &nbsp;2000-07-05&nbsp; A&nbsp; &nbsp;49.80&nbsp; &nbsp;2&nbsp; &nbsp;8.302&nbsp; &nbsp;2000-07-15&nbsp; A&nbsp; &nbsp;89.50&nbsp; &nbsp;3&nbsp; &nbsp;4.109&nbsp; &nbsp;2000-07-03&nbsp; B&nbsp; &nbsp;66.26&nbsp; &nbsp;4&nbsp; &nbsp;6.2210&nbsp; 2000-07-05&nbsp; B&nbsp; &nbsp;78.81&nbsp; &nbsp;6&nbsp; &nbsp;4.3411&nbsp; 2000-07-15&nbsp; B&nbsp; &nbsp;43.45&nbsp; &nbsp;7&nbsp; &nbsp;2.4518&nbsp; 2000-07-03&nbsp; C&nbsp; &nbsp;23.26&nbsp; &nbsp;1&nbsp; &nbsp;9.9019&nbsp; 2000-07-05&nbsp; C&nbsp; &nbsp;39.81&nbsp; &nbsp;5&nbsp; &nbsp;5.1020&nbsp; 2000-07-15&nbsp; C&nbsp; &nbsp;29.30&nbsp; &nbsp;8&nbsp; &nbsp;1.20data['Val_Diff'] = data['open'] - data['close']print(data['Val_Diff'].max())&nbsp;85.4
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python