猿问

合并数据框后,无法访问该数据框的groupby对象的各个列

这个问题与此类似,但是有一个关键的区别-当将数据帧分组到bin中时,链接问题的解决方案无法解决问题。


以下代码对2个变量的bin的相对分布进行箱线绘图会产生错误:


import pandas as pd

import seaborn as sns


raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 

        'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 

        'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 

        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],

        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])



df1 = df.groupby(['regiment'])['preTestScore'].value_counts().unstack()

df1.fillna(0, inplace=True)



sns.boxplot(x='regiment', y='preTestScore', data=df1)


---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-241-fc8036eb7d0b> in <module>()

----> 1 sns.boxplot(x='regiment', y='preTestScore', data=df1)

如果删除x和y参数,它会产生一个箱线图,但它不是我想要的箱线图:

我该如何解决?我尝试了以下方法:


df1 = df.groupby(['regiment'])['preTestScore'].value_counts().unstack()

df1.fillna(0, inplace=True)

df1 = df1.reset_index()

df1

http://img.mukewang.com/60accdea0001b05103170108.jpg


http://img2.mukewang.com/60accdf700018b6d03920257.jpg

这看起来不对。实际上,这不是正常的数据帧;如果我们打印出它的列,它不会显示regiment为一列,这就是为什么boxplot给出错误的原因ValueError: Could not interpret input 'regiment':


df1.columns

>>> Index(['regiment', 2, 3, 4, 24, 31], dtype='object', name='preTestScore')

因此,如果我能以某种方式使regiment数据框成为一列,我认为我应该能够绘制preTestScorevs的箱线图regiment。我错了吗?


慕田峪4524236
浏览 176回答 1
1回答

慕后森

如果reset_index()在dataframe上执行操作df1,则应获取要具有的数据框。问题是您有一个所需的列(regiment)作为索引,因此您需要重置它并将其设置为另一列。编辑:add_prefix在结果数据框中添加了适当的列名样例代码:import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltraw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],&nbsp; &nbsp; &nbsp; &nbsp; 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])df1 = df.groupby(['regiment'])['preTestScore'].value_counts().unstack()df1.fillna(0, inplace=True)df1 = df1.add_prefix('preTestScore ')&nbsp; # <- add_prefix for proper column namesdf2 = df1.reset_index()&nbsp; # <- Here is reset_index()cols = df2.columnsfig = plt.figure(figsize=(20,3))count = 1for col in cols[1:]:&nbsp; &nbsp; plt.subplot(1, len(cols)-1, count)&nbsp; &nbsp; sns.boxplot(x='regiment', y=col, data=df2)&nbsp; &nbsp; count+=1输出:
随时随地看视频慕课网APP

相关分类

Python
我要回答