从分组的熊猫数据框中绘制堆积图

我有一个如下所示的数据框。首先,我想要每个日期中每个状态的计数。例如,2017-11-02 中“完成”的数量是 2。我想要这样的堆栈图。


                   status              start_time                end_time  \

0             COMPLETED 2017-11-01 19:58:54.726 2017-11-01 20:01:05.414   

1             COMPLETED 2017-11-02 19:43:04.000 2017-11-02 19:47:54.877   

2     ABANDONED_BY_USER 2017-11-03 23:36:19.059 2017-11-03 23:36:41.045   

3  ABANDONED_BY_TIMEOUT 2017-10-31 17:02:38.689 2017-10-31 17:12:38.844   

4             COMPLETED 2017-11-02 19:35:33.192 2017-11-02 19:42:51.074   

这是数据框的csv:


status,start_time,end_time

COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414

COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877

ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045

ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844

COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074

ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074

为达到这个:


df_['status'].astype('category')

df_ = df_.set_index('start_time')

grouped = df_.groupby('status')

color = {'COMPLETED':'green','ABANDONED_BY_TIMEOUT':'blue',"MISSED":'red',"ABANDONED_BY_USER":'yellow'}


for key_, group in grouped:

   print(key_)

   df_ = group.groupby(lambda x: x.date).count()

   print(df_)

   df_['status'].plot(label=key_,kind='bar',stacked=True,\

   color=color[key_],rot=90)

plt.show()

以下输出是:


ABANDONED_BY_TIMEOUT

            status  end_time  

2017-10-31       1         1       

ABANDONED_BY_USER

            status  end_time  

2017-11-03       1         1            

COMPLETED

            status  end_time  

2017-11-01       1         1             

2017-11-02       2         2 

http://img2.mukewang.com/61a58d1800012cbd04720368.jpg

正如我们所看到的,这里的问题只考虑了最后两个日期“2017-11-01”和“2017-11-02”,而不是所有类别中的所有日期。我该如何解决这个问题?欢迎使用全新的堆叠图方法。提前致谢。


守候你守候我
浏览 168回答 3
3回答

慕桂英3389331

import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsdf_ = pd.read_csv('sam.csv')df_['date'] = pd.to_datetime(df_['start_time']).dt.datedf_ = df_.set_index('start_time')grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count")).pivot(columns='status', index='date', values='count')print(grouped)sns.set()grouped.plot(kind='bar', stacked=True)# g = grouped.plot(x='date', kind='bar', stacked=True)plt.show()输出:

慕森卡

尝试转型df_与pandas.crosstab替代:color = ['blue', 'yellow', 'green', 'red']df_xtab = pd.crosstab(df_.start_time.dt.date, df_.status)这DataFrame看起来像:status      ABANDONED_BY_TIMEOUT  ABANDONED_BY_USER  COMPLETEDstart_time                                                    2017-10-31                     1                  0          02017-11-01                     0                  0          12017-11-02                     1                  0          22017-11-03                     0                  1          0并且更容易绘制。df_xtab.plot(kind='bar',stacked=True, color=color, rot=90)

梦里花落0921

使用barplot带有色调的seaborn 库代码:import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsdf_ = pd.read_csv('sam.csv')df_['date'] = pd.to_datetime(df_['start_time']).dt.datedf_ = df_.set_index('start_time')print(df_)grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count"))print(grouped)g = sns.barplot(x='date', y='count', hue='status', data=grouped)plt.show()输出:数据:status,start_time,end_timeCOMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python