猿问

如何使用 seaborn 为每小时独特设备绘制 KDE?

我有以下熊猫df(datetime属于类型datetime64):


       device            datetime

0       846ee 2020-03-22 14:27:29

1       0a26e 2020-03-22 15:33:31

2       8a906 2020-03-27 16:19:06

3       6bf11 2020-03-27 16:05:20

4       d3923 2020-03-23 18:58:51

我想使用 Seaborn 的 KDE 功能distplot。尽管我不完全明白为什么,但我还是让它工作了:


df['hour'] = df['datetime'].dt.floor('T').dt.time

df['hour'] = pd.to_timedelta(df['hour'].astype(str)) / pd.Timedelta(hours=1)

进而


sns.distplot(df['hour'], hist=False, bins=arr, label='tef')

问题是:我如何做同样的事情,但只计算 unique devices?我努力了

  1. df.groupby(['hour']).nunique().reset_index()

  2. df.groupby(['hour'])[['device']].size().reset_index()

但是他们给了我不同的结果(数量级相同,但或多或少)。我想我不明白我在做什么pd.to_timedelta(df['hour'].astype(str)) / pd.Timedelta(hours=1),这让我无法思考独特之处……也许吧。


跃然一笑
浏览 100回答 2
2回答

30秒到达战场

pd.to_timedelta(df['time'].astype(str))箱子输出像0 days 01:00:00pd.to_timedelta(df['time'].astype(str)) / pd.Timedelta(hours=1)创建类似 的输出1.00,它是float小时数。timedeltas。import pandas as pdimport numpy as np  # for test dataimport random  # for test data# test datanp.random.seed(365)random.seed(365)rows = 40data = {'device': [random.choice(['846ee', '0a26e', '8a906', '6bf11', 'd3923']) for _ in range(rows)],        'datetime': pd.bdate_range(datetime(2020, 7, 1), freq='15min', periods=rows).tolist()}# create test dataframedf = pd.DataFrame(data)# this date column is already in a datetime format; for the real dataframe, make sure it's converted# df.datetime = pd.to_datetime(df.datetime)# this extracts the time component from the datetime and is a datetime.time objectdf['time'] = df['datetime'].dt.floor('T').dt.time# this creates a timedelta column; note it's formatdf['timedelta'] = pd.to_timedelta(df['time'].astype(str))# this creates a float representing the hour and its fractional component (minutes)df['hours'] = pd.to_timedelta(df['time'].astype(str)) / pd.Timedelta(hours=1)# extracts just the hourdf['hour'] = df['datetime'].dt.hour显示(df.head())这个观点应该阐明时间提取方法之间的区别。   device            datetime      time       timedelta  hours  hour0   8a906 2020-07-01 00:00:00  00:00:00 0 days 00:00:00   0.00     01   0a26e 2020-07-01 00:15:00  00:15:00 0 days 00:15:00   0.25     02   8a906 2020-07-01 00:30:00  00:30:00 0 days 00:30:00   0.50     03   d3923 2020-07-01 00:45:00  00:45:00 0 days 00:45:00   0.75     04   0a26e 2020-07-01 01:00:00  01:00:00 0 days 01:00:00   1.00     15   d3923 2020-07-01 01:15:00  01:15:00 0 days 01:15:00   1.25     16   6bf11 2020-07-01 01:30:00  01:30:00 0 days 01:30:00   1.50     17   d3923 2020-07-01 01:45:00  01:45:00 0 days 01:45:00   1.75     18   6bf11 2020-07-01 02:00:00  02:00:00 0 days 02:00:00   2.00     29   d3923 2020-07-01 02:15:00  02:15:00 0 days 02:15:00   2.25     210  0a26e 2020-07-01 02:30:00  02:30:00 0 days 02:30:00   2.50     211  846ee 2020-07-01 02:45:00  02:45:00 0 days 02:45:00   2.75     212  0a26e 2020-07-01 03:00:00  03:00:00 0 days 03:00:00   3.00     313  846ee 2020-07-01 03:15:00  03:15:00 0 days 03:15:00   3.25     314  846ee 2020-07-01 03:30:00  03:30:00 0 days 03:30:00   3.50     3绘制每小时的设备计数seaborn.countplotplt.figure(figsize=(8, 6))sns.countplot(x='hour', hue='device', data=df)plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')seaborn.distplot为每个设备绘制 a使用seaborn.FacetGrid这将给出每个设备的每小时分布。import seaborn as snsimport matplotlib.pyplot as pltg = sns.FacetGrid(df, row='device', height=5)g.map(sns.distplot, 'hours', bins=24, kde=True)g.set(xlim=(0, 24), xticks=range(0, 25, 1))

婷婷同学_

你可以试试df['hour'] = df['datetime'].dt.strftime('%Y-%m-%d %H') s = df.groupby('hour')['device'].value_counts()
随时随地看视频慕课网APP

相关分类

Python
我要回答