我有一个像这样的数据框
Start_MONTH Bucket Count Complete Partial
10/01/2015 0 57 91 0.66
11/01/2015 0 678 8 0.99
02/01/2016 0 68 12 0.12
10/01/2015 1 78 79 0.22
11/01/2015 1 99 56 0.67
1/01/2016 1 789 67 0.78
10/01/2015 3 678 178 0.780
11/01/2015 3 2880 578 0.678
我基本上需要填写每个 start_month (缺少 12/01/2015,01/01/2016 ,...)并且每个像 2 这样的桶都丢失了,其余的列(计数,完整,部分)将为零缺少存储桶和 start_month。我认为使用relativedelta(months=+1) 会有所帮助,但不确定如何使用它。
pandas as pd
data = [['10/01/2015',0 ,57 ,91,0.66],
['11/01/2015',0, 678, 8,0.99],
['02/01/2016',0,68,12,0.12],
['10/01/2015' ,1, 78,79,0.22],
['11/01/2015' ,1 ,99,56, 0.67],
['1/01/2016', 1 ,789,67,0.78],
['10/01/2015', 3,678, 178, 0.780],
['11/01/2015' ,3, 2880,578,0.678]]
df = pd.DataFrame(data, columns = ['Start_Month', 'Bucket', 'Count',
'Complete','Partial'])
基本上我希望 Start_month 和存储桶组都作为一个组重复,其他值为 0,即从 10/01/2015 到 2/1/2016(缺少 12/01/2015,01/01/2016)所有月份在那里并且 0-3 的桶(缺少 2)都需要在那里
我尝试了这个,它部分地满足了我的要求
df['Start_Month'] = pd.to_datetime(df['Start_Month'])
s = df.groupby(['Bucket',pd.Grouper(key='Start_Month', freq='MS')])['Count','Complete','Partial'].sum()
df1 = (s.reset_index(level=0)
.groupby('Bucket')['Count','Complete','Partial']
.apply(lambda x: x.asfreq('MS'))
.reset_index())
它添加了一些缺失的月份,但不会对每个存储桶重复,并且不会在其间添加存储桶整数
牛魔王的故事
UYOU
相关分类