新人来了...
我有一个名为“yes_no”的数据框,其结构如下(但它有大约 50K 条目):
Date Yes/No
0 2020-10-27 No
1 2020-10-27 No
2 2020-10-26 Yes
3 2020-10-26 Yes
4 2020-10-26 No
5 2020-10-25 No
6 2020-10-25 Yes
7 2020-10-25 No
8 2020-10-24 Yes
9 2020-10-24 Yes
我需要计算每个日期的“是”数量和“否”数量,并计算比率,最终得到如下结果:
Date Yes No Percentage
0 2020-10-27 1142 120 0.904913
1 2020-10-26 4112 388 0.913778
2 2020-10-25 1055 68 0.939448
3 2020-10-24 1012 86 0.921676
4 2020-10-23 1476 163 0.900549
5 2020-10-22 1633 182 0.899725
6 2020-10-21 1773 237 0.882090
7 2020-10-20 2332 246 0.904577
8 2020-10-19 2868 326 0.897934
9 2020-10-18 892 107 0.892893
10 2020-10-17 992 110 0.900181
11 2020-10-16 2106 207 0.910506
12 2020-10-15 5628 632 0.899042
13 2020-10-14 9304 937 0.908505
14 2020-10-13 8129 881 0.902220
我通过查阅字典,使用以下代码使其工作,但它非常长:
by_date = {}
for date in yes_no['Date']:
by_date[date] = yes_no.loc[yes_no['Date'] == date]
for date in by_date:
by_date[date] = by_date[date]['Yes/No'].value_counts()
for date in by_date:
if 'No' not in by_date[date]:
by_date[date]['No'] = 0
for date in by_date:
if 'Yes' not in by_date[date]:
by_date[date]['Yes'] = 0
for date in by_date:
by_date[date] = [by_date[date]['Yes'], by_date[date]['No'], (by_date[date]['Yes']/(by_date[date]['Yes'] + by_date[date]['No']))]
df_yes = pd.DataFrame(list(by_date.values()),columns = ['Yes', 'No', 'Percentage'])
df_yes['Date'] = list(by_date.keys())
df_yes = df_yes[['Date', 'Yes', 'No', 'Percentage']]
对于较小的数据帧(1-2K)它工作得很好,但是这段代码需要永远完成 50K 条目:
for date in yes_no['Date']:
by_date[date] = yes_no.loc[yes_no['Date'] == date]
一定有更好的方法来做到这一点!
千巷猫影
慕娘9325324
倚天杖
阿波罗的战车
相关分类