Groupby 和仅选定的列

首页课程实战体系课手记专栏慕课教程

Groupby 和仅选定的列

在这里我读了一个文件“userdata.xlsx”：

ID Debt Email Age User

1 7.5 john@email.com 16 John

2 15 john@email.com 15 John

3 22 john@email.com 15 John

4 30 david@email.com 22 David

5 33 david@email.com 22 David

6 51 fred@email.com 61 Fred

7 11 fred@email.com 25 Fred

8 24 eric@email.com 19 Eric

9 68 terry@email.com 55 Terry

10 335 terry@email.com 55 Terry

在这里，我按用户分组并为每个用户创建一个电子表格并将其输出为自己的 .xlsx 文件，如下所示：

ID Debt Email Age User

1 7.5 john@email.com 16 John

2 15 john@email.com 15 John

这是整个代码：

#!/usr/bin/env python3

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import xlrd

df = pd.read_excel('userdata.xlsx')

grp = df.groupby('User')

for group in grp.groups:

grouptofile = (grp.get_group(group))

print(grouptofile)

print(group)

grouptofile.to_excel('%s.xlsx' % group , sheet_name='sheet1', index=False)

现在我只想保存选定的列来为每个用户保存。假设我只希望选择“ID”和“电子邮件”列。我学会了如何只选择某些列，如下所示：

selected = df[['ID','Email']]

我现在认为在这里添加 ID 和电子邮件是有意义的。

grp = df.groupby('User')

添加了“ID”和“电子邮件”

grp = df[['ID', 'Email']].groupby('User')

甚至可以组合 groupby 和 select 列吗？

#!/usr/bin/env python3

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import xlrd

df = pd.read_excel('userdata.xlsx')

grp = df[['ID', 'Email']].groupby('User')

for group in grp.groups:

grouptofile = (grp.get_group(group))

print(grouptofile)

print(group)

grouptofile.to_excel('%s.xlsx' % group , sheet_name='sheet1', index=False)

慕村225694

浏览 171回答 2

2回答

不负相思意

我认为您需要在子集中指定列：cols = ['ID', 'Email']for i, group in df.groupby('User'):    group[cols].to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)如果得到KeyError: 'User'它意味着你想要选择不存在的列。因此，如果选择列ID和Email，则链接的 groupby 找不到User列并引发错误：print (df[['ID', 'Email']])   ID            Email0   1   john@email.com1   2   john@email.com2   3   john@email.com3   4  david@email.com4   5  david@email.com5   6   fred@email.com6   7   fred@email.com7   8   eric@email.com8   9  terry@email.com9  10  terry@email.com所以有必要选择列也在 groupby 中使用：for i, group in df[['ID', 'Email', 'User']].groupby('User'):    group.to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)或者在写入文件之前选择列，就像在第一个解决方案中一样。for i, group in df[['ID', 'Email', 'User']].groupby('User'):    group[cols].to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)

0 0

MMMHUHU

这是可能的......但不是你这样做的方式。您正在有效地删除除两列之外的所有列，然后尝试按不再存在的第三列进行分组。相反，您需要在选择列之前进行分组（尽管我不知道分组是否numpy是一个变异操作，因此您可能需要先进行复制）。（可能次优）示例：grp = df[('ID', 'Email', 'User')].groupby('User')[('ID', 'Email')]

0 0

随时随地看视频慕课网APP

相关分类

Python