使用逗号分隔值进行查找时过滤数据

使用逗号分隔值进行查找时过滤数据

我有以下数据框，表示员工编号、他们所在的部门以及他们在公司中的角色代码（可以是“1”或“2”）。在“部门名称”列中，您可以选择员工所在的部门（命名约定为“XX：部门名称”，其中 XX 是国家/地区代码），或者在某些情况下，显示一组部门，按部门分隔逗号“,”表示员工在这些部门中的角色。它看起来像这样：

Department Name Employee Number Role Code

0 AU:Dept1 1000 1

1 AU:Dept1, AU:Dept3 1000 2

2 AU:Dept7 1000 1

3 CZ:Dept3 1001 2

4 CZ:Dept4, CZ:Dept6, CZ:Dept7 1001 2

5 CZ:Dept4 1001 1

6 PL:Dept1 1002 2

7 PL:Dept2, PL:Dept1 1002 1

8 PL:Dept3 1002 2

9 SG:Dept1 1003 1

10 SG:Dept1 1003 2

11 SG:Dept2 1003 2

员工在每个唯一的部门名称中只能拥有角色 1 或角色 2，因此我需要创建一个代码来返回所有冲突的行，其中员工似乎在同一部门中同时拥有角色 1 和角色 2。这将是输出：

Department Name Employee Number Role Code

0 AU:Dept1 1000 1

1 AU:Dept1, AU:Dept3 1000 2

4 CZ:Dept4, CZ:Dept6, CZ:Dept7 1001 2

5 CZ:Dept4 1001 1

6 PL:Dept1 1002 2

7 PL:Dept2, PL:Dept1 1002 1

9 SG:Dept1 1003 1

10 SG:Dept1 1003 2

执行此过滤器的最佳方法是什么？

叮当猫咪

浏览 178回答 2

2回答

蓝山帝景

你可以做类似的事情df['both_role'] = df.groupby('Employee Number')['Role Code'].isin([1]).astype(int) * df.groupby('Employee Number')['Role Code'].isin([2]).astype(int)  df[df.both_role == 1]您可以使用员工编号进行分组，并检查每个用户的角色代码是否包含 1 和 2。如果它同时包含 1 和 2 那么您可以过滤数据帧。

0

0

慕的地6264312

让我们尝试拆分部门名称，然后groupby找出['Employee', 'Name']哪些员工具有两个角色nunique：(df.assign(Name=df['Department Name'].str.split(', '))   .explode('Name')   .loc[lambda x:x.groupby(['Employee Number','Name'])                 ['Role Code'].transform('nunique') ==2 ]   .drop('Name', axis=1))输出：                 Department Name  Employee Number  Role Code0                       AU:Dept1             1000          11             AU:Dept1, AU:Dept3             1000          24   CZ:Dept4, CZ:Dept6, CZ:Dept7             1001          25                       CZ:Dept4             1001          16                       PL:Dept1             1002          27             PL:Dept2, PL:Dept1             1002          19                       SG:Dept1             1003          110                      SG:Dept1             1003          2

0

0

随时随地看视频慕课网APP

相关分类

Python