更改分组依据并value_counts输出以映射到数据帧

首页课程实战体系课手记专栏慕课教程

更改分组依据并value_counts输出以映射到数据帧

我有一个场景，我试图按特定值过滤数据帧，并计算另一个标识符存在的次数。然后，我将其转换为字典并映射回数据帧。我遇到的问题是，生成的字典无法映射回数据帧，因为我正在向字典引入复杂性（额外的键？），我不知道如何避免它。

我想一个简单的问题是：“如何在我的CELL_ID列上使用value_counts”，通过另一个名为Grid_Type的列进行过滤，并将结果映射回每个CELL_ID的所有单元格？

到目前为止，我在做什么

这可以计算包含CELL_ID的单元格数，但不允许我按Grid_Type

df['CELL_ID'].value_counts()

z1 = z.to_dict()

df['CELL_CNT'] = df['CELL_ID'].map(z1)

这个简单示例的字典输出如下所示：

7015988: 1, 7122961: 1, 6976792: 1

我的代码不好

这是我迄今为止一直在研究的 - 我希望能够返回计数，并按Grid_Type过滤。例如，我希望能够计算我在每个CELL_ID中看到“Spot”的次数。

z = df[df.Grid_Type == 'Spot'].groupby('CELL_ID')['Grid_Type'].value_counts()

z1 = z.to_dict()

df['SPOT_CNT'] = df['CELL_ID'].map(z1)

似乎在我尝试过滤的示例中，字典返回了一个更复杂的结果，其中包括Grid_Type。问题是，我只想将计数映射到Cell_ID。

(7133691, 'Spot'): 3, (7133692, 'Spot'): 3, (7133693, 'Spot'): 2

示例数据

+---------+-----------+

| CELL_ID | Grid_Type |

+---------+-----------+

| 001 | Spot |

| 001 | Square |

| 001 | Spot |

| 001 | Square |

| 001 | Square |

| 002 | Spot |

| 002 | Square |

| 002 | Square |

| 003 | Square |

| 003 | Spot |

| 003 | Spot |

| 003 | Spot |

+---------+-----------+

预期结果

+---------+-----------+----------+

| CELL_ID | Grid_Type | SPOT_CNT |

+---------+-----------+----------+

| 001 | Spot | 2 |

| 001 | Square | 2 |

| 001 | Spot | 2 |

| 001 | Square | 2 |

| 001 | Square | 2 |

| 002 | Spot | 1 |

| 002 | Square | 1 |

| 002 | Square | 1 |

| 003 | Square | 3 |

| 003 | Spot | 3 |

| 003 | Spot | 3 |

| 003 | Spot | 3 |

+---------+-----------+----------+

感谢您提供的任何帮助/

慕田峪7331174

浏览 158回答 2

2回答

长风秋雁

df = pd.read_csv('spot.txt', sep=r"[ ]{1,}", engine='python', dtype='object')print(df)    CELL_ID Grid_Type0   001 Spot1   001 Square2   001 Spot3   001 Square4   001 Square5   002 Spot6   002 Square7   002 Square8   003 Square9   003 Spot10  003 Spot11  003 Spotdf_gb = df['Grid_Type'].groupby([df['CELL_ID']]).value_counts()print(df_gb)    CELL_ID  Grid_Type001      Square       3         Spot         2002      Square       2         Spot         1003      Spot         3         Square       1Name: Grid_Type, dtype: int64df_gb_dict = df_gb.to_dict()count_list = []for idx, row in df.iterrows():    for k, v in df_gb_dict.items():        if k[0] == row['CELL_ID'] and k[1] == row['Grid_Type'] and row['Grid_Type'] == 'Spot':            count_list.append([k[0], k[1], v])        if k[0] == row['CELL_ID'] and k[1] == row['Grid_Type'] and row['Grid_Type'] == 'Square':            count_list.append([k[0], k[1], df_gb_dict[(row['CELL_ID'], 'Spot')]])new_df = pd.DataFrame(count_list, columns=['CELL_ID',  'Grid_Type', 'SPOT_CNT'])new_df.sort_values(by='CELL_ID', inplace=True)new_df.reset_index(drop=True)print(new_df)  CELL_ID Grid_Type  SPOT_CNT0      001      Spot         21      001    Square         22      001      Spot         23      001    Square         24      001    Square         25      002      Spot         16      002    Square         17      002    Square         18      003    Square         39      003      Spot         310     003      Spot         311     003      Spot         3

0 0

慕姐8265434

似乎你有一个答案，但我会用transe（）来解决这个问题：# set it updf = pd.read_clipboard()print(df)    CELL_ID Grid_Type0         1      Spot1         1    Square2         1      Spot3         1    Square4         1    Square5         2      Spot6         2    Square7         2    Square8         3    Square9         3      Spot10        3      Spot11        3      Spotdf['SPOT_CNT'] = df.groupby('CELL_ID')['Grid_Type'].transform(lambda x: sum(x == 'Spot'))print(df)    CELL_ID Grid_Type  SPOT_CNT0         1      Spot         21         1    Square         22         1      Spot         23         1    Square         24         1    Square         25         2      Spot         16         2    Square         17         2    Square         18         3    Square         39         3      Spot         310        3      Spot         311        3      Spot         3在函数内部：- 它返回 bool if value（） ==- 对于每个组，将 bools相加最后转换，根据文档，行为如下：lambdax'Spot'sum()TrueDataFrame.transform(self, func, axis=0, *args, **kwargs) → 'DataFrame'[source]     "Call func on self producing a DataFrame with transformed values."       "Produced DataFrame will have same axis length as self." <----...希望这是有帮助的。

0 0

随时随地看视频慕课网APP

相关分类

Python