熊猫：一列的每个值的nan百分比

熊猫：一列的每个值的nan百分比

目标：获取 df 的每一列和每个客户的缺失值百分比

我的 df 是关于创建的票证：

id type ... priority Client

0 56 113 Incident ... Low client1

1 56 267 Demande ... High client1

2 56 294 Incident ... Nan NaN

3 56 197 Demande ... Low client3

4 56 143 Demande ... Nan client4

第一次尝试：

df.notna().sum()/len(agg_global)*100

Out[29]:

id 97.053453

type 76.415869

priority 82.626625

client 84.596443

这非常有用，但我想在我的输出中添加更多详细信息，在列中使用“客户端”维度，如下所示：

我想创建的输出：

Client1 Client2 Client3 NaN

id 100.000000 100.000000 100.000000 66.990424

type 76.415869 66.990424 76.415869 43.761970

status 100.000000 100.000000 66.990424 76.415869

category 66.990424 43.761970 76.415869 43.761970

entity 43.761970 100.000000 76.415869 76.415869

source_demande 84.596443 100.000000 76.415869 43.761970

我尝试使用“groupby”但无法获得所需的输出...：

id type ... priority Client

client ...

True 97.053453 76.415869 ... 29.98632 29.98632

任何建议将被认真考虑。感谢您的关注！

手掌心

浏览 142回答 2

2回答

一只斗牛犬

您可以删除Client不测试缺失值百分比的列，通过测试它们，用 replace sDataFrame.isna聚合平均值以避免丢失它们，最后转置通过：ClientNaNDataFrame.Tprint (df) id type priority Client0 NaN Incident Low client11 NaN NaN High client12 56 294 Incident Nan NaN3 56 197 NaN Low client34 NaN Demande NaN client4df = (df.drop('Client', 1) .isna() .groupby(df['Client'].fillna('NaN')) .mean() .rename_axis(None) .T)print (df) NaN client1 client3 client4id 0.0 1.0 0.0 1.0type 0.0 0.5 1.0 0.0priority 0.0 0.0 0.0 1.0

0

0

撒科打诨

据我所知，使用蛮力是可能的。我会尝试使用isna函数和求和来估计每行或每列中 NaN 的数量，然后我会尝试估计百分比。

0

0

随时随地看视频慕课网APP

相关分类

Python