用具有相似属性的项目的平均值替换属性零值

首页课程实战体系课手记专栏慕课教程

我在某些盆地有一些探头的高度数据。零高度值是虚假的，我想用同一盆地中探头的平均高度值代替它们。

import pandas as pd

index = [0,1,2,3,4,5]

s = pd.Series([0,2,2,0,1,6],index= index) #height values

t = pd.Series(['A','A','A','B','B','B'],index= index) #basins' names

df = pd.concat([s,t], axis=1, keys=['Height','Basin'])

print(df)

Height Basin

0 0 A

1 2 A

2 2 A

3 0 B

4 1 B

5 6 B

我首先创建一个 DataFrame 来存储盆地中的平均高度：

#find height avergage in same basin

bound_df = df[df['Height']>0]

mean_height_df = bound_df.groupby(['Basin'])['Height'].mean()

print(mean_height_df)

Basin

A 2.0

B 3.5

我尝试用相应盆地的平均值替换零值：

#substitute zeros w/ the average value

df.loc[df['Height']<=0, 'Height'] = mean_height_df.loc[mean_height_df['Basin'],'Height']

但这会引发我不明白的错误：

文件“pandas/_libs/hashtable_class_helper.pxi”，第 1218 行，在 pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: '盆地'

这是什么意思？是切片问题吗？

有没有替代方法？

桃花长相依

浏览 182回答 1

随时随地看视频慕课网APP