我有一个数据库,它有13个特征和1000万行。我想应用 k-mean 来消除任何异常。我的方法是应用k-mean,创建一个数据点和聚类质心之间距离的新列,以及一个平均距离的新列,如果距离大于平均距离,我将删除整行。但似乎我写的代码不起作用。
数据集示例:https://drive.google.com/open?id=1iB1qjnWQyvoKuN_Pa8Xk4BySzXVTwtUk
df = pd.read_csv('Final After Simple Filtering.csv',index_col=None,low_memory=True)
# Dropping columns with low feature importance
del df['AmbTemp_DegC']
del df['NacelleOrientation_Deg']
del df['MeasuredYawError']
#applying kmeans
#applying kmeans
kmeans = KMeans( n_clusters=8)
clusters= kmeans.fit_predict(df)
centroids = kmeans.cluster_centers_
distance1 = kmeans.fit_transform(df)
distance2 = distance1.mean()
df['distances']=distance1-distance2
df = df[df['distances'] >=0]
del df['distances']
df.to_csv('/content//drive/My Drive/K TEST.csv', index=False)
错误:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'distances'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
9 frames
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'distances'
During handling of the above exception, another exception occurred:
HUH函数
慕虎7371278
相关分类