使用 OneHotEncoder 编码

首页课程实战体系课手记专栏慕课教程

使用 OneHotEncoder 编码

我正在尝试使用 scikitlearn 的 OneHotEncoder 对数据进行预处理。显然，我做错了什么。这是我的示例程序：

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

from sklearn.compose import ColumnTransformer

cat = ['ok', 'ko', 'maybe', 'maybe']

label_encoder = LabelEncoder()

label_encoder.fit(cat)

cat = label_encoder.transform(cat)

# returns [2 0 1 1], which seams good.

print(cat)

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')

res = ct.fit_transform([cat])

print(res)

最后结果：[[1.0 0 1 1]]

预期结果：类似于：

[

[ 1 0 0 ]

[ 0 0 1 ]

[ 0 1 0 ]

]

有人能指出我错过了什么吗？

蓝山帝景

浏览 298回答 1

1回答

慕码人2483693

您可以考虑使用 numpy 和 MultiLabelBinarizer。import numpy as npfrom sklearn.preprocessing import MultiLabelBinarizercat = np.array([['ok', 'ko', 'maybe', 'maybe']])m = MultiLabelBinarizer()print(m.fit_transform(cat.T))如果你仍然想坚持你的解决方案。您只需要更新如下：# because of it still a row, not a column# res = ct.fit_transform([cat])  => remove this# it should worksres = ct.fit_transform(np.array([cat]).T)Out[2]:array([[0., 0., 1.],       [1., 0., 0.],       [0., 1., 0.],       [0., 1., 0.]])

0 0

随时随地看视频慕课网APP