在sklearn中使用DataFrameMapper()进行PolynomialFeature

对于住房数据集,我尝试使用 sklearn_pandas 中的 DataFrameMapper() 在选定的列上应用多项式特征。

我的代码:

from sklearn.preprocessing import PolynomialFeatures

 from sklearn_pandas import DataFrameMapper


 mapper = DataFrameMapper([

('houseAge_income', PolynomialFeatures(2)),

('median_income', PolynomialFeatures(2)),

(['latitude', 'housing_median_age', 'total_rooms', 'population', 'median_house_value', 

'ocean_proximity']], None)

 ])


 poly_feature = mapper.fit_transform(housing) 

当我尝试使用


houseAge_income.reshape(-1, 1)

在 DataFrameMapper() 中,我遇到了另一个问题:


---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)

   2645             try:

-> 2646                 return self._engine.get_loc(key)

   2647             except KeyError:


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


KeyError: 'houseAge_income.reshape(-1, 1)'


During handling of the above exception, another exception occurred:


KeyError                                  Traceback (most recent call last)

5 frames

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)

   2646                 return self._engine.get_loc(key)

   2647             except KeyError:

-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))

   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

   2650         if indexer.ndim > 1 or indexer.size > 1:


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


谁能告诉我,我错过了什么?


慕尼黑5688855
浏览 196回答 1
1回答

白板的微信

从文档中'column'将列选择器指定为(作为简单字符串)和['column'](作为包含一个元素的列表)之间的区别在于传递给转换器的数组的形状。在第一种情况下,将传递一个一维数组,而在第二种情况下,将传递一个具有一列的二维数组,即列向量。所有列必须使用相同类型的列选择器传递。在本例中,为 a list,因为需要list保留一些未转换的列。import pandas as pdfrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn_pandas import DataFrameMapper# load datadf = pd.read_csv('https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv')# create houseAge_incomedf['houseAge_income'] = df.housing_median_age.mul(df.median_income)# configure mapper with all columns passed as listsmapper = DataFrameMapper([(['houseAge_income'], PolynomialFeatures(2)),                          (['median_income'], PolynomialFeatures(2)),                          (['latitude', 'housing_median_age', 'total_rooms', 'population', 'median_house_value', 'ocean_proximity'], None)])# fitpoly_feature = mapper.fit_transform(df)# display(pd.DataFrame(poly_feature).head())  0       1           2  3       4       5      6   7     8     9          10        110  1  341.33  1.1651e+05  1  8.3252  69.309  37.88  41   880   322  4.526e+05  NEAR BAY1  1  174.33       30391  1  8.3014  68.913  37.86  21  7099  2401  3.585e+05  NEAR BAY2  1  377.38  1.4242e+05  1  7.2574   52.67  37.85  52  1467   496  3.521e+05  NEAR BAY3  1  293.44       86108  1  5.6431  31.845  37.85  52  1274   558  3.413e+05  NEAR BAY4  1     200       40001  1  3.8462  14.793  37.85  52  1627   565  3.422e+05  NEAR BAY
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python