猿问

保存 sklearn 管道的中间结果

我有一个代码示例 - 具有两个组件(PCA 和随机森林)的 sklearn 管道,我想使用管道的中间结果以带来一些可解释性。我知道可以使用 .get_params() 来查看中间步骤,但是是否可以保存或提取中间结果以进行其他操作?我想应用 PCA 的附加功能(代码中的 1.1 和 1.2 部分)


from sklearn.datasets import load_breast_cancer

import numpy as np

import pandas as pd

from sklearn.decomposition import FastICA, PCA

from sklearn.ensemble import RandomForestClassifier

from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix


#Convert the dataset to data frame

cancer = load_breast_cancer()     

data = np.c_[cancer.data, cancer.target]

columns = np.append(cancer.feature_names, ["target"])

df = pd.DataFrame(data, columns=columns)



#Split data into train and test 

X = df.iloc[:, 0:30].values

Y = df.iloc[:, 30].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)



#Create a pipeline 

n_comp = 12

clf = Pipeline([('pca', PCA(n_comp)), ('RandomForest', RandomForestClassifier(n_estimators=100))])

clf.fit(X_train, Y_train)



#Evalute the pipeline 

cr = classification_report(Y_test, Y_pred)

print(cr)



#see the intermediate steps of the pipeline

print(clf.get_params()['pca'])



##1.1 if I create PCA outside of the pipeline 

pca = PCA(n_components=10)

principalComponents = pca.fit_transform(X)


##1.2 some explainability on pca outside of the pipeline 

pca.explained_variance_ratio_


慕姐8265434
浏览 172回答 1
1回答

智慧大石

我们可以分配get_params()给一个应该返回类型对象的变量sklearn.decomposition.pca.PCA。有了这个,我们就可以访问分解的所有方法和属性。from sklearn.datasets import load_breast_cancerimport numpy as npimport pandas as pdfrom sklearn.decomposition import FastICA, PCAfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.pipeline import Pipelinefrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportfrom sklearn.metrics import confusion_matrix#Convert the dataset to data framecancer = load_breast_cancer()     data = np.c_[cancer.data, cancer.target]columns = np.append(cancer.feature_names, ["target"])df = pd.DataFrame(data, columns=columns)#Split data into train and test X = df.iloc[:, 0:30].valuesY = df.iloc[:, 30].valuesX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)#Create a pipeline n_comp = 12clf = Pipeline([('pca', PCA(n_comp)), ('RandomForest', RandomForestClassifier(n_estimators=100))])clf.fit(X_train, Y_train)### --- ###pca = clf.get_params()['pca']type(pca)#sklearn.decomposition.pca.PCApca.explained_variance_ratio_#array([9.81327198e-01, 1.67333696e-02, 1.73934848e-03, 1.05758996e-04,#       8.29268494e-05, 6.34081771e-06, 3.75309113e-06, 7.08990845e-07,#       3.16742542e-07, 1.75055859e-07, 7.11274270e-08, 1.43003803e-08])pca.components_.shape#(12, 30)希望这可以帮助。
随时随地看视频慕课网APP

相关分类

Python
我要回答