使用 lightgbm 的特征重要性

我正在尝试运行我的 lightgbm 进行功能选择,如下所示;


初始化


# Initialize an empty array to hold feature importances

feature_importances = np.zeros(features_sample.shape[1])


# Create the model with several hyperparameters

model = lgb.LGBMClassifier(objective='binary', 

         boosting_type = 'goss', 

         n_estimators = 10000, class_weight ='balanced')

然后我适合模型如下


# Fit the model twice to avoid overfitting

for i in range(2):


   # Split into training and validation set

   train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i)


   # Train using early stopping

   model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)], 

             eval_metric = 'auc', verbose = 200)


   # Record the feature importances

   feature_importances += model.feature_importances_

但我收到以下错误


Training until validation scores don't improve for 100 rounds. 

Early stopping, best iteration is: [6]  valid_0's auc: 0.88648

ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,) 


缥缈止盈
浏览 1500回答 2
2回答

哆啦的时光机

根据我们是训练modelusingscikit-learn还是lightgbm方法,为了获得重要性,我们应该分别选择feature_importances_属性或feature_importance()函数,就像在这个例子中一样(其中model是lgbm.fit() / lgbm.train(), 和的结果train_columns = x_train_df.columns):import pandas as pddef get_lgbm_varimp(model, train_columns, max_vars=50):        if "basic.Booster" in str(model.__class__):        # lightgbm.basic.Booster was trained directly, so using feature_importance() function         cv_varimp_df = pd.DataFrame([train_columns, model.feature_importance()]).T    else:        # Scikit-learn API LGBMClassifier or LGBMRegressor was fitted,         # so using feature_importances_ property        cv_varimp_df = pd.DataFrame([train_columns, model.feature_importances_]).T    cv_varimp_df.columns = ['feature_name', 'varimp']    cv_varimp_df.sort_values(by='varimp', ascending=False, inplace=True)    cv_varimp_df = cv_varimp_df.iloc[0:max_vars]       return cv_varimp_df    请注意,我们依赖于这样一个假设,即特征重要性值的排序就像训练期间模型矩阵列的排序(包括 one-hot dummy cols),请参阅LightGBM #209。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python