使用正则化 (L1 / L2) Lasso 和 Ridge 的 Logistic 回归模型

我正在尝试构建模型并创建网格搜索,下面是代码。原始数据是从该网站下载的(信用卡欺诈数据)。 https://www.kaggle.com/mlg-ulb/creditcardfraud

读取数据后从标准化开始编码。

standardization = StandardScaler()

credit_card_fraud_df[['Amount']] = standardization.fit_transform(credit_card_fraud_df[['Amount']])

# Assigning feature variable to X

X = credit_card_fraud_df.drop(['Class'], axis=1)


# Assigning response variable to y

y = credit_card_fraud_df['Class']

# Splitting the data into train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=100)

X_train.head()

power_transformer = PowerTransformer(copy=False)

power_transformer.fit(X_train)                       ## Fit the PT on training data

X_train_pt_df = power_transformer.transform(X_train)    ## Then apply on all data

X_test_pt_df = power_transformer.transform(X_test)

y_train_pt_df = y_train

y_test_pt_df = y_test

train_pt_df = pd.DataFrame(data=X_train_pt_df, columns=X_train.columns.tolist())

# set up cross validation scheme

folds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 4)


# specify range of hyperparameters

params = {"C":np.logspace(-3,3,5,7), "penalty":["l1","l2"]}# l1 lasso l2 ridge

结果示例:


  mean_fit_time std_fit_time    mean_score_time std_score_time  param_C param_penalty   params  split0_test_score   split1_test_score   split2_test_score   split3_test_score   split4_test_score   mean_test_score std_test_score  rank_test_score

    0   0.044332    0.002040    0.000000    0.000000    0.001   l1  {'C': 0.001, 'penalty': 'l1'}   NaN NaN NaN NaN NaN NaN NaN 6

    1   0.477965    0.046651    0.016745    0.003813    0.001   l2  {'C': 0.001, 'penalty': 'l2'}   0.485714    0.428571    0.542857    0.485714    0.457143    0.480000    0.037904    5

我的输入数据中没有任何空值。我不明白为什么我会得到这些列的 Nan 值。谁能帮帮我吗?


森林海
浏览 113回答 1
1回答

ITMISS

您在此处定义的默认求解器有问题:model = LogisticRegression(class_weight='balanced')这是从以下错误消息得出的:ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.此外,在定义参数网格之前研究文档可能会很有用:penalty: {'l1', 'l2', 'elasticnet', 'none'}, default='l2' 用于指定惩罚中使用的范数。“newton-cg”、“sag”和“lbfgs”求解器仅支持 l2 惩罚。“elasticnet”仅受“saga”求解器支持。如果为“none”(liblinear 求解器不支持),则不应用正则化。一旦您使用支持所需网格的不同解算器纠正它,您就可以开始:## using Logistic regression for class imbalancemodel = LogisticRegression(class_weight='balanced', solver='saga')grid_search_cv = GridSearchCV(estimator = model, param_grid = params,                         scoring= 'roc_auc',                         cv = folds,                         return_train_score=True, verbose = 1)            grid_search_cv.fit(X_train_pt_df, y_train_pt_df)## reviewing the resultscv_results = pd.DataFrame(grid_search_cv.cv_results_)另请注意,ConvergenceWarning这可能建议您需要增加默认值max_iter、tol或切换到另一个求解器并重新考虑所需的参数网格。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python