猿问

尽管已删除,但 Python 随机森林回归器仍对 nan 值出错

我有一个干净的数据集,其 nan 值为零,但我继续在回归器上遇到相同的错误。我的框架叫做 new_player_data


我试过找到任何


list(new_player_data.where(new_player_data.isna()).count() > 0)

返回


[假,假,假,假,假,假]


大约两百次。我认为可能有一些太大的浮动。我试过这个:


for i in new_player_data.columns[:]:

    if new_player_data[i].dtype == float:

        new_player_data[i] = round(new_player_data[i],2)

无论我得到什么:


regressor.fit(X_train, y_train)  

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-327-3a664017ddaa> in <module>

----> 1 regressor.fit(X_train, y_train)


/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)

    248 

    249         # Validate or convert input data

--> 250         X = check_array(X, accept_sparse="csc", dtype=DTYPE)

    251         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)

    252         if sample_weight is not None:


/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)

    571         if force_all_finite:

    572             _assert_all_finite(array,

--> 573                                allow_nan=force_all_finite == 'allow-nan')

    574 

    575     shape_repr = _shape_repr(array.shape)


/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)

     54                 not allow_nan and not np.isfinite(X).all()):

     55             type_err = 'infinity' if allow_nan else 'NaN, infinity'

---> 56             raise ValueError(msg_err.format(type_err, X.dtype))

     57 

     58 


ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

关于我还可以在这里检查什么的任何想法?亏本


慕尼黑5688855
浏览 240回答 1
1回答

狐的传说

发现它是 inf 值,通过infs = np.where(np.isinf(new_player_data))infsout: (array([ 261, 1162, 1190, 1339, 1365, 1451, 1656, 1736, 1878, 1954, 2189,&nbsp; &nbsp; 2299, 2741, 3137, 3162, 3799, 3821, 3881, 4305]),&nbsp;array([ 3, 43, 43,&nbsp; 3, 43, 43, 43, 43, 43, 43, 23, 43,&nbsp; 3, 43, 43, 43,&nbsp; 3,&nbsp; &nbsp; 23, 43]))然后我就这样替换了它们pd.options.mode.use_inf_as_na = Trueinfs = np.where(np.isinf(new_player_data))infsout: (array([], dtype=int64), array([], dtype=int64))
随时随地看视频慕课网APP

相关分类

Python
我要回答