python 随机森林参数说明-原创手记-慕课网

写在前面的话：本人刚刚学sklearn，很多参数也不是很懂，英语又比较low，只能求助google翻译，若有不对的地方，请大佬指出来。

Sklearn.ensemble.RandomForstClassifier 参数说明

Sklearn.ensemble.RandomForstClassifier(n_estimators=10, criterion=’gini’, max_depth=None,min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1,random_state=None, verbose=0, warm_start=False, class_weight=None)

n_estimators : integer, optional (default=10)

随机森林树的数目

criterion : string, optional (default=”gini”)

衡量分裂质量的功能。支持的标准是基尼杂质的“gini”和信息增益的“熵”。注意：此参数是特定于树的。

max_features : int, float, string or None, optional (default=”auto”)

寻找最佳分割时要考虑的功能数量：

如果是int，则在每次拆分时考虑max_features功能。

如果为float，则max_features为百分比，并在每次拆分时考虑int（max_features * n_features）要素。

如果是“auto”，则max_features = sqrt（n_features）。

如果是“sqrt”，则max_features = sqrt（n_features）（与“auto”相同）。

如果是“log2”，则max_features = log2（n_features）。

如果为None，则max_features = n_features。

注意：在找到节点样本的至少一个有效分区之前，搜索分割不会停止，即使它需要有效地检查超过max_features的功能。

max_depth : integer or None, optional (default=None)

树的最大深度。如果为None，则扩展节点直到所有叶子都是纯的或直到所有叶子包含少于min_samples_split样本。

min_samples_split : int, float, optional (default=2)

拆分内部节点所需的最小样本数：

如果是int，则将min_samples_split视为最小数字。

如果是float，则min_samples_split是百分比，ceil（min_samples_split * n_samples）是每个分割的最小样本数。

min_samples_leaf : int, float, optional (default=1)

叶子节点所需的最小样本数：

如果是int，则将min_samples_leaf视为最小数字。

如果是float，则min_samples_leaf是百分比，ceil（min_samples_leaf * n_samples）是每个节点的最小样本数。

min_weight_fraction_leaf : float, optional (default=0.)

需要在叶节点处的权重总和（所有输入样本）的最小加权分数。当未提供sample_weight时，样本具有相同的权重。

max_leaf_nodes : int or None, optional (default=None)

以最佳方式使用max_leaf_nodes种植树木。最佳节点定义为杂质的相对减少。如果None则无限数量的叶节点。

min_impurity_split : float,

树木生长早期停止的门槛。如果节点的杂质高于阈值，节点将分裂，否则它是叶子。

min_impurity_decrease : float, optional (default=0.)

如果该分裂导致杂质的减少大于或等于该值，则将分裂节点。

bootstrap : boolean, optional (default=True)

是否在构建树时使用bootstrap样本。

oob_score : bool (default=False)

是否使用袋外样品来估计泛化精度。

n_jobs : integer, optional (default=1)

适合和预测并行运行的核心数。如果为-1，则将作业数设置为核心数。

random_state : int, RandomState instance or None, optional (default=None)

如果是int，则random_state是随机数生成器使用的种子; 如果是RandomState实例，则random_state是随机数生成器; 如果为None，则随机数生成器是np.random使用的RandomState实例。

verbose : int, optional (default=0)

控制树构建过程的详细程度。

warm_start : bool, optional (default=False)

设置为True时，重用上一个调用的解决方案以适合并向整体添加更多估算器，否则，只需适合整个新林。

class_weight : dict, list of dicts, “balanced”,

“balanced_subsample”或None，optional（default = None）与{class_label：weight}形式的类相关联的权重。如果没有给出，所有类都应该有一个权重。对于多输出问题，可以按与y列相同的顺序提供dicts列表。

请注意，对于多输出（包括多标记），应为其自己的dict中的每个列的每个类定义权重。例如，对于四类多标签分类权重应为[{0：1,1：1}，{0：1,1：5}，{0：1,1：1}，{0：1,1： 1}]而不是[{1：1}，{2：5}，{3：1}，{4：1}]。

“平衡”模式使用y的值自动调整与输入数据中的类频率成反比的权重，如n_samples /（n_classes * np.bincount（y））

“balanced_subsample”模式与“balanced”相同，只是基于每个生长的树的bootstrap样本计算权重。

对于多输出，y的每列的权重将相乘。

请注意，如果指定了sample_weight，这些权重将与sample_weight（通过fit方法传递）相乘。

原文出处：https://blog.csdn.net/ustbclearwang/article/details/81237516