如何强制Python决策树每次只在一个节点上继续分裂（每次形成一个节点/叶子）

我意识到一种方法是构建一个 max_depth=1 的决策树。这将执行分裂成两片叶子。然后挑出杂质最高的叶子继续分裂，再次将决策树拟合到这个子集上，如此重复。为确保层次结构清晰可见，我重新标记了 leaf_ids，以便清楚地看到，当您在树上向上移动时，ID 值会下降。这是一个例子：import numpy as npfrom sklearn.tree import DecisionTreeClassifierimport pandas as pddef decision_tree_one_path(X, y=None, min_leaf_size=3):    nobs = X.shape[0]    # boolean vector to include observations in the newest split    include = np.ones((nobs,), dtype=bool)    # try to get leaves around min_leaf_size    min_leaf_size = max(min_leaf_size, 1)    # one-level DT splitter    dtmodel = DecisionTreeClassifier(splitter="best", criterion="gini", max_depth=1, min_samples_split=int(np.round(2.05*min_leaf_size)))                 leaf_id = np.ones((nobs,), dtype='int64')    iter = 0    if y is None:        y = np.random.binomial(n=1, p=0.5, size=nobs)    while nobs >= 2*min_leaf_size:        dtmodel.fit(X=X.loc[include], y=y[include])        # give unique node id        new_leaf_names = dtmodel.apply(X=X.loc[include])        impurities = dtmodel.tree_.impurity[1:]        if len(impurities) == 0:            # was not able to split while maintaining constraint            break        # make sure node that is not split gets the lower node_label 1        most_impure_node = np.argmax(impurities)        if most_impure_node == 0: # i.e., label 1            # switch 1 and 2 labels above            is_label_2 = new_leaf_names == 2            new_leaf_names[is_label_2] = 1            new_leaf_names[np.logical_not(is_label_2)] = 2        # rename leaves        leaf_id[include] = iter + new_leaf_names        will_be_split = new_leaf_names == 2        # ignore the other one        tmp = np.ones((nobs,), dtype=bool)        tmp[np.logical_not(will_be_split)] = False        include[include] = tmp        # now create new labels        nobs = np.sum(will_be_split)        iter = iter + 1    return leaf_idleaf_id 因此是按顺序观察的叶子 ID。因此，例如 leaf_id==1 是第一个被拆分成终端节点的观察结果。leaf_id==2 是下一个从生成 leaf_id==1 的拆分中拆分出来的终端节点，如下所示。因此有 k+1 个叶子。#0#|\#1 .#  |\#  2 .#.......##     |\ #     k (k+1)    不过，我想知道是否有一种方法可以在 Python 中自动执行此操作。

如何强制Python决策树每次只在一个节点上继续分裂（每次形成一个节点/叶子）

1回答