我正在尝试扩展 sklearn 中的Splitter类,它与 sklearn 的决策树类一起使用。更具体地说,我想feature_weights在新类中添加一个变量,这将通过根据特征权重按比例改变纯度计算来影响最佳分割点的确定。
新类几乎是 sklearnBestSplitter类 ( https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_splitter.pyx ) 的精确副本,只有微小的变化。这是我到目前为止所拥有的:
cdef class WeightedBestSplitter(WeightedBaseDenseSplitter):
cdef object feature_weights # new variable - 1D array of feature weights
def __reduce__(self):
# same as sklearn BestSplitter (basically)
# NEW METHOD
def set_weights(self, object feature_weights):
feature_weights = np.asfortranarray(feature_weights, dtype=DTYPE)
self.feature_weights = feature_weights
cdef int node_split(self, double impurity, SplitRecord* split,
SIZE_t* n_constant_features) nogil except -1:
# .... same as sklearn BestSplitter ....
current_proxy_improvement = self.criterion.proxy_impurity_improvement()
current_proxy_improvement *= self.feature_weights[<int>(current.feature)] # new line
# .... same as sklearn BestSplitter ....
关于上面的一些注意事项:我正在使用object变量类型,np.asfortranarray因为这是变量X在其他地方定义和设置的方式,并且X像我试图索引一样被索引feature_weights。此外,每个文件custom.feature都有一个变量类型( https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_splitter.pxd)。SIZE_t_splitter.pxd
该问题似乎是由self.feature_weights. 上面的代码抛出多个错误,但即使尝试引用类似的东西self.feature_weights[0]并将其设置为另一个变量也会抛出错误:
Indexing Python object not allowed without gil
我想知道我需要做什么才能索引self.feature_weights标量值并将其用作乘数。
慕的地8271018
一只名叫tom的猫
随时随地看视频慕课网APP
相关分类