在将点添加到节点之前是否预先制作了二元分区树?

我正在尝试在 Python 中实现这个算法,但是由于我缺乏对树结构的理解,我对分区树的创建过程感到困惑。

简要说明

链接算法,用于将高维特征空间划分为内部节点和叶节点,以便可以快速执行查询。

它使用特定的随机测试划分大空间,超平面将一个大单元格分成两个。

这个答案更准确地解释了一切

http://img4.mukewang.com/614bd3b300012e1102980149.jpg

代码片段:


def random_test(self, main_point):  # Main point is np.ndarray instance

    dimension = main_point.ravel().size

    random_coefficients = self.random_coefficients(dimension)

    scale_values = np.array(sorted([np.inner(random_coefficients, point.ravel())

                                    for point in self.points]))

    percentile = random.choice([np.percentile(scale_values, 100 * self.ratio),  # Just as described on Section 3.1

                                np.percentile(scale_values, 100 * (1 - self.ratio))])

    main_term = np.inner(main_point.ravel(), random_coefficients)

    if self.is_leaf():

        return 0  # Next node is the center leaf child

    else:

        if (main_term - percentile) >= 0:  # Hyper-plane equation defined in the document

            return -1  # Next node is the left child

        else:

            return 1  # Next node is the right child

self.ratio正如上面链接的算法中提到的,正在确定树的平衡和浅层,1/2它应该生成最平衡和浅层的树。


然后我们进入迭代部分,树不断地划分空间,直到它到达叶节点(注意关键字reach),问题是,它永远不会真正到达叶节点。


因为,上面链接的文档中叶节点的定义是这样的:


def is_leaf(self):

    return (self.capacity * self.ratio) <= self.cell_count() <= self.capacity

其中self.cell_count()是单元格中self.capacity的点数,是单元格可以拥有的最大点数,self.ratio是分割比。


我的完整代码应该通过在初始迭代时创建新节点来划分特征空间,直到创建叶节点(但从未创建叶节点)。请参阅包含除法过程的片段。

http://img1.mukewang.com/614bd3c900019d0907550264.jpg

在我们向它们添加任何点之前是否准备好二叉树(填充空节点)?如果是这样,我们不需要定义树的级别(深度)吗?

如果不是,是否在向其中添加点时创建了二元分区树?如果是这样,那么第一个点(来自第一次迭代)是如何添加到树中的?


拉莫斯之舞
浏览 143回答 2
2回答

偶然的你

它们是在您进行时构建的。第一个节点在第 1 行的右侧或左侧。然后下一个级别询问第 2 行的右侧或左侧……您提供的论文中的插图显示了这一点,其中的行编号与用于查找节点的选择相关联。当然,向右或向左是不准确的。一些线被水平切割。但是创建的空间是二进制的。

阿晨1998

我已经能够测试评论中提到的新方法,并且效果很好。上面链接的算法隐含地指出,该点应单独下降到分区树中,通过所有随机测试并在下降时创建新节点。但是这种方法有一个明显的问题,因为为了获得平衡的高效浅层树,左右节点必须均匀分布。因此,为了分裂节点,在树的每一层,节点的每个点都必须传递到左节点或右节点(通过随机测试),直到树到达该层所有节点的深度叶子。在数学术语中,根节点包含一个向量空间,该向量空间被分为左右两个节点,其中包含以分离超平面支撑超平面为边界的凸多面体:等式的负项(我相信我们可以称之为偏差),是分裂比开始发挥作用的地方,它应该是 100*r 到 100*(1-r) 之间所有节点的百分位数,这样树就被分离了更均匀,更浅。基本上它决定了超平面分离应该如何均匀,这就是为什么我们需要包含所有点的节点。我已经能够实现这样的系统:def index_space(self):&nbsp; &nbsp; shuffled_space = self.shuffle_space()&nbsp; &nbsp; current_tree = PartitionTree()&nbsp; &nbsp; level = 0&nbsp; &nbsp; root_node = RootNode(shuffled_space, self.capacity, self.split_ratio, self.indices)&nbsp; &nbsp; current_tree.root_node = root_node&nbsp; &nbsp; current_tree.node_array.append(root_node)&nbsp; &nbsp; current_position = root_node.node_position&nbsp; &nbsp; node_array = {0: [root_node]}&nbsp; &nbsp; while True:&nbsp; &nbsp; &nbsp; &nbsp; current_nodes = node_array[level]&nbsp; &nbsp; &nbsp; &nbsp; if all([node.is_leaf() for node in current_nodes]):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; level += 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node_array[level] = []&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for current_node in current_nodes:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if not current_node.is_leaf():&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; left_child = InternalNode(self.capacity, self.split_ratio, self.indices,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self._append(current_position, [-1]), current_node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; right_child = InternalNode(self.capacity, self.split_ratio, self.indices,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;self._append(current_position, [1]), current_node)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for point in current_node.list_points():&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if current_node.random_test(point) == 1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; right_child.add_point(point)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; left_child.add_point(point)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; node_array[level].extend([left_child, right_child])其中node_array包含树的所有节点(根、内部和叶)。不幸的是,node.random_test(x)方法:def random_test(self, main_point):&nbsp; &nbsp; random_coefficients = self.random_coefficients()&nbsp; &nbsp; scale_values = [np.inner(self.random_coefficients(), point[:self.indices].ravel())&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for point in self.points]&nbsp; &nbsp; percentile = np.percentile(scale_values, self.ratio * 100)&nbsp; &nbsp; main_term = np.inner(main_point[:self.indices].ravel(), random_coefficients)&nbsp; &nbsp; if self.is_leaf():&nbsp; &nbsp; &nbsp; &nbsp; return 0&nbsp; # Next node is the center leaf child&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; if (main_term - percentile) >= 0:&nbsp; # Hyper-plane equation defined in the document&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return -1&nbsp; # Next node is the left child&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return 1&nbsp; # Next node is the right child效率低下,因为计算百分位数需要太多时间。因此,我必须找到另一种计算百分位数的方法(也许通过执行短路二分搜索来优化百分位数)。结论:这只是克林顿·雷·穆里根 (Clinton Ray Mulligan) 答案的一个大扩展——它简要解释了创建此类树的解决方案,因此仍将作为公认的答案。我刚刚添加了更多细节,以防有人对实现随机二叉树感兴趣。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python