algorithm - KD Tree: meaning of `leafsize` parameter
问题描述
I am looking at SciPy's KD Tree implementation. I am confused by the leafsize
parameter, which is said to be
The number of points at which the algorithm switches over to brute-force. Default: 16.
Is this the number of leaves in the BST? If so, that means by default, the KD tree will contain no more than 32 points. This seems unreasonably small, especially for my use case of k=2. Am I interpreting the parameter incorrectly?
解决方案
The parameter leafsize
does not control the maximum number of leaves in the tree (which isn't bounded), but rather the number of input points associated with a single leaf in the tree. Consider 64 points being added to a tree. If each point gets associated with a single leaf, you will get a full tree that is seven levels deep and any processing requiring a point will descend these seven levels to find it. Alternatively, if the leafsize is 16, you will get a tree that is only 3 levels deep and each leaf in the tree is associated with 16 points. Processing a point involves descending three levels of the tree and then testing each of the 16 points in the leaf (since they are unordered). This value is can be tuned for performance, and empirically, the best value depends on how exactly the tree is being used.
Conceptually, here is an example of the trees that would be built for different leafsize
values.
You can see how the leafsize
parameter short-circuits the tree building process in the scipy source code. This will leave between leafsize/2
and leafsize
points unordered in the leafs of the tree.
推荐阅读
- google-cloud-platform - 使用无服务器 VPC 连接器连接到 Cloud Run 服务时出现 403
- python - 使用 apache 气流将 Spark 作业结果加载到 BigQuery 时出错
- python - tk.Frame() 的背景被 tk.Label() 背景覆盖
- javascript - 是否有可以将属性用作枚举索引的 Javascript 数据结构?
- python - 如何让 continue 语句进入最外层循环?
- python - 在另一个只有 main 方法的模块中创建实例无缘无故地失败
- rust - 使用 Pest.rs,如何将捕获添加到树中?
- python - 为什么数据排序值返回无?
- python - Pycharm Constructor IDE - TypeError: __init__() 采用 1 个位置参数,但给出了 4 个
- javascript - 从 api 提取信息时如何降低网页上的 kb 负载