姚伟峰
[yaoweifeng0301@126.com]
http://www.cnblogs.com/Matrix_Yao/
问题
MIP (Maximum Inner Product)
-
输入
-
查询向量(query):
-
底库(database):
, 其中
-
-
输出
-
底库中与查询向量点积相似度最大的k个向量:
-
MCS (Maximum Cosine Similarity)
-
输入
- 查询向量(query):
-
底库(database):
, 其中
- 查询向量(query):
-
输出
-
底库中与查询向量点积相似度最大的k个向量:
-
转换
MIP
L2
通过保序变换(Ordering Preserving Transformation):
设, 对每个查询向量
和库向量
分别作如下变换:
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827461-228074887.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827278-1556939243.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827239-43594218.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827214-755556537.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827431-1446332725.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827175-2087929021.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827362-174162307.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827362-643710725.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827142-1344546695.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827091-475585545.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827335-206258103.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827278-1556939243.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827239-43594218.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827214-755556537.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827431-1446332725.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827175-2087929021.png)
MCS
L2
Cosine相似性是归一化后的IP距离:
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827404-1005852382.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827175-2087929021.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827430-1051346855.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827232-80788411.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827249-412643324.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827488-967169710.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827237-99682464.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827431-1446332725.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827214-755556537.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827431-1446332725.png)
![](https://img2022.cnblogs.com/blog/46419/202202/46419-20220217193827175-2087929021.png)
实操适用
IVF Based Indexing, 使用方式:
-
训练阶段不使用变换,召回阶段使用变换
支持
训练阶段还是使用IP或者cosine相似性构建索引, 召回阶段使用相应的变换L2距离召回。 -
训练阶段、召回阶段都使用变换
-
MIP: 支持,但需要修改训练过程。需要注意:在训练阶段,质心是
,因此每一轮迭代算出新的质心后,需要先计算把所有质心按照上述的变换重新完整做一遍
维到
。
-
MCS: 支持,但需要修改训练过程。需要注意:在训练阶段,质心是
,因此每一轮迭代算出新的质心后,需要先计算把所有质心重新做一遍归一化。
-