arrays - 两个矩阵上的非成对距离计算,作为整体矩阵到矩阵的接近度计算
问题描述
我想计算两个矩阵的整体接近度。矩阵非常大。如果我们进行成对计算,我们不只是在每个地方单独计算每个数字的距离吗?对这些值求和有意义吗?像 KNN 这样介于两者之间的点在这里有意义吗?
我有欧几里得距离和余弦相似度,但它们是成对的。我想同时比较所有的向量。
def euclidean_distance_of_two_matrices(x, y):
A = np.concatenate([np.array(list (x['topics'].values())), np.array(list (x['tags'].values()))])
print(A)
B = np.concatenate([np.array(list (x['topics'].values())), np.array(list (x['tags'].values()))])
print(B)
print("similarity pairwise here")
print(scipy.spatial.distance_matrix(A, B))
Returns:
[[-0.66103 0.27502 -0.4007 ... -1.2427 0.2829 -0.79741 ]
[-0.27628 0.13999 0.098519 ... -0.15686 -0.14187 -0.26488 ]
[-0.11585 -0.05561 0.32372 ... -0.22155 -0.30258 -0.26258 ]
[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[ 0.067032 -0.10813 0.44981 ... -0.15073 -0.25662 0.08055 ]
[-0.16147 0.040132 0.66291 ... -0.41689 0.0051422 0.6892 ]]
[[-0.66103 0.27502 -0.4007 ... -1.2427 0.2829 -0.79741 ]
[-0.27628 0.13999 0.098519 ... -0.15686 -0.14187 -0.26488 ]
[-0.11585 -0.05561 0.32372 ... -0.22155 -0.30258 -0.26258 ]
[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[ 0.067032 -0.10813 0.44981 ... -0.15073 -0.25662 0.08055 ]
[-0.16147 0.040132 0.66291 ... -0.41689 0.0051422 0.6892 ]]
0.0
similarity pairwise here
[[ 0. 9.25525143 9.31467432 10.85744589 8.96540006 10.48973378]
[ 9.25525143 0. 7.67091186 9.85484951 7.510602 7.80745963]
[ 9.31467432 7.67091186 0. 9.47533533 7.48946392 8.78024467]
[10.85744589 9.85484951 9.47533533 0. 9.18870369 9.19037631]
[ 8.96540006 7.510602 7.48946392 9.18870369 0. 7.85130008]
[10.48973378 7.80745963 8.78024467 9.19037631 7.85130008 0. ]]
def cosine_similarity_distance_of_two_matrices(x, y):
A = np.concatenate([np.array(list (x['topics'].values())), np.array(list (x['tags'].values()))])
print(A)
B = np.concatenate([np.array(list (x['topics'].values())), np.array(list (x['tags'].values()))])
print(B)
def sklearn_cosine():
similarity = cosine_similarity(A, B)
print("similarity pairwise here")
print(similarity)
sklearn_cosine()
Returns:
[[-0.66103 0.27502 -0.4007 ... -1.2427 0.2829 -0.79741 ]
[-0.27628 0.13999 0.098519 ... -0.15686 -0.14187 -0.26488 ]
[-0.11585 -0.05561 0.32372 ... -0.22155 -0.30258 -0.26258 ]
[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[ 0.067032 -0.10813 0.44981 ... -0.15073 -0.25662 0.08055 ]
[-0.16147 0.040132 0.66291 ... -0.41689 0.0051422 0.6892 ]]
[[-0.66103 0.27502 -0.4007 ... -1.2427 0.2829 -0.79741 ]
[-0.27628 0.13999 0.098519 ... -0.15686 -0.14187 -0.26488 ]
[-0.11585 -0.05561 0.32372 ... -0.22155 -0.30258 -0.26258 ]
[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[ 0.067032 -0.10813 0.44981 ... -0.15073 -0.25662 0.08055 ]
[-0.16147 0.040132 0.66291 ... -0.41689 0.0051422 0.6892 ]]
similarity pairwise here
[[ 0.99999964 0.2620098 0.20013984 0.0369565 0.16275422 -0.0098459 ]
[ 0.2620098 1.0000001 0.38454026 0.11589497 0.32478386 0.36860135]
[ 0.20013984 0.38454026 0.99999976 0.12082676 0.24987505 0.12946124]
[ 0.0369565 0.11589497 0.12082676 0.9999998 0.0517699 0.18130727]
[ 0.16275422 0.32478386 0.24987505 0.0517699 0.99999976 0.18552886]
[-0.0098459 0.36860135 0.12946124 0.18130727 0.18552886 0.9999999 ]]
解决方案
def euclidean_distance_of_two_matrices(x, y):
A = np.concatenate([np.array(list (x['topics'].values())), np.array(list (x['tags'].values()))])
print("Matrix 1")
print(A)
B = np.concatenate([np.array(list (y['topics'].values())), np.array(list (y['tags'].values()))])
print("Matrix 2")
print(B)
print("The Euclidean Distance Is")
dist_squared = np.sum(np.square(A - B))
print(dist_squared)
Returns:
Matrix 1
[[-0.66103 0.27502 -0.4007 ... -1.2427 0.2829 -0.79741 ]
[-0.27628 0.13999 0.098519 ... -0.15686 -0.14187 -0.26488 ]
[-0.11585 -0.05561 0.32372 ... -0.22155 -0.30258 -0.26258 ]
[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[ 0.067032 -0.10813 0.44981 ... -0.15073 -0.25662 0.08055 ]
[-0.16147 0.040132 0.66291 ... -0.41689 0.0051422 0.6892 ]]
Matrix 2
[[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[-0.36459 0.11409 0.060372 ... -0.29483 0.11534 -0.25252 ]
[-0.68088 -0.3137 0.27078 ... 0.24844 -0.043204 -0.10115 ]
[-0.68621 -0.21032 0.30084 ... -0.038338 -0.44363 0.17988 ]
[ 0.19509 0.23105 -0.25786 ... -0.19888 -0.060843 -0.081697 ]
[-0.16147 0.040132 0.66291 ... -0.41689 0.0051422 0.6892 ]]
The Euclidean Distance Is
212.88983
不确定我使用的 scipy.spatial.distance_matrix(A, B) 成对差异是什么,但这为我返回了正确的结果。
推荐阅读
- python - Matlab 错误:找不到 Qt 平台插件“windows”
- javascript - 如何在 JavaScript 中获取匹配的正则表达式内容?
- kubernetes-helm - 我应该添加什么 helm 存储库?
- gremlin - Gremlin 查询适用于 TinkerGraph、JanusGraph 和 Neo4j,但不适用于 DSE Graph 6.8.1
- html - 我想用 span 标签触发点击事件但没有响应
- ms-access - 在设计视图中拖动以重新定位或调整控件的大小不具有平滑效果
- swiftui - 将地图放在另一个视图 Swift UI 上方
- javascript - Fullcalender V5 如何获取事件的 ResourceId
- c# - 在哪里可以找到 .Net Core 字符串扩展方法“Substring”的源代码?
- javascript - 我可以同时设置.State 和 Sort 吗?