首页 > 解决方案 > 基本 R cor() 函数的结果与Recommenderlab 包中的similarity() 函数的结果不同?

问题描述

谁能解释为什么这两个相关矩阵返回不同的结果?

library(recommenderlab)
data(MovieLense)
cor_mat <- as( similarity(MovieLense, method = "pearson", which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print( cor_mat_base[1:5, 1:5] )

标签: rrecommender-systemsrecommenderlab

解决方案


R基dissimilarity() = 1 - pmax(cor(), 0)函数。此外,重要的是要指定method它们两者使用相同的:

library("recommenderlab")
data(MovieLense)
cor_mat <- as( dissimilarity(MovieLense, method = "pearson", 
                          which = "items"), "matrix" )
cor_mat_base <- suppressWarnings( cor(as(MovieLense, "matrix"), method = "pearson"
                                      , use = "pairwise.complete.obs") )
print( cor_mat[1:5, 1:5] )
print(1- cor_mat_base[1:5, 1:5] )

> print( cor_mat[1:5, 1:5] )
                  Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995)         0.0000000        0.7782159         0.8242057         0.8968647      0.6135248
GoldenEye (1995)         0.7782159        0.0000000         0.7694644         0.7554443      0.7824406
Four Rooms (1995)        0.8242057        0.7694644         0.0000000         1.0000000      0.8153877
Get Shorty (1995)        0.8968647        0.7554443         1.0000000         0.0000000      1.0000000
Copycat (1995)           0.6135248        0.7824406         0.8153877         1.0000000      0.0000000
> print(1- cor_mat_base[1:5, 1:5] )
                  Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995)
Toy Story (1995)         0.0000000        0.7782159         0.8242057         0.8968647      0.6135248
GoldenEye (1995)         0.7782159        0.0000000         0.7694644         0.7554443      0.7824406
Four Rooms (1995)        0.8242057        0.7694644         0.0000000         1.2019687      0.8153877
Get Shorty (1995)        0.8968647        0.7554443         1.2019687         0.0000000      1.2373503
Copycat (1995)           0.6135248        0.7824406         0.8153877         1.2373503      0.0000000

要很好地理解它,请检查两个包的详细信息:)。

OP/编辑: 重要的是要指出,偶数1-dissimilarity和之间有一些值略有不同corcor大于 1。这是因为dissimilarity()将下限设置为 0(即不返回负数),并且也在做cor()可能返回大于 1 的值。https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/cor 他们只指定

For r <- cor(*, use = "all.obs"), it is now guaranteed that all(abs(r) <= 1).

这应该被评估。


推荐阅读