首页 > 解决方案 > 组间相关性

问题描述

我有一个数据框 df。我需要找到组间 ColE 和 ColF 之间的相关性。

   df = structure(list(ColA = c("A", "A", "A", "B", "B"), ColB = c("L", 
   "L", "L", "L", "K"), ColC = c("Sup1", "Sup1", "Sup2", "Sup1", 
   "Sup1"), ColD = c("Jan", "Feb", "Mar", "Apr", "May"), ColE = c(56, 
   59, 68, 45, 45), ColF = c(58, 60, 90, 65, 59)), row.names = c(NA, 
   -5L), class = c("tbl_df", "tbl", "data.frame"))
   ColA    ColB      ColC      ColD      ColE       ColF
    A       L         Sup1      Jan       56         58
    A       L         Sup1      Feb       59         60
    A       L         Sup2      Mar       68         90
    B       L         Sup1      Apr       45         65
    B       K         Sup1      May       45         59

对于 ColA、ColB 之间的组,我需要找到相关性,因此输出应该像

   New ColA     New ColB       Correlation coeff
      A            L                   ---
      B            L                   ---
      B            K                   ---

同样,如果我需要在其他组中找到cor coeff,例如

     New ColA     New ColB      New ColC    Correlation coeff
      A            L               Sup1               ---
      A            L               Sup2               ---
      B            L               Sup1               ---   
      B            K               Sup1               --- 

有没有办法解决这个问题?

标签: r

解决方案


data.table

> data.table(df)[,j=list(kor=cor(ColE,ColF)),by=list(ColA,ColB)]

   ColA ColB      kor
1:    A    L 0.982613
2:    B    L       NA
3:    B    K       NA

推荐阅读