首页 > 解决方案 > R:R中的矩阵代数:将值分配给另一个矩阵和表中的空矩阵

问题描述

我有一个跨越许多站点的物种存在和不存在的稀疏矩阵,即按物种矩阵的站点。我还有另一张分组的物种表。

我想要的是一个新的矩阵(也是一个站点 x 物种矩阵),它捕获每个集群中每个物种的数量。

我有一个for 循环可以完成这项工作,但我的原始矩阵非常大,包含数千行和列,并且for 循环在计算上会很昂贵。关于如何使用矩阵代数或更优雅的方法来实现这一点的任何想法?

我的数据集安排如下:

# 1. Species grouped in clusters

memb <- c(1,2,3,4,2,5)
names(memb) <- c("s1", "s2", "s3", "s4", "s5", "s6")
 

# 2. Number of clusters

z <- length(unique(memb))
z

[1] 5


# 3. Community matrix of species presence-absence across sites
d <- data.frame(grids=c("v1", "v1", "v2", "v2", "v3", "v3", "v3", 
                         "v1", "v3", "v4", "v2", "v1", "v4", "v3"),
                 sp=c("s1", "s3", "s2", "s3", "s1", "s2", "s3", 
                           "s4", "s4", "s4", "s5", "s6", "s5", "s5"))
M <- as.data.frame.matrix(table(d$grids, d$sp))

library(Matrix)
M <- Matrix(as.matrix(M), sparse=TRUE)
M

4 x 6 sparse Matrix of class "dgCMatrix"
   s1 s2 s3 s4 s5 s6
v1  1  .  1  1  .  1
v2  .  1  1  .  1  .
v3  1  1  1  1  1  .
v4  .  .  .  1  1  .

# 4. create empty matrix collapsed proportional to the original diversity.
cluster_comm <- Matrix(0, dim(M)[[1]], z)
tmp_comm <- Matrix(0, dim(M)[[1]], dim(M)[[2]])
rownames(cluster_comm) <- rownames(M)
rownames(tmp_comm) <- rownames(M)
colnames(tmp_comm) <- names(memb)[order(memb, decreasing = FALSE)]
colnames(cluster_comm) <- colnames(M)[1:z]

# 5. For loop to add species to reduced empty matrix based on group membership
for (m in 1:dim(M)[[1]]) {
  tmp_comm[m, ] <- as.numeric(M[m, names(memb)[order(memb,decreasing = FALSE)]])
  for (i in 1:z) {
    names <- names(memb)[memb == i]
    cluster_comm[m, i] <- sum(tmp_comm[m, names])
    colnames(cluster_comm)[i] <- names[[1]]
  }
}

# Expected outcome
cluster_comm

4 x 5 sparse Matrix of class "dgCMatrix"
   s1 s2 s3 s4 s6
v1  1  .  1  1  1
v2  .  2  1  .  .
v3  1  2  1  1  .
v4  .  1  .  1  .

标签: rmatrixsparse-matrixmatrix-multiplication

解决方案


从第 3 步开始,丑陋的解决方案可能会更快一些。不确定这是否对merge您的情况有意义,请使用更大的数据进行测试。更改Mdata.table也可能会提高性能。

library(magrittr)
library(reshape2)
M <- as.data.frame.matrix(table(d$grids, d$sp))

M %>% cbind(.,rw = rownames(.)) %>%
      melt %>%
      merge(.,
             (memb %>% 
                   melt %>% 
                   cbind(., gr = rownames(.))), by.x = 'variable',
              by.y = 'gr',
              all.x = T) %>%
       dcast(., rw ~ value.y , value.var = 'value.x', fun.aggregate = sum )

应该返回:

Using rw as id variables
  rw 1 2 3 4 5
1 v1 1 0 1 1 1
2 v2 0 2 1 0 0
3 v3 1 2 1 1 0
4 v4 0 1 0 1 0

稀疏矩阵:

假设M是一个稀疏矩阵,

library(data.table)
m2 <- as(M, "dgTMatrix")
dt2 <- data.table(row=m2@i+1, col=m2@j+1, value=m2@x)

#either match names of memb with cols, or the other way around..
names(memb) %<>% gsub('s','',.) 
dt2 %>% merge(., 
              memb %>%
                   melt %>%
                   cbind(., rw = rownames(.) %>% as.numeric),
              by.x = 'col',
              by.y = 'rw', 
              all.x=T ) %>% 
        dcast(., row ~ value.y , value.var = 'value.x', fun.aggregate = sum )

应该返回:

row 1 2 3 4 5
1:   1 1 0 1 1 1
2:   2 0 2 1 0 0
3:   3 1 2 1 1 0
4:   4 0 1 0 1 0

推荐阅读