首页 > 解决方案 > Applying lm() using sapply or lapply

问题描述

So I'm trying to use lm() with sapply.

#example data and labels
data <- matrix(data = runif(1000), nrow = 100, ncol = 10))
markers <- sample(0:1, replace = T, size = 100)

# try to get linear model stuff
Lin <- sapply(data, function(x) lm(unlist(markers) ~ unlist(x))$coefficients)

MY problem is that this gives me coefficients for 1000 equations rather than 10

标签: rlapplysapply

解决方案


You need to supply sapply with a data frame, not a matrix.

#example data and labels
data <- data.frame(matrix(data = runif(1000), nrow = 100, ncol = 10))
markers <- sample(0:1, replace = T, size = 100)

# try to get linear model stuff
sapply(data, function(x) coef(lm(markers ~ x)))    
sapply(data, function(x) coef(lm(markers ~ x))[-1]) # Omit intercepts
        X1.x         X2.x         X3.x         X4.x         X5.x 
 0.017043626  0.518378546 -0.011110972 -0.145848478  0.335232991 
        X6.x         X7.x         X8.x         X9.x        X10.x 
 0.015122184  0.001985933  0.191279594 -0.077689961 -0.107411203

Your original matrix fails:

data <- matrix(data = runif(1000), nrow = 100, ncol = 10)
sapply(data, function(x) coef(lm(markers ~ x)))    
# Error: variable lengths differ (found for 'x')

Because sapply, which calls lapply, will convert its first argument, X, to a list using as.list before performing the function. But as.list applied to a matrix results in list with length equal to the number of entries in the matrix, in your case 1,000. as.list when applied to a data frame results in a list with length equal to the number of columns of the data frame, in your case 10, with the elements containing the values in each column.


> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<bytecode: 0x000002397f5ce508>
<environment: namespace:base>

推荐阅读