首页 > 解决方案 > 如何将变量名称的数据框和另一个与回归数据匹配?

问题描述

我有两个数据框:

x = data.frame(Var1= c("A", "B", "C", "D","E"),Var2=c("F","G","H","I","J"),
    Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18), B= c(15, 16, 17, 14,18),C= c(17, 22, 23, 24,18), D= c(11, 12, 13, 34,18),E= c(11, 5, 13, 55,18),  F= c(8, 12, 13, 14,18),G= c(7, 5, 13, 14,18),
    H= c(8, 12, 13, 14,18), I= c(9, 5, 13, 14,18), J= c(11, 12, 13, 14,18))

Var3 <- rep("time", each=length(x$Var1))

x=cbind(x,Var3)

time=seq(1:length(y[,1]))
y=cbind(y,time)

> x
  Var1 Var2 Value Var3
1    A    F    11 time
2    B    G    12 time
3    C    H    13 time
4    D    I    14 time
5    E    J    18 time
> y
   A  B  C  D  E  F  G  H  I  J time
1 11 15 17 11 11  8  7  8  9 11    1
2 12 16 22 12  5 12  5 12  5 12    2
3 13 17 23 13 13 13 13 13 13 13    3
4 14 14 24 34 55 14 14 14 14 14    4
5 18 18 18 18 18 18 18 18 18 18    5

看着xDF,我有变量A,并且F作为第一行。我想在yDF 中选择这两个变量并实现一个简单的回归:lm(A ~ F, data = y),并将结果保存在列表的第一个位置。x我将对实现回归的 DF的第二行做同样的事情lm(B ~ G, data = y)

我如何将变量名称与回归x数据匹配?y


修改后的问题:更复杂的回归Var1 ~ Var2 + Var3如何?

标签: rregressionformulalinear-regressionlm

解决方案


x = data.frame(Var1= c("A", "B", "C", "D","E"),
               Var2=c("F","G","H","I","J"),
               Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18),
               B= c(15, 16, 17, 14,18),
               C= c(17, 22, 23, 24,18),
               D= c(11, 12, 13, 34,18),
               E= c(11, 5, 13, 55,18),
               F= c(8, 12, 13, 14,18),
               G= c(7, 5, 13, 14,18),
               H= c(8, 12, 13, 14,18), 
               I= c(9, 5, 13, 14,18),
               J= c(11, 12, 13, 14,18))

我们可以用

fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
                                              data = quote(y)))

modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))

modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115  

评论:

  1. 的用途do.call是确保reformulate在传递给 时对其进行评估lm。这是需要的,因为它允许函数update在模型对象上正常工作。请参阅在公式中显示字符串,而不是在 lm fit 中作为变量。进行比较:

    oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
              as.character(x$Var2), as.character(x$Var1))
    oo[[1]]
    #Call:
    #lm(formula = reformulate(RHS, LHS), data = y)
    #
    #Coefficients:
    #(Intercept)            F  
    #     4.3500       0.7115  
    
  2. as.characteronx$Var1和是必要的x$Var2,因为这两个变量目前是“因子”变量而不是字符串,reformulate不能使用它们。如果您stringsAsFactors = FALSEdata.frame构建时放入x,则不存在此类问题。

它对你有用吗?它不应该有一个“for”循环吗?

Map函数隐藏了“for”循环。它是mapply函数的包装器。R 中的*apply族函数是一种语法糖


更新您修改后的问题

您最初的问题是将模型公式构建为Var1 ~ Var2.

你的新问题想要Var1 ~ Var2 + Var3

x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))

## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5  ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS)  ## `fitmodel` function unchanged
modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept)            F         time  
#        5.6          0.5          0.5  

推荐阅读