首页 > 解决方案 > R:计算每个模型的行数,不包括某些变量(如果存在)

问题描述

我有一个看起来像这样的表:

modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))

我想计算每个模型中除截距、月份、比率差异之外的变量数。我想要的输出是:

modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))

我尝试使用以下方法获取标志:

modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1, 
                  function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))

grep(month)不起作用。

modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1, 
                  function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))

这有效,但后缀后面的月份未被捕获。

我想要在变量intercept、month和RateDiff上从sql中获得与~ilike~等价的东西,因为我不希望它区分大小写,并且希望允许在变量上添加后缀和前缀。我怎么能做到这一点?

标签: rsumfrequency

解决方案


这是一种方法dplyr-

modelsummary %>% 
  mutate(
    variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>% 
      n_distinct()
  )

          term mod_id variables
1  (Intercept)      1         3
2       month1      1         3
3       month2      1         3
4     RateDiff      1         3
5         var1      1         3
6         var2      1         3
7         var3      1         3
8  (Intercept)      2         3
9       month1      2         3
10        var1      2         3
11        var2      2         3
12        var3      2         3

或使用dplyrand stringr

modelsummary %>%
  mutate(
    variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>% 
      n_distinct()
  )

如果要计算每个变量的数量,请group_by(mod_id)在.mutatemod_id

在基础 R -

modelsummary$variables <- with(modelsummary, 
               term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>% 
               unique() %>% length()
               )

推荐阅读