首页 > 解决方案 > 使用加权 t 检验汇总多列

问题描述

我有以下数据并想计算加权 p 值。我使用 t.test 查看了 dplyr 汇总多个列。但我的版本应该使用重量。我可以使用 Code2 来做到这一点。但是有超过30列。如何有效地计算加权 p 值?

代码 1

# A tibble: 877 x 5
   cat     population farms farmland weight
   <chr>        <dbl> <dbl>    <dbl>  <dbl>
 1 Treated       9.89  8.00     12.3  1    
 2 Control      10.3   7.81     12.1  0.714
 3 Control      10.2   8.04     12.4  0.156
 4 Control      10.3   7.97     12.1  0.340
 5 Control      10.9   8.87     12.7  2.85 
 6 Control      10.4   8.35     12.5  0.934
 7 Control      10.5   8.58     12.9  0.193
 8 Control      10.6   8.57     12.6  0.276
 9 Control      10.2   8.54     12.5  0.344
10 Control      10.5   8.76     12.6  0.625
# … with 867 more rows

代码 2

wtd.t.test(
  x = df$population[df$cat == "Treated"],
  y = df$population[df$cat == "Control"],
  weight = df$weight[df$cat == "Treated"],
  weighty = df$weight[df$cat == "Control"])$coefficients[3]

标签: rstatisticstidyverse

解决方案


我们可以summarise使用across

library(dplyr)
df %>%
   summarise(across(c(population:farmland),
   ~ weights::wtd.t.test(x = .[cat == 'Treated'],
                         y = .[cat == 'Control'], 
                         weight = weight[cat == 'Treated'],
                         weighty= weight[cat == 'Control'])$coefficients[3]))

或使用lapply/sapply

sapply(df[2:4], function(v)
         weights::wtd.t.test(x = v[df$cat == "Treated"],
                             y = v[df$cat == "Control"],
                             weight = df$weight[df$cat == "Treated"],
                   weighty = df$weight[df$cat == "Control"])$coefficients[3])

推荐阅读