首页 > 解决方案 > 自动化(手动)R 代码以计算 P 值

问题描述

我正在尝试自动化计算 p 值的以下 R 代码。数据为 csv 格式(在 excel 中)。我有每个部分及其版本的点击次数和打开次数。如果有人可以帮助应用任何循环或其他东西。

我有 .csv 格式的数据:

Section Version A   Version B   Version C   Version D
Section 1   2967    3353             495    559
Section 2   4840    4522             285    266
Section 3   
Section 4   
Section 5   
Main emailbody                  
Total email                 
          


Version # Opens
A    18223
B    
C    
D    

方法 1-(从 csv 文件手动分配数据):

S1_Click_A=2967 #(section 1, email A)
S1_Click_B=3353 #(section 1, email B)
S1_Click_C=495
S1_Click_D=559
S2_Click_A=4840
...
S5_Click_D=154
MainBody_Click_A=12408
...
MainBody_Click_D=260
TotalEmail_Click_A=13525
..
TotalEmail_Click_D=248`

#no. email opens
Open_A=18223
Open_B=18368
Open_C=18223
Open_D=18368


#to test % total click is the comparable across versions
#section 1 test 
S1ab <- prop.test(x = c(S1_Click_A,S1_Click_B), n = c(Open_A,Open_B))
...
S1cd <- prop.test(x = c(S1_Click_C,S1_Click_D), n = c(Open_C,Open_D))

#section 2 test
S2ab <- prop.test(x = c(S2_Click_A,S2_Click_B), n = c(Open_A,Open_B))
...
S2cd <- prop.test(x = c(S2_Click_C,S2_Click_D), n = c(Open_C,Open_D))

#similarly for section 3,4 and 5

#Main body test
MainBodyab <- prop.test(x = c(MainBody_Click_A,MainBody_Click_B), n = 
c(Open_A,Open_B))
MainBodyac <- prop.test(x = c(MainBody_Click_A,MainBody_Click_C), n = 
c(Open_A,Open_C))
...
MainBodycd <- prop.test(x = c(MainBody_Click_C,MainBody_Click_D), n = 
c(Open_C,Open_D))

#Total Email test
 TotalEmailab <- prop.test(x = c(TotalEmail_Click_A,TotalEmail_Click_B), n 
 =c(Open_A,Open_B))
 ```
 TotalEmailcd <- prop.test(x = c(TotalEmail_Click_C,TotalEmail_Click_D), n 
 = c(Open_C,Open_D))

#FINAL P VALUE
S1ab$p.value
S1ac$p.value

方法2

# no. email opens
open <- 
c(
Open_A=18223,
Open_B=18368,
Open_C=18223,
Open_D=18368
)

s1 <- c(
S1_Click_A=2967, #(section 1, email A)
S1_Click_B=3353, #(section 1, email B)
S1_Click_C=495,
S1_Click_D=559
)

open_comb <- combn(names(open), 2)
s1_comb <- combn(names(s1), 2)
res_names <-  combn(c("A", "B", "C", "D"), 2)

# to test % total click is the comparable across versions`
# section 1 test`
result1 <- list()
for(k in 1:length(open)){
result1[[paste0("s1", res_names[1, k], res_names[2, k])]] <- prop.test(x = 
s1[s1_comb[,k]], n = open[open_comb[,k]])
}
result_section1 <- c (ress1$s1AB$p.value, ress1$s1AC$p.value, 
ress1$s1AD$p.value, ress1$s1BC$p.value, ress1$s1BD$p.value, 
ress1$s1CD$p.value)
result_section1

但是,此自动代码仅为以下组合提供 P 值:AB、AC、AD、BC,而不是 BD 和 CD。可能是因为打开的长度,即只有 4(请帮助解决)

I expect:
1. I want to read the input data directly from the csv. I mean reading the 
   section 1 version A data i.e 2967 then assign the same to 
   S1_Click_A=2967 variable and similarly for others.
2. Fix the code to provides P values only for all combination: AB, AC, AD, BC,BD and CD.

输入(数据)

structure(list(Section = structure(c(2L, 3L, 4L, 5L, 6L, 1L, 7L), .Label = 
c("Main email body", "Section 1", "Section 2", "Section 3", "Section 4", 
"Section 5", "Total email"), class = "factor"), Version.A = c(2967L, 4840L, 
2508L, 2093L, 1117L, 12408L, 13525L), Version.B = c(3353L, 4522L, 2250L, 
1333L, 925L, 11458L, 12383L), Version.C = c(495L, 285L, 228L, 209L, 186L, 
282L, 271L), Version.D = c(559L, 266L, 205L, 133L, 154L, 260L, 248L)), class 
= "data.frame", row.names = c(NA, -7L ))

标签: r

解决方案


考虑将您的数据从原来的宽格式重塑为长格式。然后prop.test按每个Section并跨Version的所有组合运行。下面构建了一个元素列表,其中包含prop.test每 7 个部分的所有 6 个组合的结果(包括但不限于 p 值)。

数据

txt <- '"Section" "Version A"   "Version B"   "Version C"   "Version D"
"Section 1"   2967    3353             495    559
"Section 2"   4840    4522             285    266
"Section 3"   2508    2250             228    205
"Section 4"   2093    1333             209    133
"Section 5"   1117    925              186    154
"Main emailbody"  12408   11458        282    260
"Total email" 13525   12383            271    248'

df <- read.table(text = txt, header = TRUE)

open_df <- data.frame(Version = c("A", "B", "C", "D"),
                      Open = c(18223, 18368, 18223, 18368))

reshape+by

# RESHAPE WIDE TO LONG
rdf <- reshape(df, idvar = "Section", varying = list(names(df)[-1]),
               times = names(df)[-1], v.names = "Value", timevar = "Version",
               new.row.names = 1:1E5, direction = "long")

rdf$Version  <- gsub("Version.", "", rdf$Version)

# SUBSET BY SECTION AND RUN prop.test ON ALL COMBS
prop_test_list <- by(rdf, rdf$Section, function(sub) {
    pairs <- combn(sub$Version, 2, simplify = FALSE)

    sapply(pairs, function(item) 
             prop.test(x = sub$Value[sub$Version %in% item], 
                       n = open_df$Open[open_df$Version %in% item])
          )
})

Rextester 演示


推荐阅读