r - 自动化(手动)R 代码以计算 P 值
问题描述
我正在尝试自动化计算 p 值的以下 R 代码。数据为 csv 格式(在 excel 中)。我有每个部分及其版本的点击次数和打开次数。如果有人可以帮助应用任何循环或其他东西。
我有 .csv 格式的数据:
Section Version A Version B Version C Version D
Section 1 2967 3353 495 559
Section 2 4840 4522 285 266
Section 3
Section 4
Section 5
Main emailbody
Total email
Version # Opens
A 18223
B
C
D
方法 1-(从 csv 文件手动分配数据):
S1_Click_A=2967 #(section 1, email A)
S1_Click_B=3353 #(section 1, email B)
S1_Click_C=495
S1_Click_D=559
S2_Click_A=4840
...
S5_Click_D=154
MainBody_Click_A=12408
...
MainBody_Click_D=260
TotalEmail_Click_A=13525
..
TotalEmail_Click_D=248`
#no. email opens
Open_A=18223
Open_B=18368
Open_C=18223
Open_D=18368
#to test % total click is the comparable across versions
#section 1 test
S1ab <- prop.test(x = c(S1_Click_A,S1_Click_B), n = c(Open_A,Open_B))
...
S1cd <- prop.test(x = c(S1_Click_C,S1_Click_D), n = c(Open_C,Open_D))
#section 2 test
S2ab <- prop.test(x = c(S2_Click_A,S2_Click_B), n = c(Open_A,Open_B))
...
S2cd <- prop.test(x = c(S2_Click_C,S2_Click_D), n = c(Open_C,Open_D))
#similarly for section 3,4 and 5
#Main body test
MainBodyab <- prop.test(x = c(MainBody_Click_A,MainBody_Click_B), n =
c(Open_A,Open_B))
MainBodyac <- prop.test(x = c(MainBody_Click_A,MainBody_Click_C), n =
c(Open_A,Open_C))
...
MainBodycd <- prop.test(x = c(MainBody_Click_C,MainBody_Click_D), n =
c(Open_C,Open_D))
#Total Email test
TotalEmailab <- prop.test(x = c(TotalEmail_Click_A,TotalEmail_Click_B), n
=c(Open_A,Open_B))
```
TotalEmailcd <- prop.test(x = c(TotalEmail_Click_C,TotalEmail_Click_D), n
= c(Open_C,Open_D))
#FINAL P VALUE
S1ab$p.value
S1ac$p.value
方法2
# no. email opens
open <-
c(
Open_A=18223,
Open_B=18368,
Open_C=18223,
Open_D=18368
)
s1 <- c(
S1_Click_A=2967, #(section 1, email A)
S1_Click_B=3353, #(section 1, email B)
S1_Click_C=495,
S1_Click_D=559
)
open_comb <- combn(names(open), 2)
s1_comb <- combn(names(s1), 2)
res_names <- combn(c("A", "B", "C", "D"), 2)
# to test % total click is the comparable across versions`
# section 1 test`
result1 <- list()
for(k in 1:length(open)){
result1[[paste0("s1", res_names[1, k], res_names[2, k])]] <- prop.test(x =
s1[s1_comb[,k]], n = open[open_comb[,k]])
}
result_section1 <- c (ress1$s1AB$p.value, ress1$s1AC$p.value,
ress1$s1AD$p.value, ress1$s1BC$p.value, ress1$s1BD$p.value,
ress1$s1CD$p.value)
result_section1
但是,此自动代码仅为以下组合提供 P 值:AB、AC、AD、BC,而不是 BD 和 CD。可能是因为打开的长度,即只有 4(请帮助解决)
I expect:
1. I want to read the input data directly from the csv. I mean reading the
section 1 version A data i.e 2967 then assign the same to
S1_Click_A=2967 variable and similarly for others.
2. Fix the code to provides P values only for all combination: AB, AC, AD, BC,BD and CD.
输入(数据)
structure(list(Section = structure(c(2L, 3L, 4L, 5L, 6L, 1L, 7L), .Label =
c("Main email body", "Section 1", "Section 2", "Section 3", "Section 4",
"Section 5", "Total email"), class = "factor"), Version.A = c(2967L, 4840L,
2508L, 2093L, 1117L, 12408L, 13525L), Version.B = c(3353L, 4522L, 2250L,
1333L, 925L, 11458L, 12383L), Version.C = c(495L, 285L, 228L, 209L, 186L,
282L, 271L), Version.D = c(559L, 266L, 205L, 133L, 154L, 260L, 248L)), class
= "data.frame", row.names = c(NA, -7L ))
解决方案
考虑将您的数据从原来的宽格式重塑为长格式。然后prop.test
按每个Section并跨Version的所有组合运行。下面构建了一个元素列表,其中包含prop.test
每 7 个部分的所有 6 个组合的结果(包括但不限于 p 值)。
数据
txt <- '"Section" "Version A" "Version B" "Version C" "Version D"
"Section 1" 2967 3353 495 559
"Section 2" 4840 4522 285 266
"Section 3" 2508 2250 228 205
"Section 4" 2093 1333 209 133
"Section 5" 1117 925 186 154
"Main emailbody" 12408 11458 282 260
"Total email" 13525 12383 271 248'
df <- read.table(text = txt, header = TRUE)
open_df <- data.frame(Version = c("A", "B", "C", "D"),
Open = c(18223, 18368, 18223, 18368))
reshape
+by
# RESHAPE WIDE TO LONG
rdf <- reshape(df, idvar = "Section", varying = list(names(df)[-1]),
times = names(df)[-1], v.names = "Value", timevar = "Version",
new.row.names = 1:1E5, direction = "long")
rdf$Version <- gsub("Version.", "", rdf$Version)
# SUBSET BY SECTION AND RUN prop.test ON ALL COMBS
prop_test_list <- by(rdf, rdf$Section, function(sub) {
pairs <- combn(sub$Version, 2, simplify = FALSE)
sapply(pairs, function(item)
prop.test(x = sub$Value[sub$Version %in% item],
n = open_df$Open[open_df$Version %in% item])
)
})
推荐阅读
- php - mysqli::connect 和新的 mysqli 有什么区别?
- javascript - 数组长度为零但数组不为空
- reactjs - 切换视图(路线)时的滑动过渡
- java - Java 和 Kotlin 互操作性,使用 Koltin Lambda 管理数据?
- python - 有没有一种有效的方法来对由特定值标记的行的连续子集求和?
- powershell - 使用 PowerShell 在注册表中创建事件日志会创建与日志名同名的不需要的源
- c++ - ifstream 使用和用户输入不输出要读取的内容
- tensorflow - TensorFlow 数据集:打乱文件名 VS 加载后打乱数据
- python - PySpark - 读取 SequenceFile 并将其转换为 DataFrame
- uwp - UART (PL011) Raspberry Pi 3 上的 Windows 10 IoT UWP