首页 > 解决方案 > 将列的组合转换为某种可解释的变量

问题描述

我想将列的组合变成某种可解释的变量。对于每个 id,因子的 3 个级别在三列中重复。对于变量之间的所有组合,我想获得一个列表,当我有 lsit 时,我想知道每个组合可以找到多少次。例如,当 q1 和 q2 相同时,它应该返回“A”。然后A出现XX次。有人有建议吗?谢谢!!

id <- 1:10
set.seed(1)
q1 <- sample(1:3, 10, replace=TRUE) 
set.seed(2)
q2 <- sample(1:3, 10, replace=TRUE) 
set.seed(2)
q3 <- sample(1:3, 10, replace=TRUE) 

df <- data.frame(id,q1,q2,q3)
df
df
   id q1 q2 q3
1   1  1  1  1
2   2  2  3  3
3   3  2  2  2
4   4  3  1  1
5   5  1  3  3
6   6  3  3  3
7   7  3  1  1
8   8  2  3  3
9   9  2  2  2
10 10  1  2  2
if df$q1=="1" & df$q2=="1" print A
if df$q1=="1" & df$q2=="2" print B
if df$q1=="1" & df$q2=="3" print C
if df$q1=="2" & df$q2=="3" print D
if df$q1=="2" & df$q2=="2" print E
if df$q1=="3" & df$q2=="3" print F
if df$q2=="1" & df$q2=="1" print G
if df$q2=="1" & df$q2=="2" print H

response <- save(print A, print B, print C and so on....)
length(A)
length(B)
and so on...

标签: r

解决方案


我认为这应该做你想要的,使用base R。我希望我理解你想要的输出。我基本上将每对列组合成它自己的变量comb.var[, i]output$fctsummary()

代码:

# dimensions of df
n = nrow(df) #rows
p = ncol(df) #columns

# unique pairs of q columns
pairs.n = choose(p - 1, 2) # number of unique pairs
pairs = combn(1:(p - 1), 2) # matrix of those pairs

# data frame of NAs of proper size
comb.var <- matrix(NA, nrow = n, ncol = pairs.n)

for(combo in 1:ncol(pairs)){
  i = pairs[1, combo]
  j = pairs[2, combo]
  # get the right 2 columns from df
  qi = df[, i + 1] 
  qj = df[, j + 1]
  # combine into 1 variable
  comb.var[, combo] <- paste(qi, qj, sep = "_")
}

# clean up the output: turn out.M into vector and add id columns
output = data.frame(data.frame(id = rep(df$id, times = pairs.n),
                               qi = rep(pairs[1, ], each = n),
                               qj = rep(pairs[2, ], each = n),
                               val = as.vector(comb.var)))
# combine variables again
output$fct = with(output, paste(qi, qj, val, sep = "."))
# count number of different outputs
uniq.n = length(unique(output$fct))
# re-label the factor
output$fct <- factor(output$fct, labels = LETTERS[1:uniq.n])
# count the group members
summary(output$fct)

推荐阅读