首页 > 解决方案 > 为变量级别组合的每个实例创建一个“实例编号”

问题描述

我需要计算每个变量组合的实例数,并将其转换为新变量。例如,

set.seed(2)
V1 <- sample(rep(c(1:3),10))
V2 <- rep_len(c("small", "large"),30)
temp <- cbind(V1,V2)

产生一个数据框,其前十行如下所示:

       V1  V2     
 [1,] "3" "small"
 [2,] "3" "large"
 [3,] "3" "small"
 [4,] "1" "large"
 [5,] "2" "small"
 [6,] "2" "large"
 [7,] "1" "small"
 [8,] "3" "large"
 [9,] "3" "small"
[10,] "3" "large"

我需要一个新变量来计算到目前为止该变量组合在数据框中出现的次数。结果应该类似于:

       V1  V2      V3 
 [1,] "3" "small" "1"
 [2,] "3" "large" "1"
 [3,] "3" "small" "2"
 [4,] "1" "large" "1"
 [5,] "2" "small" "1"
 [6,] "2" "large" "1"
 [7,] "1" "small" "1"
 [8,] "3" "large" "2"
 [9,] "3" "small" "3"
[10,] "3" "large" "3"

有什么有效的方法来做到这一点?(我不一定需要它们是字符向量;我只需要一个通用解决方案。)

标签: rdataframerecode

解决方案


我们可以在转换为之后按'V1','V2'分组,data.frame然后创建新列作为行序列row_number()

library(dplyr)
as.data.frame(temp) %>%
      group_by(V1, V2) %>%
      mutate(V3 = row_number())

数据

temp <- structure(list(V1 = c(3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L), 
    V2 = c("small", "large", "small", "large", "small", "large", 
    "small", "large", "small", "large")), class = "data.frame", 
    row.names = c(NA, 
-10L))

推荐阅读