首页 > 解决方案 > 将样本随机分配到 R 中的组中

问题描述

我有一个大数据集,其中包含来自不同城市的每个人的一些人口统计信息。我想创建一个变量(例如类),将城市内同一年龄组的个人分配到大约 20(~15-25)人的组中。这是生成我的数据示例的 R 代码:

    set.seed(10)
    ID = seq(1:10000)
    df <- as.data.frame(ID)
    df$City <- cut(runif(10000, 0,100),breaks = c(0,7,20,35,47,55,61,74,85,91,100),include.lowest = T,right = F, labels = c("City 1","City 2","City 3","City 4","City 5","City 6","City 7","City 8","City 9","City 10"))
    df$Age_Group <- cut(runif(10000, 0,100),breaks = c(0,10,20,30,40,50,60,70,80,90,101),include.lowest = T,right = F, labels = c("0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90+"))
    table(df$Age_Group,df$City)

我想df$class将相似年龄组和城市的个人分组。阶级价值观需要在所有年龄组和城市之后继续。我怎样才能做到这一点?

谢谢

标签: rstatistics

解决方案


caret软件包可以帮助您解决这个问题。它将尝试创建 n 个分区,同时尊重诸如此类的类别,Age并且City考虑到输入的不平衡性质,它不会是完美的。但是你可以选择分区的数量(又名折叠),看看什么适合你的需要,我选择了 5 个。

require(caret)
#> Loading required package: caret
#> Loading required package: lattice
#> Loading required package: ggplot2
set.seed(10)
ID = seq(1:10000)
df <- as.data.frame(ID)
df$City <- cut(runif(10000, 0,100),breaks = c(0,7,20,35,47,55,61,74,85,91,100),include.lowest = T,right = F, labels = c("City 1","City 2","City 3","City 4","City 5","City 6","City 7","City 8","City 9","City 10"))
df$Age_Group <- cut(runif(10000, 0,100),breaks = c(0,10,20,30,40,50,60,70,80,90,101),include.lowest = T,right = F, labels = c("0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90+"))
# table(df$Age_Group, df$City)
df$class <- caret::createFolds(df$Age_Group,
                               5,
                               FALSE)
table(df$class, df$City, df$Age_Group)
#> , ,  = 0-9
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     18     27     28     29     15      8     22     21      9      21
#>   2     16     29     31     27      9     10     19     23     12      22
#>   3     12     20     26     26     20     11     30     22     12      18
#>   4      9     27     24     28     13     12     24     31     12      17
#>   5     10     22     36     31     13     13     23     24     11      15
#> 
#> , ,  = 10-19
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     13     22     13     22     11      9     38     18     22      23
#>   2     12     23     34     21     13      7     26     22     16      16
#>   3     14     25     30     25     13      7     30     23     11      12
#>   4     13     29     31     19     22     17     23     16      9      11
#>   5     17     22     24     23     18     20     22     15      9      20
#> 
#> , ,  = 20-29
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     14     28     31     24     12     10     35     22     12      14
#>   2      9     32     22     29     15      9     30     19     18      19
#>   3     18     35     25     17     14     13     22     18     19      21
#>   4     15     26     33     25     11     15     37     20      1      19
#>   5     14     20     31     32     12     14     23     16     18      21
#> 
#> , ,  = 30-39
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     13     28     29     22     24     14     24     19     18      21
#>   2     15     28     31     32     19     14     21     25     16      12
#>   3     17     30     28     22     20      9     22     29     14      21
#>   4     18     26     33     23     10     16     23     24     13      26
#>   5     13     26     40     24     12      8     25     21     20      23
#> 
#> , ,  = 40-49
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     16     26     41     16     19     13     19     18     16      22
#>   2     18     23     36     32      8     12     28     15     16      18
#>   3     19     27     29     23     11     16     33     13     15      21
#>   4     13     21     30     29     18     18     26     19      9      23
#>   5      9     34     27     27     17      9     27     22     11      23
#> 
#> , ,  = 50-59
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     21     28     28     21     15     10     25     26     21       8
#>   2     12     17     24     25     20     20     25     32     14      13
#>   3     19     27     35     30     10      8     19     24     13      17
#>   4     19     23     30     23     19     11     19     25     16      18
#>   5     15     37     38     18     10     15     23     25      9      13
#> 
#> , ,  = 60-69
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     12     29     31     25     14     15     12     27     11      20
#>   2     12     22     29     25     18     14     22     20     11      24
#>   3     11     27     30     21     15     16     22     23     15      16
#>   4     17     21     32     20     12     12     24     28     11      19
#>   5     12     27     37     31     11     11     17     16     17      18
#> 
#> , ,  = 70-79
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     10     23     27     36     13      7     29     20     13      17
#>   2     25     19     27     27     18      8     25     17     10      20
#>   3     12     17     27     26     13      5     34     24     14      23
#>   4     12     28     34     22     15      8     28     21     14      13
#>   5     17     30     40     23     13     11     21     17      7      16
#> 
#> , ,  = 80-89
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     10     27     26     34     17     16     23     19      8      16
#>   2     17     19     33     16     19     19     16     31     12      14
#>   3     14     24     27     23     14     10     25     23     12      23
#>   4     12     25     30     33     14     16     19     14     12      20
#>   5     24     24     25     26     20      6     18     20     13      20
#> 
#> , ,  = 90+
#> 
#>    
#>     City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 City 10
#>   1     16     21     30     25     20     15     31     23     10      11
#>   2     15     25     34     28     16     13     25     19     10      17
#>   3     12     23     30     26     19     14     24     23     13      18
#>   4     13     30     30     24     15     10     23     25     14      18
#>   5     13     16     24     24     23     17     30     23     18      15

reprex 包于 2020-05-08 创建(v0.3.0)


推荐阅读