首页 > 解决方案 > R中的聚类抽样

问题描述

我想了解这个脚本中发生了什么。为什么在整群抽样中需要均值、标准差?rnorm(200, mean=7, sd=1)在这个data.frame的上下文中是什么意思?

#make this example reproducible 
set.seed(1)  

#create data frame
df <- data.frame(tour = rep(1:10, each=20),
                 experience = rnorm(200, mean=7, sd=1))  

#view first six rows of data frame
head(df)  

#randomly choose 4 tour groups out of the 10
clusters <- sample(unique(df$tour), size=4, replace=F)
  
#define sample as all members who belong to one of the 4 tour groups
cluster_sample <- df[df$tour %in% clusters,]  

#view how many customers came from each tour
table(cluster_sample$tour) 

标签: rcluster-analysiscluster-computing

解决方案


推荐阅读