r - 在 r 中使用通配符匹配重命名级别
问题描述
是否可以在 R 中使用通配符匹配替换级别?
我有一个名为年经验值的列"0 YEAR, 9 MONTHS"
, "1 YEAR, 0 MONTHS"
, "1 YEAR, 1 MONTHS"
, "1 YEAR, 10 MONTHS"
, , "1 YEAR, 9 MONTHS"
, "10 YEAR, 0 MONTHS"
, "10 YEAR, 1 MONTHS"
, "10 YEAR, 10 MONTHS"
, 同样近 600 个级别;我希望将 all "0 YEAR...
as "<1"
, 1 YEAR
as "1"
, more than 5 YEAR
as存储">5"
起来,总共给出 5 个级别。
grep("9 YEAR", data$Service, ignore.case = TRUE, value = TRUE)
试过了mutate
,我不能完全缩小每个级别,我希望最终只能获得 5 或 6 个级别。
解决方案
首先让我们生成一些随机样本数据
set.seed(2018)
x <- factor(paste(sample(0:10, 10, replace = T), "YEAR,", sample(0:11, 10, replace = T), "MONTHS"))
df <- data.frame(years_of_experience = x)
# years_of_experience
#1 3 YEAR, 4 MONTHS
#2 5 YEAR, 7 MONTHS
#3 0 YEAR, 11 MONTHS
#4 2 YEAR, 8 MONTHS
#5 5 YEAR, 9 MONTHS
#6 3 YEAR, 7 MONTHS
#7 6 YEAR, 3 MONTHS
#8 1 YEAR, 6 MONTHS
#9 10 YEAR, 8 MONTHS
#10 6 YEAR, 9 MONTHS
然后我们可以使用case_when
to binyears_of_experience
基于年份
df.new <- df %>%
mutate(
yr = as.numeric(gsub(" YEAR.*$", "", x)),
bucket = case_when(
yr < 1 ~ "<1",
yr >= 5 ~ ">=5",
TRUE ~ as.character(yr)))
df.new
# years_of_experience yr bucket
#1 3 YEAR, 4 MONTHS 3 3
#2 5 YEAR, 7 MONTHS 5 >=5
#3 0 YEAR, 11 MONTHS 0 <1
#4 2 YEAR, 8 MONTHS 2 2
#5 5 YEAR, 9 MONTHS 5 >=5
#6 3 YEAR, 7 MONTHS 3 3
#7 6 YEAR, 3 MONTHS 6 >=5
#8 1 YEAR, 6 MONTHS 1 1
#9 10 YEAR, 8 MONTHS 10 >=5
#10 6 YEAR, 9 MONTHS 6 >=5
我们可以转换df.new$bucket
为factor
具有 5 个级别的
df.new %>% mutate(bucket = as.factor(bucket)) %>% pull(bucket)
# [1] 3 >=5 <1 2 >=5 3 >=5 1 >=5 >=5
#Levels: <1 >=5 1 2 3
推荐阅读
- java - 从jenkins运行时使用selenium上传excel文件不起作用
- symfony - How to share Redis Connection / RedisCache in Symfony between services / controllers?
- bash - Check if list of patterns from file1.csv is present in file2.csv and change records in file2.csv
- google-chrome-extension - 通过 Promise 传递消息
- android - 如何将标志传递给 D8 工具?
- java - javassist_123 与 javassist.util.proxy.Proxy 不兼容?
- javascript - Cannot access object fields using lodash: Error cannot access objects element
- python - 如何使 Flask 中的多个按钮以不同的方式处理它们?
- r - 如何将子模型拟合并组合到单个 stanfit 对象中?
- eclipse - 带有排除项的 Eclipse 搜索