r - 如何用 data.frame 重写我的数据框转换
问题描述
我有一个数据框:
ID value
1 he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY
2 Attaching package: ‘magrittr’. Natural language support but running in an English locale
2 Attaching package: ‘DT’. Natural language support but running in an English locale
2 Attaching package: ‘anytime’. Natural language support but running in an English locale
3 package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information
4 Warning messages: type 'demo()' for some demos, 'help()' for on-line help
4 Warning messages: 'help.start()' for an HTML browser interface to help
如何创建它:
ID <- c(1,2,2,2,3,4,4)
value <- c("he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY",
"Attaching package: ‘magrittr’. Natural language support but running in an English locale",
"Attaching package: ‘DT’. Natural language support but running in an English locale",
"Attaching package: ‘anytime’. Natural language support but running in an English locale",
"package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information",
"Warning messages:type 'demo()' for some demos, 'help()' for on-line help",
"Warning messages:'help.start()' for an HTML browser interface to help")
df <- data.table(ID, value)
我用这段代码转换它:
df_patterns <- df %>%
mutate(pattern= stringr::str_extract(value, "\\S+\\s+\\S+\\s+\\S+"),
pattern = coalesce(stringr::str_extract(pattern, "^Attaching package:|Warning messages:"),pattern),
id_type = case_when(ID %in% c(1, 5) ~ "extra_type")
) %>%
group_by(ID, pattern) %>%
summarise(example = sample(value,1)) %>%
ungroup() %>%
mutate(pattern=coalesce(pattern, example))
输出是:
ID pattern example id_type
1 he following object he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY extra_type
2 Attaching package: Attaching package: ‘anytime’. Natural language support but running in an English locale NA
3 package ‘ggplot2’ was package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information NA
4 Warning messages: Warning messages:'help.start()' for an HTML browser interface to help NA
及其所需的输出。如您所见,我创建了新的列模式并按其分组数据表。我还添加了带有模式示例的列示例。
我怎么能用 data.table 重写这个转换?我不想使用 mutate 和其他功能,而是想使用 data.table 的功能。但我不擅长它。我试过这个,但我不知道下一步该怎么做:
df_patterns <- df[, c("pattern", "id_type") := list(
pattern = coalesce(stringr::str_extract(pattern= stringr::str_extract(value, "\\S+\\s+\\S+\\s+\\S+"), "^Attaching package:|Warning messages:"),pattern= stringr::str_extract(value, "\\S+\\s+\\S+\\s+\\S+")),
case_when(ID %in% c(1, 5) ~ "extra_type")), by = ID, pattern]
解决方案
删除除以下之外的所有依赖data.table
项应该与您的预期输出匹配(但当然会在没有设置种子的情况下有所不同):
df_patterns <-
copy(df)[, pattern := fcase(
startsWith(value, "Attaching package:"), "Attaching package:",
startsWith(value, "Warning messages:"), "Warning messages:",
rep(TRUE, nrow(df)), sub("((\\S+\\s+){2}\\S+).+", "\\1", value)
)][,
.(
example = sample(value, 1),
id_type = fifelse(ID %in% c(1,5), "extra_type", NA_character_)
),
by = .(ID, pattern)]
ID pattern example id_type
1: 1 he following object he following object is masked from ‘package:purrr’. R is free software and comes with ABSOLUTELY NO WARRANTY extra_type
2: 2 Attaching package: Attaching package: ‘DT’. Natural language support but running in an English locale <NA>
3: 3 package ‘ggplot2’ was package ‘ggplot2’ was built under R version 3.6.2. Type 'contributors()' for more information <NA>
4: 4 Warning messages: Warning messages:type 'demo()' for some demos, 'help()' for on-line help <NA>
推荐阅读
- php - 从相关帖子中排除特定类别
- php - 为什么 Google 示例中的这个 MySQL 查询不起作用?(正弦公式)
- python - 我的递归代码继续循环大约 1 次额外时间(4 次)而不是 3 次,任何解决方案
- twitter - oEmbed for Flutter
- javascript - Ajax 调用返回 500 内部服务器错误 php
- node.js - 使用 Async/Await 的 Nodemailer 电子邮件确认
- python-3.x - 如何阅读 pandas 中的希腊字符?
- node.js - 如何使用 express.js 和 axios 将数组插入到 mongodb
- android - 如何在 Kotlin 的菜单项中添加点击监听器
- r - 是否可以在 R 中创建如下图所示的核密度图?(我不是在寻找散点图!)