r - 反向摘要以扩展数据框中的逗号分隔字符串
问题描述
我有以下数据框
group = c("cat", "dog", "horse")
value = c("1", "2", "3")
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian" )
df = data.frame(group, value, list)
df
group value list
1 cat 1 siamese,burmese,balinese
2 dog 2 corgi,sheltie,collie
3 horse 3 arabian,friesian,andalusian
并试图实现这一目标:
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
我知道如何总结一个数据框,但我现在意识到我不知道如何用逗号分隔的字符串“取消总结”一个。
解决方案
data.frame(
group = c("cat", "dog", "horse"),
value = c("1", "2", "3"),
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
stringsAsFactors = FALSE
) -> xdf
tidyverse
:
tidyr::separate_rows(xdf, list, sep=",")
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
基数 R:
do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
枪战:
microbenchmark::microbenchmark(
unnest = transform(xdf, list = strsplit(list, ",")) %>%
tidyr::unnest(list),
separate_rows = tidyr::separate_rows(xdf, list, sep=","),
base = do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## unnest 3689.890 4280.7045 6326.231 4881.160 6428.508 16670.715 100
## separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528 100
## base 872.343 975.1615 1589.915 1099.391 1660.324 6663.132 100
我一直对tidyr
操作的可怕表现感到惊讶。
推荐阅读
- mongodb - How to find a result and apply localization in MongoDB?
- javascript - 为什么找不到':server'和':client'?
- angular - 在 HttpInterceptor 上捕获取消/中止请求。角8
- sql - SQL 按日期获取最接近的值
- mysql - 在 INSERT 上带有触发器的表未锁定以进行 INSERTS
- php - php在上传之前将2个上传的文件重命名为特定名称
- java - 如何捕获异常并存储在 ArrayList 中
- r - 变量 cyl 作为原子向量
- javascript - 从字符串列表创建动态 SVG
- inno-setup - 我可以在 [Setup] 部分使用什么替代方法来代替 Inno Setup 中的自定义消息?