r - 用分隔符“:”和列号分隔列
问题描述
我有以下输入格式的巨大数据框。我正在尝试根据分隔符“:”分隔列,并将值与第一列中的列号和行值一起输出。
input <- structure(list(V1 = structure(1:2, .Label = c("a1", "a2"), class = "factor"),
V2 = structure(1:2, .Label = c("aaa-1-c:bbb-1-d:ccc:a", "www-1-c"
), class = "factor"), V3 = structure(1:2, .Label = c("cc:nnn:ttt-cc",
"cdd:aaa:pp"), class = "factor"), V4 = structure(c(1L, NA
), .Label = "aaa-1-d", class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
我试过了,但列号和值的顺序不正确。
output <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("a1", "a2 "), class = "factor"),
V2 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L, 1L, 1L), V3 = structure(c(3L,
5L, 7L, 1L, 6L, 9L, 11L, 4L, 12L, 8L, 2L, 10L), .Label = c("a",
"aaa", "aaa-1-c", "aaa-1-d", "bbb-1-d", "cc", "ccc", "cdd",
"nnn", "pp", "ttt-cc", "www-1-c"), class = "factor")), class = "data.frame", row.names = c(NA,
-12L))
任何人都可以请帮忙。谢谢!
解决方案
这是一个选项,我们将数据集从“宽”重塑为“长”(pivot_longer
从tidyr
-1.0.0 开始),然后:
在整数与match
library(dplyr)
library(tidyr)
input %>%
pivot_longer(cols = -V1, names_to = "V2", values_to = "V3",
values_drop_na = TRUE) %>%
# older versions use gather
# gather(V2, V3, -V1, na.rm = TRUE) %>%
separate_rows(V3, sep=":") %>%
group_by(V1) %>%
mutate(V2 = match(V2, unique(V2))) %>%
ungroup
# A tibble: 12 x 3
# V1 V2 V3
# <fct> <int> <chr>
# 1 a1 1 aaa-1-c
# 2 a1 1 bbb-1-d
# 3 a1 1 ccc
# 4 a1 1 a
# 5 a1 2 cc
# 6 a1 2 nnn
# 7 a1 2 ttt-cc
# 8 a1 3 aaa-1-d
# 9 a2 1 www-1-c
#10 a2 2 cdd
#11 a2 2 aaa
#12 a2 2 pp
推荐阅读
- angular - 如何在 Angular Mat 卡片内容中翻译 html 标签
- bitbucket - 如何完全控制其他人的私有 bitbucket 存储库
- javascript - 从文本字符串Javascript中提取变量
- flutter - Flutter 将堆栈跟踪写入 tombstoned
- python-3.x - python Concurrent Futures 每次都会给出不同的结果
- python - 如何将 whois.whois().creation_date 的输出导出到 csv 文件
- javascript - React-select:默认值仅在静态设置时更新
- java - 使用 Hibernate 5.4 从 SQL 查询中读取嵌套对象
- ruby-on-rails - 加密 Rails URL 中的 id 并在单独的前端访问它?
- python - 将 python virtualenv 创建到 /usr/share/ 是否安全?