r - 转换具有 2 列和行分隔符的 R 数据框
问题描述
我有一个数据框,它有两列“id”和“detail”(下面的df_current)。我需要按id对dataframe进行分组,并将文件展开,使列变为“Interface1”、“Interface2”等,并且接口列下的内容是每次接口值出现时的立即值。本质上是“!” 用作分隔符,但在输出中不需要它。
所需的输出如下所示:“df_needed_from_current”。
我尝试了多种方法(group_by、spread、reshape、dcast 等),但无法让它发挥作用。任何帮助将不胜感激!
示例当前数据框(要在下面创建的代码):
ID | 细节 |
---|---|
1 | ! |
1 | 接口1 |
1 | 一种 |
1 | b |
1 | ! |
1 | 接口2 |
1 | 一种 |
1 | b |
2 | ! |
2 | 接口1 |
2 | 一种 |
2 | b |
2 | C |
2 | ! |
2 | 接口2 |
2 | 一种 |
3 | ! |
3 | 接口1 |
3 | 一种 |
3 | b |
3 | C |
3 | d |
df_current <- data.frame(
id = c("1","1","1","1","1","1","1","1","2",
"2","2","2","2","2","2","2","3","3",
"3","3","3","3","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4",
"5","5","5","5","5","5","5","5","5",
"5","5","5","5"),
detail = c("!", "Interface1","a","b","!",
"Interface2","a","b","!","Interface1",
"a","b","c","!","Interface2","a",
"!", "Interface1","a","b","c","d",
"!", "Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b","c","!","Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b"))
需要数据框(创建代码):
ID | 接口1 | 接口2 | 接口3 |
---|---|---|---|
1 | 一种 | 一种 | 不适用 |
1 | b | b | 不适用 |
2 | 一种 | 一种 | 不适用 |
2 | b | 不适用 | 不适用 |
2 | C | 不适用 | 不适用 |
3 | 一种 | 不适用 | 不适用 |
3 | b | 不适用 | 不适用 |
3 | C | 不适用 | 不适用 |
3 | d | 不适用 | 不适用 |
df_needed_from_current <- data.frame(
id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
)
解决方案
我们删除 'detail' 值所在的行"!"
,然后创建一个新列 'interface',其中仅包含来自 'detail' 的前缀为 'Interface' 的值,使用fill
from用前一个非 NAtidyr
填充元素,其中的行'detail' 值与 'interface' 列不同,使用(from ) 创建行序列 id 并使用重塑为 'wide' 格式NA
filter
rowid
data.table
pivot_wider
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
filter(detail != "!") %>%
mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
group_by(id) %>%
fill(interface) %>%
ungroup %>%
filter(detail != interface) %>%
mutate(rn = rowid(id, interface)) %>%
pivot_wider(names_from = interface, values_from = detail) %>%
select(-rn)
# A tibble: 15 x 4
# id Interface1 Interface2 Interface3
# <chr> <chr> <chr> <chr>
# 1 1 a a <NA>
# 2 1 b b <NA>
# 3 2 a a <NA>
# 4 2 b <NA> <NA>
# 5 2 c <NA> <NA>
# 6 3 a <NA> <NA>
# 7 3 b <NA> <NA>
# 8 3 c <NA> <NA>
# 9 3 d <NA> <NA>
#10 4 a a a
#11 4 b b b
#12 4 <NA> c c
#13 5 a a a
#14 5 b b b
#15 5 <NA> c <NA>
推荐阅读
- android - 未能将价值纳入日志消息?Kotlin - Firebase
- android - CodePush React Native 在 versionName 上引发错误
- c++ - 如何在 ncursesw 中使用和更新面板
- apache-spark - 如何在一定时间内运行 Spark 结构化流式传输作业?
- python - 如何使用 pytorch visdom 在同一窗口中绘制散点图和线图
- node.js - VS Code - 启动调试时出现“runtimeVersion”错误提示
- java - 如果仅更新到 @ElementCollection 字段,则 JPA 实体侦听器不会更新 @LastModifiedDate
- drupal - 如何在使用负载均衡器时修复 Drupal 9 Base url 问题
- c - 如何使用 libuv 投票接收蚊子消息?
- apache-kafka - 将数据重新发送给客户端的 Pub-Sub 机制