首页 > 解决方案 > 从 XML 转换表中重塑数据

问题描述

我有一个从 XML 转换为 csv 的数据框。现在的问题是我需要用这些数据制作一个 excel 表,但数据是一团糟。我想知道您是否可以帮助找到解决问题的 R 代码。

我详细解释了这个问题。想象一下,如果数据集是这样的:

student.data <- data.frame(id = c(1:17),
                       student_id = c(1111,"","","","","","","","","2222","","","","","","",""),
                       exam_id =c("",10,10,20,20,20,30,40,40,"",10,10,10,20,30,40,40), 
                       status = c("","AAA","BBB","CCC","DDD","FFF","GGG","AAA","GGG","","BBB","HHH","MMM","FFF","DDD","GGG","GGG"))

结果必须是:

在此处输入图像描述 我知道这有点复杂,但提前感谢您的帮助。

标签: rxmlreshapedata-miningdata-cleaning

解决方案


我们可以转换空格("")。'student_id'中的元素到NAna_if),然后用fill相邻的非NA元素替换NA元素,按'student_id','exam_id'分组,获取unique'status'中非空白的元素("")并粘贴将其转换为单个字符串 ( toString),filter去掉任何有空格的行并将输出重塑为“宽”格式pivot_wider

library(dplyr)
library(tidyr)
library(purrr)
student.data %>% 
  mutate(student_id = na_if(student_id, "")) %>%
 fill(student_id) %>%
 group_by(student_id, exam_id) %>% 
 summarise(status  =  toString(unique(status[status!= '']))) %>% 
 filter_at(vars(exam_id, status), any_vars(. != '')) %>% 
 pivot_wider(names_from = exam_id, values_from = status)
# A tibble: 2 x 5
# Groups:   student_id [3]
#  student_id `10`          `20`          `30`  `40`    
#  <fct>      <chr>         <chr>         <chr> <chr>   
#1 1111       AAA, BBB      CCC, DDD, FFF GGG   AAA, GGG
#2 2222       BBB, HHH, MMM FFF           DDD   GGG     

推荐阅读