首页 > 解决方案 > 将行分成不同的列并在 R 中连接它们

问题描述

我有一个重复此模式的 2000 多行数据框: 在此处输入图像描述

我需要的是与此类似的东西:

change_order  material   error_message
C601987       600099882  Invalid Model for Material, Invalid Material for Model
C601987       600099883  Invalid Model for Material, Invalid Material for Model

我已经能够得到一个只有变更单和材料编号的数据框,但我不知道如何组合以无效开头的行。任何建议将不胜感激。

用于再现数据的请求信息。

structure(list(change_order = c("C601987", "C601987", "C601987", 
"C601987", "C601987", "C601987", "C601987", "C601987", "C601987", 
"C601987", "C601987", "C601987", "C601987", "C601987", "C601987"
), error_message = c("Material:000000000600099882 - Component:000000000600134722: Error Message - E", 
"- Invalid Model for Material", "- Invalid Material for Model", 
"Material:000000000600099882 - Component:000000000600138896: Error Message - E", 
"- Invalid Model for Material", "- Invalid Material for Model", 
"Material:000000000600099882 - Component:000000000600134722: Error Message - E", 
"- Invalid Model for Material", "- Invalid Material for Model", 
"Material:000000000600099882 - Component:000000000600138896: Error Message - E", 
"- Invalid Model for Material", "- Invalid Material for Model", 
"Material:000000000600099883 - Component:000000000600134722: Error Message - E", 
"- Invalid Model for Material", "- Invalid Material for Model"
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))

标签: r

解决方案


The general idea is to first split the values we need out into their own columns (material and error message). Once we have those parts in their own columns, we have the grouping variables necessary to aggregate to your desired summary. Note: if you need component broken out as well, you would essentially do the same thing that I demonstrate with Material.

 library(tidyverse)
 library(stringr)

 data %>% 
    # separate material number into its own column, I coerce to numeric to trim the leading zeros
    mutate(material = as.numeric(gsub("^Material:(\\d+).*$", "\\1", error_message))) %>% 
    # pull the error messages into their own column
    mutate(Errors = ifelse(is.na(material), error_message, NA)) %>%
    group_by(change_order) %>%
    # fill the material ID down with the group so it can be matched with the Errors
    fill(material) %>% 
    distinct() %>% # if you care about components, don't use this line
    group_by(change_order, material) %>%
    filter(!is.na(Errors)) %>% 
    # concatenate the error messages into one row
    summarize(error_message = str_c(Errors, collapse = ", ")) %>%
    # clean up the dashes in the error messages
    mutate(error_message = gsub("- ", "", error_message, fixed = TRUE))

# A tibble: 2 x 3
# Groups:   change_order [1]
  change_order  material error_message                                         
  <chr>            <dbl> <chr>                                                 
1 C601987      600099882 Invalid Model for Material, Invalid Material for Model
2 C601987      600099883 Invalid Model for Material, Invalid Material for Model

推荐阅读