首页 > 解决方案 > 在从列名解析信息并使用它从特定列收集信息的同时在 R 中重塑表

问题描述

我给了我这个组织得很糟糕的数据表,其中有数百列(子集在下面给出)

列的名称用点分隔,其中第一个字段包含有关一种对象类型的信息(例如 Item123、object_AB 等),没有任何命名约定。这些列也没有特定的顺序。其他列共享对象字段的类型,并且还具有该对象的某些属性的名称(例如颜色、制造商等)。

Item123.type.value  Item123.mass.value  Item123.color.value object_AB.type.value  object_AB.mass.value  object_AB.color.value
Desk  11.2  blue  Chair 2.3 orange
Desk 14.2 red Sofa  22  grey
Armchair  23.3  black  Monitor 2.2 white

编辑:添加 dput() 结构:

structure(list(Item123.type.value = structure(c(2L, 2L, 1L),
levels = c("Armchair", "Desk"), class = "factor"), Item123.mass.value = structure(1:3,
levels = c("11.2", "14.2", "23.3"), class = "factor"), Item123.color.value = structure(c(2L,
3L, 1L), levels = c("black", "blue", "red"), class = "factor"),
object_AB.type.value = structure(c(1L, 3L, 2L), levels = c("Chair",
"Monitor", "Sofa"), class = "factor"), object_AB.mass.value = structure(c(2L,
3L, 1L), levels = c("2.2", "2.3", "22"), class = "factor"),
object_AB.color.value = structure(c(2L, 1L, 3L), levels = c("grey",
"orange", "white"), class = "factor")), row.names = c(NA_integer_,
-3L), class = "data.frame")

我需要将表转换成这样的(行的顺序无关紧要):

type  name  mass  color
Item123  Desk  11.2  blue
Item123  Desk  14.2  red
object_AB  Chair 2.3 orange
object_AB  Sofa  22  grey
Item123  Armchair  23.3  black
object_AB  Monitor 2.2 white

我真的很感激我能得到的任何帮助!!

标签: rdata.tablereshape

解决方案


df我会建议这种方法,使用您添加的数据可能会是最长和无聊的。该代码在您的列名中查找特定模式,对其进行整形并最终合并所有:

library(tidyverse)
#Code
df %>% select(contains('type')) %>%
  mutate(id=1:n()) %>%
  pivot_longer(-id) %>%
  separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
  select(-c(V2,V3)) %>%
  rename(Value1=value) %>%
  left_join(df %>% select(contains('mass')) %>%
              mutate(id=1:n()) %>%
              pivot_longer(-id) %>%
              separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
              select(-c(V2,V3)) %>%
              rename(Value2=value)) %>%
  left_join(df %>% select(contains('color')) %>%
              mutate(id=1:n()) %>%
              pivot_longer(-id) %>%
              separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
              select(-c(V2,V3)) %>%
              rename(Value3=value))

输出:

# A tibble: 6 x 5
     id V1        Value1   Value2 Value3
  <int> <chr>     <chr>     <dbl> <chr> 
1     1 Item123   Desk       11.2 blue  
2     1 object_AB Chair       2.3 orange
3     2 Item123   Desk       14.2 red   
4     2 object_AB Sofa       22   grey  
5     3 Item123   Armchair   23.3 black 
6     3 object_AB Monitor     2.2 white 

推荐阅读