首页 > 解决方案 > 如何将层数据集转换为 R 中的普通数据集

问题描述

我有一个如下数据集:

Number                  Func
A01 Metabolism
B011 Suger metabolism
C                       fun_1
C                       fun_2
C                       fun_3
B012 Lipid metabolism
C                       func_4
C                       func_5
C                       func_6
A02 Degradation
B021 Suger degradation
C                       fun_7
C                       fun_8
C                       fun_9
B022 Lipid degradation
C                       fun_10
C                       fun_11
C                       fun_12
...

我想得到的数据框如下:

Level_1         Level_2                 Level_3    Func
A01 Metabolism  B011 Suger metabolism   C          fun_1
A01 Metabolism  B011 Suger metabolism   C          fun_2
A01 Metabolism  B011 Suger metabolism   C          fun_3
A01 Metabolism  B012 Lipid metabolism   C          func_4
A01 Metabolism  B012 Lipid metabolism   C          func_5
A01 Metabolism  B012 Lipid metabolism   C          func_6
A02 Degradation B021 Suger degradation  C          fun_7
A02 Degradation B021 Suger degradation  C          fun_8
A02 Degradation B021 Suger degradation  C          fun_9
A02 Degradation B022 Lipid degradation  C          fun_10
A02 Degradation B022 Lipid degradation  C          fun_11
A02 Degradation B022 Lipid degradation  C          fun_12
...

我已经搜索并尝试了实现它的方法,但仍然不可能。请问有人有什么想法吗?先感谢您。

标签: rdataset

解决方案


ifelse您可以使用一些基本的方法来创建新列,regex以将我们的Number列拆分为Level1Level2。通过tidyr::fill过滤行等于"C"和一些列重命名,我们得到了我们需要的地方。我假设在以orNumber开头的行之间的列中没有其他值,然后是“C” 。AB

library(dplyr)
library(tidyr)
library(stringr)
library(tibble)

data <- tibble(
  Number = c("A01 Metabolis", "B011 Suger metabolism", "C", "C", "C", "B012 Lipid metabolism", "C", "C", "C", "A02 Degradation", "B021 Suger degradation", "C", "C", "C", "B022 Lipid degradation", "C", "C", "C"),
  Func = c(NA_character_, NA_character_, "func1", "func2", "func3", NA_character_, "func4", "func5", "func6", NA_character_, NA_character_, "func7", "func8", "func9", NA_character_, "func10", "func11", "func12")
)

data <- data %>%
  mutate(Id_A = ifelse(str_detect(Number, "^A[0-9]{1}"), Number, NA_character_)) %>%
  mutate(Id_B = ifelse(str_detect(Number, "^B[0-9]{1}"), Number, NA_character_)) %>%
  fill(Id_A,
       Id_B) %>%
  filter(Number == "C") %>%
  select(Level1 = Id_A, Level2 = Id_B, Level3 = Number, Func = Func)

有了这个输出:

> data
# A tibble: 12 x 4
   Level1          Level2                 Level3 Func  
   <chr>           <chr>                  <chr>  <chr> 
 1 A01 Metabolis   B011 Suger metabolism  C      func1 
 2 A01 Metabolis   B011 Suger metabolism  C      func2 
 3 A01 Metabolis   B011 Suger metabolism  C      func3 
 4 A01 Metabolis   B012 Lipid metabolism  C      func4 
 5 A01 Metabolis   B012 Lipid metabolism  C      func5 
 6 A01 Metabolis   B012 Lipid metabolism  C      func6 
 7 A02 Degradation B021 Suger degradation C      func7 
 8 A02 Degradation B021 Suger degradation C      func8 
 9 A02 Degradation B021 Suger degradation C      func9 
10 A02 Degradation B022 Lipid degradation C      func10
11 A02 Degradation B022 Lipid degradation C      func11
12 A02 Degradation B022 Lipid degradation C      func12

推荐阅读