首页 > 解决方案 > 如何使用宽数据格式的第二行作为额外的变量名

问题描述

我有一个数据框,其中第一行是列名(当然),但第二行是一个额外的信息和一个我想对其应用统计信息的变量/例如,第一行是植物的 ID,第二行是位置,其余行是时间序列的因变量。重要的是要注意第一个列是我的 x 轴,它代表时间。这是我的一段数据:

Days L-FCS L-DRC    L-PCH   S-PCH   S-PCH   S-SSV   M-SSV   L-SSV   S-DRC   L-MNS   L-DRC
Room  1-BR-SW 1-BR-SW   1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 2-BR-SE 2-BR-SE 2-BR-SE 
0.00    0   0   0   0   0   0   0   0   0   0   0
0.04    0   0   1   0   0   0   1   0   0   0   0
0.08    0   0   1   0   0   1   2   0   0   0   0
0.13    0   0   -1  0   0   2   3   0   0   0   0
0.17    0   0   -1  0   0   3   4   0   0   0   0
0.21    0   0   -1  0   0   4   5   0   0   0   0
0.25    0   0   -1  0   0   4   6   0   0   0   0
0.29    0   0   -2  0   0   4   6   0   0   0   0
0.33    0   0   -1  0   0   4   6   0   0   0   0
0.38    -1  0   -1  0   0   4   6   0   0   0   0
0.42    -2  0   -1  0   0   4   6   0   0   0   0
0.46    -5  0   -1  0   0   4   6   0   0   0   0
0.50    -5  0   -2  0   0   4   6   0   0   -1  0
0.54    -5  0   -2  0   0   4   6   0   0   -2  0
0.58    -6  0   -3  0   0   4   7   0   0   -3  0
0.63    -8  0   -3  0   0   4   8   0   0   -3  0
0.67    -9  0   -3  0   0   4   8   0   0   -3  0
0.71    -9  0   -3  0   0   4   11  0   -1  -3  0
0.75    -9  0   -3  0   0   4   11  0   -1  -4  0
0.79    -9  0   -3  0   0   4   13  0   -1  -5  0
0.83    -10 0   -3  0   0   4   13  0   -1  -5  0
0.88    -12 0   -3  0   0   4   13  0   -1  -5  0
0.92    -13 0   -4  0   0   4   13  0   -1  -6  0
0.96    -14 0   -5  0   -1  4   13  0   -1  -6  0
1.00    -14 0   -5  0   -1  4   13  0   -1  -6  0
1.04    -15 0   -5  0   -1  4   13  0   -2  -6  0
1.08    -16 0   -5  0   -1  4   13  0   -2  -6  0

我省略了一些行名,因为它不适合这里的 col(第一行和第二行的名称对于数字来说太宽了)

为了将来使用,我很想知道如何将任何多行用作我的数据的变量。我试图将其重塑为长格式(并且我将此数据用于其他目的的长格式)但我找不到如何重塑它,所以我也有这些列。

到目前为止我所做的是完全省略第二行,所以我只有 Days col,它的第一个调用是 0.00

再次附上 dput() :

structure(list(Days = c("Room", "0.00", "0.04", "0.08", "0.13", 
"0.17"), L.FCS = c("1-BR-SW", "0", "0", "0", "0", "0"), L.DRC = c("1-BR-SW", 
"0", "0", "0", "0", "0"), L.PCH = c("1-BR-SW", "0", "0", "0", 
"0", "0"), S.PCH = c("1-BR-SW", "0", "0", "0", "0", "0"), S.PCH.1 = c("1-BR-SW", 
"0", "0", "0", "0", "0"), S.SSV = c("1-BR-SW", "0", "0", "0", 
"0", "0"), Hodaya_M = c("1-BR-SW", "0", "1", "1", "3", "3"), 
    L.SSV = c("2-BR-SE", "0", "-1", "-1", "-2", "-2"), S.DRC = c("2-BR-SE", 
    "0", "0", "-1", "-1", "-1")), row.names = c(NA, 6L), class = "data.frame")

标签: rrows

解决方案


我建议分别提取标题信息和数据,然后将两者结合起来。

library(tidyverse)
df1_headers <- df1 %>%
  janitor::clean_names() %>%   # Reformats names and makes them unique
  slice(1) %>%   # only keep the first data row, your "location"
  mutate(across(,as.character)) %>%  # make everything character data
  pivot_longer(everything(), values_to = "location")

df1_body <- df1 %>% 
  janitor::clean_names() %>%
  slice(-1) %>%   # remove only the first row
  mutate(across(,as.numeric),       # make numeric
         row = row_number()) %>%    # add row number in case days not unique / ordered
  pivot_longer(-c(row, days))

df_long <- df1_body %>% left_join(df1_headers)

这种格式对于进一步的分析和可视化应该非常灵活,例如

ggplot(df_long, aes(days, value, color = name)) +
  geom_line() +
  facet_wrap(~location)

在此处输入图像描述

源数据

df1 <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
              Days = c("Room","0","0.04","0.08",
                       "0.13","0.17","0.21","0.25","0.29","0.33","0.38",
                       "0.42","0.46","0.5","0.54","0.58","0.63","0.67",
                       "0.71","0.75","0.79","0.83","0.88","0.92","0.96","1",
                       "1.04","1.08"),
           `L-FCS` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","-1","-2","-5","-5","-5",
                       "-6","-8","-9","-9","-9","-9","-10","-12","-13",
                       "-14","-14","-15","-16"),
           `L-DRC` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0","0"),
           `L-PCH` = c("1-BR-SW","0","1","1","-1",
                       "-1","-1","-1","-2","-1","-1","-1","-1","-2",
                       "-2","-3","-3","-3","-3","-3","-3","-3","-3","-4",
                       "-5","-5","-5","-5"),
           `S-PCH` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0","0"),
           `S-PCH` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","-1","-1","-1",
                       "-1"),
           `S-SSV` = c("1-BR-SW","0","0","1","2",
                       "3","4","4","4","4","4","4","4","4","4","4",
                       "4","4","4","4","4","4","4","4","4","4","4","4"),
           `M-SSV` = c("1-BR-SW","0","1","2","3",
                       "4","5","6","6","6","6","6","6","6","6","7",
                       "8","8","11","11","13","13","13","13","13","13",
                       "13","13"),
           `L-SSV` = c("2-BR-SE","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0","0"),
           `S-DRC` = c("2-BR-SE","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","-1","-1","-1","-1","-1","-1","-1","-1",
                       "-2","-2"),
           `L-MNS` = c("2-BR-SE","0","0","0","0",
                       "0","0","0","0","0","0","0","0","-1","-2","-3",
                       "-3","-3","-3","-4","-5","-5","-5","-6","-6",
                       "-6","-6","-6"),
           `L-DRC` = c(NA,0L,0L,0L,0L,0L,0L,0L,
                       0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,
                       0L,0L,0L,0L,0L,0L)

推荐阅读