首页 > 解决方案 > 重新排列数据:从上面的内线添加星期数

问题描述

我可能需要一些帮助来重新排列 R 中的数据。

我的数据如下所示:

"sometext for week 1"
"headertext", "1900", "1910", "1920"
"data1", 1,2,3
"data2", 2,2,2
"data3", 0,0,1
"sometext for week 2"
"headertext", "1900", "1910", "1930"
"data1", 0,0,3
"data2", 1,1,1
"data3", 1,0,0

我现在使用 read.csv 将其导入 R 中。

我需要安排我的数据进行绘图。我可以在 R 之外很容易地做到这一点。python中的代码如下所示:

data = open("mydata.csv", 'r')
week = None
for line in data:
    if line[0:8]=="sometext":
        week = line.split(" ")[-1]
    else:
        print (week.stip() + "," + line)

从那里开始也很容易在python中格式化它

"data1", 1900, 1, 1, 
"data1", 1900, 2, 0
"data1", 1910, 1, 2, 
"data1", 1910, 2, 0
"data1", 1920, 1, 3, 
"data1", 1920, 2, 3
...

但我想最好跳过任何额外的外部步骤并在 R 内部进行。

有关如何在 R 中执行此操作的任何建议?

我应该使用任何函数或库吗?

标签: r

解决方案


library(tidyverse)

d <- read_lines('"sometext for week 1"
"headertext", "1900", "1910", "1920"
"data1", 1,2,3
"data2", 2,2,2
"data3", 0,0,1
"sometext for week 2"
"headertext", "1900", "1910", "1930"
"data1", 0,0,3
"data2", 1,1,1
"data3", 1,0,0') 

d1 <- d %>%
  split(cumsum(str_detect(., "week \\d"))) %>% # Use Regex to detect where subtables begin and split vector based on that. Results in list of the subtables
  set_names(map_chr(., ~ .x[[1]])) %>% # Set names of the list vector to the first line of the subtables
  map(~ read.csv(text = .x, skip = 1)) %>% # Parse each subtable with read.csv(), skipping the first line.
  bind_rows(.id = "week") %>% # Bind the subtables together into one. Names of the list of tables become the week variable
  as_tibble() # Convert to tibble, the tidyverse version of the data frame. Mostly for better readability of the console output
  
d1
#> # A tibble: 6 × 6
#>   week                      headertext X1900 X1910 X1920 X1930
#>   <chr>                     <chr>      <int> <int> <int> <int>
#> 1 "\"sometext for week 1\"" data1          1     2     3    NA
#> 2 "\"sometext for week 1\"" data2          2     2     2    NA
#> 3 "\"sometext for week 1\"" data3          0     0     1    NA
#> 4 "\"sometext for week 2\"" data1          0     0    NA     3
#> 5 "\"sometext for week 2\"" data2          1     1    NA     1
#> 6 "\"sometext for week 2\"" data3          1     0    NA     0

reprex 包(v2.0.1)于 2021-09-08 创建


推荐阅读