首页 > 解决方案 > 将一致间隔的行移动到列 R

问题描述

使用一些商业劳工统计数据 ( https://www.bls.gov/regions/mid-atlantic/data/producerpriceindexconcrete_us_table.htm )。我已经从这个 url 刮掉了表格,并试图把它变成一个整洁的格式。这是一个工作示例:

Commodity                       jan   feb   mar
Nonmetallic mineral products         
2020                            257.2 258.1 258.5   
2021                            262.6 263.4 264.4
Concrete ingredients
2020                            316.0 316.9 317.8
2021                            328.4 328.4 328.4
Construction gravel
2020                            359.2 360.7 362.1
2021                            375.0 374.7 374.1

如何将 2020 和 2021 行放入“Year”列,并将 jan、feb、mar 等放入“Month”列,如下所示?

Commodity                     Month Year Value
Nonmetallic mineral products  jan   2020 257.2
Nonmetallic mineral products  feb   2020 258.1
Concrete ingredients          jan   2020 316.0
Concrete ingredients          jan   2021 328.4

标签: rdata-structuresdplyrtidy

解决方案


我们可以使用tidyverse将数据转换为所需的格式。

  1. 通过对逻辑向量进行累积和 ( ) 创建分组列“grp”,cumsum即“商品”列中存在字母
  2. 用于mutate创建“年份”,通过replace将第一个元素设置为并通过使用值NA更新它来修改“商品”first
  3. 删除第一行slice
  4. ungroup并将数据重塑为长格式pivot_longer
library(dplyr)
library(stringr)
library(tidyr)
df1 %>% 
    group_by(grp = cumsum(str_detect(Commodity, "[A-Za-z]"))) %>% 
    mutate(Year = replace(Commodity,  1, NA), 
           Commodity = first(Commodity)) %>%     
    slice(-1) %>%
    ungroup %>% 
    select(-grp) %>%
    pivot_longer(cols = jan:mar, names_to = 'Month')

-输出

# A tibble: 18 x 4
   Commodity                    Year  Month value
   <chr>                        <chr> <chr> <dbl>
 1 Nonmetallic mineral products 2020  jan    257.
 2 Nonmetallic mineral products 2020  feb    258.
 3 Nonmetallic mineral products 2020  mar    258.
 4 Nonmetallic mineral products 2021  jan    263.
 5 Nonmetallic mineral products 2021  feb    263.
 6 Nonmetallic mineral products 2021  mar    264.
 7 Concrete ingredients         2020  jan    316 
 8 Concrete ingredients         2020  feb    317.
 9 Concrete ingredients         2020  mar    318.
10 Concrete ingredients         2021  jan    328.
11 Concrete ingredients         2021  feb    328.
12 Concrete ingredients         2021  mar    328.
13 Construction gravel          2020  jan    359.
14 Construction gravel          2020  feb    361.
15 Construction gravel          2020  mar    362.
16 Construction gravel          2021  jan    375 
17 Construction gravel          2021  feb    375.
18 Construction gravel          2021  mar    374.

数据

df1 <- structure(list(Commodity = c("Nonmetallic mineral products", 
"2020", "2021", "Concrete ingredients", "2020", "2021", "Construction gravel", 
"2020", "2021"), jan = c(NA, 257.2, 262.6, NA, 316, 328.4, NA, 
359.2, 375), feb = c(NA, 258.1, 263.4, NA, 316.9, 328.4, NA, 
360.7, 374.7), mar = c(NA, 258.5, 264.4, NA, 317.8, 328.4, NA, 
362.1, 374.1)), class = "data.frame", row.names = c(NA, -9L))

推荐阅读