r - 将一致间隔的行移动到列 R
问题描述
使用一些商业劳工统计数据 ( https://www.bls.gov/regions/mid-atlantic/data/producerpriceindexconcrete_us_table.htm )。我已经从这个 url 刮掉了表格,并试图把它变成一个整洁的格式。这是一个工作示例:
Commodity jan feb mar
Nonmetallic mineral products
2020 257.2 258.1 258.5
2021 262.6 263.4 264.4
Concrete ingredients
2020 316.0 316.9 317.8
2021 328.4 328.4 328.4
Construction gravel
2020 359.2 360.7 362.1
2021 375.0 374.7 374.1
如何将 2020 和 2021 行放入“Year”列,并将 jan、feb、mar 等放入“Month”列,如下所示?
Commodity Month Year Value
Nonmetallic mineral products jan 2020 257.2
Nonmetallic mineral products feb 2020 258.1
Concrete ingredients jan 2020 316.0
Concrete ingredients jan 2021 328.4
解决方案
我们可以使用tidyverse
将数据转换为所需的格式。
- 通过对逻辑向量进行累积和 ( ) 创建分组列“grp”,
cumsum
即“商品”列中存在字母 - 用于
mutate
创建“年份”,通过replace
将第一个元素设置为并通过使用值NA
更新它来修改“商品”first
- 删除第一行
slice
ungroup
并将数据重塑为长格式pivot_longer
library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
group_by(grp = cumsum(str_detect(Commodity, "[A-Za-z]"))) %>%
mutate(Year = replace(Commodity, 1, NA),
Commodity = first(Commodity)) %>%
slice(-1) %>%
ungroup %>%
select(-grp) %>%
pivot_longer(cols = jan:mar, names_to = 'Month')
-输出
# A tibble: 18 x 4
Commodity Year Month value
<chr> <chr> <chr> <dbl>
1 Nonmetallic mineral products 2020 jan 257.
2 Nonmetallic mineral products 2020 feb 258.
3 Nonmetallic mineral products 2020 mar 258.
4 Nonmetallic mineral products 2021 jan 263.
5 Nonmetallic mineral products 2021 feb 263.
6 Nonmetallic mineral products 2021 mar 264.
7 Concrete ingredients 2020 jan 316
8 Concrete ingredients 2020 feb 317.
9 Concrete ingredients 2020 mar 318.
10 Concrete ingredients 2021 jan 328.
11 Concrete ingredients 2021 feb 328.
12 Concrete ingredients 2021 mar 328.
13 Construction gravel 2020 jan 359.
14 Construction gravel 2020 feb 361.
15 Construction gravel 2020 mar 362.
16 Construction gravel 2021 jan 375
17 Construction gravel 2021 feb 375.
18 Construction gravel 2021 mar 374.
数据
df1 <- structure(list(Commodity = c("Nonmetallic mineral products",
"2020", "2021", "Concrete ingredients", "2020", "2021", "Construction gravel",
"2020", "2021"), jan = c(NA, 257.2, 262.6, NA, 316, 328.4, NA,
359.2, 375), feb = c(NA, 258.1, 263.4, NA, 316.9, 328.4, NA,
360.7, 374.7), mar = c(NA, 258.5, 264.4, NA, 317.8, 328.4, NA,
362.1, 374.1)), class = "data.frame", row.names = c(NA, -9L))
推荐阅读
- r - 在R中按冬季而不是年份分组月份
- javascript - Discord.js 审核日志
- phpstorm - 如何只从远程服务器下载特定文件而不是整个项目?
- angular - 模态关闭后以角度更新视图
- java - 在 Eclipse 中创建 Maven 项目时出错
- sails.js - 如何在sails js中集成管理面板?
- java - 如何在 Artifactory 上上传的选定快照上运行 Jenkins 构建
- sql - Postgres 索引统计
- java - JAVA SESSION Cookie 问题
- sonarqube - Checkmarx SonarQube 插件 8.60 中的身份验证失败