r - 用`-`分割r中的月/年字符串
问题描述
我有一个专栏如下;
fiscal_year_end
1 1231
2 1231
3 1231
4 1231
5 202
6 1231
7 1231
8 202
9 1231
10 927
它们对应于月份,12-31
即9-27
和20-2
。
我正在尝试将它们以这种格式放置,但似乎无法正确处理。
我试过str_replace_all(df$fiscal_year_end, "(?<=^\\d{2}|^\\d{4})", "-")
使用这个stringr
包,但它没有像我预期的那样出来。
我在哪里错了?
数据:
structure(list(fiscal_year_end = c(1231L, 1231L, 1231L, 1231L,
202L, 1231L, 1231L, 202L, 1231L, 927L, 228L, 1231L, 1231L, 1231L,
1231L, 928L, 1231L, 1231L, 930L, 1231L, 1231L, 628L, 1231L, 1231L,
1228L, 930L, 1231L, 1231L, 1231L, 1231L, 927L, 630L, 1231L, 202L,
1231L, 1231L, 1231L, 1231L, 927L, 930L, 1231L, 1231L, 1231L,
1231L, 228L, 928L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1228L, 1231L, 1231L, 1231L, 1231L,
131L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 831L, 1231L, 102L,
1231L, 1231L, 1231L, 1130L, 1231L, 1228L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1031L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 203L, 1231L, 1231L, 1231L,
1231L, 1231L, 1229L, 1231L, 1231L, 1231L, 426L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 202L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1229L, 1231L, 1231L, 630L,
1231L, 1231L, 1209L, 1231L, 1231L, 1231L, 728L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 630L, 1231L, 1231L, 1231L, 1231L,
1231L, 1231L, 727L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L,
1231L, 630L, 1231L, 1231L, 1231L, 1130L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 930L, 930L, 1231L, 1231L, 331L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1031L, 1229L, 1231L,
1231L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L,
831L, 630L, 831L)), row.names = c(NA, -200L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")
编辑:
datadate fiscal_year_end
1 2012-08-31 831
2 2017-01-31 201
3 1999-12-31 1231
4 2009-02-28 228
5 2010-12-31 1231
6 2005-12-31 1231
7 <NA> 630
8 2010-09-30 928
9 2009-09-30 930
10 2018-01-31 201
11 2017-12-31 1231
12 2004-12-31 1231
解决方案
我们可以separate
在格式化为 4 位后
library(dplyr)
library(tidyr)
df1 %>%
mutate(fiscal_year_end = sprintf("%04d", fiscal_year_end)) %>%
separate(fiscal_year_end, c("month", "day"), sep= 2)
或者使用负索引separate
df1 %>%
separate(fiscal_year_end, c("month", "day"), sep= -2)
或仅 base R
使用,我们用于sub
创建分隔符(仅使用单个捕获组)并将其转换为两列 data.frameread.csv
out <- read.csv(text = sub("(\\d{2})$", ",\\1", df1[[1]]), header = FALSE,
col.names = c("month", "day"), stringsAsFactors = FALSE)
head(out, 5)
# month day
#1 12 31
#2 12 31
#3 12 31
#4 12 31
#5 2 2
推荐阅读
- python - zarr 何时压缩块并将其推送到底层存储系统?
- flutter - 在颤振应用程序中找到资产文件的路径
- vue.js - vue.config.js devServer 未考虑在内
- objective-c - 约束单窗口MacOS Objective-C的纵横比
- javascript - React Native Render 错误缺少分号
- javascript - 将脚本注入 CefSharp
- python - 如何从 JSON 请求中解析特定数据
- rust - 如何在非异步函数上等待 Rust Future
- powershell - 如何通过 FullName 属性加快对 Win32_UserAccount 过滤的查询
- python - 关于线程模块和树莓派的问题