首页 > 解决方案 > 减去日期对象?

问题描述

所以我试图简单地从 child_date 中减去survey_date,但不断收到“字符串不是标准的明确格式”错误。两列都是字符格式,那有什么问题呢?

这不起作用:

df %>% mutate(child_age = survey_date-child_date)

结构(列表(case_id = c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L),person_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), home_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6), 年 = c(2018, 2018, 2018, 2018 , 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018), 月 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),survey_date_cmc = c(1417,1417,1417,1417,1417,1417,1417,1417,1417 , 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417, 1417), mom_age = c(28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37), mom_dob_cmc = c(1081, 1081, 1081, 1081, 1081, 1081, 1081, 1081, 1081, 1081, 973, 973, 973, 973 , 973, 973, 973, 973, 973, 973),名称 = c(“b3_01”、“b3_02”、“b3_03”、“b3_04”、“b3_05”、“b3_06”、“b3_07”、“b3_08”、“b3_09”、“b3_10”、“b3_01”、“b3_02” "、"b3_03"、"b3_04"、"b3_05"、"b3_06"、"b3_07"、"b3_08"、"b3_09"、"b3_10"),值 = c(NA, NA, NA, NA, NA, NA ,NA,NA,NA,NA,1297,1297,NA,NA,NA,NA,NA,NA,NA,NA),child_date = c(NA,NA,NA,NA,NA,NA,NA,NA, NA,NA,“2008-01-01”,“2008-01-01”,NA,NA,NA,NA,NA,NA,NA,NA),survey_date = c(“2018-01-01”,“ 2018-01-01”、“2018-01-01”、“2018-01-01”、“2018-01-01”、“2018-01-01”、“2018-01-01”、“2018- 01-01", "2018-01-01”、“2018-01-01”、“2018-01-01”、“2018-01-01”、“2018-01-01”、“2018-01-01”、“2018- 01-01"、"2018-01-01"、"2018-01-01"、"2018-01-01"、"2018-01-01"、"2018-01-01")),类 = c ("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list( mom_age = c(28, 37), case_id = 1 :2, .rows = list(1:10, 11:20)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"), .下降=真))"2018-01-01", "2018-01-01")), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, - 20L),groups = structure(list( mom_age = c(28, 37), case_id = 1:2, .rows = list(1:10, 11:20)), row.names = c(NA, -2L) , 类 = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))"2018-01-01", "2018-01-01")), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, - 20L),groups = structure(list( mom_age = c(28, 37), case_id = 1:2, .rows = list(1:10, 11:20)), row.names = c(NA, -2L) , 类 = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

数据框

标签: rdataframedatedplyrcalculated-columns

解决方案


列是character类。它需要转换

library(dplyr)
df %>% 
   mutate(child_age = as.Date(survey_date) - as.Date(child_date))

为了更好地控制units,可以使用difftime

df %>%
   mutate(child_age = difftime(as.Date(child_date), as.Date(survey_date), unit = 'weeks'))

或使用intervalfromlubridate

lubridate)
df %>% 
     mutate(child_age = interval( as.Date(child_date), as.Date(survey_date))/years(1))

推荐阅读