首页 > 解决方案 > 从一月到当年和上一年的最新可用月份的总和

问题描述

我正在尝试找到一种方法来为当前和上一年做一年至今。例如,对于本年度,我将从一月到最新可用的总和,然后是上一年的相同月份,在这种情况下是一月到七月。如果最新的可用月份是 2020 年 7 月,那么我希望 2019 年 1 月到 7 月相加,2020 年 1 月到 7 月相加。

我已将脚本缩减为准系统并添加了 dput():

library(tidyverse)
  
  #Get the data for table 1
  data1 <- read_csv("test-table.csv")
  data1
  
  dput(data1)
  
  data1 <- data1 %>%
    select(DATE, TC, VALUE) 

  dput(data1)
 structure(list(DATE = structure(c(18444, 18444, 18444), class = "Date"), 
  TC = c("Canada", "Canada", "Canada"), VALUE = c(141772, 113414, 
  100351)), row.names = c(NA, -3L), spec = structure(list(cols = list(
  REF_DATE = structure(list(), class = c("collector_character", 
  "collector")), GEO = structure(list(), class = c("collector_character", 
  "collector")), DGUID = structure(list(), class = c("collector_character", 
  "collector")), `Traveller characteristics` = structure(list(), class = c("collector_character", 
  "collector")), UOM = structure(list(), class = c("collector_character", 
  "collector")), UOM_ID = structure(list(), class = c("collector_double", 
  "collector")), SCALAR_FACTOR = structure(list(), class = c("collector_character", 
  "collector")), SCALAR_ID = structure(list(), class = c("collector_double", 
  "collector")), VECTOR = structure(list(), class = c("collector_character", 
  "collector")), COORDINATE = structure(list(), class = c("collector_double", 
  "collector")), VALUE = structure(list(), class = c("collector_double", 
  "collector")), STATUS = structure(list(), class = c("collector_logical", 
  "collector")), SYMBOL = structure(list(), class = c("collector_logical", 
  "collector")), TERMINATED = structure(list(), class = c("collector_logical", 
  "collector")), DECIMALS = structure(list(), class = c("collector_double", 
  "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

标签: rdataframedplyr

解决方案


一旦数据整理好,就可以简单filter地删除您不感兴趣的月份,通过groupandyear获取每个s的med 总数。tcsummarizesumvaluetc

要获取动态月份列表,请使用 ed 中的月份unique值以仅包含最新年份 ( )。然后我们可以将它用作我们最终的df。months_dffiltermaxfiltersummary

library(tidyverse)

# tidy up the data
tidy_data <- data1 %>%
  separate(`DATE    TC  VALUE`, into = c("date", 'tc', 'value'), sep = '\t') %>%
  separate(date, into = c('year', 'month'), sep = '-') %>%
  mutate_at(.vars = c('year','month','value'), .funs = as.integer) 

# filter for latest year
months_df <- tidy_data %>%
  filter(year == max(tidy_data$year))

# use months_df to feed the filter
summary <- tidy_data %>%
  filter(month %in% unique(months_df$month)) %>%
  group_by(year, tc) %>%
  summarize(total = sum(value, na.rm = TRUE))
   year tc       total
  <int> <chr>    <int>
1  2019 TC-1  18271577
2  2019 TC-2  14094089
3  2019 TC-3   9415440
4  2020 TC-1   4340588
5  2020 TC-2   3431912
6  2020 TC-3   2551697

编辑:更新为具有基于最近一年的动态月份列表。我敢肯定有一种更优雅的方式,但这应该可行。


推荐阅读