首页 > 解决方案 > ts() 函数 - 定义唯一的四分位数

问题描述

我有一个看起来像这样的数据集(如下)。

我试图通过按季节列将数据分组为列出的季节,将此数据转换为 ts() 对象。例如,Q1 是 SU15,Q2 是 FA15,依此类推。此外,并非所有季节都有相同数量的数据点。我试过做

 DF_ts <- ts(DF$Variable, frequency = 52, 
           start=decimal_date(ymd(DF$Date[1])))

DF_ts_2 <- aggregate(each_style_ts, nfrequency=4)

但我注意到四分位数没有按照我希望的方式分组。我还有一个非常大的数据集,我正在为此做这个,所以它需要有点自动化。下面的例子仅仅是……一个例子。我还需要确保它们是有序的:(即:2015 年夏季、2015 年秋季、2015 年冬季、2016 年春季)

Date	Season	Variable
6/20/15	SU15	67859
6/27/15	SU15	75251
7/4/15	SU15	100085
7/11/15	FA15	98760
7/18/15	FA15	95053
7/25/15	FA15	91286
8/1/15	FA15	88573
8/8/15	FA15	23084
8/15/15	FA15	31939
8/22/15	FA15	31445
8/29/15	FA15	30854
9/5/15	FA15	21890
9/12/15	FA15	29948
9/19/15	FA15	54254
9/26/15	FA15	52819
10/3/15	FA15	51974
10/10/15	WN15	55826
10/17/15	WN15	53300
10/24/15	WN15	52442
10/31/15	WN15	23084
11/7/15	WN15	31939
11/14/15	WN15	31445
11/21/15	WN15	30854
11/28/15	WN15	21890
12/5/15	WN15	29948
12/12/15	WN15	54254
12/19/15	WN15	52819
12/26/15	WN15	51974
1/2/16	WN15	55826
1/9/16	SP16	53300
1/16/16	SP16	52442
1/23/16	SP16	23084
1/30/16	SP16	31939
2/6/16	SP16	31445
2/13/16	SP16	30854
2/20/16	SP16	21890
2/27/16	SP16	29948
3/5/16	SP16	54254
3/12/16	SP16	52819
3/19/16	SP16	51974
3/26/16	SP16	55826
4/2/16	SP16	53300
		

dput 格式的数据。

DF <-
structure(list(Date = structure(c(28L, 29L, 33L, 30L, 31L, 32L, 
34L, 38L, 35L, 36L, 37L, 42L, 39L, 40L, 41L, 9L, 6L, 7L, 8L, 
10L, 14L, 11L, 12L, 13L, 18L, 15L, 16L, 17L, 2L, 5L, 1L, 3L, 
4L, 22L, 19L, 20L, 21L, 26L, 23L, 24L, 25L, 27L, 28L, 29L, 33L, 
30L, 31L, 32L, 34L, 38L, 35L, 36L, 37L, 42L, 39L, 40L, 41L, 9L, 
6L, 7L, 8L, 10L, 14L, 11L, 12L, 13L, 18L, 15L, 16L, 17L, 2L, 
5L, 1L, 3L, 4L, 22L, 19L, 20L, 21L, 26L, 23L, 24L, 25L, 27L), .Label = c("1/16/16", 
"1/2/16", "1/23/16", "1/30/16", "1/9/16", "10/10/15", "10/17/15", 
"10/24/15", "10/3/15", "10/31/15", "11/14/15", "11/21/15", "11/28/15", 
"11/7/15", "12/12/15", "12/19/15", "12/26/15", "12/5/15", "2/13/16", 
"2/20/16", "2/27/16", "2/6/16", "3/12/16", "3/19/16", "3/26/16", 
"3/5/16", "4/2/16", "6/20/15", "6/27/15", "7/11/15", "7/18/15", 
"7/25/15", "7/4/15", "8/1/15", "8/15/15", "8/22/15", "8/29/15", 
"8/8/15", "9/12/15", "9/19/15", "9/26/15", "9/5/15"), class = "factor"), 
    Season = structure(c(3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
    ), .Label = c("FA15", "SP16", "SU15", "WN15"), class = "factor"), 
    Variable = c(67859L, 75251L, 100085L, 98760L, 95053L, 91286L, 
    88573L, 23084L, 31939L, 31445L, 30854L, 21890L, 29948L, 54254L, 
    52819L, 51974L, 55826L, 53300L, 52442L, 23084L, 31939L, 31445L, 
    30854L, 21890L, 29948L, 54254L, 52819L, 51974L, 55826L, 53300L, 
    52442L, 23084L, 31939L, 31445L, 30854L, 21890L, 29948L, 54254L, 
    52819L, 51974L, 55826L, 53300L, 67859L, 75251L, 100085L, 
    98760L, 95053L, 91286L, 88573L, 23084L, 31939L, 31445L, 30854L, 
    21890L, 29948L, 54254L, 52819L, 51974L, 55826L, 53300L, 52442L, 
    23084L, 31939L, 31445L, 30854L, 21890L, 29948L, 54254L, 52819L, 
    51974L, 55826L, 53300L, 52442L, 23084L, 31939L, 31445L, 30854L, 
    21890L, 29948L, 54254L, 52819L, 51974L, 55826L, 53300L)), class = "data.frame", row.names = c(NA, 
-84L))

标签: rtime-series

解决方案


该问题从未定义所需的输出,因此我们假设我们希望找到每个年/季度的平均值Variable作为一个ts系列。

计算两个字母的季节 ,season并从中计算qtrSP季度 1、SU季度 2 等。还计算两位数的年份yy并从中计算出"yearqtr"类对象yq。然后使用聚合zoo(Variable)。(如果需要一些不同的聚合函数而不是用它替换。)最后将生成的 zoo 对象转换为 ts。yqmeanmeanmeanag

library(zoo)

season <- gsub("\\d", "", DF$Season)
qtr <- match(season, c("SP", "SU", "FA", "WN"))
yy <- sub("..", "", DF$Season)
yq <- as.yearqtr(paste(yy, qtr), "%y %q")

ag <- aggregate(zoo(DF$Variable), yq, mean)
as.ts(ag)

给予:

         Qtr1     Qtr2     Qtr3     Qtr4
2015          81065.00 53990.69 41969.31
2016 41775.00   

推荐阅读