首页 > 解决方案 > seq.int(0, to0 - from, by) 中的错误:'to' 必须是 R 中的有限数时间序列

问题描述

我进行时间序列分析。例如数据

df2=structure(list(supplier = c("TKP", "S7", "Travelfusion", "MyAgent", 
"S7", "TKP", "Travelfusion", "MyAgent", "TKP", "S7", "MyAgent", 
"Travelfusion", "S7", "TKP", "Travelfusion", "MyAgent", "TKP", 
"S7", "Travelfusion", "MyAgent"), date = c("2021-01-06", "2021-01-06", 
"2021-01-06", "2021-01-06", "2021-01-06", "2021-01-06", "2021-01-06", 
"2021-01-06", "2021-01-06", "2021-01-06", "2021-01-06", "2021-01-06", 
"2021-01-06", "2021-01-06", "2021-01-06", "2021-01-06", "2021-01-06", 
"2021-01-06", "2021-01-06", "2021-01-06"), hour = c(18L, 18L, 
18L, 18L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 
21L, 22L, 22L, 22L, 22L), base_price = c(4770, 49881, 244.45, 
0, 39253, 13168, 101.1, 0, 4156, 12946, 0, 0, 51737, 54711, 0, 
0, 23875, 41853, 52.61, 0)), row.names = c(NA, 20L), class = "data.frame")

我的代码

u_supplier<-unique(df2$supplier)
u_supplier
for(i in 1:length(u_supplier)) {

  s_df<-df[df$supplier==u_supplier[i],]

  date_time <- apply(X = s_df[,c('date','hour')],MARGIN = 1, FUN = function(x) {
    x = paste(as.vector(x),collapse = "")
    return(x)
  })
  dt_index <- seq(from = as.POSIXct(date_time[1],format = "%m-%d-%Y %H"),
                  to  = as.POSIXct(date_time[length(date_time)],format = "%m-%d-%Y %H"),
                  by = "hour")

  ts_data <- xts(x = s_df$base_price, order.by = dt_index)
  names(ts_data)<-c('base_price')

  m<-holt(y = ts_data,h = 24*3)

  ts_data$hlt = m$fitted

  pr_dt_index <- seq(from = as.POSIXct(date_time[length(date_time)],format = "%m-%d-%Y %H")+hours(1),
                     to  = as.POSIXct(date_time[length(date_time)],format = "%m-%d-%Y %H")+hours(24*3),
                     by = "hour")
  pr_s_dt<-cbind(supplier = u_supplier[i],
                 date = gsub(" ","",format(pr_dt_index, "%e/%e/%Y")),
                 hour = hour(pr_dt_index),
                 weekday = as.POSIXlt(pr_dt_index)$wday,
                 base_price = round(m$mean,2))

  write.csv(x = data.frame(pr_s_dt),file = paste0(u_supplier[i],"_results.csv"),row.names = F,quote = F)

  p<-plot(ts_data, main = paste(u_supplier[i],'\nMAE=',round(MAE(y_pred = ts_data$hlt,y_true = ts_data$base_price),2)))
  print(p)
  
}

我不明白为什么我会收到这个错误

Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

我在这里看到了类似的主题和解决方案,但这对我没有帮助。如何在我的数据中修复此错误?

标签: rdplyrdata.table

解决方案


当您从脚本中的特定行收到错误时,请查看该代码行中使用的数据。当你看到你认为的与 R 所说的有很大不同时,大多数像这样的问题都会自我解决(快得多!)。

首先,您的代码中有两个错误:

  1. 替换paste(..., collapse="")paste(..., collapse=" "),尤其是因为您特别依赖该空间。
  2. 将您的as.POSIXct格式固定为"%Y-%m-%d %H".

演练,替换dfdf2

for在循环 ( )的第一遍i < 1,我看到:

  s_df<-df2[df2$supplier==u_supplier[i],]
  date_time <- apply(X = s_df[,c('date','hour')],MARGIN = 1, FUN = function(x) {
    x = paste(as.vector(x),collapse = "")
    return(x)
  })
  dt_index <- seq(from = as.POSIXct(date_time[1],format = "%m-%d-%Y %H"),
                  to  = as.POSIXct(date_time[length(date_time)],format = "%m-%d-%Y %H"),
                  by = "hour")
# Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

深入研究价值观:

as.POSIXct(date_time[1],format = "%m-%d-%Y %H")
#  1 
# NA 

date_time[1]
#              1 
# "2021-01-0618" 
  1. 2021-01-06不是"%m-%d-%Y"
  2. 小时前没有空位

改为尝试:

date_time <- apply(X = s_df[,c('date','hour')],MARGIN = 1, FUN = function(x) {
  x = paste(as.vector(x), collapse = " ")
  ##                                  ^ add this space
  return(x)
})
as.POSIXct(date_time, format = "%Y-%m-%d %H")
#                         1                         6                         9                        14 
# "2021-01-06 18:00:00 EST" "2021-01-06 19:00:00 EST" "2021-01-06 20:00:00 EST" "2021-01-06 21:00:00 EST" 
#                        17 
# "2021-01-06 22:00:00 EST" 

从那里

dt_index <- seq(from = as.POSIXct(date_time[1],format = "%Y-%m-%d %H"),
                to  = as.POSIXct(date_time[length(date_time)],format = "%Y-%m-%d %H"),
                by = "hour")  ##   update two formats   ^^^            ^^^
dt_index
# [1] "2021-01-06 18:00:00 EST" "2021-01-06 19:00:00 EST" "2021-01-06 20:00:00 EST" "2021-01-06 21:00:00 EST"
# [5] "2021-01-06 22:00:00 EST"

推荐阅读