首页 > 解决方案 > 由于可能的索引错误,时间序列函数不起作用?

问题描述

我正在使用将通过时间序列分析的 CSV 文件文件夹。为了执行测试,我决定创建几个函数,它们将通过一个 for 循环来创建对路径中每个文件中特定变量的时间序列分析。

下面我的代码可以分为三个部分:数据导入、函数创建和它通过 for-loop

我遇到的问题是我认为我的函数编码不正确。'ts' object must have one or more observations当我运行我的代码并初始化我的时间序列函数时,我收到一条错误消息:当我尝试绘制我的data_frame或特定的计数时csv_data,我唯一的选择是绘制csv_data$file_name。这是一个问题,因为我应该选择csv_data$placeholderName改为。

我编写函数的方式有问题吗?如果是这样,我该如何解决?我的代码还有其他问题吗?

注意:许多时间序列对象将使用相同的方法制作,以便更轻松地分析多个文件。例如,使用下面的示例数据,该函数将使用 5 个 placeholderName 变量创建时间序列对象。

代码:

# retrieve data from the csv files from PATH
data <- list.files("/path of 1160 csv files", pattern = ".csv", all.files = TRUE, full.names = TRUE)

# get list element for data
csv_data <- lapply(data, read.csv)

# Set the name of each list element to its respective file name.
# NOTE: full.names = FALSE to get only the file names, not the full path.
names(csv_data) <- gsub(".csv","", list.files("/path of 1160 csv files", pattern = ".csv", all.files = TRUE, full.names = FALSE), fixed = TRUE)
names(csv_data) <- gsub(", ", "-", names(csv_data), fixed = TRUE)

# Determine the amount of csvs in path
csv_length <- as.integer(length(csv_data))

# placeholderName is converted into time-series data. 
# Since dates were not mentioned in the Date format (shown as string)
# Thus general index were used (1 to 248)
# Declaring time series object

count <- 0
timeSeries <- function(count) {
  data_frame <- csv_data[count]
  data_frame[is.na(data_frame)] = 0
  placeholderName_ts = ts(data_frame$placeholderName, start = 1, end=248, frequency = 1)
  return(placeholderName_ts)
}

for (i in 1:csv_length) {
  count = i

  # time series creation
  timeSeries(i)
}

示例数据:

# placeholderName = phN
$`File-name-1`
       date            phN1   phN2      phN3     phN4         phN5
1   2020-02-15          0  8.331944  1.8722222 65.29108        NA
2   2020-02-16          1 15.045833 11.9569444 83.02963        NA
3   2020-02-17          1 15.090278 14.2013889 94.59667        NA
4   2020-02-18          5 20.806944 19.0736111 90.42332        NA
5   2020-02-19          1 13.134722 11.9388889 92.53200        NA
6   2020-02-20          0  9.240278  8.0916667 92.64821        NA
7   2020-02-21          2  5.838889 -0.8875000 64.58893        NA

任何更多信息,如果需要,将在下面添加:

标签: rfunctioncsvindexingtime-series

解决方案


你可以试试这个功能 -

timeSeries <- function(count) {
  data_frame <- csv_data[[count]]
  data_frame[is.na(data_frame)] = 0

  placeholderName_ts = ts(data_frame[[placeholderName]], 
                          start = 1, end=248, frequency = 1)
  return(placeholderName_ts)
}

result <- lapply(seq_len(csv_length), timeSeries)

推荐阅读