r - Error with Multi-variate Time Series clustering using dtwclust - There are missing values in the series
问题描述
I am working on a time series clustering using dtwclust package in R. I am using the fundamental dataset from S&P500 stocks. The Head of the data frame is as follows :
'>print(head(DFCleanPF))
ticker reportperiod price ebitusd epsusd pe marketcap gp
1 ADS 2019-06-29 140.13 1761800000 16.13 8.44 7340500696 3185200000
2 ADS 2019-03-30 174.98 1818600000 17.42 9.78 9273534921 3295100000
3 ADS 2018-12-30 150.08 1894300000 17.56 8.49 8175386182 3570300000
4 ADS 2018-09-29 236.16 1735100000 17.21 13.67 12975478451 3439600000
5 ADS 2018-06-29 233.20 1692000000 16.01 14.58 12919222400 3398900000
6 ADS 2018-03-30 212.86 1652700000 14.55 14.64 11805497214 3402700000
> '
Thus I have the quarterly filling data sets for few stocks ordered by ticker and dates. After that I perform normalize function and try to derive the cluster from the aforementioned data frame.
'DFCleanPF.norm <- BBmisc::normalize(DFCleanPF, method="standardize")
#show(DFCleanPF.norm)
missmap(DFCleanPF.norm)
mvc <- tsclust(DFCleanPF.norm, k = 4L, distance = "gak", seed = 390,
args = tsclust_args(dist = list(sigma = 100)))'
missmap(DFCleanPF.norm) returns a DF with zero missing value.
However I encounter an error while there is no missing value in the data set. The error is as follows :
'Error in FUN(X[[i]], ...) : There are missing values in the series
In addition: There were 50 or more warnings (use warnings() to see the first 50)
>'
I am new in time series clustering. My goad is to find an ideal method to cluster the stocks to determine the risk and return based on the key parameters captured over time( for multiple records. Therefore I appreciate helps in: 1. Create an ideal data frame from the source dataset mentioned above and get rid of the errors. 2. The ideal dtwclust function with an optimum K value to get the best combination of cluster.
Please find the sample data sets. We can consider other clustering models which could help us achieving the goals. Not sure if the current data frame be treated as an input or we need to treat the symbols as numeric variables and create few matrices against each symbol.
Here is the sample data set :
解决方案
推荐阅读
- python - Python:用 re.sub 替换列表中的多个特定单词
- powershell - PSRemoting 参数 RunAsAdministrator '参数集无法解析' 错误
- python - 在字典列表中查找和编辑值的最快方法
- .htaccess - 修改 .htaccess 以直接在浏览器中打开 .txt.gz 文件
- django - 从 bitbucket 管道部署到谷歌应用引擎时如何运行 django 迁移?
- java - Spring数据重复键值
- java - Elatsic 缓存 - redis - 写入 ~30-40 K 键后无法连接 - 连接池问题
- azure-devops - Azure DevOps:管道无法使用来自工件的 NuGet 包
- java - 无法使用源选项编译 Java 11 代码
- r - 变异变量以使用管道和用户定义的函数创建是/否