首页 > 解决方案 > 无法将数据拟合到 HMM-Learn 模型(Python3.9)

问题描述

我正在尝试将隐马尔可夫模型建模为标准普尔 500 指数的一些股票数据。

数据从雅虎财经下载,并包含在一个包含 250 个交易日数据的 CSV 文件中。一周前我有这个代码工作,但现在它似乎不起作用。

import pandas as pd
from hmmlearn import hmm
import numpy as np
from matplotlib import cm, pyplot as plt
from matplotlib.dates import YearLocator, MonthLocator

df = pd.read_csv( "SnP500_1Yhist.csv",
                   header      = 0,
                   index_col   = "Date",
                   parse_dates = True
                   )
df["Returns"] = df["Adj Close"].pct_change()
df.dropna( inplace = True )

hmm_model = hmm.GaussianHMM( n_components    =   4,
                             covariance_type =   "full",
                             n_iter          = 100
                             )               # %Create the model
df = df["Returns"]                           # %Extract the wanted column of data
training_set = np.column_stack( df )         # %Shape = [1,250]

hmm_model.fit( training_set )                # %This is where I get the error

我得到的错误是:

ValueError                                Traceback (most recent call last)
<ipython-input-51-c8f66806fad6> in <module>
      9 print(training_set.shape)
     10 print(training_set)
---> 11 hmm_model.fit(training_set)

~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/hmmlearn/base.py in fit(self, X, lengths)
    460         """
    461         X = check_array(X)
--> 462         self._init(X, lengths=lengths)
    463         self._check()
    464 

~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/hmmlearn/hmm.py in _init(self, X, lengths)
    205             kmeans = cluster.KMeans(n_clusters=self.n_components,
    206                                     random_state=self.random_state)
--> 207             kmeans.fit(X)
    208             self.means_ = kmeans.cluster_centers_
    209         if self._needs_init("c", "covars_"):

~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in fit(self, X, y, sample_weight)
   1033                                 accept_large_sparse=False)
   1034 
-> 1035         self._check_params(X)
   1036         random_state = check_random_state(self.random_state)
   1037 

~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_params(self, X)
    956         # n_clusters
    957         if X.shape[0] < self.n_clusters:
--> 958             raise ValueError(f"n_samples={X.shape[0]} should be >= "
    959                              f"n_clusters={self.n_clusters}.")
    960 

ValueError: n_samples=1 should be >= n_clusters=4.

标签: pythonpandasfinancequantitative-financehmmlearn

解决方案


“……它似乎不起作用。”

嗯,
确实如此。如果您training_set在调用该方法之前测试您的实际情况.fit(),我们无法在此处重现,您将得到报告错误的直接原因:

N_COMPONENTS = 4
ERR_MASK     = ( "ERR: training_set was smaller than the N_COMPONENTS == {0:}"
               + "requested,\n"
               + "     whereas the actual shape[0] was {1:}"
                  )
...

hmm_model = hmm.GaussianHMM( n_components    =   N_COMPONENTS,
                             covariance_type =   "full",
                             n_iter          = 100
                             )
...

( hmm_model.fit( training_set )    if training_set.shape[0] >= N_COMPONENTS
                                 else print( ERR_MASK.format(  N_COMPONENTS,
                                                               training_set.shape[0]
                                                               )
                                             )
  )
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_params(self, X)
    956         # n_clusters
    957         if X.shape[0] < self.n_clusters:
--> 958             raise ValueError(f"n_samples={X.shape[0]} should be >= "
    959                              f"n_clusters={self.n_clusters}.")
--------------------------------------------------X.shape[0]------------
--------------------------------------------------X.shape[0]------------

ValueError: n_samples=1 should be >= n_clusters=4.

fit( X, lengths = None )

    Estimate model parameters.

    An initialization step is performed before entering the EM algorithm.
       If you want to avoid this step for a subset of the parameters,
       pass proper init_params keyword argument to estimator’s constructor.

    Parameters

            X ( array-like, shape ( n_samples, n_features ) )
              – Feature matrix of individual samples.

            lengths ( array-like of integers, shape ( n_sequences, ) )
              – Lengths of the individual sequences in X.
                The sum of these should be n_samples.


推荐阅读