python - 无法将数据拟合到 HMM-Learn 模型(Python3.9)
问题描述
我正在尝试将隐马尔可夫模型建模为标准普尔 500 指数的一些股票数据。
数据从雅虎财经下载,并包含在一个包含 250 个交易日数据的 CSV 文件中。一周前我有这个代码工作,但现在它似乎不起作用。
import pandas as pd
from hmmlearn import hmm
import numpy as np
from matplotlib import cm, pyplot as plt
from matplotlib.dates import YearLocator, MonthLocator
df = pd.read_csv( "SnP500_1Yhist.csv",
header = 0,
index_col = "Date",
parse_dates = True
)
df["Returns"] = df["Adj Close"].pct_change()
df.dropna( inplace = True )
hmm_model = hmm.GaussianHMM( n_components = 4,
covariance_type = "full",
n_iter = 100
) # %Create the model
df = df["Returns"] # %Extract the wanted column of data
training_set = np.column_stack( df ) # %Shape = [1,250]
hmm_model.fit( training_set ) # %This is where I get the error
我得到的错误是:
ValueError Traceback (most recent call last)
<ipython-input-51-c8f66806fad6> in <module>
9 print(training_set.shape)
10 print(training_set)
---> 11 hmm_model.fit(training_set)
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/hmmlearn/base.py in fit(self, X, lengths)
460 """
461 X = check_array(X)
--> 462 self._init(X, lengths=lengths)
463 self._check()
464
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/hmmlearn/hmm.py in _init(self, X, lengths)
205 kmeans = cluster.KMeans(n_clusters=self.n_components,
206 random_state=self.random_state)
--> 207 kmeans.fit(X)
208 self.means_ = kmeans.cluster_centers_
209 if self._needs_init("c", "covars_"):
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in fit(self, X, y, sample_weight)
1033 accept_large_sparse=False)
1034
-> 1035 self._check_params(X)
1036 random_state = check_random_state(self.random_state)
1037
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_params(self, X)
956 # n_clusters
957 if X.shape[0] < self.n_clusters:
--> 958 raise ValueError(f"n_samples={X.shape[0]} should be >= "
959 f"n_clusters={self.n_clusters}.")
960
ValueError: n_samples=1 should be >= n_clusters=4.
解决方案
问:“……它似乎不起作用。”
嗯,
确实如此。如果您training_set
在调用该方法之前测试您的实际情况.fit()
,我们无法在此处重现,您将得到报告错误的直接原因:
N_COMPONENTS = 4
ERR_MASK = ( "ERR: training_set was smaller than the N_COMPONENTS == {0:}"
+ "requested,\n"
+ " whereas the actual shape[0] was {1:}"
)
...
hmm_model = hmm.GaussianHMM( n_components = N_COMPONENTS,
covariance_type = "full",
n_iter = 100
)
...
( hmm_model.fit( training_set ) if training_set.shape[0] >= N_COMPONENTS
else print( ERR_MASK.format( N_COMPONENTS,
training_set.shape[0]
)
)
)
~/Git Projects/Aiguille Systems/allocationmodel/macromodelv2_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_params(self, X)
956 # n_clusters
957 if X.shape[0] < self.n_clusters:
--> 958 raise ValueError(f"n_samples={X.shape[0]} should be >= "
959 f"n_clusters={self.n_clusters}.")
--------------------------------------------------X.shape[0]------------
--------------------------------------------------X.shape[0]------------
ValueError: n_samples=1 should be >= n_clusters=4.
fit( X, lengths = None )
Estimate model parameters.
An initialization step is performed before entering the EM algorithm.
If you want to avoid this step for a subset of the parameters,
pass proper init_params keyword argument to estimator’s constructor.
Parameters
X ( array-like, shape ( n_samples, n_features ) )
– Feature matrix of individual samples.
lengths ( array-like of integers, shape ( n_sequences, ) )
– Lengths of the individual sequences in X.
The sum of these should be n_samples.
推荐阅读
- c# - 调试模式错误
- kendo-grid - Kendo Grid - 根据列单元格数据值呈现自定义 HTML
- spring - Thymeleaf 安全方言 - hasPermission()
- c++ - Eigen 中的断言失败,C++
- javascript - JavaScript postMessage 和多个可传输对象
- c++ - 带有类型参数的模板模板参数?
- build - Flatpak - 打包二进制文件
- ssl - curl:(35)错误:1408F10B:SSL例程:ssl3_get_record:错误的版本号
- python - 在内部范围内执行时,不会调用多处理池 apply_async 的工作程序和回调
- sql - 当数字存储在短文本字段中时,MS Access Query 可查找序号中的空白