python - 如何用生命线包估计 cox 模型?
问题描述
我想估计 cox 模型,但是当我尝试运行代码时,出现错误。似乎关于 coxphfitter() 的这个问题。这里有没有人可以解决这个问题。我认为生命线库不能用 ML 方法计算系数。所以在这里我复制错误和示例代码。我应该说我编写代码只是为了举例,输入不是真实的。
代码
df_l=df[['Observed','HighLTV','Liquidation']]
df_c=df[['Observed','HighLTV','Cure']]
cph_l=CoxPHFitter()
cph_c=CoxPHFitter()
cph_l.fit(df_l,'Observed',event_col='Liquidation')
cph_c.fit(df_c,'Observed',event_col='Cure')
beta_cure=float('{:.3f}'.format((cph_c.params_[0])))
beta_liquidation=float('{:.3f}'.format((cph_l.params_[0])))
错误
LinAlgError Traceback (most recent call last)
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _newton_rhapson_for_efron_model(self, X, T, E, weights, entries, initial_point, step_size, precision, show_progress, max_steps)
1497 try:
-> 1498 inv_h_dot_g_T = spsolve(-h, g, assume_a="pos", check_finite=False)
1499 except (ValueError, LinAlgError) as e:
~\anaconda3\lib\site-packages\scipy\linalg\basic.py in solve(a, b, sym_pos, lower, overwrite_a, overwrite_b, debug, check_finite, assume_a, transposed)
247 overwrite_b=overwrite_b)
--> 248 _solve_check(n, info)
249 rcond, info = pocon(lu, anorm)
~\anaconda3\lib\site-packages\scipy\linalg\basic.py in _solve_check(n, info, lamch, rcond)
28 elif 0 < info:
---> 29 raise LinAlgError('Matrix is singular.')
30
LinAlgError: Matrix is singular.
During handling of the above exception, another exception occurred:
ConvergenceError Traceback (most recent call last)
<ipython-input-145-7cb92b8db8fe> in <module>
8 k.append(list(map(lambda x: random.choice(o),range(10))))
9 s=pd.DataFrame(k[i],columns=df.columns)
---> 10 c.append(CCR(s))
<ipython-input-144-da506c585def> in CCR(data)
30 cph_c=CoxPHFitter()
31 cph_l.fit(df_l,'Observed',event_col='Liquidation')
---> 32 cph_c.fit(df_c,'Observed',event_col='Cure')
33 beta_cure=float('{:.3f}'.format((cph_c.params_[0])))
34 beta_liquidation=float('{:.3f}'.format((cph_l.params_[0])))
~\anaconda3\lib\site-packages\lifelines\utils\__init__.py in f(model, *args, **kwargs)
52 def f(model, *args, **kwargs):
53 cls.set_censoring_type(model, cls.RIGHT)
---> 54 return function(model, *args, **kwargs)
55
56 return f
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in fit(self, df, duration_col, event_col, show_progress, initial_point, strata, step_size, weights_col, cluster_col, robust, batch_mode, timeline, formula, entry_col)
274 """
275 self.strata = utils.coalesce(strata, self.strata)
--> 276 self._model = self._fit_model(
277 df,
278 duration_col,
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _fit_model(self, *args, **kwargs)
595 def _fit_model(self, *args, **kwargs):
596 if self.baseline_estimation_method == "breslow":
--> 597 return self._fit_model_breslow(*args, **kwargs)
598 elif self.baseline_estimation_method == "spline":
599 return self._fit_model_spline(*args, **kwargs)
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _fit_model_breslow(self, *args, **kwargs)
608 )
609 if utils.CensoringType.is_right_censoring(self):
--> 610 model.fit(*args, **kwargs)
611 return model
612 else:
~\anaconda3\lib\site-packages\lifelines\utils\__init__.py in f(model, *args, **kwargs)
52 def f(model, *args, **kwargs):
53 cls.set_censoring_type(model, cls.RIGHT)
---> 54 return function(model, *args, **kwargs)
55
56 return f
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in fit(self, df, duration_col, event_col, show_progress, initial_point, strata, step_size, weights_col, cluster_col, robust, batch_mode, timeline, formula, entry_col)
1225 )
1226
-> 1227 params_, ll_, variance_matrix_, baseline_hazard_, baseline_cumulative_hazard_, model = self._fit_model(
1228 X_norm,
1229 T,
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _fit_model(self, X, T, E, weights, entries, initial_point, step_size, show_progress)
1353 show_progress: bool = True,
1354 ):
-> 1355 beta_, ll_, hessian_ = self._newton_rhapson_for_efron_model(
1356 X, T, E, weights, entries, initial_point=initial_point, step_size=step_size, show_progress=show_progress
1357 )
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _newton_rhapson_for_efron_model(self, X, T, E, weights, entries, initial_point, step_size, precision, show_progress, max_steps)
1505 )
1506 elif isinstance(e, LinAlgError):
-> 1507 raise exceptions.ConvergenceError(
1508 """Convergence halted due to matrix inversion problems. Suspicion is high collinearity. {0}""".format(
1509 CONVERGENCE_DOCS
ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular.
解决方案
给定的清楚地说明了问题:
ConvergenceError:由于矩阵反转问题,收敛停止。怀疑是高度共线性。请参阅生命线文档中的以下提示: https ://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix 是单数。
如果没有真实数据,我无法提供任何进一步的建议。但是生命线文档在这个问题上给出了很多建议:
由于矩阵反转问题导致收敛停止:这意味着您的数据集中存在高度共线性。也就是说,一列等于1个或多个其他列的线性组合。此错误的一个常见原因是虚拟分类变量但未删除数据集中的列或某些层次结构。尝试通过以下方式找到关系:向模型添加惩罚器,例如:CoxPHFitter(penalizer=0.1).fit(...) 直到模型收敛。在 print_summary() 中,具有高共线性的系数在 coefs 列中将具有大(绝对)幅度。使用方差膨胀因子(VIF)来寻找冗余变量。查看数据集的相关矩阵,或
这很可能不是由生命线引起的错误,而是您的数据或您如何将模型应用于数据。
推荐阅读
- c - 在C中的两个给定字符之间打印所有ASCII字符
- ocaml - OCaml - 为什么数组引用是默认的
- ios - 如何知道用户是否共享了我的应用程序?
- google-chrome - Aframe 激光控制不适用于 chrome canary 中的 WMR 耳机
- javascript - 离子/角度问题:尝试区分“[object Object]”时出错
- tensorflow.js - 错误:io_utils.ts:116 未捕获(承诺中) RangeError:Float32Array 的字节长度应为 4 的倍数
- r - 为什么矩阵到栅格显示旋转?
- python - 如何在 Django 的 ImageField 中保存用户输入 img
- visual-studio - 如何在 Visual Studio 中删除当前行周围的矩形?
- docker - Jenkins 没有在 docker 中启动(包括 Dockerfile)