首页 > 解决方案 > 如何用生命线包估计 cox 模型?

问题描述

我想估计 cox 模型,但是当我尝试运行代码时,出现错误。似乎关于 coxphfitter() 的这个问题。这里有没有人可以解决这个问题。我认为生命线库不能用 ML 方法计算系数。所以在这里我复制错误和示例代码。我应该说我编写代码只是为了举例,输入不是真实的。

代码

df_l=df[['Observed','HighLTV','Liquidation']]
    df_c=df[['Observed','HighLTV','Cure']]
    cph_l=CoxPHFitter()
    cph_c=CoxPHFitter()
    cph_l.fit(df_l,'Observed',event_col='Liquidation')
    cph_c.fit(df_c,'Observed',event_col='Cure')
    beta_cure=float('{:.3f}'.format((cph_c.params_[0])))
    beta_liquidation=float('{:.3f}'.format((cph_l.params_[0])))
    

错误

LinAlgError                               Traceback (most recent call last)
~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _newton_rhapson_for_efron_model(self, X, T, E, weights, entries, initial_point, step_size, precision, show_progress, max_steps)
   1497             try:
-> 1498                 inv_h_dot_g_T = spsolve(-h, g, assume_a="pos", check_finite=False)
   1499             except (ValueError, LinAlgError) as e:

~\anaconda3\lib\site-packages\scipy\linalg\basic.py in solve(a, b, sym_pos, lower, overwrite_a, overwrite_b, debug, check_finite, assume_a, transposed)
    247                            overwrite_b=overwrite_b)
--> 248         _solve_check(n, info)
    249         rcond, info = pocon(lu, anorm)

~\anaconda3\lib\site-packages\scipy\linalg\basic.py in _solve_check(n, info, lamch, rcond)
     28     elif 0 < info:
---> 29         raise LinAlgError('Matrix is singular.')
     30 

LinAlgError: Matrix is singular.

During handling of the above exception, another exception occurred:

ConvergenceError                          Traceback (most recent call last)
<ipython-input-145-7cb92b8db8fe> in <module>
      8     k.append(list(map(lambda x: random.choice(o),range(10))))
      9     s=pd.DataFrame(k[i],columns=df.columns)
---> 10     c.append(CCR(s))

<ipython-input-144-da506c585def> in CCR(data)
     30     cph_c=CoxPHFitter()
     31     cph_l.fit(df_l,'Observed',event_col='Liquidation')
---> 32     cph_c.fit(df_c,'Observed',event_col='Cure')
     33     beta_cure=float('{:.3f}'.format((cph_c.params_[0])))
     34     beta_liquidation=float('{:.3f}'.format((cph_l.params_[0])))

~\anaconda3\lib\site-packages\lifelines\utils\__init__.py in f(model, *args, **kwargs)
     52         def f(model, *args, **kwargs):
     53             cls.set_censoring_type(model, cls.RIGHT)
---> 54             return function(model, *args, **kwargs)
     55 
     56         return f

~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in fit(self, df, duration_col, event_col, show_progress, initial_point, strata, step_size, weights_col, cluster_col, robust, batch_mode, timeline, formula, entry_col)
    274         """
    275         self.strata = utils.coalesce(strata, self.strata)
--> 276         self._model = self._fit_model(
    277             df,
    278             duration_col,

~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _fit_model(self, *args, **kwargs)
    595     def _fit_model(self, *args, **kwargs):
    596         if self.baseline_estimation_method == "breslow":
--> 597             return self._fit_model_breslow(*args, **kwargs)
    598         elif self.baseline_estimation_method == "spline":
    599             return self._fit_model_spline(*args, **kwargs)

~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _fit_model_breslow(self, *args, **kwargs)
    608         )
    609         if utils.CensoringType.is_right_censoring(self):
--> 610             model.fit(*args, **kwargs)
    611             return model
    612         else:

~\anaconda3\lib\site-packages\lifelines\utils\__init__.py in f(model, *args, **kwargs)
     52         def f(model, *args, **kwargs):
     53             cls.set_censoring_type(model, cls.RIGHT)
---> 54             return function(model, *args, **kwargs)
     55 
     56         return f

~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in fit(self, df, duration_col, event_col, show_progress, initial_point, strata, step_size, weights_col, cluster_col, robust, batch_mode, timeline, formula, entry_col)
   1225         )
   1226 
-> 1227         params_, ll_, variance_matrix_, baseline_hazard_, baseline_cumulative_hazard_, model = self._fit_model(
   1228             X_norm,
   1229             T,

~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _fit_model(self, X, T, E, weights, entries, initial_point, step_size, show_progress)
   1353         show_progress: bool = True,
   1354     ):
-> 1355         beta_, ll_, hessian_ = self._newton_rhapson_for_efron_model(
   1356             X, T, E, weights, entries, initial_point=initial_point, step_size=step_size, show_progress=show_progress
   1357         )

~\anaconda3\lib\site-packages\lifelines\fitters\coxph_fitter.py in _newton_rhapson_for_efron_model(self, X, T, E, weights, entries, initial_point, step_size, precision, show_progress, max_steps)
   1505                     )
   1506                 elif isinstance(e, LinAlgError):
-> 1507                     raise exceptions.ConvergenceError(
   1508                         """Convergence halted due to matrix inversion problems. Suspicion is high collinearity. {0}""".format(
   1509                             CONVERGENCE_DOCS

ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular.

        
    
        
    

标签: python

解决方案


给定的清楚地说明了问题:

ConvergenceError:由于矩阵反转问题,收敛停止。怀疑是高度共线性。请参阅生命线文档中的以下提示: https ://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix 是单数。

如果没有真实数据,我无法提供任何进一步的建议。但是生命线文档在这个问题上给出了很多建议:

由于矩阵反转问题导致收敛停止:这意味着您的数据集中存在高度共线性。也就是说,一列等于1个或多个其他列的线性组合。此错误的一个常见原因是虚拟分类变量但未删除数据集中的列或某些层次结构。尝试通过以下方式找到关系:向模型添加惩罚器,例如:CoxPHFitter(penalizer=0.1).fit(...) 直到模型收敛。在 print_summary() 中,具有高共线性的系数在 coefs 列中将具有大(绝对)幅度。使用方差膨胀因子(VIF)来寻找冗余变量。查看数据集的相关矩阵,或

这很可能不是由生命线引起的错误,而是您的数据或您如何将模型应用于数据。


推荐阅读