python - python statsmodels:输出“formula.api”与“regression.quantile_regression”的差异
问题描述
对于statsmodels
using的模块python
,我想知道使用statsmodels.formula.api
vs调用相同程序的差异是如何产生的statsmodels.regression.quantile_regression
。特别是,我获得了参数估计的差异。
附上一个最小的工作示例。
#%% Moduls;
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg
#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
#%% QuantReg-Version;
model2 = QuantReg \
(
data['foodexp'].values,
exog = sm.tools.tools.add_constant(data['income']).values,
missing = 'drop'
)
result2 = model2.fit \
(
q = 0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06
)
#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9: ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))
编辑:
我需要编辑我的问题;下面提出的解决方法,我仍然非常感激,在应用设置中不起作用;原因:我没有只有 1 个回归器。请在附件中找到修改后的版本。
#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg
#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2
#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
#%% QuantReg-Version;
model2 = QuantReg \
(
data['foodexp'].values,
exog = sm.tools.tools.add_constant(data[['income', 'income2']].values),
missing = 'drop'
)
result2 = model2.fit \
(
q = 0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06
)
#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9: ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))
解决方案
您需要对代码进行一些小改动。这有很大的不同
#%% QuantReg-Version;
model2 = QuantReg ( data['foodexp'].values, exog = sm.tools.tools.add_constant(data['income'].values), missing = 'drop')
正如您将其放在外部一样,这对内部实施产生了很大影响。
最终实施
#%% Moduls;
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg
#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
#%% smf-Version;
model1 = smf.quantreg(formula='foodexp ~ income', data=data, missing="drop")
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather',
max_iter=1000, p_tol=1e-06)
#%% QuantReg-Version;
model2 = QuantReg \
(
data['foodexp'].values,
exog = sm.tools.tools.add_constant(data['income'].values),
missing = "drop"
)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9: ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))
除了我上面的代码。我已将 exog 从模型 2 复制到模型 1
#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg
#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2
model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
model2 = QuantReg (data['foodexp'].values, exog = sm.tools.tools.add_constant(data[['income', 'income2']].values), missing = 'drop')
model1.exog = model2.exog
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9: ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))
第二种方法:-我已将 exog 从模型 1 复制到模型 2
#%% Moduls;
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.quantile_regression import QuantReg
#%% Load in sample data;
data = sm.datasets.engel.load_pandas().data
data['income2'] = data['income']**2
model1 = smf.quantreg(formula='foodexp ~ income + income2', data=data, missing="drop")
model2 = QuantReg (data['foodexp'].values, exog = sm.tools.tools.add_constant(data[['income', 'income2']].values), missing = 'drop')
model2.exog = model1.exog
result1 = model1.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
result2 = model2.fit(q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather', max_iter=1000, p_tol=1e-06)
#%% Compare Results;
print(result1.params[0])
print(result2.params[0])
print('Difference times 10^9: ' + str(abs(10**9*(result1.params[0]-result2.params[0]))))
如果我将两个 exog 保持为相同的值,则答案是相等的。所以我之前说过的数据转换的实现有明显的区别。
推荐阅读
- python - 某些服务如何在您通话时读取您在拨号盘中键入的内容?
- sql - 在 SQL 中不进行聚合的数据透视
- python - Keras 纪元计数回调是否适用于多个拟合会话?
- google-sheets - 自动从谷歌表格的下拉列表中删除
- windows - 如何使用powershell获取共享文件夹级别权限
- javascript - 通过 selenium Python 复制粘贴?
- path - 获取在 MT4 中使用 ChartApplyTemplate 的路径
- reactjs - 在设置状态完成后调用函数
- python - Pandas 数据框 - 根据列删除部分字符串
- python - Django urls.py 问题,不知道在 'urlpatterns' 中放什么