r - SAS和R中因子逻辑回归的不同输出
问题描述
我正在尝试在 SAS 和 R 中进行这些阶乘逻辑回归,但我在 dry=rt*chi_ur 中获得了不同的结果!!!为什么???
我的数据:
id dry rt chi_ur
1 1 0 1
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 1
6 0 0 0
7 0 0 0
8 0 0 1
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 1 0 0
14 0 0 0
15 0 0 1
16 0 0 1
17 0 0 0
18 1 0 0
19 0 0 0
20 0 0 0
21 0 0 1
22 1 1 0
23 0 1 1
24 0 0 1
25 0 0 1
26 1 0 0
27 1 0 0
28 0 0 0
29 1 0 0
30 1 0 0
31 1 0 1
32 1 0 0
33 0 0 0
34 1 0 0
35 0 0 0
36 0 0 1
37 1 0 0
38 1 0 0
39 0 0 1
40 0 1 0
41 0 1 0
42 1 1 0
43 0 1 0
44 0 0 0
45 0 0 0
46 0 0 1
47 0 0 0
48 0 0 1
49 1 0 0
50 0 0 1
51 0 0 0
52 1 0 0
53 1 0 0
54 1 0 0
55 1 0 0
56 0 0 0
57 1 0 0
58 0 0 0
59 1 0 0
60 1 0 0
61 0 0 0
62 0 1 0
63 0 0 0
64 0 0 0
65 1 1 0
66 0 0 0
67 1 0 0
68 1 0 0
69 1 0 0
70 1 0 0
71 1 0 0
72 1 0 0
73 1 0 0
74 1 0 0
75 1 0 0
76 1 0 0
77 0 1 0
78 1 0 0
79 0 1 0
80 0 1 0
81 1 0 0
82 1 0 0
83 1 0 0
84 1 0 0
85 1 0 0
86 0 0 1
87 1 0 0
88 1 0 0
89 1 0 0
90 1 0 1
91 1 0
92 1 0
93 0 0
94 0 1
95 0 1
96 0 1
97 1 0
98 1 0
代码:
summary(glm(dry ~ chi_ur, data = en, family = binomial))
summary(glm(dry ~ rt, data = en, family = binomial))
summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))
SAS代码:
proc logistic data = en.en1 desc;
class chi_ur ;
model dry = chi_ur / expb;
run;
proc logistic data = en.en1 desc;
class rt ;
model dry = rt / expb;
run;
proc logistic data = en.en1 desc;
class rt chi_ur ;
model dry = rt chi_ur rt*chi_ur/ expb;
run;
我的 R 结果:
> summary(glm(dry ~ chi_ur, data = en, family = binomial))
Call:
glm(formula = dry ~ chi_ur, family = binomial, data = en)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2601 -1.2601 -0.6231 1.0969 1.8626
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1924 0.2352 0.818 0.4133
chi_ur -1.7328 0.6782 -2.555 0.0106 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.59 on 89 degrees of freedom
Residual deviance: 116.37 on 88 degrees of freedom
(8 observations deleted due to missingness)
AIC: 120.37
Number of Fisher Scoring iterations: 3
> summary(glm(dry ~ rt, data = en, family = binomial))
Call:
glm(formula = dry ~ rt, family = binomial, data = en)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2181 -1.2181 -0.6945 1.1372 1.7552
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.09531 0.21847 0.436 0.6626
rt -1.39459 0.68700 -2.030 0.0424 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 135.69 on 97 degrees of freedom
Residual deviance: 130.81 on 96 degrees of freedom
AIC: 134.81
Number of Fisher Scoring iterations: 4
> summary(glm(dry ~ rt*chi_ur, data = en, family = binomial))
Call:
glm(formula = dry ~ rt * chi_ur, family = binomial, data = en)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3304 -1.3304 -0.6444 1.0317 1.8297
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3528 0.2559 1.379 0.16798
rt -1.2001 0.7360 -1.631 0.10297
chi_ur -1.8192 0.6897 -2.637 0.00835 **
rt:chi_ur -12.8996 1455.3979 -0.009 0.99293
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 124.59 on 89 degrees of freedom
Residual deviance: 113.07 on 86 degrees of freedom
(8 observations deleted due to missingness)
AIC: 121.07
Number of Fisher Scoring iterations: 14
我的 SAS 结果:
The SAS System
The LOGISTIC Procedure
Model Information
Data Set EN.EN1
Response Variable dry
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 98
Number of Observations Used 90
Response Profile
Ordered
Value dry Total
Frequency
1 1 43
2 0 47
Probability modeled is dry='1'.
Note: 8 observations were deleted due to missing values for the response or explanatory variables.
Class Level Information
Class Value Design
Variables
chi_ur 0 1
1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 126.589 120.371
SC 129.088 125.371
-2 Log L 124.589 116.371
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 8.2175 1 0.0041
Score 7.6262 1 0.0058
Wald 6.5262 1 0.0106
Type 3 Analysis of Effects
Effect DF Wald
Chi-Square Pr > ChiSq
chi_ur 1 6.5262 0.0106
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -0.6740 0.3391 3.9498 0.0469 0.510
chi_ur 0 1 0.8664 0.3391 6.5262 0.0106 2.378
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
chi_ur 0 vs 1 5.656 1.497 21.372
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 27.7 Somers' D 0.228
Percent Discordant 4.9 Gamma 0.700
Percent Tied 67.4 Tau-a 0.115
Pairs 2021 c 0.614
--------------------------------------------------------------------------------
The SAS System
The LOGISTIC Procedure
Model Information
Data Set EN.EN1
Response Variable dry
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 98
Number of Observations Used 98
Response Profile
Ordered
Value dry Total
Frequency
1 1 47
2 0 51
Probability modeled is dry='1'.
Class Level
Information
Class Value Design
Variables
rt 0 1
1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 137.694 134.806
SC 140.279 139.976
-2 Log L 135.694 130.806
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 4.8871 1 0.0271
Score 4.6063 1 0.0319
Wald 4.1208 1 0.0424
Type 3 Analysis of Effects
Effect DF Wald
Chi-Square Pr > ChiSq
rt 1 4.1208 0.0424
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -0.6020 0.3435 3.0712 0.0797 0.548
rt 0 1 0.6973 0.3435 4.1208 0.0424 2.008
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
rt 0 vs 1 4.033 1.049 15.504
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 20.2 Somers' D 0.152
Percent Discordant 5.0 Gamma 0.603
Percent Tied 74.8 Tau-a 0.077
Pairs 2397 c 0.576
--------------------------------------------------------------------------------
The SAS System
The LOGISTIC Procedure
Model Information
Data Set EN.EN1
Response Variable dry
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 98
Number of Observations Used 90
Response Profile
Ordered
Value dry Total
Frequency
1 1 43
2 0 47
Probability modeled is dry='1'.
Note: 8 observations were deleted due to missing values for the response or explanatory variables.
Class Level Information
Class Value Design
Variables
rt 0 1
1 -1
chi_ur 0 1
1 -1
Model Convergence Status
Quasi-complete separation of data points detected.
Warning: The maximum likelihood estimate may not exist.
Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 126.589 121.066
SC 129.088 131.065
-2 Log L 124.589 113.066
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 11.5228 3 0.0092
Score 10.6138 3 0.0140
Wald 8.6501 3 0.0343
Joint Tests
Effect DF Wald
Chi-Square Pr > ChiSq
rt 1 0.0007 0.9787
chi_ur 1 0.0009 0.9765
rt*chi_ur 1 0.0005 0.9830
Note: Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization.
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error Wald
Chi-Square Pr > ChiSq Exp(Est)
Intercept 1 -3.5417 111.8 0.0010 0.9747 0.029
rt 0 1 2.9849 111.8 0.0007 0.9787 19.785
chi_ur 0 1 3.2945 111.8 0.0009 0.9765 26.963
rt*chi_ur 0 0 1 -2.3849 111.8 0.0005 0.9830 0.092
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 40.7 Somers' D 0.319
Percent Discordant 8.8 Gamma 0.646
Percent Tied 50.6 Tau-a 0.161
Pairs 2021 c 0.660
我认为SAS最大似然估计分析中的标准误差保持不变有点怀疑......
任何想法?我该如何解决?谢谢!
解决方案
我怀疑这是因为您没有在 PROC LOGISTIC 中的 CLASS 语句上指定 PARAMETERIZATION 和 REF 选项,因此参数化方法会有所不同。R 也没有指定“事件”是什么,假设它使用 1 那么结果应该是相似的。
class rt (param=ref);
推荐阅读
- php - Laravel:使用 Varnish 和 CSRF 令牌
- powershell - For-each 循环和 IF 语句:检查正在运行的进程 ID 号
- python - 将子列表中的每三个元素附加到字典
- angular - 将一个值从一个数组的对象传递到另一个数组位置到当前数组或另一个相同类型的数组 Angular
- ansible - 无法在 M1 Mac 上运行 Ansible Playbook
- flutter - 带有多个电话号码的 Flutter url_launcher
- java - JSON和Lombok构造函数的问题杰克逊反序列化
- bash - 比较变量并不总是正常工作
- c++ - 如何使用 SQLAPI++ 从 SQL 服务器读取 unicode 字符?
- python - 当我尝试在 df 中加载一些行时出现错误