首页 > 解决方案 > How to keep 'Time' and 'Group' in this mixed linear regression analysis

问题描述

I have following study results: Two groups of mice were taken: group A (which received drug A) and group B (which received drug B). Weight was tested at baseline and after 1 month. Hence, data is arranged as follows:

ID      Group       Time        Weight
1       A           basal       25          
1       A           1month      28
2       B           basal       29
2       B           1month      28
...
...

I want to determine if weight change is different in Group A versus Group B.

I took code from similar example at this page: https://scientificallysound.org/2017/08/24/the-likelihood-ratio-test-relevance-and-application/

How do I conduct mixed linear regression for my study. I have 2 options:

md = smf.mixedlm("Weight ~ Group", data, groups=data["Time"])
mdf = md.fit(reml=False)
print(mdf.summary())

Or:

md = smf.mixedlm("Weight ~ Time", data, groups=data["Group"])
mdf = md.fit(reml=False)
print(mdf.summary())

Or I just do linear regression? Here also there are 2 options:

`"Weight ~ Time + Group"` 

and

`"Weight ~ Time + Group + Time*Group"` ?

Note: for above code import statsmodels.formula.api as smf is needed.

标签: pythonregressionlinear-regressionmixed-models

解决方案


您可以使用差异中的差异技术来实现您想要的结果。

为组变量创建一个虚拟变量(即,如果组 == B,则为 -> 1,如果组 == A,则为 0)。然后为采样时间创建另一个虚拟变量(如果是基线,则为 0,后处理为 1)。

然后,您的第三个选项将正常工作,以获得 Time*Group 交互变量的系数。

我想你比我更清楚如何编写代码,但从统计数据来看,第三种解决方案绝对是你正在寻找的,以监督你的学习效果。

编辑 - 要清楚,第三个选项是(重量〜组+时间+组*时间)

在处理错误组件中包含固定效果的面板数据集时,我可能会使用混合模型。使用混合回归,您可以克服从所谓的随机或不同分布(例如,某个数据集可能在不同国家和不同时间以不同方式分布)中采样的数据所产生的问题。根据我的经验,这种情况在处理面板数据集时最常见。根据您的需要,我肯定会使用 diff-in-diff 模型,因为时间和组之间的差异是您想要测量的东西。(并且不要试图抵消它的影响)。


推荐阅读