r - 具有多索引的面板数据
问题描述
我想问一下如何使用面板数据,或者如何转换数据集以便它可以建模为面板数据,当它具有多索引时?
library(tibble)
library(plm)
library(fastDummies)
dataset <- tribble(
~country, ~year, ~sex, ~age, ~suicides_no,
"Albania", 1987, "male", "15-24", 50,
"Albania", 1987, "male", "35-50", 20,
"Albania", 1987, "male", "50-", 11,
"Albania", 1987, "female", "15-24", 18,
"Albania", 1987, "female", "35-50", 2,
"Albania", 1987, "female", "50-", 1,
"Albania", 1988, "male", "15-24", 50,
"Albania", 1988, "male", "35-50", 2,
"Albania", 1988, "male", "50-", 11,
"Albania", 1988, "female", "15-24", 17,
"Albania", 1988, "female", "35-50", 20,
"Albania", 1988, "female", "50-", 10,
"Albania", 1989, "male", "15-24", 0,
"Albania", 1989, "male", "35-50", 2,
"Albania", 1989, "male", "50-", 1,
"Albania", 1989, "female", "15-24", 7,
"Albania", 1989, "female", "35-50", 2,
"Albania", 1989, "female", "50-", 1,
"Germany", 1987, "male", "15-24", 50,
"Germany", 1987, "male", "35-50", 2,
"Germany", 1987, "male", "50-", 11,
"Germany", 1987, "female", "15-24", 18,
"Germany", 1987, "female", "35-50", 20,
"Germany", 1987, "female", "50-", 1,
"Germany", 1988, "male", "15-24", 0,
"Germany", 1988, "male", "35-50", 2,
"Germany", 1988, "male", "50-", 110,
"Germany", 1988, "female", "15-24", 17,
"Germany", 1988, "female", "35-50", 20,
"Germany", 1988, "female", "50-", 10,
"Germany", 1989, "male", "15-24", 0,
"Germany", 1989, "male", "35-50", 20,
"Germany", 1989, "male", "50-", 1,
"Germany", 1989, "female", "15-24", 73,
"Germany", 1989, "female", "35-50", 2,
"Germany", 1989, "female", "50-", 11
)
dataset %>% tail
dataset2 <- dummy_cols(dataset, "age") %>% select(-age)
panel <- pdata.frame(dataset2, index = c("country", "year"))
由于年龄间隔,我们在一年内对一个横截面单位进行了多次观察,
我们将如何转换此数据集以将其用作面板数据并使用随机或固定效果?
使用:
library(plm)
fixex = plm(suicides_no ~ factor(sex) + factor(age), index = c("country", "year"), data = dataset, model = "within")
不工作,如何转换数据以便可以估计
解决方案
该plm()
函数需要 ID 和时间的唯一组合,由错误消息指示duplicate couples (id-time)
。当你运行时:
library(dplyr)
dataset %>%
count(country, year)
然后你可以看到,每个国家和年份的组合都有六个观察值:
country year n
<chr> <dbl> <int>
1 Albania 1987 6
2 Albania 1988 6
3 Albania 1989 6
4 Germany 1987 6
5 Germany 1988 6
6 Germany 1989 6
为避免这种情况,您需要创建唯一 ID。我假设它们可以根据国家、年龄和性别来创建。然后,你可以这样做:
library(broom)
dataset %>%
mutate(ID = group_indices(., !!!select(., -suicides_no, -year))) %>%
mutate_at(vars(sex, age), as.factor) %>%
do(tidy(plm(suicides_no ~ sex + age,
index = c("year", "ID"),
model = "within",
data = .)))
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 sexmale 5.17 7.82 0.661 0.514
2 age35-50 -15.5 9.57 -1.62 0.116
3 age50- -10.1 9.57 -1.05 0.301
推荐阅读
- python - open .ods file inside python package
- python-3.x - Applying frame independence with none linear movement
- html - 如何在屏幕的某个部分放置图像或文本
- kotlin - 我如何将类枚举中的 var 与 fun main 中的 var 进行比较以生成 when 语句
- css - 将可见性设置为可见并使用 :checked 伪类时元素不出现
- c++ - 重载 [] 或 = 运算符以更改链表中的数据
- io - 平台控制器集线器中的带宽共享方案
- python - 发送消息确认机器人已删除/清除消息
- java - 使用 MediaStore API 保存音频文件
- pointers - 二维数组和指针算法