首页 > 解决方案 > 如何查找组内的观察值是否具有某些值

问题描述

假设我有这个 MWE:

clear all
input str2 person enr_year enr_term
"a" 2000 1     
"a" 2000 2   
"a" 2000 2 
"a" 2000 3   
"a" 2000 3 
"a" 2001 1   
"a" 2001 2   
"a" 2001 3   
"a" 2002 2
"a" 2002 2    
"a" 2003 2      
"a" 2006 1
"a" 2006 2
"a" 2008 2  
"b" 2000 2
"b" 2001 3
end

label define term 1 "Summer" 2 "Fall" 3 "Spring"
label values enr_term term

一些解释是有序的。这是学校招生数据。 person是一个人,一切都需要在人内完成。

enr_year是一个学年。enr_term是一个学术术语。夏季和秋季在春季之前到来的原因是因为这一年是学年,而不是日历年。

数据中的每一行都隐含地表示该人在给定的年份和学期注册。

我的任务是创建两个指标变量:enr_this_springenr_next_fall. 我可以顺利拿到enr_this_spring。我已经包含了这样做的代码,以防逻辑有助于弄清楚如何获取enr_next_fall.

*这些指标变量只应为秋季入学的观察结果创建。

enr_this_spring表示该人于次年春季入学。因为我们只为秋季学期设置这个变量,所以如果同一年有春季观察,这将是 1。否则为 0,即使明年有春季观测。

enr_next_fall如果从明年开始有秋季观测,则为 1。如下文所述,如果出现学生在 x 的秋季入学,而不是 x+1 的秋季入学,而是 x+n 的秋季入学的情况,就会出现我不知道如何克服的问题,其中n>1。

如果同一年内有两次秋季观察(多个入学期,也许学生同时在两所学校入学),它们都将采用相同的值。

这是我想要得到的:

clear all
input str2 person enr_year enr_term enr_this_spring enr_next_fall
"a" 2000 1  .   . // missing because not Fall    
"a" 2000 2  1   1 // 1 b/c a/2000/3; 1 b/c a/2001/2   
"a" 2000 2  1   1 // same reasons as line directly above 
"a" 2000 3  .   . // missing because not Fall  
"a" 2000 3  .   . // missing because not Fall 
"a" 2001 1  .   . // missing because not Fall   
"a" 2001 2  1   1 // 1 b/c a/2001/3; 1 b/c a/2002/2   
"a" 2001 3  .   . // missing because not Fall   
"a" 2002 2  0   1 // 0 b/c no a/2002/3; 1 b/c a/2003/2  
"a" 2002 2  0   1 // same reasons as line directly above    
"a" 2003 2  0   0 // 0 b/c no a/2003/2; 0 b/c no a/2004/2       
"a" 2006 1  .   . // missing because not Fall  
"a" 2006 2  0   0 // 0 b/c no a/2006/3; 0 b/c no a/2007/2  
"a" 2008 2  0   0 // 0 b/c no a/2008/3; 0 b/c no a/2009/2  
"b" 2000 2  0   0 // 0 b/c no a/2000/3; 0 b/c no a/2001/2
"b" 2001 3  .   . // missing because not Fall
end
label define term 1 "Summer" 2 "Fall" 3 "Spring"
label values enr_term term

从原始数据入手,我首先可以成功获取enr_this_spring,如下:

*Create indicators for if the term is spring and if term is fall
gen is_spring = enr_term == 3
gen is_fall = enr_term ==2
*Get the maximum value, within person and year
bys person enr_year: egen enr_this_spring = max(is_spring)
replace enr_this_spring=. if is_fall!=1

我不确定如何创建一个指标来判断他们是否在明年秋季入学。

这是我尝试过的,并解释了为什么它在代码之后不起作用:

*Preserve the data. We are going to process it and merge back on
preserve
*We only are concerned about fall attendance for this part
keep if enr_term==2
*We only want one observation per term, as duplicates mess up the code
bys person enr_year enr_term: keep if _n==1 
*Make a variable that is a constant 1
gen one = 1
*Make a variable, enr_next_fall that is 1 if the person enrolled in the fall
* in the following observation. Note that we do this within group and sort
* by enr_year
bys person (enr_year): gen enr_next_fall = one[_n+1]
* Replace missing with 0. This only affects the final observation within group
replace enr_next_fall = 0 if missing(enr_next_fall)
*Create temporary file, to be merged on
tempfile a
save `a'
restore
*Merge on the temporary file
merge m:1 person enr_year enr_term using `a'
drop is_spring is_fall one _merge

在该人没有在明年秋季入学但又回来的情况下,这并没有得到我想要的东西。也许他们生病了,错过了整个学年。我应该如何解决这个问题?

标签: variablesstataindicator

解决方案


我想我已经弄清楚了:

clear all
input str2 person enr_year enr_term
"a" 2000 1     
"a" 2000 2   
"a" 2000 2 
"a" 2000 3   
"a" 2000 3 
"a" 2001 1   
"a" 2001 2   
"a" 2001 3   
"a" 2002 2
"a" 2002 2    
"a" 2003 2      
"a" 2006 1
"a" 2006 2
"a" 2008 2  
"b" 2000 2
"b" 2001 3
end

label define term 1 "Summer" 2 "Fall" 3 "Spring"
label values enr_term term

*Create indicators for if the term is spring and if term is fall
gen is_spring = enr_term == 3
gen is_fall = enr_term ==2
*Get the maximum value, within person and year
bys person enr_year: egen enr_this_spring = max(is_spring)
replace enr_this_spring=. if is_fall!=1

*Create enr_next_fall variable. Merge back on
preserve
keep if enr_term==2
bys person enr_year: keep if _n==1
bys person (enr_year): gen next = enr_year[_n+1]
replace next = next - 1
gen enr_next_fall = enr_year==next
drop next
tempfile fall
save `fall'
restore
merge m:1 person enr_year using `fall'
drop _merge
replace enr_next_fall = . if enr_term!=2

推荐阅读