首页 > 解决方案 > 如何在 SAS 中按类和 ID 创建运行条件和变量

问题描述

从累积的剧集计数(如果时间间隔小于 10 天,则视为一集),我想根据 ID 的类计算运行剧集计数的“宽”和“长”版本。

这就是我的数据现在的样子。

id  Class   Date      Obsvn Episode_Sum
9   Wide    3/10/2012   1   1
9   Wide    3/12/2012   2   1
9   Wide    7/1/2012    111 2
9   Wide    7/3/2012    2   2
108 Wide    3/31/2011   1   1
108 Long    3/31/2011   1   1
108 Wide    4/17/2011   17  2
108 Wide    6/24/2011   68  3
108 Wide    6/16/2012   358 4
108 Wide    7/20/2012   34  5
108 Wide    7/27/2012   7   5

我通过这段代码实现了运行计数:

data want (drop=lag); set have;

by id date;

format lag mmddyy10.;
lag=lag(date);

if first.id then obsvn=1;
else obsvn=max(intck("Day", Lag, date),1);

if first.id then episode_sum=1;
else if obsvn>10 then episode_sum+1;

run;

我希望我的数据看起来像这样:

id  Class   Date     Obsvn Sum Wide Long
9   Wide    3/10/2012   1   1   1   0
9   Wide    3/12/2012   2   1   1   0
9   Wide    7/1/2012    111 2   2   0
9   Wide    7/3/2012    2   2   2   0
108 Wide    3/31/2011   1   1   1   0
108 Long    3/31/2011   1   1   1   1
108 Wide    4/17/2011   17  2   2   1
108 Wide    6/24/2011   68  3   3   1
108 Wide    6/16/2012   358 4   4   1
108 Wide    7/20/2012   34  5   5   1
108 Wide    7/27/2012   7   5   5   1

但我得到了这个:

id  Class   Date    Obsvn  Sum Wide Long
9   Wide    3/10/2012   1   1   1   0
9   Wide    3/12/2012   2   1   1   0
9   Wide    7/1/2012    111 2   2   0
9   Wide    7/3/2012    2   2   **1**   0
108 Wide    3/31/2011   1   1   1   **1**
108 Long    3/31/2011   1   1   1   1
108 Wide    4/17/2011   17  2   2   1
108 Wide    6/24/2011   68  3   3   1
108 Wide    6/16/2012   358 4   4   1
108 Wide    7/20/2012   34  5   5   1
108 Wide    7/27/2012   7   5   **1**   1

这是我创建剧集的代码。我试图解释每个 ID 何时切换类。我如何实现这一目标?

/*Calculating Long*/

if (first.id and class in ("Long")) then Episode_Long=1;
else if obsvn>10 and class in ("Long") then Episode_Long+1;
retain Episode_Long;

if (obsvn<10 and class in ("Long")) then Episode_Long=1;

if class not in ("Long") then do;
if first.id and class not in ("Long") then Episode_Long=0;
retain Episode_Long;
end;

/*Calculating Wide */

if (obsvn<10 and class in ("Wide")) then Episode_Wide=1   ;  
if (first.id and class in ("Wide")) then Episode_Wide=1;
else if obsvn>10 and class in ("Wide") then Episode_Wide+1;
retain Episode_Wide;

标签: sas

解决方案


棘手的部分是您在第二个 ID 组中有两个相同 DATE 的记录。因此,您希望在计算天数变化时跟踪这一点。

这是一种方法。首先让我们输入您的源数据(和所需的结果)。

data have ;
  input id Class $ Date :mmddyy. EObsvn ESum EWide ELong ;
  format date yymmdd10.;
cards;
9   Wide    3/10/2012   1   1   1   0
9   Wide    3/12/2012   2   1   1   0
9   Wide    7/1/2012    111 2   2   0
9   Wide    7/3/2012    2   2   2   0
108 Wide    3/31/2011   1   1   1   0
108 Long    3/31/2011   1   1   1   1
108 Wide    4/17/2011   17  2   2   1
108 Wide    6/24/2011   68  3   3   1
108 Wide    6/16/2012   358 4   4   1
108 Wide    7/20/2012   34  5   5   1
108 Wide    7/27/2012   7   5   5   1
;

您可能希望首先找到存在 WIDE 或 LONG 间隔的日期。

data long ;
  set have ;
  by id date;
  where class='Long';
  if first.date;
  lag=lag(date);

  if first.id then call missing(lag,obsvn);
  else obsvn=max(intck("Day", Lag, date),1);
  lflag = missing(lag) or obsvn > 10 ;
  keep id date lflag ;

run;


data wide ;
  set have ;
  by id date;
  where class='Wide';
  if first.date;
  lag=lag(date);

  if first.id then call missing(lag,obsvn);
  else obsvn=max(intck("Day", Lag, date),1);
  wflag = missing(lag) or obsvn > 10 ;
  keep id date wflag ;

run;

然后按日期将其合并回源并计算您的计数器。

data want ;
  merge have wide long ;
  by id date;

  if first.date then do ;
    lag=lag(date);
    format lag yymmdd10.;
    if first.id then call missing(lag,obsvn);
    else obsvn=max(intck("Day", Lag, date),1);
    retain lag obsvn;
  end;

  if first.id then call missing(sum,wide,long);
  if missing(lag) or obsvn > 10 then sum+first.date ;
  wide + (wflag and first.date);
  long + (lflag and first.date);

run;  

在此处输入图像描述


推荐阅读