首页 > 解决方案 > SAS:提取按日期排序的最后一个唯一组合

问题描述

我很难提取工人在按时间安排的事件中执行的独特任务。唯一组合由 ID 和 Mode 定义。以下数据集模拟了该场景:

  ID        Time       Mode     Event
 23456     20120101    A        Open
 23456     20120101    B        Closed
 87690     20120311    G        Closed
 98000     20120201    B        Open
 98000     20120301    A        Open
 98000     20120101    A        Open
 87889     20121009    C        Closed
 87889     20120101    C        Open
 87900     20120411    A        Closed
 87900     20120102    A        Closed

希望得到以下结果:

  ID        Time       Mode     Event
 23456     20120101    A        Open
 23456     20120101    B        Closed
 87690     20120311    G        Closed
 98000     20120201    B        Open
 98000     20120301    A        Open
 87889     20121009    C        Closed
 87900     20120411    A        Closed

我将首先按时间降序排序:

  proc sort data=df; by ID descending time; run;

然后我可以再次使用 sort 通过 ID 和 Mode 获得唯一的组合:

  proc sort data=df dupout=nodup nodupkey;
     by ID Mode; run;

在最后一步,如何确保非重复记录也是最新事件?

谢谢!

标签: sortingsasduplicates

解决方案


你可以通过先使用来做到这一点。最后一个概念

data have;
 input ID        Time:yymmdd8.      Mode $    Event $;
 format time yymmdd10.;
  datalines;
 23456     20120101    A        Open
 23456     20120101    B        Closed
 87690     20120311    G        Closed
 98000     20120201    B        Open
98000     20120301    A        Open
98000     20120101    A        Open
87889     20121009    C        Closed
87889     20120101    C        Open
 87900     20120411    A        Closed
 87900     20120102    A        Closed
  ;

 proc sort data = have out=have1;
 by id mode time;
 run;

 data want;
 set have1;
 by id mode time;
if last.mode and last.time then output;
 run;

或者我你可以简单的 proc sql 如下所示

proc sql;
create table want1 as
select id, time, mode,  event from have
 group by  id, mode
 having time = max(time);

为了使您的代码正常工作,在您的第一个排序中,您需要成为您的第一个排序为 proc sort data=df; 按ID模式下降时间;跑;


推荐阅读