首页 > 解决方案 > 在 SAS 中计算分布表

问题描述

SAS新手在这里。我有一张表,上面有每个日期的教育参数。示例如下所示:

| id | dt_issue   | EDUCATION |
|----|------------|-----------|
| 1  | 12.02.2010 | higher    |
| 2  | 04.12.2009 | phd       |
| 3  | 23.07.2010 | higher    |
| 4  | 18.08.2010 | higher    |
| 5  | 26.11.2009 | higher    |
| 6  | 02.12.2009 | higher    |
| 7  | 25.11.2009 | higher    |
| 8  | 06.09.2010 | higher    |
| 9  | 29.12.2010 | secondary |
| 10 | 04.05.2010 | secondary |

我想计算每个日期的教育分布百分比。最终结果应如下所示:

| Education | 2007   | 2008   | 2009   | 2010   |
|-----------|--------|--------|--------|--------|
| PHD       | 0.0888 | 0.0733 | 0.0851 | 0.0802 |
| Higher    | 0.5898 | 0.5532 | 0.5717 | 0.5561 |
| Secondary | 0.2429 | 0.2993 | 0.2636 | 0.2902 |
| Undergrad | 0.0785 | 0.0742 | 0.0796 | 0.0735 |

到目前为止,我尝试这样做:

proc sql;
    create table stat_educ as
    select year(dt_issue) as year
                ,education
                ,count(*)/b.cnt_total as percent 
    from sample as a 
    left join 
    (
    select year(dt_issue) as year
                ,count(*) as cnt_total 
    from sample
    group by 1
    ) as b
        on year(a.dt_issue) = b.year group by 1,2;
quit;

但这确实只导致了这样的表格:

| Year | Education | percent |
|------|-----------|---------|
| 2007 | higher    | 0.5898  |
| 2007 | Secondary | 0.2429  |
| 2007 | Undegrad  | 0.0785  |
| 2007 | PHD       | 0.0888  |
| 2008 | higher    | 0.5532  |
| 2008 | Secondary | 0.2993  |
| 2008 | Undegrad  | 0.0742  |
| 2008 | PHD       | 0.0733  |

两个问题:
1)我怎样才能得到我想要的桌子?
2)是否可以使用与我使用的代码不同的代码以使其更简单/简洁?

标签: sas

解决方案


利用Proc TABULATE

proc tabulate data=have;
  class education dt_issue;
  format dt_issue year.;
  table education='', dt_issue=''*colpctn='' / box='Education';
run;

在此处输入图像描述


推荐阅读