首页 > 解决方案 > 如何在 sas 合并中对不在 select ... 中的 proc sql 进行编码?

问题描述

下午好,

所以,我开始学习 SAS 中的代码。

我想得到完全相同的结果

proc sql;

create table award_print_new as
select * 
from awards_try2
where BOR_ITEM_TYPE NOT IN (SELECT BOR_ITEM_TYPE from FADSFUND);
run;

我虽然这是确切的代码,但结果不同,所以我错了。

proc sort data=awards_try2;
by BOR_ITEM_TYPE;
run;

proc sort data=FADSFUND;
by BOR_ITEM_TYPE;
run;

data award_print;
set awards_try2 (in=a) FADSFUND (in=b);
by BOR_ITEM_TYPE;
if a and not b;
run;

如下所示,有 9525 个观察值而不是 681 个。如何在 SAS 代码中获得 681?

1665  data award_print;
1666  set awards_try2 (in=a) FADSFUND (in=b);
1667  by BOR_ITEM_TYPE;
1668  if a and not b;
1669  run;

NOTE: There were 9525 observations read from the data set WORK.AWARDS_TRY2.
NOTE: There were 1226 observations read from the data set WORK.FADSFUND.
NOTE: The data set WORK.AWARD_PRINT has 9525 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


1670
1671  proc sql;
1672  create table award_print_new as
1673  select *
1674  from awards_try2
1675  where BOR_ITEM_TYPE NOT IN (SELECT BOR_ITEM_TYPE from FADSFUND);
NOTE: Table WORK.AWARD_PRINT_NEW created, with 681 rows and 15 columns.

1676  run;

标签: sas

解决方案


您没有合并数据集。因为您使用了 SET 语句,所以您只是将它们交错。你可以从笔记中看出:

NOTE: There were 9525 observations read from the data set WORK.AWARDS_TRY2.
NOTE: There were 1226 observations read from the data set WORK.FADSFUND.
NOTE: The data set WORK.AWARD_PRINT has 9525 observations and 22 variables.

您只是写出从第一个数据集中读取的观察结果。

data award_print;
  merge awards_try2 (in=a) FADSFUND (in=b);
  by BOR_ITEM_TYPE;
  if a and not b;
run;

请注意,您不需要 IF 条件的第一部分。如果您只合并两个数据集,则不在 B 中的任何观察都必须来自 A。

当 B 拥有比 A 更多的 BY 变量的特定值的副本时,其他差异来源可能是。此外,如果 B 有其他非 BY 变量,它们的值可能会覆盖从 A 读取的相同变量的值。但是在这种情况下不会覆盖任何数据,因为您没有写入 B 贡献数据的观察结果。但是,如果 B 具有不在 A 中的变量,则它们将被添加到输出数据集中,但所有观察值都缺少值。


推荐阅读