sorting - SAS Proc 排序不同于 Proc SQL 顺序
问题描述
我使用 proc sort 对多个变量的数据集进行了排序,然后继续执行数据步骤,根据排序的字段分配一个计数变量。多次运行此代码给出相同的答案。如果我使用 proc sql 和 order by 而不是 proc sort 然后使用相同的 datastep 代码,我会得到不同的答案。为什么会发生这种情况?
解决方案
您的 KEY 变量具有重复项,其中第三个(或更多)变量不同。并且排序不完全相同,因为您没有指定第三个(或其他)变量。
这是如何发生的示例。
/*Creates duplicate records with variables that are different for the remaining values*/
data temp_class;
set class (obs=3);
call streaminit(25);
weight = weight + rand('normal', 0, 5);
run;
data class;
set sashelp.class temp_class;
run;
*proc sort;
proc sort data=class out=class2;
by name sex;
run;
*proc sql sort;
proc sql;
create table class3 as
select *
from class
order by name, sex;
quit;
*comparison to show difference;
proc compare data=class2 compare=class3;
run;
结果表明,这些确实会产生不同的结果,因为关键变量没有很好地定义。为避免这种情况,您还应该按变量 WEIGHT 进行排序,以确保获得所需的顺序。
The COMPARE Procedure
Comparison of WORK.CLASS2 with WORK.CLASS3
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.CLASS2 27SEP18:16:15:28 27SEP18:16:15:28 5 22
WORK.CLASS3 27SEP18:16:14:47 27SEP18:16:14:47 5 22
Variables Summary
Number of Variables in Common: 5.
Observation Summary
Observation Base Compare
First Obs 1 1
First Unequal 1 1
Last Unequal 4 4
Last Obs 22 22
Number of Observations in Common: 22.
Total Number of Observations Read from WORK.CLASS2: 22.
Total Number of Observations Read from WORK.CLASS3: 22.
Number of Observations with Some Compared Variables Unequal: 4.
Number of Observations with All Compared Variables Equal: 18.
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 4.
Number of Variables Compared with Some Observations Unequal: 1.
Total Number of Values which Compare Unequal: 4.
Maximum Difference: 5.4405.
Variables with Unequal Values
Variable Type Len Ndif MaxDif
Weight NUM 8 4 5.440
The COMPARE Procedure
Comparison of WORK.CLASS2 with WORK.CLASS3
(Method=EXACT)
Value Comparison Results for Variables
__________________________________________________________
|| Base Compare
Obs || Weight Weight Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 112.5000 117.3936 4.8936 4.3499
2 || 117.3936 112.5000 -4.8936 -4.1686
3 || 84.0000 78.5595 -5.4405 -6.4767
4 || 78.5595 84.0000 5.4405 6.9253
__________________________________________________________
这是变量列表中包含 WEIGHT 的版本,以确保排序正确。
*proc sort;
proc sort data=class out=class4;
by name sex;
run;
*proc sql sort;
proc sql;
create table class5 as
select *
from class
order by name, sex;
quit;
*comparison to show difference;
proc compare data=class4 compare=class5;
run;
结果表明在这种情况下没有区别:
The COMPARE Procedure
Comparison of WORK.CLASS4 with WORK.CLASS5
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs
WORK.CLASS4 27SEP18:16:21:13 27SEP18:16:21:13 5 22
WORK.CLASS5 27SEP18:16:21:13 27SEP18:16:21:13 5 22
Variables Summary
Number of Variables in Common: 5.
Observation Summary
Observation Base Compare
First Obs 1 1
Last Obs 22 22
Number of Observations in Common: 22.
Total Number of Observations Read from WORK.CLASS4: 22.
Total Number of Observations Read from WORK.CLASS5: 22.
Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 22.
NOTE: No unequal values were found. All values compared are exactly equal.
推荐阅读
- xorg - Xorg 未在 NixOS 上的 Radeon RX Vega 7 上运行
- python - 如何将熊猫表中的列名旋转为乳胶方法?
- python - 处理大型数据集并训练模型
- reactjs - 无法转换 Figma 设计以做出反应
- autodesk-forge - 为什么 viewer.select(dbids) 会使查看器变慢并隐藏所选项目的某些部分?
- php - Laravel如何模拟自定义验证规则的SoapClient响应
- django - 如何修复管理面板上的 Django ImageField 上传错误?
- c# - 带有消息对话框的 Uwp 条件返回导航
- reactjs - 如何处理 redux 或组件函数中的 AG-Grid API 操作?
- android-studio - 是否可以在 Windows 上构建的 android studio 项目中包含 Linux 库?