首页 > 解决方案 > SAS Proc 排序不同于 Proc SQL 顺序

问题描述

我使用 proc sort 对多个变量的数据集进行了排序,然后继续执行数据步骤,根据排序的字段分配一个计数变量。多次运行此代码给出相同的答案。如果我使用 proc sql 和 order by 而不是 proc sort 然后使用相同的 datastep 代码,我会得到不同的答案。为什么会发生这种情况?

标签: sortingsas

解决方案


您的 KEY 变量具有重复项,其中第三个(或更多)变量不同。并且排序不完全相同,因为您没有指定第三个(或其他)变量。

这是如何发生的示例。

/*Creates duplicate records with variables that are different for the remaining values*/

data temp_class;
set class (obs=3);
call streaminit(25);
weight = weight + rand('normal', 0, 5);
run;

data class;
set sashelp.class temp_class;
run;


*proc sort;
proc sort data=class out=class2;
by name sex;
run;

*proc sql sort;
proc sql;
create table class3 as
select *
from class
order by name, sex;
quit;

*comparison to show difference;
proc compare data=class2 compare=class3;
run;

结果表明,这些确实会产生不同的结果,因为关键变量没有很好地定义。为避免这种情况,您还应该按变量 WEIGHT 进行排序,以确保获得所需的顺序。

              The COMPARE Procedure                                                        
                                         Comparison of WORK.CLASS2 with    WORK.CLASS3                                             
                                                       (Method=EXACT)                                                           

                                                     Data Set Summary                                                           

                               Dataset               Created          Modified  NVar    NObs                                    

                               WORK.CLASS2  27SEP18:16:15:28  27SEP18:16:15:28     5      22                                    
                               WORK.CLASS3  27SEP18:16:14:47  27SEP18:16:14:47     5      22                                    


                                                     Variables Summary                                                          

                                           Number of Variables in Common: 5.                                                    


                                                    Observation Summary                                                         

                                               Observation      Base  Compare                                                   

                                               First Obs           1        1                                                   
                                               First Unequal       1        1                                                   
                                               Last  Unequal       4        4                                                   
                                               Last  Obs          22       22                                                   

                              Number of Observations in Common: 22.                                                             
                              Total Number of Observations Read from WORK.CLASS2: 22.                                           
                              Total Number of Observations Read from WORK.CLASS3: 22.                                           

                              Number of Observations with Some Compared Variables Unequal: 4.                                   
                              Number of Observations with All Compared Variables Equal: 18.                                     


                                                 Values Comparison Summary                                                      

                              Number of Variables Compared with All Observations Equal: 4.                                      
                              Number of Variables Compared with Some Observations Unequal: 1.                                   
                              Total Number of Values which Compare Unequal: 4.                                                  
                              Maximum Difference: 5.4405.                                                                       


                                               Variables with Unequal Values                                                    

                                            Variable  Type  Len  Ndif   MaxDif                                                  

                                            Weight    NUM     8     4    5.440                                                  

                                                   The COMPARE Procedure                                                        
                                         Comparison of WORK.CLASS2 with WORK.CLASS3                                             
                                                       (Method=EXACT)                                                           

                                           Value Comparison Results for Variables                                               

                                 __________________________________________________________                                     
                                            ||       Base    Compare                                                            
                                        Obs ||     Weight     Weight      Diff.     % Diff                                      
                                  ________  ||  _________  _________  _________  _________                                      
                                            ||                                                                                  
                                         1  ||   112.5000   117.3936     4.8936     4.3499                                      
                                         2  ||   117.3936   112.5000    -4.8936    -4.1686                                      
                                         3  ||    84.0000    78.5595    -5.4405    -6.4767                                      
                                         4  ||    78.5595    84.0000     5.4405     6.9253                                      
                                 __________________________________________________________                                 

这是变量列表中包含 WEIGHT 的版本,以确保排序正确。

*proc sort;
proc sort data=class out=class4;
by name sex;
run;

*proc sql sort;
proc sql;
create table class5 as
select *
from class
order by name, sex;
quit;

*comparison to show difference;
proc compare data=class4 compare=class5;
run;

结果表明在这种情况下没有区别:

  The COMPARE Procedure                                                        
                                         Comparison of WORK.CLASS4 with WORK.CLASS5                                             
                                                       (Method=EXACT)                                                           

                                                     Data Set Summary                                                           

                               Dataset               Created          Modified  NVar    NObs                                    

                               WORK.CLASS4  27SEP18:16:21:13  27SEP18:16:21:13     5      22                                    
                               WORK.CLASS5  27SEP18:16:21:13  27SEP18:16:21:13     5      22                                    


                                                     Variables Summary                                                          

                                           Number of Variables in Common: 5.                                                    


                                                    Observation Summary                                                         

                                               Observation      Base  Compare                                                   

                                               First Obs           1        1                                                   
                                               Last  Obs          22       22                                                   

                              Number of Observations in Common: 22.                                                             
                              Total Number of Observations Read from WORK.CLASS4: 22.                                           
                              Total Number of Observations Read from WORK.CLASS5: 22.                                           

                              Number of Observations with Some Compared Variables Unequal: 0.                                   
                              Number of Observations with All Compared Variables Equal: 22.                                     

                              NOTE: No unequal values were found. All values compared are exactly equal.     

推荐阅读