首页 > 解决方案 > SAS - How to skip certain columns in a loop using arrays

问题描述

I have a dataset that contains several numeric and character columns. Most of the character columns are simple 'yes' or 'no' questions but some are not. For the numeric columns I feel that I have a valid solution. For the character columns I have tried to minimize it but have not succeeded. What I want to do, and tried, is a loop that states that when j = column number of var4 then use the Big/Small/IU replacements. The idea is that this would make the code less “variable dependent” due to the fact that I can change the variable for var4 in the %let statement and the code would be shorter in length.

As the dim statement give me the total number of the numeric or character columns I guess that there are some statment that gives me the column placement of a specific variable that I would use for the secound loop. However, I have not found such a statement.

My code so far,

%let v1 = var1;
%let v2 = var2;
%let v3 = var3;
%let v4 = var4;

data have;
infile datalines delimiter = '|'; 
input surveyYear id var1 var2$ var3 var4$20. ;
datalines;
2016 |  1 |     10 |   Yes |     5 |   BIG
2016 |  2 |      6 |   YES |     8 |   Big
2016 |  3 |      8 |   YEs | 99999 |   big
2016 |  4 |      . |   yes |     5 | 99999
2017 |  5 |      6 |    No |     7 | SMALL
2017 |  6 |      5 |    Ye |     . | small
2017 |  7 |  99999 |    no |     3 | 99999
2018 |  8 |      3 | 99999 |     1 | SMall
2018 |  9 |      2 |    iu |     2 |    IU
2018 | 10 |     15 |    IU |     . |    Iu
;
run; 

data want;
set have;

    array var_num[*] _numeric_;
    do i=3 to dim(var_num);
        if var_num[i] = 99999 then var_num[i] = .;
    end;

    array var_cha[*] _character_;
    do j=1 to dim(var_cha);
        var_cha(j) = upcase(var_cha(j));
        if var_cha[j]  = 'YES'     then var_cha[j] = 'Yes';
        if var_cha[j]  = 'NO'      then var_cha[j] = 'No';
        if var_cha[j]  = 'IU'      then var_cha[j] = 'IU';
        if var_cha[j]  = '99999'   then var_cha[j] = 'IU';
        if var_cha[j]  = ''        then var_cha[j] = 'IU';
    end;
/* Integrate the code below into the loop*/
    if &v4 = 'BIG'     then &v4 = 'Big city';
    if &v4 = 'SMALL'   then &v4 = 'Small city';
    if &v4 = 'IU'      then &v4 = 'Unknow city size';
    if &v4 = '99999'   then &v4 = 'Unknow city size';

drop i j;
run;

The conceptual idea how i would like to program it,

data want;
set have;
array var_cha[*] _character_;
    do j=1 to dim(var_cha);
        var_cha(j) = upcase(var_cha(j));
/*      All of the if statements for the yes, no and IU */

        if var_cha[j] = ColumnNumerOfVar4 then do;
        /* where ColumnNumerOfVar4 is equal to the column number of var4*/
            if var_cha[j] =  'BIG'      then var_cha[j] = 'Big city';
            if var_cha[j] =  'SMALL'    then var_cha[j] = 'Small city';
            if var_cha[j] =  'IU'       then var_cha[j] = 'Unknow city size';
            if var_cha[j] =  '99999'    then var_cha[j] = 'Unknow city size';
        end;
    end;
run;

标签: arraysif-statementsas

解决方案


Perhaps I am not understanding exactly what you want, but it looks like you just want to make a macro variable with the list of variable names that the DO loop applies to and use that to define the array.

%let yesnovars=var4 ;
array yesno &yesnovars;

Then you DO loop looks like this:

 do j=1 to dim(yesno);
    yesno(j) = upcase(yesno(j));
    if yesno[j]  = 'YES'     then yesno[j] = 'Yes';
    if yesno[j]  = 'NO'      then yesno[j] = 'No';
    if yesno[j]  = 'IU'      then yesno[j] = 'IU';
    if yesno[j]  = '99999'   then yesno[j] = 'IU';
    if yesno[j]  = ''        then yesno[j] = 'IU';
 end;

But it really looks like you just want to apply formats to the values.
So if you had defined a $YESNO format and a $CITYSIZE format you might do something like this to convert all of the character variables from the raw to formatted values. Just remember to define them as long enough to hold the formatted values instead of just the raw values.

format var2 $yesno. var4 $citysize.;
array _c _character_;
do j=1 to dim(_c);
  _c(j)=vvalue(_c(i));
end;

Or even better define INformats and use them when reading the raw data.


推荐阅读