Send As SMS

Thursday, February 22, 2007

Counting across variables

On SAS-L, 2/22/2005, Laughing Beggar asked about counting the occurence of strings in variables:

These 3 must be easy, but I am forced to defer to the wisdom of the group. My data set has, say, ten character variables. Each variable can have a text value that starts with a letter of the aplhabet and is then followed by some numbers (eg A24.6). What I want to do is count the number of times a string starting with eg "A" appears (1) in the dataset, (2) for each record and (3) in each variable. I also need to do this for all the other letters of the alphabet. So for (2), for example, I'd like a new variable created for each letter of the alphabet that has the count of the number of times that letter appears as the leading character in the string across the ten variables. All hints gratefully accepted. SAS V8.02, WIN-XP.

Arrays are one solution to this problem.

The code can probably be made simpler, but something like this will do it:

data test;
   infile cards;
   input (var1-var5) ($);
   cards;
A1 B1 A1 Z2 C1
A1 A1 A1 A1 A1
B1 B2 A1 T2 D1
B1 B2 B3 B4 Z5
;;;;

data colcounts (keep= c1_c1-c1_c26  c2_c1-c2_c26  
                      c3_c1-c3_c26  c4_c1-c4_c26  
                      c5_c1-c5_c26)
     rowcounts (keep= r1-r26)
     dscounts  (keep=d1-d26);
   retain alpha 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';  
   drop alpha;

   set test end=end;

   array vars var1-var5;
   array letters_in_dataset{26} d1-d26;
   array letters_in_columns{5, 26} c1_c1-c1_c26  
                                   c2_c1-c2_c26
                                   c3_c1-c3_c26
                                   c4_c1-c4_c26
                                   c5_c1-c5_c26;
   array letters_in_row{26} r1-r26;

   do p = 1 to 26;
      letters_in_row{p} = 0;
   end;

   do v = 1 to 5;
      do p = 1 to 26;
         letters_in_dataset(p) + (vars{v} =: substr(alpha, p, 1));
         letters_in_row(p) + (vars{v} =: substr(alpha, p, 1));
         letters_in_columns(v, p) + (vars{v} =: substr(alpha, p, 1));
      end;
   end;

   output rowcounts;

   if end then 
      output colcounts dscounts;

run;
;

0 Comments:

Post a Comment

<< Home