Monday, April 28, 2014

Manipulate dataset: %split

Split data set :
1. equally split by _n_:
%macro split(ndsn=2);
 data %do i = 1 %to &ndsn.; dsn&i. %end; ;
 retain x;
 set orig nobs=nobs;
 if _n_ eq 1
 then do;
 if mod(nobs,&ndsn.) eq 0
 then x=int(nobs/&ndsn.);
 else x=int(nobs/&ndsn.)+1;
 end;
 if _n_ le x then output dsn1;
 %do i = 2 %to &ndsn.;
 else if _n_ le (&i.*x)
 then output dsn&i.;
 %end;
 run;
%mend split;

http://www2.sas.com/proceedings/sugi27/p083-27.pdf


2. the number of dataset
a. when NOBS= option on a SET not always return the correct number
look this:
http://www2.sas.com/proceedings/sugi26/p095-26.pdf

b. regular solution
http://www.sascommunity.org/wiki/Determining_the_number_of_observations_in_a_SAS_data_set_efficiently

3. remove the duplicate records:
by SQL: http://www2.sas.com/proceedings/sugi25/25/cc/25p106.pdf

4. The pros and cons of using PROC FREQ, PROC MEANS, and PROC TABULATE to check the quality of date variables
http://support.sas.com/resources/papers/proceedings13/335-2013.pdf

5. Split Data into Subsets
best practice:
http://www.sascommunity.org/wiki/Split_Data_into_Subsets

6.  the lag function:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm
http://support.sas.com/resources/papers/proceedings09/055-2009.pdf
http://www.phusewiki.org/docs/2011%20Papers/CC08%20paperl.pdf

7. Set, Match, Merge … Don’t You Love SAS
http://analytics.ncsu.edu/sesug/2007/IS02.pdf


No comments:

Post a Comment