1. equally split by _n_:
%macro split(ndsn=2);
data %do i = 1 %to &ndsn.; dsn&i. %end; ;
retain x;
set orig nobs=nobs;
if _n_ eq 1
then do;
if mod(nobs,&ndsn.) eq 0
then x=int(nobs/&ndsn.);
else x=int(nobs/&ndsn.)+1;
end;
if _n_ le x then output dsn1;
%do i = 2 %to &ndsn.;
else if _n_ le (&i.*x)
then output dsn&i.;
%end;
run;
%mend split;
2. the number of dataset
a. when NOBS= option on a SET not always return the correct number
look this:
http://www2.sas.com/proceedings/sugi26/p095-26.pdf
b. regular solution
http://www.sascommunity.org/wiki/Determining_the_number_of_observations_in_a_SAS_data_set_efficiently
3. remove the duplicate records:
by SQL: http://www2.sas.com/proceedings/sugi25/25/cc/25p106.pdf
4. The pros and cons of using PROC FREQ, PROC MEANS, and PROC TABULATE to check the quality of date variables
http://support.sas.com/resources/papers/proceedings13/335-2013.pdf
5. Split Data into Subsets
best practice:
http://www.sascommunity.org/wiki/Split_Data_into_Subsets6. the lag function:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm
http://support.sas.com/resources/papers/proceedings09/055-2009.pdf
http://www.phusewiki.org/docs/2011%20Papers/CC08%20paperl.pdf
7. Set, Match, Merge … Don’t You Love SAS
http://analytics.ncsu.edu/sesug/2007/IS02.pdf
No comments:
Post a Comment