Beruflich Dokumente
Kultur Dokumente
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
The following is some of my more useful SAS Tips, or other pieces of code that may or may not be directly SAS related, that I have collected over time. Some may have been seen in the Tip of the Month page, although it must be noted that only some pass from that page to this one, and others are just Using PROC tidbits that are interesting and/or useful. There is no order to these nuggets so please just look carefully SUMMARY at the index on the left for what is available as it is updated from time to time.
for Descriptive and Frequency Statistics Month SAS to CSV Does a Dataset Exist Reordering Variables Additional Codes to an Existing Format Concatenating Datasets Deleting SAS Datasets based on a Date Reading Variable Length Record Files % m a c r on u m o b s ( d s n ) ; % g l o b a ln u m ; d a t a_ n u l l _ ; i f0t h e ns e t& d s nn o b s = n o b s ; c a l ls y m p u t ( ' n u m ' , t r i m ( l e f t ( p u t ( n o b s , 8 . ) ) ) ) ; s t o p ; r u n ; % m e n dn u m o b s ;
There are a number of ways that SAS code can be written to get the number of observations in a SAS Last Date of dataset. My favorite and an oldie is:
Another adaption of this counts not only the number of observations but also the number of variables in a dataset:
% m a c r on u m o b s ( d s n ) ; % g l o b a ln u m o b sn u m v a r s ; d a t a_ n u l l _ ; s e t& d s n( o b s = 1 )N O B S = o b s c n t ; a r r a ya a{ * }$_ c h a r a c t e r _ ; a r r a yn n{ * }_ n u m e r i c _ ; v a r s = d i m ( a a ) + d i m ( n n ) ; c a l ls y m p u t ( ' n u m v a r s ' , v a r s ) ; c a l ls y m p u t ( ' n u m o b s ' , o b s c n t ) ; s t o p ; r u n ; % m e n dn u m o b s ;
Changing the Version SAS 6.12 introduced the BASE SAS programmer to some of the SAS functions that were Height of a only available in SCL. One of these was the ATTRN function and this was able to be used to get the HEADLINE in number of observations with the following code: PROC REPORT % m a c r on u m o b s ( d s n ) ; when using % g l o b a l n u m ; ODS RTF % l e td s i d = % s y s f u n c ( o p e n ( & d s n ) ) ; % i f & d s i d % t h e n % d o ; How % l e tn u m = % s y s f u n c ( a t t r n ( & d s i d , n o b s ) ) ; Quantiles are % l e t r c = % s y s f u n c ( c l o s e ( & d s i d ) ) ; Calculated % e n d ; Variance % e l s e% p u tO p e n i n gd a t a s e t& d s nf a i l e d-% s y s f u n c ( s y s m s g ( ) ) ; Calculation % m e n dn u m o b s ; Differences Getting the Comment text macro variable DSN and puts the number found in the global macro variable NUM. Which one is best of an Excel is down to personal preference but remember that the latter can only be used where you have access to Cell Getting the
www.theprogrammerscabin.com/Useful.htm 1/11
In both cases calling the macro counts the number of observations inside the dataset specified by the
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
A DOS .BAT To many, the PROC MEANS and PROC SUMMARY SAS procedures are the same. There are however two differences. File for Backing Up a File with Date The first difference is that the SUMMARY procedure does have as default to print no output to an A Stamp
Between Two Points on the The second difference is not widely known but it is a useful. When the VAR statement is missing in the Earth's Surface MEANS procedure analysis is carried out on all numeric variables, as shown in the following example Return to Homepage Other interesting pages ... SAS Cheat Sheet SAS Tip of the Month Full SAS Example Basic Statistics Contact Information
output file while the MEANS procedure does by default. The option that controls this is Calculating PRINT/NOPRINT so it is possible to print the output from the SUMMARY procedure and have no the Distance output from the MEANS procedure.
(output below):
d a t av i t a l s ; i n f i l ec a r d s ; i n p u tp a t i d$ 3 .h e a r t _ r a t e t e m p e r a t u r et r t c d$ 1 . ; c a r d s ; 0 0 17 23 5 . 8A 0 0 28 03 6 . 4B 0 0 39 93 6 . 6A ; r u n ; p r o cm e a n sd a t a = v i t a l sn w a y ; c l a s st r t c d ; r u n ; T h eS A SS y s t e m T h eM E A N SP r o c e d u r e N t r t c d O b s V a r i a b l e N M e a n S t dD e v M i n i m u m M a x i m u m A 2 h e a r t _ r a t e 2 8 5 . 5 0 0 0 0 0 0 1 9 . 0 9 1 8 8 3 1 7 2 . 0 0 0 0 0 0 0 9 9 . 0 0 0 0 0 0 0 t e m p e r a t u r e 2 3 6 . 2 0 0 0 0 0 0 0 . 5 6 5 6 8 5 4 3 5 . 8 0 0 0 0 0 0 3 6 . 6 0 0 0 0 0 0 B 1 h e a r t _ r a t e 1 8 0 . 0 0 0 0 0 0 0 . 8 0 . 0 0 0 0 0 0 0 8 0 . 0 0 0 0 0 0 0 t e m p e r a t u r e 1 3 6 . 4 0 0 0 0 0 0 . 3 6 . 4 0 0 0 0 0 0 3 6 . 4 0 0 0 0 0 0 -
However, what happens when the SUMMARY procedure is used instead on the same data?
p r o cs u m m a r yd a t a = v i t a l sn w a yp r i n t ; c l a s st r t c d ; r u n ; T h eS A SS y s t e m T h eS U M M A R YP r o c e d u r e N t r t c d O b s A 2 B 1 -
Notice that the only result that came out from the SUMMARY procedure was the number of
www.theprogrammerscabin.com/Useful.htm 2/11
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
observations from each treatment group, similar to what the FREQ procedure will produce. The same result will prevail if the only variables in the VITALS dataset were PATID and TRTCD, that is only the character variables. So what does this finding mean? In most cases SAS programmers use the FREQ procedure for frequency counts and the MEANS procedure for summary statistics however these two sets of statistics most commonly carried out can be done by one procedure. A possible macro for calculating both from one procedure is given below:
% m a c r os u m m s t a t ( d s i n = , / * I n p u tf i l eR E Q U I R E D * / d s o u t = , / * R e s u l t sf i l s tR E Q U I R E D * / c l a s s v a r = , / * C l a s sV a r i a b l e ( s )R E Q U I R E D * / v v a r s = , / * A n a l y s i sV a r i a b l e s ,o n l yi f d e s c r i p t i v es t a t i s t i c s r e q u e s t e d .C a nu s eo n eo r n u m e r i cv a r i a b l en a m e so r _ N U M E R I C _f o ra l ln u m e r i c v a r i a b l e s . A U T O N A M Eo p t i o nw i l l s e tv a r i a b l en a m e si no u t p u t f i l e * / o u t p r t = N O P R I N T / * O u t p u tt ol i s t i n g sf i l e . V a l u e s :P R I N T | N O P R I N T * / ) ; p r o cs u m m a r yd a t a = & d s i nn w a y& o u t p r t ; c l a s s& c l a s s v a r ; % i f( & v v a r sn e)% t h e n% d o ; v a r& v v a r s ; o u t p u to u t = & d s o u t( d r o p = _ t y p e __ f r e q _ ) n =m e a n =s t d =m e d i a n =m i n =m a x = / a u t o n a m e ; % e n d ; % e l s e% d o ; o u t p u to u t = & d s o u t ( d r o p = _ t y p e _ r e n a m e = ( _ f r e q _ = N ) ) ; % e n d ; r u n ; % m e n ds u m m s t a t ;
Note the use of the INTNX function which is useful for time interval calculations.
www.theprogrammerscabin.com/Useful.htm
3/11
2/1/14
issue. One of the usual ways this is done is converting the data into a common format that both applications can read and write - CSV is one such established format. There are a number of ways that SAS will create the CSV file. The first is using the PROC EXPORT procedure and using the CSV as the filename extension, as the following example shows:
P R O CE X P O R TD A T A = s a s h e l p . c l a s s O U T F I L E = " c : \ t a u i \ c l a s s . c s v " ; R U N ;
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
Before the EXPORT procedure came into SAS, the data step was used to generate the CSV file, as the following example shows:
D A T A_ N U L L _ ; F I L E" c : \ t u a i \ c l a s s . c s v " ; S E Ts a s h e l p . c l a s s ; P U T( _ a l l _ )( ' , ' ) ; R U N ;
Also available is the DEXPORT command that is run inside the command line of the SAS Desktop, which is shown in the following example:
D E X P O R Ts a s h e l p . c l a s s" c : \ t a u i \ c l a s s . c s v "
The last method that will be looked at here is using a form of SAS ODS that was introduced in SAS 8.2 - the best way to show it is using the following example:
O D SC S VF I L E = ' c : \ t a u i \ c l a s s . c s v ' ; P R O CP R I N TD A T A=s a s h e l p . c l a s sN O O B S ; R U N ; O D SC S VC L O S E ;
With this usage it is possible to use the standard VAR and WHERE statements to limit the observations and variables being passed to the CSV file. CSV files can be easily read into Excel or most other spreadsheet or database programs and is a useful format when transferring data from SAS to another application.
where the variable EXIST will contain a value of 1 if the dataset is present or 0 if not.
Reordering Variables
Volumes have been written in the past on how to reorder variables in a SAS Dataset. There is the LENGTH, ATTRIB and retain statements inside a SAS dataset but I have found in the past that the best way is the use of the SQL procedure. Lets say you have a SAS dataset DEMOG with the variables AGE, GENDER, SUBJECTID, HEIGHT and WEIGHT, and you want the variable SUBJECTID first (the placement of the other variables is okay), the following code is useful:
P R O CS Q L ;
www.theprogrammerscabin.com/Useful.htm 4/11
2/1/14
P R O CS Q L ; C R E A T ET A B L Ed e m o gA S S E L E C Ts u b j e c t i d ,* F R O Md e m o g ; Q U I T ; R U N ;
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
The resulting dataset DEMOG will have the variables in the order of SUBJECTID, AGE, GENDER, HEIGHT and WEIGHT.
A better way though, avoiding transcription problems, is to make the new YNX format by nesting the old format YN into the new as shown in the following code:
p r o cf o r m a t ; v a l u ey n x 3 = " D O N ' TK N O W " o t h e r = [ y n . ] ; r u n ;
Note that you must enclose the existing format name in square brackets, as shown above, or with parentheses and vertical bars, for example (|yn.|).
Concatenating Datasets
Concatenating datasets can be one of the most tricky activities in SAS as you have to know the structure and content of the dataset before this task is done. However, one trick that is very useful is to use SQL to do the concatenation therefore avoiding a number of the problems associated with similar tasks using either the DATA step or APPEND procedure. The following example shows the SQL code used:
P R O CS Q L ; C R E A T ET A B L Eo u t aA S S E L E C T* ,' i n 1 'A Sd s e t F R O Mi n 1 O U T E RU N I O NC O R R S E L E C T* ,' i n 2 'A Sd s e t F R O Mi n 2 Q U I T ; R U N ;
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
Occasionally it is necessary to cleanup old datasets in a directory based on a date. The following SAS code is an example where datasets in the directory referenced by the LIBNAME OLDDATA that are older than 15MAR2007 are deleted:
% l e td s l i s t = % s t r ( ) ; p r o cs q l ; s e l e c tm e m n a m ei n t o: d s l i s ts e p a r a t e db y'' f r o ms a s h e l p . v t a b l e w h e r el i b n a m e = ' O L D D A T A 'a n dd a t e p a r t ( c r d a t e )<" 1 5 M A R 2 0 0 7 " d ; q u i t ; r u n ; p r o cd a t a s e t s l i b r a r y = O L D D A T An o l i s tn o d e t a i l s ; d e l e t e& d s l i s t ; q u i t ; r u n ;
This code can be modified to any datasets in a specified directory as well as and date and/or time that the user may choose.
Changing the Height of a HEADLINE in PROC REPORT when using ODS RTF
Sometimes when creating a report using ODS RTF in SASv8.2, the line under the columns using the HEADLINE option in the REPORT procedure is too thin to be displayed properly on a computer screen but yet can be seen on a printout. The simple reason is that most computer screens use a display width of 72 cells per inch while the basic of printers have the capability of 300 or more cells per inch. To alter the height of the headline line some RTF code has to be sent to the RTF file being created. The following illustrates the usage:
% l e th d r b r d r = % s t r ( s t y l e ( h e a d e r ) = [ p r e t e x t = " \ b r d r b \ b r d r s \ b r d r w 3 0 " ] ) ; % l e th d r o p t= % s t r ( s t y l e ( h e a d e rc o l u m n ) = [ p r o t e c t s p e c i a l c h a r s = o f f ] ) ; p r o cr e p o r td a t a = p o p 1n o w i n d o w sh e a d l i n e& h d r o p ts p l i t = ' \ 'm i s s i n g ; c o l u m n ss t a t ep o pl a n d a r e ap s q m i l e ; d e f i n es t a t e / g r o u p o r d e r = i n t e r n a l s t y l e = [ c e l l w i d t h = 8 0j u s t = l e f t ] & h d r b r d r ' S t a t e ' ; d e f i n ep o p / d i s p l a y s t y l e = [ c e l l w i d t h = 3 0j u s t = c e n t e r ] & h d r b r d r ' P o p u l a t i o n \ ( 2 0 0 3 ) '
www.theprogrammerscabin.com/Useful.htm
6/11
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
In the example above it is the HRDBRDR macro variable that controls the width of the line, specifically the number at the end of the definition. Note that in this example the width is set to '30' twips but can be altered to any value.
Why the difference? There actually is no standard for the calculation of percentile and it does depend on what a statistician is looking for. SAS has six methods of calculating the percentile, the two common ones being:
Method 5: y = (xj - xj+1 )/2 if g =0 or y = xj+1 if g >0, where n*p=j+g Method 4: y = (1-g)*xj + g*xj+1 , where (n+1)*p=j+g and xn+1 is taken to be xn
Excel, S-Plus and StarOffice Calc by comparison uses a different method, specifically:
y = (1-g)*xj+1 +g*xj+2 where (n-1)*p=j+g, and both xn+1 and xn+2 is taken to be xn
Among the major statistical software packages only Excel, S-Plus and StarOffice Calc use this method. For those using Minitab or SPSS they use the SAS Method 4 for their calculation. The moral of this example is that you should know how your software calculates a statistic before blindly reporting the result.
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
(b) The first equation (a) requires two passes of the data - the first pass to calculate the mean and the second to calculate the variance. This second equation (b) is more commonly known as the Desktop Calculator Formula as it is possible to calculate the variance with one pass of the data. There is one issue with this formula as if the numbers are big and the differences between the numbers are small an incorrect result will prevail. This is because computers and calculators store results as real numbers and some precision may be lost when storing the sum of the square values across a long list of numbers. As an example calculate the variance of 10,000,001, 10,000,003 and 10,000,005 - the answer is 4 but some calculators may produce an answer of 0. Now what about SAS and Excel, what do they do. SAS uses the Traditional Formula. Microsoft used the Desktop Calculator Formula until Excel 2003 (for Windows) and Excel (for Mac) when it changed to use the Traditional Formula1. The Desktop Calculator Formula appears to have been used by Microsoft as it was faster than doing the Traditional Formula, the latter requiring two passes of the data. Is there any calculations that will produce a result with one pass of data? The answer is yes. One formula is known as the Method of Provisional Means where
and
There are other methods that calculate the variance in one pass of the data that are found in some statistical textbooks. The moral of the story is to always check with your software documentation to see how statistics are being calculated. -------------1 Reference Microsoft Knowledgebase, article 828888, dated March 10, 2005
www.theprogrammerscabin.com/Useful.htm
8/11
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
then
I n s e r t>M o d u l e
The function is now ready for use within Excel and can be accessed from the User Defined function list.
then
I n s e r t>M o d u l e
The function is now ready for use within Excel and can be accessed from the User Defined function list. The codes returned from the function will show the background color of the cell. Microsoft unfortunately did not keep codes consistent across different releases of Excel. The following list is a general guide that is useful when referring to color codes for Excel 97: -4142 = No Color 1 = Black 2 = White 3 = Red 5 = Blue
www.theprogrammerscabin.com/Useful.htm 9/11
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
The usage for this file, where the code is stored in the file BKUP.BAT, is BKUP filename An example of its usage is the file DEMO.SAS where after running the command BKUP DEMO.SAS the file would appear in the BKUP directory with the name DEMO-01-13-2004-1238.SAS where the "01-13-2004" is the date in local format from the DOS country setting and the "1238" is the time. This code will only work when your operating system is Windows NT, Windows 2000 or Windows XP - it will not work in any other environment. Note that if the directory BKUP is not present then it will be created.
2/1/14
David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code
nautical miles to standard miles multiply the result by 1.150779. Note that if your calculator returns the ARCOS result as radians you will have to convert the radians to degrees before multiplying by 60, i.e. where degrees = (radians/PI)*180, where PI is approximately 3.141592654.
Updated January 11, 2011
www.theprogrammerscabin.com/Useful.htm
11/11