Sie sind auf Seite 1von 11

2/1/14

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

The following is some of my more useful SAS Tips, or other pieces of code that may or may not be directly SAS related, that I have collected over time. Some may have been seen in the Tip of the Month page, although it must be noted that only some pass from that page to this one, and others are just Using PROC tidbits that are interesting and/or useful. There is no order to these nuggets so please just look carefully SUMMARY at the index on the left for what is available as it is updated from time to time.
for Descriptive and Frequency Statistics Month SAS to CSV Does a Dataset Exist Reordering Variables Additional Codes to an Existing Format Concatenating Datasets Deleting SAS Datasets based on a Date Reading Variable Length Record Files % m a c r on u m o b s ( d s n ) ; % g l o b a ln u m ; d a t a_ n u l l _ ; i f0t h e ns e t& d s nn o b s = n o b s ; c a l ls y m p u t ( ' n u m ' , t r i m ( l e f t ( p u t ( n o b s , 8 . ) ) ) ) ; s t o p ; r u n ; % m e n dn u m o b s ;

Number of Obs in a Dataset

Counting the Number of Observations in a Dataset

There are a number of ways that SAS code can be written to get the number of observations in a SAS Last Date of dataset. My favorite and an oldie is:

Another adaption of this counts not only the number of observations but also the number of variables in a dataset:
% m a c r on u m o b s ( d s n ) ; % g l o b a ln u m o b sn u m v a r s ; d a t a_ n u l l _ ; s e t& d s n( o b s = 1 )N O B S = o b s c n t ; a r r a ya a{ * }$_ c h a r a c t e r _ ; a r r a yn n{ * }_ n u m e r i c _ ; v a r s = d i m ( a a ) + d i m ( n n ) ; c a l ls y m p u t ( ' n u m v a r s ' , v a r s ) ; c a l ls y m p u t ( ' n u m o b s ' , o b s c n t ) ; s t o p ; r u n ; % m e n dn u m o b s ;

Changing the Version SAS 6.12 introduced the BASE SAS programmer to some of the SAS functions that were Height of a only available in SCL. One of these was the ATTRN function and this was able to be used to get the HEADLINE in number of observations with the following code: PROC REPORT % m a c r on u m o b s ( d s n ) ; when using % g l o b a l n u m ; ODS RTF % l e td s i d = % s y s f u n c ( o p e n ( & d s n ) ) ; % i f & d s i d % t h e n % d o ; How % l e tn u m = % s y s f u n c ( a t t r n ( & d s i d , n o b s ) ) ; Quantiles are % l e t r c = % s y s f u n c ( c l o s e ( & d s i d ) ) ; Calculated % e n d ; Variance % e l s e% p u tO p e n i n gd a t a s e t& d s nf a i l e d-% s y s f u n c ( s y s m s g ( ) ) ; Calculation % m e n dn u m o b s ; Differences Getting the Comment text macro variable DSN and puts the number found in the global macro variable NUM. Which one is best of an Excel is down to personal preference but remember that the latter can only be used where you have access to Cell Getting the
www.theprogrammerscabin.com/Useful.htm 1/11

In both cases calling the macro counts the number of observations inside the dataset specified by the

SAS version 6.12 or above.

2/1/14

Getting the Background Color of a Cell Using in Excel

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

PROC SUMMARY for Descriptive and Frequency Statistics

A DOS .BAT To many, the PROC MEANS and PROC SUMMARY SAS procedures are the same. There are however two differences. File for Backing Up a File with Date The first difference is that the SUMMARY procedure does have as default to print no output to an A Stamp

Between Two Points on the The second difference is not widely known but it is a useful. When the VAR statement is missing in the Earth's Surface MEANS procedure analysis is carried out on all numeric variables, as shown in the following example Return to Homepage Other interesting pages ... SAS Cheat Sheet SAS Tip of the Month Full SAS Example Basic Statistics Contact Information

output file while the MEANS procedure does by default. The option that controls this is Calculating PRINT/NOPRINT so it is possible to print the output from the SUMMARY procedure and have no the Distance output from the MEANS procedure.

(output below):

d a t av i t a l s ; i n f i l ec a r d s ; i n p u tp a t i d$ 3 .h e a r t _ r a t e t e m p e r a t u r et r t c d$ 1 . ; c a r d s ; 0 0 17 23 5 . 8A 0 0 28 03 6 . 4B 0 0 39 93 6 . 6A ; r u n ; p r o cm e a n sd a t a = v i t a l sn w a y ; c l a s st r t c d ; r u n ; T h eS A SS y s t e m T h eM E A N SP r o c e d u r e N t r t c d O b s V a r i a b l e N M e a n S t dD e v M i n i m u m M a x i m u m A 2 h e a r t _ r a t e 2 8 5 . 5 0 0 0 0 0 0 1 9 . 0 9 1 8 8 3 1 7 2 . 0 0 0 0 0 0 0 9 9 . 0 0 0 0 0 0 0 t e m p e r a t u r e 2 3 6 . 2 0 0 0 0 0 0 0 . 5 6 5 6 8 5 4 3 5 . 8 0 0 0 0 0 0 3 6 . 6 0 0 0 0 0 0 B 1 h e a r t _ r a t e 1 8 0 . 0 0 0 0 0 0 0 . 8 0 . 0 0 0 0 0 0 0 8 0 . 0 0 0 0 0 0 0 t e m p e r a t u r e 1 3 6 . 4 0 0 0 0 0 0 . 3 6 . 4 0 0 0 0 0 0 3 6 . 4 0 0 0 0 0 0 -

However, what happens when the SUMMARY procedure is used instead on the same data?
p r o cs u m m a r yd a t a = v i t a l sn w a yp r i n t ; c l a s st r t c d ; r u n ; T h eS A SS y s t e m T h eS U M M A R YP r o c e d u r e N t r t c d O b s A 2 B 1 -

Notice that the only result that came out from the SUMMARY procedure was the number of
www.theprogrammerscabin.com/Useful.htm 2/11

2/1/14

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

observations from each treatment group, similar to what the FREQ procedure will produce. The same result will prevail if the only variables in the VITALS dataset were PATID and TRTCD, that is only the character variables. So what does this finding mean? In most cases SAS programmers use the FREQ procedure for frequency counts and the MEANS procedure for summary statistics however these two sets of statistics most commonly carried out can be done by one procedure. A possible macro for calculating both from one procedure is given below:
% m a c r os u m m s t a t ( d s i n = , / * I n p u tf i l eR E Q U I R E D * / d s o u t = , / * R e s u l t sf i l s tR E Q U I R E D * / c l a s s v a r = , / * C l a s sV a r i a b l e ( s )R E Q U I R E D * / v v a r s = , / * A n a l y s i sV a r i a b l e s ,o n l yi f d e s c r i p t i v es t a t i s t i c s r e q u e s t e d .C a nu s eo n eo r n u m e r i cv a r i a b l en a m e so r _ N U M E R I C _f o ra l ln u m e r i c v a r i a b l e s . A U T O N A M Eo p t i o nw i l l s e tv a r i a b l en a m e si no u t p u t f i l e * / o u t p r t = N O P R I N T / * O u t p u tt ol i s t i n g sf i l e . V a l u e s :P R I N T | N O P R I N T * / ) ; p r o cs u m m a r yd a t a = & d s i nn w a y& o u t p r t ; c l a s s& c l a s s v a r ; % i f( & v v a r sn e)% t h e n% d o ; v a r& v v a r s ; o u t p u to u t = & d s o u t( d r o p = _ t y p e __ f r e q _ ) n =m e a n =s t d =m e d i a n =m i n =m a x = / a u t o n a m e ; % e n d ; % e l s e% d o ; o u t p u to u t = & d s o u t ( d r o p = _ t y p e _ r e n a m e = ( _ f r e q _ = N ) ) ; % e n d ; r u n ; % m e n ds u m m s t a t ;

Finding the Last date of the Month


Sometimes it is necessary to find the last date of a month. Remembering that the last date for a month is not the same across all months of the year the following code will help:
L a s t _ D a y _ o f _ M o n t h = I N T N X ( ' M O N T H ' , S A S _ D a t e , 0 , ' E N D ' ) ;

Note the use of the INTNX function which is useful for time interval calculations.

Creating a CSV file of a SAS Dataset


Passing a SAS dataset to Excel? Transferring data from one application to another is always a tricky issue. One of the usual ways this is done is converting the data into a common format that both

www.theprogrammerscabin.com/Useful.htm

3/11

2/1/14

issue. One of the usual ways this is done is converting the data into a common format that both applications can read and write - CSV is one such established format. There are a number of ways that SAS will create the CSV file. The first is using the PROC EXPORT procedure and using the CSV as the filename extension, as the following example shows:
P R O CE X P O R TD A T A = s a s h e l p . c l a s s O U T F I L E = " c : \ t a u i \ c l a s s . c s v " ; R U N ;

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

Before the EXPORT procedure came into SAS, the data step was used to generate the CSV file, as the following example shows:
D A T A_ N U L L _ ; F I L E" c : \ t u a i \ c l a s s . c s v " ; S E Ts a s h e l p . c l a s s ; P U T( _ a l l _ )( ' , ' ) ; R U N ;

Also available is the DEXPORT command that is run inside the command line of the SAS Desktop, which is shown in the following example:
D E X P O R Ts a s h e l p . c l a s s" c : \ t a u i \ c l a s s . c s v "

The last method that will be looked at here is using a form of SAS ODS that was introduced in SAS 8.2 - the best way to show it is using the following example:
O D SC S VF I L E = ' c : \ t a u i \ c l a s s . c s v ' ; P R O CP R I N TD A T A=s a s h e l p . c l a s sN O O B S ; R U N ; O D SC S VC L O S E ;

With this usage it is possible to use the standard VAR and WHERE statements to limit the observations and variables being passed to the CSV file. CSV files can be easily read into Excel or most other spreadsheet or database programs and is a useful format when transferring data from SAS to another application.

Checking a SAS Dataset Exists


Want to check if a particular SAS dataset exists? If you have SAS version 6.12 or above the following code fragment will be useful:
e x i s t=% S Y S F U N C ( E X I S T ( d a t a s e t _ n a m e ) ) ;

where the variable EXIST will contain a value of 1 if the dataset is present or 0 if not.

Reordering Variables
Volumes have been written in the past on how to reorder variables in a SAS Dataset. There is the LENGTH, ATTRIB and retain statements inside a SAS dataset but I have found in the past that the best way is the use of the SQL procedure. Lets say you have a SAS dataset DEMOG with the variables AGE, GENDER, SUBJECTID, HEIGHT and WEIGHT, and you want the variable SUBJECTID first (the placement of the other variables is okay), the following code is useful:
P R O CS Q L ;
www.theprogrammerscabin.com/Useful.htm 4/11

2/1/14

P R O CS Q L ; C R E A T ET A B L Ed e m o gA S S E L E C Ts u b j e c t i d ,* F R O Md e m o g ; Q U I T ; R U N ;

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

The resulting dataset DEMOG will have the variables in the order of SUBJECTID, AGE, GENDER, HEIGHT and WEIGHT.

Additional Codes to an Existing Format


Your program has a variable that is coded as 1=YES and 2=NO with an associated format called YN. Now there is a new value of 3=DON'T KNOW in your data but the format YN has not been changed. In your reporting program you can create another format, in this example YNX, using the following code:
p r o cf o r m a t ; v a l u ey n x 1 = ' Y E S ' 2 = ' N O ' 3 = " D O N ' TK N O W " ; r u n ;

A better way though, avoiding transcription problems, is to make the new YNX format by nesting the old format YN into the new as shown in the following code:
p r o cf o r m a t ; v a l u ey n x 3 = " D O N ' TK N O W " o t h e r = [ y n . ] ; r u n ;

Note that you must enclose the existing format name in square brackets, as shown above, or with parentheses and vertical bars, for example (|yn.|).

Concatenating Datasets
Concatenating datasets can be one of the most tricky activities in SAS as you have to know the structure and content of the dataset before this task is done. However, one trick that is very useful is to use SQL to do the concatenation therefore avoiding a number of the problems associated with similar tasks using either the DATA step or APPEND procedure. The following example shows the SQL code used:
P R O CS Q L ; C R E A T ET A B L Eo u t aA S S E L E C T* ,' i n 1 'A Sd s e t F R O Mi n 1 O U T E RU N I O NC O R R S E L E C T* ,' i n 2 'A Sd s e t F R O Mi n 2 Q U I T ; R U N ;

Deleting SAS Datasets based on a Date


www.theprogrammerscabin.com/Useful.htm 5/11

2/1/14

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

Occasionally it is necessary to cleanup old datasets in a directory based on a date. The following SAS code is an example where datasets in the directory referenced by the LIBNAME OLDDATA that are older than 15MAR2007 are deleted:
% l e td s l i s t = % s t r ( ) ; p r o cs q l ; s e l e c tm e m n a m ei n t o: d s l i s ts e p a r a t e db y'' f r o ms a s h e l p . v t a b l e w h e r el i b n a m e = ' O L D D A T A 'a n dd a t e p a r t ( c r d a t e )<" 1 5 M A R 2 0 0 7 " d ; q u i t ; r u n ; p r o cd a t a s e t s l i b r a r y = O L D D A T An o l i s tn o d e t a i l s ; d e l e t e& d s l i s t ; q u i t ; r u n ;

This code can be modified to any datasets in a specified directory as well as and date and/or time that the user may choose.

Reading Variable Length Record Files


Ever try to read in a file with variable length records? The following example may be useful and looks at the file INTEXT.TXT which as columns 1-10 as a FLAG for the record and columns 11 onwards as input text:
D A T Aa ; I N F I L E' i n t e x t . t x t 'L E N G T H = l e n ; I N P U Tf l a g1 1 0@ ; v a r l e n = l e n 1 0 ; I N P U T@ 1 1t e x t$ V A R Y I N G 2 0 0 .v a r l e n ; R U N ;

Changing the Height of a HEADLINE in PROC REPORT when using ODS RTF
Sometimes when creating a report using ODS RTF in SASv8.2, the line under the columns using the HEADLINE option in the REPORT procedure is too thin to be displayed properly on a computer screen but yet can be seen on a printout. The simple reason is that most computer screens use a display width of 72 cells per inch while the basic of printers have the capability of 300 or more cells per inch. To alter the height of the headline line some RTF code has to be sent to the RTF file being created. The following illustrates the usage:
% l e th d r b r d r = % s t r ( s t y l e ( h e a d e r ) = [ p r e t e x t = " \ b r d r b \ b r d r s \ b r d r w 3 0 " ] ) ; % l e th d r o p t= % s t r ( s t y l e ( h e a d e rc o l u m n ) = [ p r o t e c t s p e c i a l c h a r s = o f f ] ) ; p r o cr e p o r td a t a = p o p 1n o w i n d o w sh e a d l i n e& h d r o p ts p l i t = ' \ 'm i s s i n g ; c o l u m n ss t a t ep o pl a n d a r e ap s q m i l e ; d e f i n es t a t e / g r o u p o r d e r = i n t e r n a l s t y l e = [ c e l l w i d t h = 8 0j u s t = l e f t ] & h d r b r d r ' S t a t e ' ; d e f i n ep o p / d i s p l a y s t y l e = [ c e l l w i d t h = 3 0j u s t = c e n t e r ] & h d r b r d r ' P o p u l a t i o n \ ( 2 0 0 3 ) '

www.theprogrammerscabin.com/Useful.htm

6/11

2/1/14

' P o p u l a t i o n \ ( 2 0 0 3 ) ' f o r m a t = c o m m a 1 1 . ; d e f i n el a n d a r e a/ d i s p l a y s t y l e = [ c e l l w i d t h = 3 0j u s t = c e n t e r ] & h d r b r d r ' L a n dA r e a \ ( s q .m i l e s ) ' f o r m a t = c o m m a 1 1 . ; d e f i n ep s q m i l e / d i s p l a y s t y l e = [ c e l l w i d t h = 3 0j u s t = c e n t e r ] & h d r b r d r' P e o p l ep e rs q u a r em i l e ' f o r m a t = c o m m a 1 1 . 1 ; q u i t ; r u n ;

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

In the example above it is the HRDBRDR macro variable that controls the width of the line, specifically the number at the end of the definition. Note that in this example the width is set to '30' twips but can be altered to any value.

How Quantiles are Calculated


The way a statistic is calculated may be more important than the result it produces. Recently an example showed up when some statistics were being checked using Excel on results produced with SAS, specifically with the calculation of a first and third quartile, also known as the 25th and 75th percentile. For the data 1, 2, 3, 4, 5, 6, 7 and 8 the following results are calculated for the 25th percentile:
SAS Method 5 (default) = 2.5 SAS Method 4 = 2.25 Excel = 2.75

Why the difference? There actually is no standard for the calculation of percentile and it does depend on what a statistician is looking for. SAS has six methods of calculating the percentile, the two common ones being:
Method 5: y = (xj - xj+1 )/2 if g =0 or y = xj+1 if g >0, where n*p=j+g Method 4: y = (1-g)*xj + g*xj+1 , where (n+1)*p=j+g and xn+1 is taken to be xn

Excel, S-Plus and StarOffice Calc by comparison uses a different method, specifically:
y = (1-g)*xj+1 +g*xj+2 where (n-1)*p=j+g, and both xn+1 and xn+2 is taken to be xn

Among the major statistical software packages only Excel, S-Plus and StarOffice Calc use this method. For those using Minitab or SPSS they use the SAS Method 4 for their calculation. The moral of this example is that you should know how your software calculates a statistic before blindly reporting the result.

Variance Calculation Differences


I received an email in early 2005 from a SAS programmer asking me if I could help explain why he was getting differences when comparing the variance results between SAS and Excel using the same set of numbers. Traditionally the variance is calculated as
www.theprogrammerscabin.com/Useful.htm 7/11

2/1/14

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

(a) Alternately the equation can be written as

(b) The first equation (a) requires two passes of the data - the first pass to calculate the mean and the second to calculate the variance. This second equation (b) is more commonly known as the Desktop Calculator Formula as it is possible to calculate the variance with one pass of the data. There is one issue with this formula as if the numbers are big and the differences between the numbers are small an incorrect result will prevail. This is because computers and calculators store results as real numbers and some precision may be lost when storing the sum of the square values across a long list of numbers. As an example calculate the variance of 10,000,001, 10,000,003 and 10,000,005 - the answer is 4 but some calculators may produce an answer of 0. Now what about SAS and Excel, what do they do. SAS uses the Traditional Formula. Microsoft used the Desktop Calculator Formula until Excel 2003 (for Windows) and Excel (for Mac) when it changed to use the Traditional Formula1. The Desktop Calculator Formula appears to have been used by Microsoft as it was faster than doing the Traditional Formula, the latter requiring two passes of the data. Is there any calculations that will produce a result with one pass of data? The answer is yes. One formula is known as the Method of Provisional Means where

and

There are other methods that calculate the variance in one pass of the data that are found in some statistical textbooks. The moral of the story is to always check with your software documentation to see how statistics are being calculated. -------------1 Reference Microsoft Knowledgebase, article 828888, dated March 10, 2005

www.theprogrammerscabin.com/Useful.htm

Getting the Comment text of an Excel Cell

8/11

2/1/14

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

Getting the Comment text of an Excel Cell


Not a SAS tip but one that is useful within Excel. The following VBA can be used to get the text of a comment behind a cell in an Excel Spreadsheet and place it in another specified cell:
F u n c t i o nG C o m T x t ( r C e l l C o m m e n tA sR a n g e ) O nE r r o rR e s u m eN e x t G C o m T x t=W o r k s h e e t F u n c t i o n . C l e a n ( r C e l l C o m m e n t . C o m m e n t . T e x t ) O nE r r o rG o T o0 E n dF u n c t i o n

To enter the VBA code into Excel select


T o o l s>M a c r o>V i s u a lB a s i cE d i t o r

then
I n s e r t>M o d u l e

Then paste the code above and then press


F i l e>C l o s ea n dR e t u r nt oM i c r o s o f tE x c e l

The function is now ready for use within Excel and can be accessed from the User Defined function list.

Getting the Background Color of a Cell in Excel


Again, not a SAS tip but one that is useful within Excel. The following VBA can used to get the background color a cell in an Excel Spreadsheet and place it in another specified cell:
F u n c t i o ns h o w C o l o r C o d e ( r c e l l ) s h o w C o l o r C o d e=r c e l l . I n t e r i o r . C o l o r I n d e x E n dF u n c t i o n

To enter the VBA code into Excel select


T o o l s>M a c r o>V i s u a lB a s i cE d i t o r

then
I n s e r t>M o d u l e

Then paste the code above and then press


F i l e>C l o s ea n dR e t u r nt oM i c r o s o f tE x c e l

The function is now ready for use within Excel and can be accessed from the User Defined function list. The codes returned from the function will show the background color of the cell. Microsoft unfortunately did not keep codes consistent across different releases of Excel. The following list is a general guide that is useful when referring to color codes for Excel 97: -4142 = No Color 1 = Black 2 = White 3 = Red 5 = Blue
www.theprogrammerscabin.com/Useful.htm 9/11

2/1/14

5 = Blue 6 = Yellow 10 = Green

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

A DOS .BAT File for Backing Up a File with Date A Stamp


This is not a direct SAS tip but a DOS .BAT file that I find useful when writing programs and wanting to keep a copy of a file with a date/time stamp embedded in the file name, in a backup directory, directly under the directory I am working in. The code for this file is the following:
@ E C H OO F F i fz % 1= =zg o t ou s a g e I fn o te x i s tb k u p \ N u lM Db k u p F O R/ F" T O K E N S = 2 4D E L I M S = /"% % FI N( ' D A T E/ T ' )D O( S E TT O D A Y = % % F % % G % % H ) F O R/ F" T O K E N S = 1 2D E L I M S = :"% % II N( ' T I M E/ T ' )D O( S E TN O W = % % I % % J ) c o p y% ~ n 1 % ~ x 1b k u p \ % ~ n 1 % t o d a y % % n o w % % ~ x 1 g o t oe n d : u s a g e e c h o .U s a g ei sB K U Pf i l e n a m e . : e n d

The usage for this file, where the code is stored in the file BKUP.BAT, is BKUP filename An example of its usage is the file DEMO.SAS where after running the command BKUP DEMO.SAS the file would appear in the BKUP directory with the name DEMO-01-13-2004-1238.SAS where the "01-13-2004" is the date in local format from the DOS country setting and the "1238" is the time. This code will only work when your operating system is Windows NT, Windows 2000 or Windows XP - it will not work in any other environment. Note that if the directory BKUP is not present then it will be created.

Calculating the Distance Between Two Points on the Earth's Surface


Not actually SAS related but here is something that may be useful to someone. Before GPS could tell us what the distance was between two points on the Earth's Surface there were many ways that were used to estimate that distance. One of the most simplist methods was the Great Circle Distance which was to treat the earth as a sphere and use the following formula: d = 60 * ARCOS ( SIN(l1 ) * SIN(l2 ) + COS(l1 ) * COS(l2 ) * COS(g2 - g1 )) where l1 = latitude at the first point (degrees) l2 = latitude at the second point (degrees) g1 = longitude at the first point (degrees) g2 = longitude at the second point (degrees) d = computed distance (nautical miles) This method of calculation does assume that 1 minute of arc is 1 nautical mile. To convert the distance from nautical miles to kilometers, multiply the result d by 1.852. Similarly to convert the distance from nautical miles to standard miles multiply the result by 1.150779. Note that if your calculator returns the www.theprogrammerscabin.com/Useful.htm 10/11

2/1/14

David Franklin, SAS Programmer/Consultant - Useful SAS Tips and other code

nautical miles to standard miles multiply the result by 1.150779. Note that if your calculator returns the ARCOS result as radians you will have to convert the radians to degrees before multiplying by 60, i.e. where degrees = (radians/PI)*180, where PI is approximately 3.141592654.
Updated January 11, 2011

www.theprogrammerscabin.com/Useful.htm

11/11

Das könnte Ihnen auch gefallen