Sie sind auf Seite 1von 5

NESUG 2010

Coders' Corner

LOG CHECKING: What to check and why? Sridhar R Dodlapati, i3 Statprobe, Basking Ridge, NJ Kiran Kumar Karidi, Novartis, Florham Park, NJ Mahipal R Vanam, EMD Serono, Rockland, MA
ABSTRACT
Not all the messages in the log that pose potential problems get enough emphasis from all the programmers. Every SAS programmer need to know the significance of some of the important messages in the log to avoid potential danger of ignoring them. After executing our SAS programs, it is obvious that we all check the corresponding log files to see if it has ERROR and/or WARNING messages listed in the log, as these are considered serious problems. However there are many other important messages that are listed in NOTE section that could pose potential issues such as Uninitialized, Merge statement has more than one dataset with repeats of BY values etc, but many programmers ignore them, not realizing the magnitude of the problems that can be caused, if these messages were not taken care off.

INTRODUCTION
The various messages in the log are shown under ERROR, WARNING and NOTE sections produced during compilation or execution time. They could be related to syntax, improper usage of SAS elements, data issues and processing. Most often the ERROR and WARNING messages get enough attention by a SAS programmer, but some of the most important messages that are listed as NOTE messages are ignored. The probable reason is that there are some hundreds of different types of NOTE messages, and most of them are just informative and not considered as important, and programmer gets overwhelmed to check all of them. But there are certain NOTE messages that ought to be checked as they are as equally important as ERROR / WARNING messages and indicate potential issues in the program code or logic or data. In this paper we are listing some of the very important messages in the log that need to be checked thoroughly. Also the importance of checking the last modified date and time of the SAS program and the corresponding log file is explained. Upon looking for the issues that one need to check in the log, there are many (in hundreds), but at least the ones that are mentioned in this paper are worth checking, every time a SAS program is run. Apart from letting the user know the syntax mistakes in the program, the log file also provides vital information about the correctness of the logic / algorithm used in the SAS program. It is important that while updating the programs based on the information provided in the log files, one has to start working from the top of the program (top to bottom approach).

WHAT TO CHECK AND WHY?


1. LAST MODIFIED DATE & TIME OF SAS FILE AND CORRESPONDING LOG FILE: One of the most important points that need to be checked, even before checking the contents of a log file is to check the last modified date and time of the SAS and corresponding LOG file. I.e. the log file creation date time should be always greater than or equal to the corresponding SAS program last modified date time stamp. If any SAS program is updated for any reason, it has to be rerun to recreate the corresponding deliverable or whatever the program was supposed to do. For this, we need to get the information from the operating system, but not from the log file itself. In WINDOWS environment irrespective of how old the files are we can always get the last modified date time stamp of the files by using the operating system commands, directly (calling the operating system commands) or indirectly (using SAS functions which in turn will call the operating system commands). Calling the operating system commands directly is more efficient. Below is the command for WINDOWS operating system: %let file=c:\logcheck\test.sas; filename foo pipe "dir &file /t:w /a:-d"; /t:w indicates you want a time field 'T', of type Last Modified 'W' /a:-d return information for files only, not directories

NESUG 2010

Coders' Corner

Below is the command for UNIX operating system: filename foo pipe "ls -g -o ~/test.sas"; ls is the UNIX equivalent of DIR -g specifies not to print the file's owner -o specifies not to print the file's group However, in UNIX environment if the files are older than six months, then we can not get the complete last modified date time stamp of the files by using the operating system commands directly or indirectly. If the files are older than six months, then the time (hour and minute) will be replaced by the year. In such cases we need to use the PERL or other script language to extract the complete date time stamp. Below is the PERL script to get the last modified date & time: #!/usr/bin/perl use File::stat; foreach $ab (@ARGV) { $sb = stat($ab) ; printf scalar localtime $sb->mtime ; print " -- $ab"; print "\n" ; } 2. ERROR: These are MUST BE FIXED serious messages that have to be taken care off in order to make the program run, otherwise the program will abort. With out fixing these messages, program does not even get executed completely. There are five types of errors as follows: Syntax: Occurs when programming statements do not conform to the rules of the SAS language during compilation time. Semantic: Occurs when the language element is correct, but the element might not be valid for a particular usage during compilation time. Execution-time: Occurs when SAS attempts to execute a program and execution fails during execution time. Data-related: Occurs when data values are invalid during execution time. Macro-related: Occurs when the macro facility is used incorrectly during macro compilation or execution time, DATA or PROC step compilation or execution time. 3. WARNING: Similar to ERROR messages, these are also serious messages most of the time, but not always. Because of this uncertain nature, they are considered as MAY or MAY NOT serious messages implying that they may or may not influence the program to run correctly. Hence when this message appears in the log, it is imperative that the programmer must put every effort to clear it from the log, or makes sure that the program runs correctly even if they are present. 4. FATAL: There are many different scenarios that can cause this message in the log and, as the name itself indicates, this message is as serious as an error message, causing the program to abort. NOTE: Most of the time the NOTE messages are just informative and not important, but some of the NOTE messages should be considered as serious as ERROR / WARNING messages because of the potential problems they can cause. The following belong to this category: 5. UNINITIALIZED: When SAS is unable to find a variable in a DATA step, SAS prints the "variable ... is uninitialized" message. Then SAS creates the variable, sets its values to missing for all observations, and runs the DATA step. Its nice that SAS runs the DATA step, but you probably dont want the variable to have missing values for all observations. A more serious problem ensues when SAS is unable to find a variable in a PROC step. SAS prints the variable-not-found message and does not run the procedure at all. 6. NOT RESOLVED: When a macro or macro variable does not get resolved, SAS issues a message saying that it is not resolved. With out the proper resolution of the macro or macro variable, correct results are not assured.

NESUG 2010

Coders' Corner

7. MERGE STATEMENT HAS MORE THAN ONE DATA SET WITH REPEATS OF BY VALUES: When merge statement is used in data step to merge datasets for MANY to MANY merge, this messages will appear in the log as a cautious note to warn us about the potential incorrect results. User should be wary of using the merge statement in data step for many to many merge; instead a PROC SQL can be used to achieve it. A very complicated method for getting the correct results using the merge statement in data step for a many to many merge is available, otherwise the simple straight forward way of using the merge statement almost all the time yields incorrect results. 8. INVALID ARGUMENT TO FUNCTION: Correct arguments with appropriate attributes need to be passed to any function for their normal functionality. 9. INVALID DATA: Whenever SAS encounters invalid data while reading with an INPUT statement, SAS sets the problematic variable to missing for that observation and then prints a detailed message. Occasionally programmers get invalid-data messages because they are trying to read non-printable characters such as carriage returns. 10. MISSING VALUES WERE GENERATED: The missing-values-were-generated note tells you that SAS was unable to compute the value of a new variable because of existing missing values in the data. This may not indicate a problem, but it warrants an investigation. The missing-values-were-generated note tells that SAS automatically assigned missing values for us. 11. ILLEGAL: There are many scenarios that can lead to this note in the log. Some of them are "illegal arguments to functions", "illegal mathematical operations" such as taking the logarithm of zero or trying to take the square root of a negative number, "Illegal Instruction In Task" etc. 12. DATA SET DOES NOT EXIST: When the specified data set does not exist, this message will be generated in the log. 13. AT LEAST: There are two scenarios that can lead to this note in the log. I.e. one way is, when SAS expects none of the scenarios/events and encounters one or more. Another way is, when SAS expects more than one scenarios/events and encounters none. 14. INSUFFICIENT: Occur when SAS encounter an out-of-resources condition, such as a full disk, or insufficient memory for a SAS procedure to complete. When these conditions occur, SAS attempts to find resources for current use. For example, SAS may ask the user for permission to delete temporary data sets that might no longer be needed, or to free the memory in which macro variables are stored. 15. HAS 0 OBSERVATIONS: SAS gives this note in the log whenever there are zero observations in the data set that is newly created. SAS issues this note to alert us and avoid any unintended results, because when a dataset is created we don't expect it to have zero observations. 16. NO OBSERVATIONS: Similar to the previous message, but when the input dataset that is read has zero observations, then SAS issues this note in the log to alert us. 17. NUMERIC VALUES HAVE BEEN CONVERTED TO CHARACTER: If user accidentally mixes numeric and character variables, SAS will convert the data from one type to the other, run the program anyway, and print a note stating that the "values have been converted". Its nice that SAS tries to fix the problem, but this doesnt mean that it can be ignored. SAS uses the default format for the conversion, which could be undesirable some times. If you let SAS convert your variables, it can come back to haunt you at a later time when the variable that you think is numeric is now character or vice versa. If a variable needs to be converted, the user should do it explicitly, so there are no surprises. 18. CHARACTER VALUES HAVE BEEN CONVERTED TO NUMERIC: For the same reason mentioned above, one would not want to see this message in the log. Here the SAS uses the default informat for the conversion, which could be undesirable some times. 19. DIVISION BY ZERO:

NESUG 2010

Coders' Corner

Division by ZERO results in infinite, and one should avoid this in the programming, as computers do not have the capabilities to do such calculations. 20. MATHEMATICAL OPERATIONS COULD NOT BE PERFORMED: When the SAS encounters incorrect/impossible mathematical operations, or correct mathematical operations but SAS does not have the capability of performing them, then this message will appear. 21. OUTSIDE THE AXIS RANGE: When the data points / values that need to be plotted in the graph are outside the specified axis range, they will not appear in the graph and this note will be printed to the log. 22. SAS WENT TO A NEW LINE WHEN INPUT STATEMENT REACHED PAST THE END OF A LINE: When the INPUT statement tries to read past the end of the current input data record, it then moves the input pointer to column 1 of the next record to read the remaining values. But this is okay, only if our intention is to read the data that flows over into several lines of input, otherwise this will result in incorrect data reading. 23. OBSERVATIONS WITH DUPLICATE KEY VALUES WERE DELETED: By using the proc sort, we can delete the duplicate records with the NODUPKEY option. When the duplicate records are not expected and still one wants to make sure that they don't exist, or even if they are expected but interested in knowing how many are deleted, then one has to look into the log for this message. 24. W.D FORMAT WAS TOO SMALL: The width of the specified format is not wide enough to contain the complete value for one or more values. Specifying the correct overall Format width and appropriate number of decimal places is essential to having SAS present your data the way you want them to appear. Notice that when decimal specifications are supplied that are smaller than the internal decimal value, the Format rounds up the output to the nearest specified decimal place. 25. THE SAS SYSTEM STOPPED PROCESSING THIS STEP BECAUSE OF ERRORS: When the SAS encounters a fatal error, this note will be printed in the log immediately after that error message. This paper does not contain a complete list of messages and the user / programmer will need to add additional messages depending on their requirement. Some times the log files seem to be clear with out any messages but the program/results can still be defective. For example, if a by statement is missed after merge statement the log does not show any messages indicating this, but almost all the times this yields erroneous results, unless otherwise the intention is to do a blind merge which does not require a by statement. Log checking will not guarantee the accuracy of the results, but increases the accuracy of the validation process and assures that the log file has been checked for the specified messages. We all know that a perfectly normal log file does not mean that the program has worked correctly, as the program can hinder logical mistakes.

AUTOMATION
Checking the log file for various messages manually is not a good idea for the following reasons. Impractical because, each log file could be huge and there could be many log files. As this is not sufficient, each SAS program is usually re-run multiple times in its lifetime forcing us to check the same log file again and again. Error prone as there is always a possibility of oversight of some of the messages. Inefficient as it consumes lot of resources both time and effort. No proof or hard evidence is left to indicate that one has checked a log file manually. The opposite can be proved, if the log file contains any of the messages.

Keeping them in mind, making the log checking an automated process is always preferred over manual checking.

NESUG 2010

Coders' Corner

CONCLUSION
It is not practical to check all the messages in SAS log files every time they were run, considering the huge volume of the log files, however there are certain messages and information that must be checked to make sure that there are no / minimal issues in the program or data. SAS programmers must pay attention to these messages / information in the log and always must check them keeping in mind the consequences of ignoring them. After knowing the trouble they can cause, one would not want to see them in their log file.

REFERENCES
SAS online documentation Smoak Carey G, "A Utility Program for Checking SAS Log Files" (SUGI 27, Paper 96-27) Chakravarthy Venky, "You be Your own QC Cop - Check your .SAS, .LOG and .LST dates from Within SAS using Operating System Commands"

ACKNOWLEDGMENTS
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.

CONTACT INFORMATION
We appreciate your valuable comments and suggestions. Please contact the authors at:

Sridhar R Dodlapati i3 Statprobe 131 Morristown Road Basking Ridge, NJ 07920 Sridhar.Dodlapati@i3statprobe.com

Kiran Karidi Novartis 180 Park Avenue Florham Park, NJ 07932 Kiran.Karidi@novartis.com

Mahipal Vanam EMD Serono Rackland, MA Mahipal.V@gmail.com

Das könnte Ihnen auch gefallen