Beruflich Dokumente
Kultur Dokumente
DISCUSSION DOCUMENT
Chicago, IL
Bangalore, India
Month 2005
www.mu-sigma.com
Proprietary Information
"This document and its attachments are confidential. Any unauthorized copying, disclosure or distribution of the material is strictly forbidden"
Course Contents
SAS A Brief Introduction
1
Numerous software are used in the academia and industry for data
management, statistical analysis and optimization
Win Cross
Statistical Analysis
Predictive Modeling
Forecasting &
Simulation
Campaign Unica
Management
2
SAS provides a complete set of solutions for enterprise wise
business users for data management and analysis
SAS Help
SAS Language
DATA steps typically create or modify SAS PROC (procedure) steps are pre written
datasets. They can also be used to produce routines that enable you to analyze and
custom designed reports. For example you process the data in SAS data set. For example
can use DATA steps to: you can use PROC steps to:
put your data into a SAS data set create a report that lists the data
compute values produce descriptive statistics
check for and correct errors in your data create a summary report
produce new SAS data sets by subsetting, produce plots and charts
merging, and updating existing data sets
5
Lets look at a sample SAS program
DATA STEP
PROC STEP
A normal SAS program can go up to 1000s of lines of code consisting of DATA & PROC steps
6
A SAS Library is basically an address to a particular
location in your hard drive
By default all SAS datasets will get created in the WORK library
SAS datasets in WORK library are stored temporarily and exist only during the session
Usually the SAS WORK library is located at C:\Documents and Settings\<User name>\Local
Settings\Temp\SAS Temporary Files
SAS gives you the ability to change the location of your default WORK library
7
LIBNAME statement can be used to create libraries at
user defined locations
A permanent library will store datasets permanently, i.e., they will not get deleted after the SAS
session is closed
A valid library name must start with an alphabet and cannot have more than 8 characters
8
To store datasets in the permanent library use a two
level naming convention
To store datasets in the your permanent library the syntax is libname.dataset_name
run;
When the SAS session is closed, the library reference will get deleted but the SAS datasets in that
location will be retained
9
SAS system options control the way SAS performs
operations
SAS system options differ from SAS data set options and statement options in that once you invoke a
system option, it remains in effect for all subsequent DATA and PROC steps unless modified
In this session we will cover how to set SAS system options only through the OPTIONS statement
10
SAS system options contd.
The syntax for OPTIONS statement in SAS is as follows:
12
Reading Delimited Files
Reading delimited files will require the knowledge of the layout of the dataset along with the type of
delimiter used in the file
Example Reading delimited files using DATA INFILE
The TRUNCOVER options prevents SAS from going to next row to read data if any observation is
incomplete
13
Import options
The default. Causes the INPUT statement to jump to the next record if it doesnt
FLOWOVER find values for all variables.
MISSOVER Sets all empty vars to missing when reading a short line. However, it can also
skip values.
TRUNCOVER Forces the INPUT statement to stop reading when it gets to the end of a short
line. This option will not skip information.
Other options:
DEBUG Writes information to the SAS log about the FTP process.
14
Importing data from Data Import Wizard
SAS windows gives you the flexibility of reading delimited files using an import wizard
SAS will try to intelligently guess the format of each variable while reading and hence may not be
perfect!
Similar to the IMPORT wizard, there also exists an EXPORT wizard in SAS
15
Importing data with a lot of variables
1. Running PROC IMPORT on the file or a small subset of the file.
2. If possible, select the option to use the first line as variable names
16
Importing concepts review - Import
DSD:
Changes default delimiter from blank to comma
If there are 2 delimiters in a row, it assumes there is a missing value between them
If character values are placed in quotes then the quotes are stripped from the value
DLM:
DLM can be used along with DSD for any other delimiter. Overrides the first condition above
Filename:
Filename <name> <path> : Alias for mentioning the full path in import or export
17
Suppose the following program is written and submitted to SAS
Compilation Phase
Execution Phase PDV Created
thehas
DuringSAS
Once compilation
execution
reaches
readphase
the phase
all the
end the whole
observations,
as
ofSAS
the starts
data set, DATAthestep
executing
theall eachisstatement
temporary
values scanned
dataset isand
from the PDVthe
areProgram
replaced
sequentially it Data
populated
with the also Vector
the SASalong
permanent
tostarts data
with the temporary
populating
temporary
set dataset, SAS
the PDV and
in data
the
the
samesetorder
valuesis of
created
the variables in PDV are set to missing
18
Data processing using SAS
Concept of Program Data Vector (PDV)
Compile Program
Compilation Phase
Execution Phase
Initialize variables PDV, created during the compilation phase,
to missing in PDV is a logical area in memory where a dataset is
processed one observation at a time
Load one
observation in PDV SAS creates two temporary variables
_N_ current observation number in process
_ERROR_ Binary flag to indicate data
Execute statements processing errors
Other variables _TYPE_ , _FREQ_,
_STATISTICS_ etc. get created for specific
Output to SAS
dataset
procedures
20
Common SAS log notes and warnings
NOTE: Missing values were generated as a result of performing an operation on missing values.
NOTE: MERGE statement has more than one data set with repeats of BY values.
WARNING: Multiple lengths were specified for the BY variable emp_id by input data sets. This may cause unexpected results.
WARNING: The variable x in the DROP, KEEP, or RENAME list has never been referenced
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 52:8
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).55:9
21
Before you dare to merge
1. Check if the key variable exists in both the datasets that are to be merged
2. The key variable should have the same format and length in both the datasets
5. If there are common variables other than the key variable in the dataset treat them appropriately
8. Subsetting variables and observations can be done while merging need not write a separate step
22
CONVERTING VARIABLES FROM NUMERIC TO CHARACTER AND CHARACTER TO
NUMERIC.
SAS variables are either character or numeric. If you use character variables in a numeric context or vice versa, as users
frequently do, the SAS system automatically converts one of the following ways.
From numeric to character using the BEST12. format
From character to numeric using the $w informat (w is the length of the character variable)
When automatic conversion takes place, a note similar to one of the following is written to the SAS log. Since the desired result
is usually (but not always) generated, many users ignore these ubiquitous notes.
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 52:8
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).55:9
To ensure appropriate results when converting between character and numeric data, use the PUT or INPUT function to
control type conversion, as follows.
1. Convert from numeric to character with the PUT function. The syntax is as follows. Use a format large enough to hold
the largest possible numeric value being converted.
character-variable = put(numeric-variable,format.);
The PUT function right-justifies (aligns) the result. Whenever the size (number of digits) of the numeric variable is smaller
than the format, the character variable will have leading blanks. Remove the blanks with the COMPRESS function.
character-variable = compress(put(numeric-variable,format.));
2. Convert from character to numeric with the INPUT function. The syntax is as follows. Make the informat large enough
to hold the largest possible character value being converted.
numeric-variable = input(character-variable,informat.);
23