Beruflich Dokumente
Kultur Dokumente
Econometric Analysis
SAS Notes
February 8, 2005
Availability..................................................................................................................................... 2
Resource ...................................................................................................................................... 2
User Interface ............................................................................................................................... 2
SAS Language.............................................................................................................................. 2
The DATA step ...................................................................................................................... 2
The PROC step ..................................................................................................................... 2
Comments ............................................................................................................................. 3
SAS Data ...................................................................................................................................... 3
SAS Variables........................................................................................................................ 3
Missing values ....................................................................................................................... 3
Data libraries.......................................................................................................................... 3
Data Step...................................................................................................................................... 4
Inputting data ......................................................................................................................... 4
Outputting data — writing out text files .................................................................................. 5
Copying data — Set............................................................................................................... 5
Functions ............................................................................................................................... 6
If statements .......................................................................................................................... 6
Merging data.......................................................................................................................... 6
Procedures ................................................................................................................................... 7
Contents, Print ....................................................................................................................... 7
Format ................................................................................................................................... 7
Sort ........................................................................................................................................ 7
Means, Summary................................................................................................................... 8
Univariate............................................................................................................................... 9
Plot......................................................................................................................................... 9
Corr(elation)........................................................................................................................... 9
REG ....................................................................................................................................... 9
Autoreg ................................................................................................................................ 10
Syslin ................................................................................................................................... 10
Tscsreg ................................................................................................................................ 10
Probit ................................................................................................................................... 10
Lifereg.................................................................................................................................. 11
Summary of Procedures ............................................................................................................. 12
1
Availability
Resource
User Interface
When you open up SAS you'll see three windows: the program, log, and output windows.
The program window is where you will open your programs, edit your programs, save your
programs, and submit your programs. At the top of this window (on the bar) you will see the name
of the file you are currently working with (if it has been saved).
The log window will contain all of the relevant information about your current SAS session
including assigned library names and the version of SAS you're running. Once you submit a
program it will contain any warnings or error messages generated by your program as well as how
long each data and procedure step took, the number of observations in your data sets, etc. After
running a program, you should always look at your log window first. Often programs will
generate appropriate-looking output but will have errors that may be important.
SAS Language
SAS programming statements are separated by semicolons (;) and can occupy more than one
line.
SAS programs consist of DATA steps and PROC steps.
Most SAS statements are part of a DATA step or a PROC step, but there are some that are used
outside of these. An example is the LIBNAME statement, which points SAS to a physical area on
the disk for reading and/or writing SAS data sets.
Comments
Iin a SAS program
/* This is a comment */
* So is this;
SAS Data
SAS Variables
• Start with a letter
• Have a maximum of 32 characters (letters and numbers plus underscore)
• Cannot contain blanks or special characters
• Can be lower, upper or mixed case
Missing values
• Numeric missing values are represented by a dot (.).
• Character missing values are just blanks.
Data libraries
SAS can read data from a variety of formats. Typically, users want to read in data from ASCII files.
ASCII files are just straight text files - you could open them up in word perfect or any text editor
and read them. Of course, SAS also reads permanent SAS data sets which are (usually) created by
SAS. These data sets are binary (which is important if you're transporting them ~ see below) and
are not viewable in most other applications. SAS data sets typically contain more information than
ASCII files (unless ASCII file has a header - see below). You can assign variable names, labels,
etc., plus SAS keeps information such as when the data were created and how the data is sorted.
A SAS library is simply a place on your hard drive where you are storing SAS data sets. You tell
SAS where your data sets are by using a LIBNAME statement:
libname mywork 'c:\econ\temp' ;
This tells SAS that I want to create a library (or that I have a library of SAS data sets) on the
subdirectory c:\econ\temp and that within my program I will refer to this library as mywork.
Library names are limited to 8 characters. You typically put libname statements at the beginning of
your program (or in the autoexec.sas file).
3
Data Step
Inputting data
In data step, the datalines statement tells SAS to expect raw data within the program itself while
the input statement tells SAS the order of the variables and their type (here year and date as well
as x, y, and z are numeric data while month is a character string ($)).
data workdata ;
input year month $ date x y z;
datalines;
1987 jan 20 1 2 3
1987 feb 22 3 4 5
1988 may 18 7 8 9
run;
data workdata ;
infile 'c:\textfile.txt' ;
input year month $ date x y z;
run;
The infile statement tells SAS where the file is. By default, SAS assumes that the file is space
delimited. If you have some other delimiter, for instance CSV (comma delimited text), you will
use the following:
data workdata ;
infile 'c:\textfile.txt' delim = ',';
input year month $ date x y z;
run;
A data set libname.file can be used after this (but won't be saved as a permanent SAS data set
unless you do so in a DATA step). PROC ACCESS can also be used to read in Dbase files.
4
4) Import wizard — reading Excel and Access data through Menu Command
You can import Excel data using import wizard. First, choose your original file. Second, choose
which table (sheet) in the Excel file should be imported. Third, give a new name to dataset in the
Member text box. After you get sas data, you can use it directly in the data step with:
data libname.dataname
……
It is very easy to use this software for transferring data formats. However, it is recommended to use
INFILE statement and ACCESS procedure to get data, because when the source data change,
running program again can let you quickly change your SAS data.
The first line just tells SAS not to create a SAS working data set.
or, equivalently,
You'll choose keep versus drop based on whichever requires the least typing. It's a good
idea to eliminate variables you're not using at the time, both in terms of conservation of memory
and hard drive space. If instead you want to use data from a permanent data set (PERMDATA in
your library MYWORK) you could write:
data whatever ;
set mywork.permdata;
run;
5
Functions
Mathematical operations and functions can only be used in a data step. Adding, dividing, taking
square roots, trigonometric functions, rounding, etc. are all functions available in a data step.
Somewhere in the language guide there is a list of functions ~ but you can just look in the index to
find whatever it is you want ~ most are named something obvious (e.g, ROUND is used to round
off numbers).
Raising to the nth power requires double asteriks, e.g., x**0.5 is the square root of x.
Make sure if you are dividing to use an if statement if the divisor may sometimes be zero (ditto
if you're taking logs, etc. ~ SAS will otherwise generate a lot of error statements every time it
encounters illegal operations):
data whatever ;
set workdata ;
if v= 0 then v = (y+z)/x ;
run;
Notes:
Unless you specify in a keep or drop statement or unless you overwrite the variables, you keep all
of the original
variables plus whatever you create.
If you're using U.S. state or county data or data with zip codes, SAS has reserved functions (e.g.,
ZIPSTATE) which convert zip codes to states, etc.
See the back of the SAS language guide (the card) for FUNCTIONS and additional statements
used in the data step.
If statements
If statements (subsetting if, if-then, if-then-else) are all used within a data step. See the language
guide for syntax (I always have to look it up). A subsetting if just selects observations that meet
that criterion. For example if we were only interested in data from the months of March for
1978-1987 we could use the following subsetting if:
data whatever ;
set workdata ;
if month = 'March' and 1978<=year<=1987 ;
run;
Note that as month is a character variable you must use single quotes.
Merging data
In order to merge data you need to sort it according to some CLASS variable. For example, if
you have annual state GDP (STGDP) in one SAS data set and annual state income tax rates in
another (INTAX) you will need to sort each by the class variables year and state before you can
merge on these.
proc sort data= stgdp; by year state ;run;
proc sort data = intax ; by year state ; run;
data gdp_tax; merge stgdp intax ; by year state ; run;
However, suppose you have another data set that only varies by state ~ say, square miles
(SQMILES). To merge this in you only need to sort it by state. However, the data set gdp_tax is
sorted by year, state. You should PLAN AHEAD if you know you'll be merging in a lot of data.
6
The sort procedure is very time consuming if you have large data sets. So a better way to merge
all of these data sets would be to do the following:
Warning: when using a merge statement you will OVERWRITE duplicate variables with the
value that occurs in the last merged data set.
Procedures
Contents, print
Proc contents allows you to see the names, labels, type, length, position, sorting, etc., of any
SAS data set. The syntax is:
proc contents data = x; run;
Proc print prints out SAS data sets. You can specify which variables you're interested in (the
default is all variables in the data set). If your variables have labels, you can tell SAS to print
them instead of the variable names. If you've created formats, you can use those here, too (see
below for proc format). The following prints variables a, b, and c using (already-created) labels:
proc print data = x labels;
var a b c ;
run;
Format
Suppose you have a variable (GENDER) that takes the (numeric) value 1 for males, 2 for female.
You can create a format for the variable:
proc format;
value sex
1 = 'male'
2 = 'female' ;
run;
This creates a format called SEX that you can use to format anytime you use the variable
GENDER (note that I find it easier if I call the format the same thing as I do the variable-- but
here I have chosen different names to illustrate syntax). If you're printing the data you can format
it so that in the column under gender instead of seeing 1's and 2's you'll see Male and Female. This
is useful especially if you have a variable which takes a lot of values -- you don't have to keep
codes straight.
Sort
Data is sorted according to class variables (e.g., by year, state above). The syntax is
7
straight-forward:
Again, give a little forethought to your sorting. Data remains sorted in a particular order until you
resort it differently. If you have a large data set that you will use a lot, it is sometimes useful to
sort the permanent data set instead of sorting it every time you read it into SAS.
Means, Summary
Proc means and summary perform the same functions (as far as I can tell) but means produces a
lot of output by default whereas summary produces no output by default.
Unless you specify otherwise, proc means produces tables of means, standard deviations, mins
and maxs. There are other statistics available (e.g. sum) ~ see the procedures guide. Suppose
you're interested not only in the summary statistics of a data set but also want to use them in
additional analysis. You can output statistics to another data set by doing the following:
Noprint means that SAS will not generate the printed output of means, etc. in the output window.
This sums the variables a,b, and c and writes them to a SAS data set z as the variables asum, bsum,
and csum, respectively.
If instead you were interested in the sums, but really didn't need to anything other than look at
them you could use the following (note that sum is not one of the proc means default stats):
proc means data = x sum ;
var a b c ;
run;
This would print the sum of these variables to the output screen.
Suppose that you were interested in, say, average test scores by gender. You can sort the data by
gender and take the average using proc means.
proc sort data = scores ;
by gender ;
run;
proc means data = scores ;
var test1 ;
by gender ;
run;
This would generate means, std. deviations, etc. by gender. But what if you were interested in
scores by gender AS WELL AS overall? You would use a class statement, not a by statement:
proc means data = scores ;
var test1 ;
class gender ;
output out =scormean mean = testmean ;
run;
8
This generates the means (testmean) by gender and overall. You will get output that looks like:
This tells you that overall there are 10 students and the mean test score is 84, there are four
males, six females, etc...
Univariate
Proc univariate creates statistics that means/summary do not, e.g., percentiles, quartiles. I've
never really used it but I'm told that in addition to providing good summary statistics this is
valuable in terms of looking for outliers in your data set (i.e., look at the 1rst and 99th percentiles
if there's the possibility of crazy data. I'm guessing you'll have to sort the data on the variable(s)
you're interested in ranking.
Plot
Proc plot does not provide very pretty graphs, but for quick and dirty diagnostics, it's sometimes
useful to plot your data. Suppose you were interested in what tax rates were by state over time.
You'd need to sort the data by these class variables (state, year) before plotting:
proc sort data = gdp_tax;
by state year ;
run;
proc plot data = gdp_tax ;
by state ;
plot taxrate*year ;
run;
Corr(elation)
Proc corr produces tables of correlations (as well variances, number of observations in common,
etc.). The syntax is straightforward:
proc corr data = x;
var a b ;
run;
This will produce the variance of a, b as well as the covariance and correlation. See manual for
details. You can also use a by statement to do the analysis by any class variable you're using:
proc corr data= x;
var a b ;
by year ;
run;
REG
Proc reg is the basic (ordinary least squares) procedure for regression analysis and is the least
flexible. You will find documentation in the STAT manual. By default, an intercept is included.
The basic syntax (with no intercept here) is:
This regresses y on x1, x2 and x3 with no intercept ~ it prints the title REG1 for all of the output
9
associated with this model. It outputs the fitted values (named yhat by me) and the residuals
(named ehat by me) to a data set outdata. You CANNOT output the vector of parameter
estimates to a data set (this is part of the inflexibility). If you need these coefficients for
calculations and you don't feel like copying them by hand, etc., you should use proc model (see
below).
The first test (test1) is a t-test on the coefficient for x1. Test2 is an f-test that the coefficients on
x1 and x2 are jointly zero (the default).
Note: Add /ACOV option to MODEL statement when you calculate heteroscedastic-robust
standard errors. Add WEIGHT statement when you run WLS regressions.
Autoreg
Proc Autoreg estimates and forecasts linear regression models for time series data when the
errors are autocorrelated or heteroscedastic. If you want to get iterative Yule-Walker estimates,
the basic syntax is:
Syslin
Tscsreg
Proc Tscsreg procedure analyzes panel data. If you want to get fixed effect estimates, the basic
syntax is:
Probit
Proc Probit estimates parameters in a probit model with a binary dependent variable. The basic
syntax is
Lifereg
Proc Lifereg can help you get parameters of Tobit model. The basic syntax is
data whatever;
set libname.whatever;
if y>0 lower_y=y; /* lower_y should be MISSING if y is zero*/
upper_y=y;
run;
11
Summary of Procedures
12