Sie sind auf Seite 1von 12

Econ 6160

Econometric Analysis

SAS Notes
February 8, 2005

Availability..................................................................................................................................... 2
Resource ...................................................................................................................................... 2
User Interface ............................................................................................................................... 2
SAS Language.............................................................................................................................. 2
The DATA step ...................................................................................................................... 2
The PROC step ..................................................................................................................... 2
Comments ............................................................................................................................. 3
SAS Data ...................................................................................................................................... 3
SAS Variables........................................................................................................................ 3
Missing values ....................................................................................................................... 3
Data libraries.......................................................................................................................... 3
Data Step...................................................................................................................................... 4
Inputting data ......................................................................................................................... 4
Outputting data — writing out text files .................................................................................. 5
Copying data — Set............................................................................................................... 5
Functions ............................................................................................................................... 6
If statements .......................................................................................................................... 6
Merging data.......................................................................................................................... 6
Procedures ................................................................................................................................... 7
Contents, Print ....................................................................................................................... 7
Format ................................................................................................................................... 7
Sort ........................................................................................................................................ 7
Means, Summary................................................................................................................... 8
Univariate............................................................................................................................... 9
Plot......................................................................................................................................... 9
Corr(elation)........................................................................................................................... 9
REG ....................................................................................................................................... 9
Autoreg ................................................................................................................................ 10
Syslin ................................................................................................................................... 10
Tscsreg ................................................................................................................................ 10
Probit ................................................................................................................................... 10
Lifereg.................................................................................................................................. 11
Summary of Procedures ............................................................................................................. 12

1
Availability

Computer Lab of Habsham Building.

Resource

SAS Manual: Computer Lab of Habsham Building


Online SAS Manual: http://v8doc.sas.com/sashtml/
http://gsbwww.uchicago.edu/computing/research/
SASManual/onldoc.htm
SAS Learning: http://www.ats.ucla.edu/stat/sas/ (recommended)

User Interface
When you open up SAS you'll see three windows: the program, log, and output windows.

The program window is where you will open your programs, edit your programs, save your
programs, and submit your programs. At the top of this window (on the bar) you will see the name
of the file you are currently working with (if it has been saved).

The log window will contain all of the relevant information about your current SAS session
including assigned library names and the version of SAS you're running. Once you submit a
program it will contain any warnings or error messages generated by your program as well as how
long each data and procedure step took, the number of observations in your data sets, etc. After
running a program, you should always look at your log window first. Often programs will
generate appropriate-looking output but will have errors that may be important.

The output window displays your output.

SAS Language
SAS programming statements are separated by semicolons (;) and can occupy more than one
line.
SAS programs consist of DATA steps and PROC steps.

The DATA step


• starts with a DATA statement
• ends with a RUN statement (or with the start of another DATA or PROC step)
• has programming statements in between
• typically creates a new SAS data set (usually from another SAS data set or another file)
• allows you to create new variables from existing ones
• facilitates various data management tasks (e.g. combining data sets).

The PROC step


• starts with a PROC statement
• ends with a RUN (or sometimes a QUIT) statement (or with the start of another step)
2
• has specification statements in between
• is typically a reporting routine (e.g. PROC PRINT)
• a data management routine (e.g. PROC SORT)
• or a statistical analysis routine (e.g. PROC REG)

Most SAS statements are part of a DATA step or a PROC step, but there are some that are used
outside of these. An example is the LIBNAME statement, which points SAS to a physical area on
the disk for reading and/or writing SAS data sets.

Comments
Iin a SAS program

/* This is a comment */
* So is this;

SAS Data

SAS Variables
• Start with a letter
• Have a maximum of 32 characters (letters and numbers plus underscore)
• Cannot contain blanks or special characters
• Can be lower, upper or mixed case

Missing values
• Numeric missing values are represented by a dot (.).
• Character missing values are just blanks.

Data libraries
SAS can read data from a variety of formats. Typically, users want to read in data from ASCII files.
ASCII files are just straight text files - you could open them up in word perfect or any text editor
and read them. Of course, SAS also reads permanent SAS data sets which are (usually) created by
SAS. These data sets are binary (which is important if you're transporting them ~ see below) and
are not viewable in most other applications. SAS data sets typically contain more information than
ASCII files (unless ASCII file has a header - see below). You can assign variable names, labels,
etc., plus SAS keeps information such as when the data were created and how the data is sorted.

A SAS library is simply a place on your hard drive where you are storing SAS data sets. You tell
SAS where your data sets are by using a LIBNAME statement:
libname mywork 'c:\econ\temp' ;

This tells SAS that I want to create a library (or that I have a library of SAS data sets) on the
subdirectory c:\econ\temp and that within my program I will refer to this library as mywork.
Library names are limited to 8 characters. You typically put libname statements at the beginning of
your program (or in the autoexec.sas file).

3
Data Step

Inputting data

1) Datalines/Cards — reading in program

In data step, the datalines statement tells SAS to expect raw data within the program itself while
the input statement tells SAS the order of the variables and their type (here year and date as well
as x, y, and z are numeric data while month is a character string ($)).

data workdata ;
input year month $ date x y z;
datalines;
1987 jan 20 1 2 3
1987 feb 22 3 4 5
1988 may 18 7 8 9
run;

This method can only be used to input very small data.

2) Infile — reading in text files


To read in an ASCII file (say, C:\textfile.txt) with several variables (say: year, month, date, x, y, z)
and create a working SAS data set (say, WORKDATA) use the following syntax:

data workdata ;
infile 'c:\textfile.txt' ;
input year month $ date x y z;
run;

The infile statement tells SAS where the file is. By default, SAS assumes that the file is space
delimited. If you have some other delimiter, for instance CSV (comma delimited text), you will
use the following:
data workdata ;
infile 'c:\textfile.txt' delim = ',';
input year month $ date x y z;
run;

3) Access procedure — reading from an Excel spreadsheet

*Save Excel spreadsheets to 5.0/95 version.

proc access dbms=xls;


create libname.file.access; /* write the name of dataset here */
path='c:\econ\temp\file.xls'; /* write the original excel data file here */
scantype=yes; /* don’t have to change below */
getnames=yes;
assign=yes;
list all;
create libname.file.view;
select all;
run;

A data set libname.file can be used after this (but won't be saved as a permanent SAS data set
unless you do so in a DATA step). PROC ACCESS can also be used to read in Dbase files.

4
4) Import wizard — reading Excel and Access data through Menu Command

You can import Excel data using import wizard. First, choose your original file. Second, choose
which table (sheet) in the Excel file should be imported. Third, give a new name to dataset in the
Member text box. After you get sas data, you can use it directly in the data step with:
data libname.dataname

……

5) Stat/Transfer — transferring most of file formats to SAS data

It is very easy to use this software for transferring data formats. However, it is recommended to use
INFILE statement and ACCESS procedure to get data, because when the source data change,
running program again can let you quickly change your SAS data.

Outputting data — writing out text files


File is the opposite of infile ~ you use it to write out ASCII files using SAS data sets. To write
out a file (c:\writeout.txt) containing variables x,y and z from a SAS data set (WORKDATA) use
the following:
data _null_ ;
set workdata;
file 'c:\writeout.txt' ;
put x y z ;
run;

The first line just tells SAS not to create a SAS working data set.

Copying data — Set


In each data step you have to tell SAS which data to use. If the data is already a SAS data set
(either a working data set or a permanent data set) you use the SET statement. To use only some
variables from a working data set (say, WORKDATA from above) you could write:
data whatever (keep = year x);
set workdata ;
run;

or, equivalently,

data whatever (drop = month date y z);


set workdata ;
run;

You'll choose keep versus drop based on whichever requires the least typing. It's a good
idea to eliminate variables you're not using at the time, both in terms of conservation of memory
and hard drive space. If instead you want to use data from a permanent data set (PERMDATA in
your library MYWORK) you could write:
data whatever ;
set mywork.permdata;
run;

In the set statement the syntax is LIBNAME.DATASET .

5
Functions
Mathematical operations and functions can only be used in a data step. Adding, dividing, taking
square roots, trigonometric functions, rounding, etc. are all functions available in a data step.
Somewhere in the language guide there is a list of functions ~ but you can just look in the index to
find whatever it is you want ~ most are named something obvious (e.g, ROUND is used to round
off numbers).

Raising to the nth power requires double asteriks, e.g., x**0.5 is the square root of x.

Make sure if you are dividing to use an if statement if the divisor may sometimes be zero (ditto
if you're taking logs, etc. ~ SAS will otherwise generate a lot of error statements every time it
encounters illegal operations):
data whatever ;
set workdata ;
if v= 0 then v = (y+z)/x ;
run;

Notes:
Unless you specify in a keep or drop statement or unless you overwrite the variables, you keep all
of the original
variables plus whatever you create.

If you're using U.S. state or county data or data with zip codes, SAS has reserved functions (e.g.,
ZIPSTATE) which convert zip codes to states, etc.

See the back of the SAS language guide (the card) for FUNCTIONS and additional statements
used in the data step.

If statements
If statements (subsetting if, if-then, if-then-else) are all used within a data step. See the language
guide for syntax (I always have to look it up). A subsetting if just selects observations that meet
that criterion. For example if we were only interested in data from the months of March for
1978-1987 we could use the following subsetting if:

data whatever ;
set workdata ;
if month = 'March' and 1978<=year<=1987 ;
run;

Note that as month is a character variable you must use single quotes.

Merging data
In order to merge data you need to sort it according to some CLASS variable. For example, if
you have annual state GDP (STGDP) in one SAS data set and annual state income tax rates in
another (INTAX) you will need to sort each by the class variables year and state before you can
merge on these.
proc sort data= stgdp; by year state ;run;
proc sort data = intax ; by year state ; run;
data gdp_tax; merge stgdp intax ; by year state ; run;

However, suppose you have another data set that only varies by state ~ say, square miles
(SQMILES). To merge this in you only need to sort it by state. However, the data set gdp_tax is
sorted by year, state. You should PLAN AHEAD if you know you'll be merging in a lot of data.
6
The sort procedure is very time consuming if you have large data sets. So a better way to merge
all of these data sets would be to do the following:

proc sort data= stgdp; by state year ;run;


proc sort data = intax ; by state year ; run;
data gdp_tax; merge stgdp intax ; by state year ; run;
proc sort data = sqmiles ; by state ;run;
data gdptax2; merge gdp_tax sqmiles; by state ; run;

Warning: when using a merge statement you will OVERWRITE duplicate variables with the
value that occurs in the last merged data set.

Procedures

Contents, print
Proc contents allows you to see the names, labels, type, length, position, sorting, etc., of any
SAS data set. The syntax is:
proc contents data = x; run;

Proc print prints out SAS data sets. You can specify which variables you're interested in (the
default is all variables in the data set). If your variables have labels, you can tell SAS to print
them instead of the variable names. If you've created formats, you can use those here, too (see
below for proc format). The following prints variables a, b, and c using (already-created) labels:
proc print data = x labels;
var a b c ;
run;

Format
Suppose you have a variable (GENDER) that takes the (numeric) value 1 for males, 2 for female.
You can create a format for the variable:

proc format;
value sex
1 = 'male'
2 = 'female' ;
run;

This creates a format called SEX that you can use to format anytime you use the variable
GENDER (note that I find it easier if I call the format the same thing as I do the variable-- but
here I have chosen different names to illustrate syntax). If you're printing the data you can format
it so that in the column under gender instead of seeing 1's and 2's you'll see Male and Female. This
is useful especially if you have a variable which takes a lot of values -- you don't have to keep
codes straight.

proc print data = x labels;


var gender score grade ;
format gender sex. ;
run;
This tells SAS to format the variable GENDER using the created format SEX.

Sort
Data is sorted according to class variables (e.g., by year, state above). The syntax is

7
straight-forward:

proc sort data =x ;


by var1 var2 ;
run;

Again, give a little forethought to your sorting. Data remains sorted in a particular order until you
resort it differently. If you have a large data set that you will use a lot, it is sometimes useful to
sort the permanent data set instead of sorting it every time you read it into SAS.

Means, Summary

Proc means and summary perform the same functions (as far as I can tell) but means produces a
lot of output by default whereas summary produces no output by default.
Unless you specify otherwise, proc means produces tables of means, standard deviations, mins
and maxs. There are other statistics available (e.g. sum) ~ see the procedures guide. Suppose
you're interested not only in the summary statistics of a data set but also want to use them in
additional analysis. You can output statistics to another data set by doing the following:

proc means data = x noprint;


var a b c ;
output out = z sum = asum bsum csum ;
run;

Noprint means that SAS will not generate the printed output of means, etc. in the output window.
This sums the variables a,b, and c and writes them to a SAS data set z as the variables asum, bsum,
and csum, respectively.

If instead you were interested in the sums, but really didn't need to anything other than look at
them you could use the following (note that sum is not one of the proc means default stats):
proc means data = x sum ;
var a b c ;
run;

This would print the sum of these variables to the output screen.

Suppose that you were interested in, say, average test scores by gender. You can sort the data by
gender and take the average using proc means.
proc sort data = scores ;
by gender ;
run;
proc means data = scores ;
var test1 ;
by gender ;
run;

This would generate means, std. deviations, etc. by gender. But what if you were interested in
scores by gender AS WELL AS overall? You would use a class statement, not a by statement:
proc means data = scores ;
var test1 ;
class gender ;
output out =scormean mean = testmean ;
run;

8
This generates the means (testmean) by gender and overall. You will get output that looks like:

GENDER TYPE FREQ TESTMEAN


0 10 84
1 1 4 82
2 1 6 85

This tells you that overall there are 10 students and the mean test score is 84, there are four
males, six females, etc...

Univariate
Proc univariate creates statistics that means/summary do not, e.g., percentiles, quartiles. I've
never really used it but I'm told that in addition to providing good summary statistics this is
valuable in terms of looking for outliers in your data set (i.e., look at the 1rst and 99th percentiles
if there's the possibility of crazy data. I'm guessing you'll have to sort the data on the variable(s)
you're interested in ranking.

Plot
Proc plot does not provide very pretty graphs, but for quick and dirty diagnostics, it's sometimes
useful to plot your data. Suppose you were interested in what tax rates were by state over time.
You'd need to sort the data by these class variables (state, year) before plotting:
proc sort data = gdp_tax;
by state year ;
run;
proc plot data = gdp_tax ;
by state ;
plot taxrate*year ;
run;

Corr(elation)
Proc corr produces tables of correlations (as well variances, number of observations in common,
etc.). The syntax is straightforward:
proc corr data = x;
var a b ;
run;

This will produce the variance of a, b as well as the covariance and correlation. See manual for
details. You can also use a by statement to do the analysis by any class variable you're using:
proc corr data= x;
var a b ;
by year ;
run;

REG
Proc reg is the basic (ordinary least squares) procedure for regression analysis and is the least
flexible. You will find documentation in the STAT manual. By default, an intercept is included.
The basic syntax (with no intercept here) is:

proc reg data = whatever ;


reg1: model y = x1 x2 x3 / noint ;
output out = outdata p = yhat r = ehat ;
run;

This regresses y on x1, x2 and x3 with no intercept ~ it prints the title REG1 for all of the output
9
associated with this model. It outputs the fitted values (named yhat by me) and the residuals
(named ehat by me) to a data set outdata. You CANNOT output the vector of parameter
estimates to a data set (this is part of the inflexibility). If you need these coefficients for
calculations and you don't feel like copying them by hand, etc., you should use proc model (see
below).

You can test restrictions using TEST:

proc reg data = whatever ;


reg1: model y = x1 x2 x3 / noint ;
test1: test x1 = 0 ;
test2: test x1,x2 ;
output out = outdata p = yhat r = ehat ;
run;

The first test (test1) is a t-test on the coefficient for x1. Test2 is an f-test that the coefficients on
x1 and x2 are jointly zero (the default).

Note: Add /ACOV option to MODEL statement when you calculate heteroscedastic-robust
standard errors. Add WEIGHT statement when you run WLS regressions.

Autoreg

Proc Autoreg estimates and forecasts linear regression models for time series data when the
errors are autocorrelated or heteroscedastic. If you want to get iterative Yule-Walker estimates,
the basic syntax is:

proc autoreg data = whatever ;


reg1: model y = x xlag1 xlag2 xlag3 /nlag=1 method=ITYW;
run;

Syslin

Proc Syslin estimates parameters in an interdependent system of linear regression equations. If


you want to get 2SLS estimates, the basic syntax is:
proc syslin data=whatever 2sls;
endogenous education;
instruments mother_educ;
model wage = education;
run;

Tscsreg

Proc Tscsreg procedure analyzes panel data. If you want to get fixed effect estimates, the basic
syntax is:

proc tscsreg data=whatever;


id place time;
model y = x1 x2 x3 /fixone;
run;

Probit

Proc Probit estimates parameters in a probit model with a binary dependent variable. The basic
syntax is

proc probit data=whatever;


10
class binary_y;
model binary_y = x1 x2 x3;
run;

Lifereg

Proc Lifereg can help you get parameters of Tobit model. The basic syntax is

data whatever;
set libname.whatever;
if y>0 lower_y=y; /* lower_y should be MISSING if y is zero*/
upper_y=y;
run;

proc lifereg data=whatever;


model (lower_y, upper_y) = x1 x2 x3 x4 / d=normal;
run;

11
Summary of Procedures

Procedure Requirments Module

Access Read Excel data Access


Content, Print List data Basic
Format Generate variable labels Basic
Sort Sort data Basic
Means, Summary,
Descriptive Statistics Basic
Freq, Univariate
Plot Plot data Basic
Corr Correlation coefficient Basic
Sort Sort data Basic
Reg Multiple regression SAS/STAT
Autoreg Time series SAS/ETS

Syslin Two-stage least squares SAS/ETS


Tscsreg Panel regression SAS/ETS
Probit Probit regression SAS/STAT
Lifereg Tobit regression SAS/STAT

12

Das könnte Ihnen auch gefallen