Sie sind auf Seite 1von 38

SAS Training Session 1

Getting Started with SAS

Sun Li Centre for Academic Computing lsun@smu.edu.sg

Outline
Section 1: Getting to know SAS
See an overview of SAS Explore the SAS workspace Work with SAS data sets Create and run SAS programs Work with SAS dates and times

<10-min Break>

Section 2: Exploring sample codes on SAS DATA steps


Read SAS data sets from raw data format or existing SAS data set Manage data Use SQL to query & combine SAS data sets

Software Installation
Faculty:
CAC website: https://mercury.smu.edu.sg/research_portal/index.asp => Software Requisition

Postgraduate students
Have your supervisors consent. Please ask your supervisor to: Drop us a consent email to cacstaff@smu.edu.sg, stating the reason for request and students detail. Or, directly raise Software Requisition request from CAC website , stating the reason and students detail.

CAC Support Engineer will contact you once we receive your request.

See an overview of SAS

Data access
You can read raw data in almost any format, from any kind of file, including variable-length records, binary files, free-formatted dataeven files with messy or missing data.

Managing your data


After you've accessed your data, you can use the SAS programming language to manipulate it any way you choose. For example, you can format your data, create variables, use functions to create data values, subset data, perform conditional processing, merge data sources, etc.

Analysis & Presentation


SAS provides a powerful data analysis tools. You can create graphs, generate a variety of statistics and models. For reporting and displaying results, SAS gives you an almost limitless number of visually appealing output formats: HTML format, PDF, PostScipt, RTF and colorful graphs.

Explore the SAS workspace

Explore the SAS workspace


Solution and Tool menus: A set of ready-to-use solutions, applications, and tools.

Help menu:

Work with SAS data sets


The most commonly used SAS files:
SAS data set: *. sas7bdat SAS program file: *. sas SAS list file/ procedure output: *. lst SAS log file: *. log

Spreadsheet data format:

Work with SAS data sets

SAS library:
All SAS files are stored in SAS library. Its a group of files in the same folder or directory. To access a library, you assign it a name (also known as a libref, or library reference). libref must be 8 characters or less. The full reference to a SAS file is libref.filename.

Work with SAS data sets

Pre-assigned SAS libraries:


Sashelp: a permanent library that contains sample data and other files that control how SAS works at your site. This is a read-only library. Sasuser: a permanent library that contains SAS files in the Profile catalog that store your personal settings. Work: a temporary library for files that do not need to be saved from session to session.

Manage files in a SAS library:


Open: double-click to open library and file. View data property: right-click and select Properties. Copy files: click and drag under Show Tree mode. With the Explorer window active, goto menu View -> Show Tree. View column format: open SAS data file and right-click column heading, and select Column Attributes.

Work with SAS data sets

SAS data formats & informats:


Formats are variable attributes that affect the way data values are written. General form of a format is: <$>format<w>.<d>
$ indicates a charater format; format is a format name; w is total width including decimal places and special characters; Period . requires delimiter; d is the number of decimal places.

Example:
Data Format 12234.21 8.2 KATHY $5. 12,234.21 COMMA9.2 $12,234.21 DOLLAR10.2

An informat (input format) is the instruction that specifies how SAS reads raw data. Its general form is the same as SAS formats.

Create and run SAS programs

Three major steps:


DATA step
Enter, manage and manipulate data Merge data, recode, label and format data Convert non-SAS data sets into SAS data sets

PROC step
Conduct statistical analysis & other procedures Output data into tables and graphs Convert data into other formats

Macro
Rationalize the repetitive sections of programs Undergo the preprocessing

Create and run SAS programs Components of SAS programs:


SAS statements:
begin with an identifying keyword end with a semicolon.

SAS steps
begin with a DATA statement or a PROC statement. end with a RUN statement or a QUIT statement or the beginning of another step.
SAS ignores spaces, carriage returns, or extra lines. To write a comment: Begin with an asterisk * and end with ; semicolon Begin with /* and end with */ SAS statements and commands are not case sensitive. SAS keywords are color-coded.

Question:
How many SAS statements ? How many SAS steps ?

Create and run SAS programs

Submit SAS programs and view output:


Select program lines and click menu Run -> Submit. Program output will be displayed in the Output window if there is any output. For graphs, a Graph window will pop in.

Save SAS programs and output:


With the Editor window / Output window active, click menu File -> Save / Save as.

Create and run SAS programs Operators


Operator **
* / + Operator = ^= ~= > < >= <= Alter. eq ne gt lt ge le Meaning Equal Not equal Greater than Less than Greater than or equal to Less than or equal to

Operation Exponentiation
Multiplication Division Addition Subtraction

Example x**2 x**.5 x*y x*12 year/12 dose1+dose2 x-y


Alter. and or not

Operator & | ^ ~ in

Meaning Both must be true. One must be true. The expression is not true. The value on the left matches at least one of the group on the right.

Work with SAS dates and times SAS date and time values:
SAS date value: A value representing the number of days between January 1, 1960 and a specified date.
Jan 1, 1959 -365 Jan 1, 1960 0 Jan 1, 1961 366

SAS time value: A value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400.
midnight (12:00 am) 0 12:15 pm 44100 5:00 pm (17:00) 61200

Work with SAS dates and times


SAS datetime value: A value representing the number of seconds between January 1, 1960 and an hour/minute/second within a specified date.
July 4, 1776 11:30:23 -5790400177 Jan 1, 1960 midnight 0 April 22, 1989 16:10:45 92488384

Common Informats for Dates:


Date expression
02Jan2000 02Jan00 01/02/2000 January 02, 2000 Sunday, January 02, 2000

SAS date informat


DATEw. DATEw. MMDDYYw. WORDDATEw. WEEKDATEw.

SAS date value


14611 14611 14611 14611 14611

Work with SAS dates and times Handling two-digit years:


When a two-digit year value is read, SAS interprets it based on a 100-year span which starts with 1920.

Date expression 12/07/41 18Dec15 04/15/30 15Apr95

Interpreted As 12/07/1941 18Dec2015 04/15/1930 15Apr1995

Work with SAS dates and times


However, you can override the default and change the value of by specifying the YEARCUTOFF= option in the OPTIONS statement to change .
options yearcutoff=1950;

Date expression 12/07/41

Yearcutoff=1920 (by default) 12/07/1941

Yearcutoff=1950 12/07/2041

18Dec15
04/15/30 15Apr95

18Dec2015
04/15/1930 15Apr1995

18Dec2015
04/15/2030 15Apr1995

Work with SAS dates and times

QUIZ 1
See the file QUIZ-SAS1.pdf.

Hint for Q4: The w value of the informat MMDDYY8. is too short to read the entire value, so the last two digits of the year are truncated.

Section 1: Getting to know SAS


We have learnt in Section 1:
SAS windows SAS files SAS library SAS data formats & informats SAS program steps & statements SAS operators SAS date & time values

Section 2: SAS DATA Steps


Exploring sample codes on SAS DATA steps
Create a SAS library Read external raw data

Create SAS data sets from direct input


Sort & print data Subset data: writing to a single or multiple SAS data sets

Data conversion btw character and numeric values


Query data using SQL procedure Simple joins using SQL procedure

Create SAS data sets

Create a SAS library:


To refer a SAS file in a permanent library, use two-level name: libref.filename. To refer a SAS file in a Work library (temporary library), we can omit the libref work and write filename directory. Example:
libname sas1 E:\lsun';

Create SAS data sets Read external raw data:


Standard and non-standard format data:
15 -15 15.4 +1.23 1.23E3 Example of standard numeric data: Example of nonstandard numeric data: 5,000 $1,000.00 10/31/02 (23)

Three methods to read data into SAS as a SAS data set: Formatted input:
standard and nonstandard character and numeric data calendar style date values and convert them to SAS date values

Column input:
data in fixed columns standard character and numeric data Point-and-click: SAS Import Wizard Click menu File -> Import data Microsoft Excel *.xls, Comma separated values *.csv, Tab limited file *.txt, some database data sets.

Create SAS data sets Example:


sp1.txt 0,70,4,1,1,general,4feb1989 1,121,4,2,1,vocati,11nov1989 0,86,4,3,1,general,22oct1991 0,141,4,3,1,vocati,4feb1993 ============================== ** read standard and nonstandard data; sp2.txt Billy Ray Washington Sally Hamrick Robert J. Castle Glenda Smith-Neil Codebook for sp2.txt: Variable name Name Age Gender Column number 1-20 21-22 23 12M 19M 45M 21F

DATA sas1.sp1; infile E:\lsun\sp1.txt' dlm=',' ; input gender id race ses schtyp prgtype $ date: date9.; RUN; infile: identifies an external raw data file to read. input: lists variable names in the input file. DATA sas1.sp2; infile E:\lsun\sp2.txt' ; input name $ 1-20 age 21-22 gender $ 23; RUN;

Create SAS data sets Create SAS data sets from direct input:
** create SAS data set from direct input; DATA quarter1; length Department $ 7 Site $ 8; input Department Site Quarter Sales; datalines; Parts Sydney 1 4043.97 Parts Atlanta 1 6225.26 Parts Paris 1 3543.97 Repairs Sydney 1 5592.82 Repairs Atlanta 1 9210.21 Repairs Paris 1 8591.98 Tools Sydney 1 1775.74 Tools Atlanta 1 2424.19 Tools Paris 1 5914.25 ; RUN; PROC PRINT data=quarter1; RUN; length: indicates length of variables. datalines: indicates internal data.

Manage SAS data sets Sort and Print data:


** sort and print data ; PROC CONTENTS data=sas1.prdsale; RUN; PROC CONTENTS nods data=work._all_; RUN; PROC CONTENTS: displays the descriptor portion of a SAS data set PROC PRINT: displays the data portion of a SAS data set.

PROC SORT : sorts a SAS data set


where: selects observations that meet a certain condition. var: gives the variable list; PROC SORT data=sas1.prdsale; by product descending actual; RUN; PROC SORT data=sas1.prdsale out=sortsale; by product; RUN;

PROC PRINT data=sas1.prdsale (firstobs=100 obs=200); RUN;


PROC PRINT data=sas1.prdsale; where year=1993; var actual predict product year; RUN;

Manage SAS data sets Subset data:


SET statement:
set: reads data set. ** writing to a single SAS data set; DATA prdsale; set sashelp.prdsale; RUN; PROC PRINT data=prdsale (obs=10); RUN;

Keep /drop: keeps/drops named variables.


if-then-else: executes a statement on if the condition is true. DATA diffsale; set prdsale; diff=actual-predict; drop actual predict; RUN;

DATA lowsale; set prdsale; if predict < 500; RUN; PROC PRINT data=lowsale (obs=10); RUN;

DATA sale_cat; length cat $ 8; set prdsale; if 0 <= actual < 400 then cat='low'; else if 400 <= actual < 600 then cat='moderate'; else cat='high'; keep actual cat product year; RUN;

Manage SAS data sets


DATA product_sample; do i=1 to 1440 by 10; set prdsale point=i; if _error_ then abort; output; end; stop; RUN; PROC PRINT data=product_sample (obs=10); RUN; ** Writing to multiple SAS data sets;

do-end: generates an iterative DO loop process.

point-stop: accesses the observation directly.


abort: goes out of loop;

output: invokes an explicit output DATA lperf mperf hperf; drop cat; set sale_cat; if cat='low' then output lperf; else if cat='moderate' then output mperf; else output hperf; RUN; PROC PRINT data=lperf (obs=10); PROC PRINT data=mperf (obs=10); PROC PRINT data=hperf (obs=10); RUN;

Manage SAS data sets

QUIZ 2
See the file QUIZ-SAS1.pdf.

Hint for Q3: To avoid a continuous loop when using direct access, either include a STOP statement or use programming logic that checks for an invalid value of the POINT= variable. If SAS reads an invalid value, it sets the automatic variable _ERROR_=1. You can use that information to check for conditions that cause continuous processing.

Manage SAS data sets Data conversion:


**character-to-numeric conversion; DATA conversion1; cvar1='32000'; cvar2='32,000'; cvar3='03may2008'; cvar4='050308'; nvar1=input(cvar1, 5.); nvar2=input(cvar2, comma6.); nvar3=input(cvar3, date9.); nvar4=input(cvar4, mmddyy6.); RUN; PROC CONTENTS data=conversion1; RUN; PROC PRINT data=conversion1; RUN; input: explicitly convert character values to numeric values. NumVar=INPUT(source, informat);

SAS auto converts a character value to a numeric value when the character value is used in a numeric context such as an arithmetic operation or a function taking numeric arguments.

Manage SAS data sets


**numeric-to-character conversion; DATA conversion2; nvar1=614; nvar2=55000; nvar3=366; cvar1=put(nvar1, 3.); cvar2=put(nvar2, dollar7.); cvar3=put(nvar3, date9.); RUN; PROC CONTENTS data=conversion2; RUN; PROC PRINT data=conversion2; RUN; put: explicitly convert numeric values to character values. CharVar=PUT(source, format);

Note: Numeric formats right-align the results. Character formats left-align the results.

SAS auto converts a numeric value to a character value when the numeric value is used in a character context such as an assignment to a character variable or a function that accepts character arguments.

Manage SAS data sets PROC SQL -- query data:


Structured Query Language (SQL) is a standardized, widely used language that retrieves and updates data in tables and views based on those tables.
DATA payroll; input IdNumber $ 1-4 Sex $ 6 Jobcode $ 8-10 Salary 12-16 @18 Birth date7. @26 Hired date7.; datalines; 1009 M TA1 28880 02MAR59 26MAR92 1017 M TA3 40858 28DEC57 16OCT81 1036 F TA3 39392 19MAY65 23OCT84 1037 F TA1 28558 10APR64 13SEP92 1038 F TA1 26533 09NOV69 23NOV91 1050 M ME2 35167 14JUL63 24AUG86 1065 M ME2 35090 26JAN44 07JAN87 1076 M PT1 66558 14OCT55 03OCT91 1094 M FA1 22268 02APR70 17APR91 1100 M BCK 25004 01DEC60 07MAY88 ; RUN;

Manage SAS data sets


**retrieve data using a query; PROC SQL; select Jobcode, count(jobcode) as number label='Number', avg(int((today()-birth)/365.25)) as average format=2. label='Average Age', avg(salary) as avgsal format=dollar8. label='Average Salary' from payroll group by jobcode; QUIT;

select: specifies columns for the view. format: assigns format to the variable. label: assigns a label to the column.

from: specifies the table to select from.


group by: groups data by the named variable. quit: to end SQL procedure. create table - as: creates table from the results of subsequent query.

**create a new table from a query;


PROC SQL; create table bonus as select IdNumber, Salary format=dollar8., salary*.025 as Bonus format=dollar8. from payroll; QUIT;

Manage SAS data sets PROC SQL join tables:


Inner joins:

The PROC SQL enables you:

Join tables and produce a report in one step without creating a SAS data set
Join tables without sorted data Use complex matching criteria

Manage SAS data sets


PROC SQL;

create table oildata as


select p.country, barrelsperday 'Production, barrels 'Reserves' from sas1.oilprod as p, sas1.oilrsrvs as r where p.country = r.country order by barrelsperday desc;

QUIT;
The SELECT clause selects variables with labels assigned. Because the Country columns are common to both tables, you qualify them in the SELECT and WHERE clauses with the table alias p that is assigned in the FROM clause. You could also qualify the columns by prefixing the column names with the table names. The WHERE clause selects only rows that have matching values for Country. The ORDER BY clause orders the output in descending order by the BarrelsPerDay

column. Because the column name appears only in the OilProd table, you don't have
to qualify the column name.

Manage SAS data sets

QUIZ 3
See the file QUIZ-SAS1.pdf.

Resources and books


You may find the following books helpful:
Combining and Modifying SAS Data Sets: Examples The Little SAS Book: A Primer, Second Edition SAS SQL Procedure User's Guide Step-by-Step Programming with Base SAS Software

CAC statistical consultation support:


CAC statistical WIKI page: http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/SAS.aspx Statistical consultation service: lsun@smu.edu.sg

Next session
Statistical Analysis Using SAS
06 Oct Monday 9.30am-12pm Training Room @ Library Level 5
Using Output Delivery System (ODS) to produce output Producing summary report using PROC MEANS & PROC FREQ Simple inference using PROC FREQ & PROC UNIVARIATE Correlation using PROC CORR Group comparison using PROC TTEST ANOVA procedures using PROC ANOVA and PROC GLM

General Linear regression using PROC GLM and PROC REG


Logistic models using PROC LOGISTIC

Das könnte Ihnen auch gefallen