Sie sind auf Seite 1von 135

1 Foreword

This Exam Preparation book is intended for those preparing for the
SAS Certified Base Programming for SAS 9 Exam.

The Art of Service is an Accredited Training Organization and has


been training IT professionals since 1998. The strategies and content
in this book is a result of experience and understanding of the SAS
programming, and the exam requirements.

This book is not a replacement for completing the course. This is a


study aid to assist those who have completed an accredited course
and preparing for the exam.

Do not underestimate the value of your own notes and study aids.
The more you have, the more prepared you will be.

While it is not possible to pre-empt every question that may be asked


in the SAS Certified Base Programmer exam, this book covers the
main concepts covered within the SAS Programming discipline.

The SAS Certified Base Programmer for SAS 9 is the ideal


certification for those relatively new to SAS programming or new to
SAS certification. It is also the principle certification for many of the
advanced certifications available from SAS.

These study notes and sample exam questions in this book will allow
you to more easily prepare for an SAS Programming exam.

Ivanka Menken
Executive Director
The Art of Service

Copyright The Art of Service


1 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Write a review - receive Any Free
eBook from our Catalog - $99 Value
If you recently bought this book we would love to hear from you!
Benefit from receiving a free eBook from our catalog at
http://www.emereo.org if you write a review on Amazon (eg the online
store where you purchased this book) about your last purchase!

How does it work?

To post a review on Amazon, just log in to your account and click on


the Create your own review (under Customer Reviews) button of the
relevant product page. You can find examples of product reviews in
Amazon. IF you purchased from another online store, simply follow
their procedures.

What happens when I submit my review?

Once you have submitted your review, send us an email at


review@emereo.org with the link to your review, and the eBook you'd
like as our thank you from http://www.emereo.org. Pick any book you
like from the catalog, up to $ 99 RRP. You will receive an email with
your eBook as download link. It's that simple.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 2
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
2 Table of Contents

1 Foreword............................................................................ 1
2 Table of Contents............................................................... 3
3 SAS Certified Base Programmer for SAS 9 ....................... 6
4 Exam Specifics .................................................................. 7
5 Exam Prerequisites............................................................ 7
6 SAS Fundamentals ............................................................ 8
6.1 SAS Programs ............................................................. 8
6.2 SAS Libraries ............................................................. 10
6.3 SAS Data Sets ........................................................... 14
6.4 SAS System Outputs ................................................. 17
7 Creating SAS Data Sets .................................................. 19
7.1 Raw Data Files........................................................... 19
7.2 The Data Step............................................................ 21
7.3 Processing DATA Steps ............................................ 27
7.4 Debugging DATA Steps............................................. 30
7.5 SAS Data Sets ........................................................... 33
7.6 Combining SAS Data Sets......................................... 37
7.7 Processing Match-Merge ........................................... 41
8 Working with Data............................................................ 44
8.1 Column Input ............................................................. 44
8.2 Reading Free Format Data ........................................ 47
8.3 Reading Data and Time Values ................................. 51
9 Working with Observations .............................................. 52
9.1 Creating One Observation From Multiple Records .... 52
9.2 Creating Multiple Observations From Single Records 54
10 Creating Reports.............................................................. 55
10.1 Basic Reports ......................................................... 55
10.2 Advanced Reporting ............................................... 63
10.3 Descriptive Statistics............................................... 68
10.4 Creating Frequency Tables .................................... 72
10.5 Producing HTML Output ......................................... 75
11 Advanced Work with Variables ........................................ 78
11.1 User-Defined Formatting ........................................ 78
11.2 Using SAS Functions.............................................. 85
Copyright The Art of Service
3 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11.3 Expanding DO Loops.............................................. 93
11.4 Process Arrays ....................................................... 95
12 Troubleshooting SAS Programs ...................................... 98
12.1 Types of Errors ....................................................... 98
13 Practice Exam................................................................ 100
14 Answer Guide ................................................................ 111
14.1 Answers to Questions........................................... 111
15 References .................................................................... 118

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 4
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Notice of Rights
All rights reserved. No part of this book may be reproduced or transmitted in any form
by any means, electronic, mechanical, photocopying, recording, or otherwise, without
the prior written permission of the publisher.

Notice of Liability
The information in this book is distributed on an “As Is” basis without warranty. While
every precaution has been taken in the preparation of the book, neither the author
nor the publisher shall have any liability to any person or entity with respect to any
loss or damage caused or alleged to be caused directly or indirectly by the
instructions contained in this book or by the products described in it.

Trademarks
Many of the designations used by manufacturers and sellers to distinguish their
products are claimed as trademarks. Where those designations appear in this book,
and the publisher was aware of a trademark claim, the designations appear as
requested by the owner of the trademark. All other product names and services
identified throughout this book are used in editorial fashion only and for the benefit of
such companies with no intention of infringement of the trademark. No such use, or
the use of any trade name, is intended to convey endorsement or other affiliation with
this book.

Copyright The Art of Service


5 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
3 SAS Certified Base Programmer for SAS 9

SAS Certified Base Programmer for SAS 9 covers a basic


programming knowledge to develop efficient code to meet business
requirements. The certification is geared towards individuals who are
new to SAS programming or new to SAS certifications. The
certification program for Base Programmer for SAS 9 is the basis for
all other certifications provided by SAS.

A SAS Base Programmer should have some experience in data


management and SAS programming, specifically with tasks related
to:
x manipulating and transforming data
x importing and exporting raw data files
x combining SAS date sets
x creating basic reports
x correcting errors in data syntax and programming logic.

Certification is valid for three years, at which time a recertification


exam must be completed.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 6
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
4 Exam Specifics

Exams are delivered in a secure environment, proctored, and timed


by Prometric. Details about the exam include:
x 70 multiple-choice questions
x Score of 65%
x 2 hour testing time.

Tests can be scheduled online. Prometric requires a 24-hour waiting


period for first-time registrations of new candidate. The first-time
register will receive an email confirmation.

If a candidate fails the exam, they must wait a minimum of 30


calendar days before attempting again.

Candidates are required to bring two forms of identification to the


testing center on the day of their test. They are recommended to
arrive 15minutes before the test. If they are more than fifteen minutes
late, a seat may not be available, nor a refund guaranteed.

5 Exam Prerequisites

There are no prerequisites for SAS Certified Base Programmer for


SAS 9.

Copyright The Art of Service


7 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
6 SAS Fundamentals

Key components of SAS


x Programs
x Libraries
x Files
x Data sets.

6.1 SAS Programs

SAS programs are used to access, manage, analyze, or present data.


Most SAP programs are a combination of DATA and PROC
(procedure) steps.

DATA steps create or modify data sets, specifically:


x Entering data into a SAS data set
x Checking for errors in the data
x Computation of data
x Producing new SAS data sets
x Producing customer-designed reports.

PROC steps are pre-defined procedures to analyze and process data


in a SAS data set.

PROC steps are used to manipulate the data typically for reporting
purposes. As a result, PROC steps can:
x Create new SAS data sets
x List, sort, and provide summaries of data
x Produce statistical results
x Produce plots and charts.

6.1.1 Formatting and Behavior of SAS Programs


x Programs consist of statements.
x Statements start with a keyword.
x Statements end with a semicolon.
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 8
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Statements are in free format:
x Can begin and end anywhere on a line
x Can exist on multiple lines
x Can exist with multiple statements on a single line

Statements can be in uppercase or lowercase.


Case sensitivity is typically resolved using quotation marks.
Separation of words is resolved using spaces or special characters.

6.1.2 Processing Programs


Upon submission of a SAS program, SAS will read the statements.
While statements are being read, error checking is in place.

New steps begin with a DATA and PROC statement. The completion
of a step is identified by a RUN or QUIT statement.

SAS will read the step completely. When a new step is found, SAS
stops reading and will execute the step it just read.

When a step is executed, the processing activity is logged with the


results of the processing. The SAS log is a collection of these
messages as well as any errors that occurred.

General results of processing:


x Creation of a SAS data set
x Production of SAS log messages
x Creation of a report
x Sorting and managing data.

Copyright The Art of Service


9 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
6.2 SAS Libraries

Libraries are collections of files. Implementation of libraries depends


on the operating environment. A library can be a physical or logical
collection.

Summary of libraries in different operating environments:


x Windows – stored in the same directory
x UNIX – stored in the same directory
x OpenVMS – stored in the same directory
x CMS – same file type
x z/OS – specially formatted host data

Files can be stored temporarily or permanent. When a library name is


‘Work’ or unspecified, files are stored in a temporary SAS data library.
Temporary files are deleted at the end of a session.

Files are permanently stored in libraries that have a specified name


other than ‘Work’.

6.2.1 File Referencing


To reference, a two-level name is used. The format of a two-level
name is:
libref.filename.

libref is ‘library reference’, the SAS data library containing the file.
filename is the name of the specified file.

If there is no libref identified or ‘work’, the file reference is a temporary


file. All other files are permanent.

Several libraries are defined by default:


x ‘Sashelp’ is a permanent read-only library containing
sample data and other files that control the behavior of the
SAS
x ‘Sasuser’ is a permanent library containing SAS files that
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 10
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
stores personal settings in the Profile catalog
x ‘Work’ is a temporary library for files.

Defining libraries is often the first step in setting up a SAS session.

Assign a library name (libref):


x Use the LIBNAME statement
x Format:
LIBNAME libref 'SAS-data-library'
x libref is 1 to 8 characters long beginning with a letter or
underscore and containing only letters, numbers, or
underscores
x SAS-data-library is the name of the library where files are
stored
x Multiple LIBNAME statements can be used to assign as
many librefs as required.
x LIBNAME statements are global and stay in effect until the
libref is modified, canceled or the SAS session ends.
The files in the library are inaccessible if the library is deleted or the
session ends. The files still exist on the operating system.

Comparison of physical names of SAS data within operating systems


for the library “sample”:
x Windows – c:\sample\data
x UNIX - /users/month/sample/sasdata
x OpenVMS – dua0:[month.fitness]
x CMS – b
x z\OS – month.sample.sasdata

6.2.2 Referencing Other Formats


The LIBNAME statement can be used to reference files created by
other software programs.

To read or write to foreign files, an engine for the file type must be
used. A SAS engine is a set of internal instructions used by SAS.
Sometimes, the appropriate engine is brought up automatically; other
times, the engine must be specified.

Copyright The Art of Service


11 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To specify the engine to be used
x Use the LIBNAME statement
x Format:
LIBNAME libref engine 'SAS-data-library'
x libref is 1 to 8 characters long beginning with a letter or
underscore and containing only letters, numbers, or
underscores
x engine is the name of the library engine used in the
operating system
x SAS-data-library is the name of the library where files are
stored

Interface library engines support read-only access to BMDP, ORISIS,


and SPSS files. Physical filenames associated with a libref are actual
filenames (a rule exception).
x Example: Using libref = Differ and the SPSS engine for
the c:\newdata.dat in the windows operating environment:
libname differ spss 'c:\newdata.dat'

Other types of data that can be read or written to are dependent on


the operating environment and the SAS/ACCESS products that are
licensed, including:
x Relational Databases
ORACLE
SYBASE
Infomix
DB2 for z/OS (OS/390)
DB2 for UNIX and PC
Oracle Rdb
ODBC
CA-Openingres

x Non-relational Files
ADABAS
IMS/DL-I
CA-IDMS
SYSTEM 2000

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 12
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
x PC Files
Excel (.xls)
Lotus (.wkn)
DBF
DIF

6.2.3 View Library Contents


The CONTENTS procedure creates output to:
x describe the contents of a library
x describe the descriptor information for a SAS data set.

To create SAS output of library contents:


x Use the PROC CONTENTS statement
x Format:
PROC CONTENTS DATA=libref._ALL_NODETAILS;
RUN;
x libref is the libref assigned to the SAS library
x _ALL _ requests a listing of all files in the library
x NODETAILS prevents detailed information about each file
from being in the output.

PROC DATASETS can be used to view the contents of a SAS library


or SAS data set with a CONTENTS statement.

The difference between the two procedures is the type of statement


and the defaults for libref:
x PROC DATASETS is an interactive procedure requiring a
QUIT statement
x The PROC DATASETS default is DATA=option
x The PROC CONTENTS default is either Work or User.

To create SAS output of library contents:


x Use the PROC DATASETS statement with CONTENTS
x Format:
PROC DATASETS;
CONTENTS DATA=libref._ALL_NODETAILS;
QUIT;
x CONTENTS describes the contents of SAS data sets
Copyright The Art of Service
13 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
specified and prints the directory of the library
x libref is the libref assigned to the SAS library
x _ALL _ requests a listing of all files in the library
x NODETAILS prevents detailed information about each file
from being in the output.

When viewing descriptor information for a specific data set, the


variables are listed alphabetically by default.
Descriptor information can be viewed in logical position, or creation
order, by using the VARNUM option:
x PROC DATASETS;
CONTENTS DATA=libref VARNUM;
QUIT;

6.3 SAS Data Sets

SAS data sets are one type of SAS file.


To process data, it must be in the form of a data set.
Processing can consist of:
x Accessing data
x Analyzing, managing, and presenting data.

Data sets must have a name. Valid data set names can:
x Be 1 to 32 characters long
x Begin with a letter, uppercase or lowercase, or an
underscore
x Consist of any combination of letters, numbers, and
underscores.

Data set has two parts: a descriptive portion and a data portion that
the data set can locate.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 14
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
6.3.1 Descriptive Portion
Contains information about the data set, including:
x data set name
x data type
x creation date and time
x number of observations
x number of variables
x number of indexes.

Contains information about the variables in the data set, including:


x name
x type
x length
x format
x informat
x label.

6.3.2 Data Portion


Collection of data values arranged in a table. Observations refer to
the rows in the table. Each row is a collection of data values that
relate to a single object.

Variables refer to the columns in the table. Each column is a


collection of values that describe a particular characteristic. Each
variable in a data set has a series of attributes. Variables of the same
name can aid in combining SAS data sets. To combine data sets,
variables must be the same type.

A variable name conforms to SAS naming conventions.


Variable names follow the same rules as data set names:
x Be 1 to 32 characters long
x Begin with a letter, uppercase or lowercase, or an
underscore
x Consist of any combination of letters, numbers, and
underscores.

Copyright The Art of Service


15 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Variable types are either character or numeric:
x Character variables can contain any value
x Numeric variables can contain only numeric values
x Representation of missing values is determined by
variable type:
o A blank represents a missing value for character
values
o A period represents a missing value for numeric
values.

Variable length identifies the number of bytes used to store the


variable. Length is dependent on type:
x Character variables can be up to 32,767 bytes long
x Numeric variables have a constant default length of 8 with
an infinite number of digits possible
x Numeric variables have a constant length because they
are stored as floating-point numbers
x A different length can be specified for numeric variables.

Variable format affects the way data values are written.


x Formats are provided by SAS software or created by a
programmer
x The appropriate format must be selected to write value in
a given formatting. The maximum width of the written
value usually required to be specified
x Some formats require specification of the number of
decimal places
x A format can be permanently assigned to a variable in a
SAS data set or temporarily assigned in a PROC step to
control output appearance.

When data values must be read into a SAS data set, how they are
read must be specified. Reading specifications are controlled by
informats.

Variables can have labels:


x Labels are descriptive text up to 256 characters long
x By default, variables are identified by their names: for
more information on the variables, labels are applied.
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 16
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
6.4 SAS System Outputs

Result formats for SAS formats can be specified:


x Traditional SAS listing
x HTML document
x Both a listing and HTML document.

The appearance of the output can be controlled, specifically (for SAS


listings):
x line size (the maximum width of the log and output)
x page size (the number of lines per printed page)
x page numbers displayed
x data and time displayed.

To modify system options, use an OPTIONS statement (a global


statement).
x OPTIONS:
o options identifies one or more system options to be
changed
o The system options available are dependent on the
operating system.

Survey of common options:


NUMBER | Page numbers appear by default, so NONUMBER
NONUMBER prevents the printing of page numbers.
DATE | NODATE Date and time appear by default, so NODATE
prevents the printing of date and time.
PAGENO=option Specifies the beginning page number for report.
The default is 'page 1'.
PAGESIZE=option Specifies how many lines on each page of the
output.
LINESIZE=option Specifies the width of the print line. Observations
that are unable to fit on a single line will continue
on a different line.
TEARCUTOFF=option Fulfills Year 2000 Compliance.

Copyright The Art of Service


17 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
OBS= Specifies the last observation to be processed.
The default is MAX.
FIRSTOBS= Specifies the first observation to be processed.
The default is 1.
FIRSTOBS = and Specifies a range of observation to be processed.
OBS=
FORMCHAR= Specifies the formatting characters for the output
'formatting-character' device.
FORMDLIM= Specifies the character used to delimit page
'delimiting-character' breaks. Default is null.
LABEL | NOLABEL Can replace variable names with descriptive
labels. The default is LABEL.
SOURCE | Controls if SAS source statements are written to
NOSOURCE the log. Default is SOURCE.

FIRSTOBS= and OBS= can be used as data set options instead of


system options.
A data set option is set in parenthesis following the input data set
name.
x Format:
options firstobs=8 obs=16;
proc print data=newdata.set (firstobs=5 obs=22);
run;

To view the current setting of one or all SAS system options, use the
OPTIONS procedure.
x Format:
PROC OPTIONS <option(s)>;
RUN;

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 18
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7 Creating SAS Data Sets

7.1 Raw Data Files

A raw data file is an external text file. Raw data files are non-
proprietary. Many software programs use raw data files.

The external file contains data values which are organized in fields.
Fields may be difficult to locate in raw data format. Fixed fields
ensures the values for specific variables are located in the same
position for all records.

7.1.1 Creating a Data Set with Fixed Fields


1. The SAS library used to store the data set must be
referenced.
2. A DATA step program is written to read the raw data file and
create a SAS data set
x The following instructions must be provided to SAS by the
DATA step in order to read the raw data file:
o Name or location of the external file
o Name for the new SAS data file
o Reference for the external file
o Description of the data values to be read
x Additional SAS statements can be used to customize the
data.
3. A PROC PRINT step can produce a list report to display the
data values.

Basic statements for construction:


LIBNAME statement Reference a SAS data library.
FILENAME statement Reference an external file.
DATA statement Name a SAS data set.
INFILE statement Identify an external file.
INPUT statement Describe data.
Copyright The Art of Service
19 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
RUN statement Execute DATA step.
PROC PRINT List data.
statement
RUN statement Execute final program statement.

7.1.2 Creating References


To reference a permanent SAS library, a LIBNAME statement is
required:
x Use the LIBNAME statement
x Format:
LIBNAME libref 'SAS-data-library'
o libref is 1 to 8 characters long beginning with a
letter or underscore and containing only letters,
numbers, or underscores
o SAS-data-library is the name of the library where
files are stored.

A LIBNAME statement is not required for all situations, such as


temporary SAS data set or if the libref has been automatically
assigned for the permanent library by SAS.

The external file containing the raw data must be referenced using a
FILENAME statement:
x Use the FILENAME statement
x Format:
FILENAME fileref 'filename'
o fileref is a name associated to an external file and
must be 1 to 8 characters long beginning with a
letter or underscore and containing only letters,
numbers, or underscores
o filename is the fully qualified name or location of
the file.

When a fileref is associated with an individual external file, the fileref


is specified in subsequent SAS statements and commands.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 20
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
When a fileref is associated with an aggregated storage location, the
fileref is specified with the individual filename in parenthesis.

When reading raw data, the INFILE statement identifies which file is
storing the data.
To identify the external file where data is stored:
x Use the INFILE statement
x Format:
INFILE file-specification <options>;
o file-specification takes the form of fileref from a
previously defined file reference or 'filename' to
point to the actual name and location of the file
o options describes the characteristics of the input
file and how the raw data should be read.

LIBNAME and FILENAME are global statements which are in place


until they are changed, canceled, or the SAS session ends.

7.2 The Data Step

7.2.1 Writing a Basic Step Program


The start of a DATA step is the DATA statement. Within the DATA
statement, the SAS data set to be created is named.

To begin a DATA step and name a SAS data set:


x Use the DATA statement
x Format:
DATA SAS-data-set-1 (...SAS-data-set-n>;
o SAS-data-set-n is the name of the data set(s) to be
created

Columns are the common input style specifying actual locations for
values. Columns should not be used in all situations, but only when
data is:
x consists of standard character and numeric values
x in fixed fields.

Copyright The Art of Service


21 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Standard numeric data values contain only:
x numbers
x decimal points
x scientific notations (e-notation)
x positive or negative numbers.

Nonstandard numeric data values include:


x contain special characters such as % or $
x date and time values
x fractions
x integer binaries
x real binaries
x hexadecimal numbers.

Raw data organized in columns, or fixed format, allow a beginning


and ending column for each file to be specified. Raw data that is not
organized in columns is considered in free format. Free format cannot
be read by column input.

7.2.2 Describing Data


The INPUT statement is used to describe the fields of raw data. For
each field of raw data to be read, the information required for the
INPUT statement includes:
x a valid SAS variable name
x a variable type (character or numeric)
x a range identifying starting and ending columns.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 22
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To describe the field of raw data to be read and placed in the SAS
data step:
x Use the INPUT statement
x Format:
INPUT variable; <$> startcol-endcol...
o variable is the SAS name that you assign to the
field
o The dollar sign indicates the variable type as
character. To indicate numeric, no dollar sign
should exist
o startcol is the starting column for the variable
o endcol is the ending column for the variable.

Using column input, you can:


x read any or all fields from the file
x read the filed in any order
x specify only starting columns when values occupy only
one column.

Variable names conform to SAS naming conventions:


x Be 1 to 32 characters long
x Begin with a letter, uppercase or lowercase, or an
underscore
x Consist of any combination of letters, numbers, and
underscores.

To verify data, use the OBS=option in the infile statement. Using


OBS=n allows a portion of the data to be processed.

The log will verify that the raw data file was read correctly, typically
showing:
x The number of records read
x The number of observations and variables created.

If invalid data occurs, a note in the log will occur which include the
variable that erred, a column ruler and actual data line.

To view the data processed, use a PROC PRINT step.

Copyright The Art of Service


23 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.2.3 Creating and Modifying Variables
To modify existing values or create new variables, an assignment
statement can be used in any DATA step.

To modify or create variables:


x Format:
variable=expression;
o variable is the name of the new or existing variable
o expression is any valid SAS expression.

SAS expressions are used to:


x transform variables
x create new variables
x conditionally process variables
x calculate new values
x assign new values.

Expressions are instructions that consist of a sequence of operands


and operators:
x Operands are variable names or constants in character or
numeric format
x Operators are special-characters, grouping parenthesis, or
functions.

Arithmetic operators allow calculations to be performed. Arithmetic


operators include:
x negative prefix (-)
x exponentiation (**)
x multiplication (*)
x division (/)
x addition (+)
x subtraction (-)

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 24
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Arithmetic operators have priorities associated with them to assist in
properly calculating an expression when more than one operator
exists in the expression. To use priorities properly:
x Higher priorities are performed before lower priorities
x If consecutive operators have the same priority to be
performed:
o from right to left within priority I
o from left to right within priorities II and III
x The control of operations can be done with parentheses.

Comparison operators express a condition. Comparison operators


include:
x equal to (= or eq)
x not equal to (^= or ne)
x greater than (> or gt)
x less than (< or lt)
x greater than or equal to (>= or ge)
x less than or equal to (<= or le)
x equal to one of a list (in)

Logical operators create a link between sequences of expressions.


Logical operators (boolean) include:
x both and (AND or &)
x either or (OR or |)
x not (^)

Data values can be assigned to variables using date constants.


Constants are represented by 'ddmmmyy' or ddmmmyyyy' followed
by a D
x Format:
'ddmmmyyyy' D

Copyright The Art of Service


25 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Some situations require processing to be done only when an
observation meets a specific condition, which uses a subsetting IF
statement to indicate the condition. To express a condition:
x Use the IF statement
x Format:
IF expression;
o expression is any valid SAS expression:
o If the expression is true the DATA step continues
o If the expression is false, no further statements are
processed.

7.2.4 Using Instream Data Lines


The INFILE statement identifies an external file to read. The
DATALINES statement identifies data in the SAS program to be read,
and is the last statement in the DATA step.

To read instream data lines:


x Use the DATALINES statement
x Format:
DATALINES;
Only one DATALINES statement for each DATA
step

x CARDS is an alias for DATALINES and can be used as


well
x If the data contains semicolons, use DATALINES4;;;;

7.2.5 Creating a Raw Data File


To create a raw data file:
x Use the SET statement
x Format:
DATA _null_;
SET dataset;
_null _ allows the DATA step to be used to without
creating a SAS data set
SET identifies the SAS data set to read
Writing observations from the data set into the raw data can be done
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 26
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
using FILE and PUT statements.
x The FILE statement specifies the output file.
x The PUT statement how the lines should be written to the
file.

To specify an output file:


x Use the FILE statement
x Format:
FILE file-specification;
o file-specification takes the form of fileref from a
previously defined file reference or 'filename' to
point to the actual name and location of the file
o options describes the characteristics of the input
file and how the raw data should be read.

To describe the lines written to the raw data:


x Use the PUT statement
x Format:
PUT variable startcol-endcol...;
o variable is the name of the variable being written
o startcol identifies the line to begin writing the value
o endcol identifies the line to end writing the value.

7.3 Processing DATA Steps

When a DATA step is submitted, the SAS processes the DATA step
and creates a new SAS data set. The process consists of two
phases:
x The compilation phase scans each statement for errors.
An error will prevent further processing. When complete
with no errors, the descriptor portion of the data set is
created
x The execution phase, the DATA step reads and processes
the input data. By default, the DATA step is executed
once for each record in the input file.

Copyright The Art of Service


27 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.3.1 Compiling the DATA Step
To start the compilation phase, the input buffer is created:
x The input buffer is an area of memory used to hold a
record from the external file
x It is only required when reading raw data
x Refers to logical, not physical, memory.

The creation of a program data vector begins after the input buffer is
created. The program data vector is an area of memory where the
SAS will build the data set one observation at a time:
x Refers to logical, not physical, memory
x Contains two variables automatically which are used for
processing but not written to the data set
x _N_ is the number of times the DATA step begins
execution
x _ERROR_ is the occurrence of an error caused by data.
The default is 0, identifying no errors. One of more errors
existing, the value is 1.

Each statement is scanned for syntax errors, which include:


x missing or misspelled words
x invalid variable names
x missing or invalid punctuation
x invalid options.

INPUT statements add a slot to the program data vector for each
variable in the new data set.
Variable attributes are identified when the variable is first
encountered.

At the end of the DATA step, the compilation phase is complete.


The descriptor portion of the new data set is created, which includes:
x the name of the data set
x the number of observations and variables
x the name and attributes of the variables.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 28
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.3.2 Executing the DATA Step
The execution phase will create the data portion of the data set which
contains the data values. The input raw data file is read from the
program data vector. Observations are new data that are written out.
One observation is created for each record being inputted.

At the start of the execution phase, the file is checked for errors.
x _N_ is set at value 1.
The INFILE statement is read to identify the location of the raw data.
Any INPUT statements are read from the input buffer:
x an input pointer will track of the reading position
x the input statement starts at column 1 by default.
At the end of the DATA step:
x Values in the program data vector are written as the first
observation
x The value of _N_ is incremented by 1 and a return to the
top of the DATA step
x Variable values in the program data vector are re-set to
missing.

The DATA step becomes a loop with each pass through called an
iteration.

The execution phase continues until the end-of-file marker in the raw
data file is reached.

Copyright The Art of Service


29 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.4 Debugging DATA Steps

7.4.1 Errors During Compilation


During the compilation phase, many errors can be detected,
including:
x misspelled keywords and data set names
x missing semicolons
x unbalanced quotation marks
x invalid options.

The SAS will interpret some syntax errors. If the error is not
interpreted:
x prints the word ERROR in the log followed by the error
message
x compiles the step, but does not execute
x prints a warning - NOTE: The SAS System stopped this
step because of errors.

7.4.2 Errors During Execution


Errors can occur during the Execution phase. How errors are dealt
with is dependent on the type of error. Actions taken can include:
x A note warning, or error message in the log
x Values stored in the program data vector displayed in the
log
x Processing continues
x Process stops.

7.4.3 Performing Testing


DATA steps can be compiled and executed without creating a
dataset, by using a NULL keyword. Used to detect common error
messages and saves on development time.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 30
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To create a NULL Data Set:
x Use the DATA statement
x Format:
DATA _null_;

All compilation and execution errors are written to the log, without
creating a data set. After correcting errors, the NULL statement can
be replaced with the appropriate data set name.

The OBS= option in the INFILE statement can limit the number of
observations that are read or created during execution.

The PUT statement can be used to examine variable values. A


personal message can be created in the log.

IF-THEN/ELSE statements can be used to conditionally test for


values.

7.4.4 Creating Message Strings


Several forms of messages can be added to the log using the PUT
statement. To specify a character string, enclose the string with
quotations marks:
x Format:
PUT 'NOTE: Character Strings' ;

To specify one or more data set variables for a single iteration of the
DATA step, add the expression 'code type':
x Format:
PUT 'NOTE: Data Set Variables:'
code type;
o This expression will display only the value of the
variable.
x Format:
PUT 'NOTE: Data Set Variables:'
code= type=;
o This expression will display the name and value of
the variable.

Copyright The Art of Service


31 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To specify the automatic variables, _N_ and _ERROR_:
x Format:
PUT 'NOTE: Automatic variables:'
code= _n_= _error_=;

To specify all variables and automatic variables, use the _all_


expression:
x Format:
PUT 'NOTE: All variables:' _all_;

Conditional processing can be used to flag errors or out of range


data, using IF-THEN/ELSE statements:
x Format:
IF desired option THEN
desired action
ELSE PUT 'NOTE: Conditional processing';

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 32
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.5 SAS Data Sets

SAS Data Sets are created from either raw data sets or other SAS
data sets.

7.5.1 Reading SAS Data Sets


To create a data set, the libraries have to be referenced first:
x The library where the data is stored
x The library where the data will be stored
Then DATA and SET statements are used to create a data set using
values from another data set.

To create a data set from values found in another data set:


x Use the DATA statement and SET statement
x Format:
DATA SAS-data-set
SET SAS-data-set
RUN;
o SAS-data-set in the DATA statement is the libref.
filename of the SAS data set to be created
o SAS-data-set in the SET statement is the libref.
filename of the SAS data set to be read

Sometimes, the data to be used needs some manipulation. Some


common actions include:

IF variable condition Use a subset of data.


THEN action; Variable conditions can be <, >, <=, >=,
IF variable limit; Variable limits are typically equalities or
inequalities.
DROP variable1, Drop any unwanted variables.
variable2...;
NewVariable = Create or modify a variable.
variable condition1;
LENGTH variable; Specify a variable's length.

Copyright The Art of Service


33 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
IF variable condition Conditional statements.
THEN action;
ELSE alternate action;
LABEL variable; Label a variable.
FORMAT variable; Format a variable.

DROP= and KEEP+ statements are options that can be used


whenever data sets are named, namely either the DATA or SET
statements:
x If a variable should be processed but not appear in the
new data set, use the DROP= option in the DATA
statement
x If a variable should not be processed nor appear in the
new data set, use the DROP= option in the SET
statement.

A BY statement can be used to group observations:


x Datasets can be sorted by the value of one or more
variables specified from a BY statement
x Two temporary variables can be created: FIRST.variable
and LAST.variable. The value for both is either 1 or 0 and
they identify the first and last observation in each BY
group.

Multiple BY variables can be used to assign conditions to data sets.


The result of multiple BY variables:
x FIRST.variable for each variable is set to 1 for the first
occurrence of a new value
x A change in the value of a primary BY variable forces
LAST.variable to be 1 for secondary BY variables.

Observations in the input data set are read as they appear in the
physical file, or sequentially.
Sequential reading can be bypassed using a POINT= option.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 34
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To read observations directly:
x Use the POINT= statement
x Format:
POINT=variable;
o variable specifies a temporary numeric variable
containing the observation number of the desired
observation and must be given a value before
executing a SET statement.

Reading observations continues until an end-of-file marker is


reached: the use of a POINT= option will force the observation
reading to never reach the end-of-file marker.
The condition causes continuous looping; to resolve:
x Use a STOP statement
x Use programming logic that checks for an invalid value of
the POINT= variable

7.5.2 Writing Observations


To override the way the DATA step writes observations to output, an
OUTPUT statement can be used. Observations are added to the
data set only when the OUTPUT statement is executed.

To control the observations written to a data set:


x Use the OUTPUT statement
x Format:
DATA SAS-data-set;
SET SAS-data-set;
OUTPUT <SAS-data-set(s)>;
STOP;
o SAS-data-set(s) names the data set(s) where
observations will be written
o If a data set is not named, the OUTPUT statement
impacts all data sets named by the DATA step.

Copyright The Art of Service


35 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
In some cases, identifying the end of the data set is required to
control the observations that are written, such as identifying totals.
To identify the last observation in a DATA set:
x Use the END= option
x Format:
END= variable
o variable is the creation of a temporary variable
containing an end-of-file marker. The variable is
initialized at 0 and set to 1 when the SET
statement reads the last observation of the data
set.

The variable specified in the END= option will not be added to the
data set. END= and POINT= cannot be used in conjunction.

7.5.3 Running a Data Step


Reading SAS data sets is similar to reading raw data, except that
SAS retains the values of the variables from each observation.
Includes a compilation and execution phase.

Compilation steps:
x Program data vector is created
x Each statement is scanned for syntax errors
x Any created variables are added to the program data
vector
x At the bottom of the DATA step, compilation ends and the
descriptor portion of the new SAS data set is created.

Execution steps:
x DATA step executes for each observation in the input data
set
x The value of _N_ is 1 and the value of _ERROR_ is 0; all
other variables are initialized to missing
x The SET statement reads the first observation from the
input data set and write the values to the program data
vector
x Any assignment statements are executed
x At the end of the first iteration of the DATA step, the
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 36
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
values in the program data vector are written to the new
data set
x The value of _N_ is set to N+1
x The values of variables read from the SAS data set with
the SET statement are retained and all other variables set
to missing
x The process continues until all observations are read.

7.6 Combining SAS Data Sets

Methods of combining data sets:


x One-to-one reading – will create observations that contain
all variables from each contributing data set based on
relative position in each data set
x Concatenating – Appends observations from one data set
to another
x Interweaving – Intersperses observations from two or
more data sets, based on one or more common variables
x Match-merging – Matches observations from two or more
data sets into a single observation in a new data set based
on the values of a common variable.

7.6.1 One-to-One Reading


To create a data set combining multiple data sets using one-to-one:
x Use the DATA statement and multiple SET statements
x Format:
DATA output-SAS-data-set;
SET SAS-data-set-1;
SET SAS-data-set-n;
RUN;
o output-SAS-data-set names the data set to be
created
o SAS-data-set-n names the data sets to be read.

Copyright The Art of Service


37 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
The criteria used by one-to-one readings to select data:
x The new data set contains all variables from all input data
sets
x If data sets have variables of the same name, the values
from the last data set overwrite the values read from
previous data sets
x The number of observations in the new data set will be the
same as the number of observations in the smallest
original data set
x Observations are combined based on their relative
position, that is, the first observations from each data set
are combined, and so on.

7.6.2 Concatenating
To create a data set combining multiple data sets using
concatenation:
x Use the DATA statement and the SET statement
x Format:
DATA output-SAS-data-set;
SET SAS-data-set-1 SAS-data-set-n;
RUN;
o output-SAS-data-set names the data set to be
created
o SAS-data-set-n names the data sets to be read.

The criteria used by Concatenation readings to select data:


x The new data set contains all variables and observations
from all input data sets
x All observations are read from the data sets listed in the
SET statement
x Conditions to watch for:
o Instances of a common variable must have the
same type attribute, or processing is stopped and
an error message issued
o Common variables with different attributes other
than type will retain the attribute from the first data
set containing the variable, including length, label,
format, and informat.
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 38
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.6.3 Interweaving
To create a data set combining multiple data sets using interweaving:
x Use the DATA statement and the SET statement
x Format:
DATA output-SAS-data-set;
SET SAS-data-set-1 SAS-data-set-n;
BY variable(s);
RUN;
o output-SAS-data-set names the data set to be
created
o SAS-data-set-n names the data sets to be read
o variable(s) specifies one or more variables used to
interweave.

The criteria used by Interweaving readings to select data:


x Observations in each BY group in each data set is read
sequentially in the order of the data sets and BY variables
are listed
x The new data set contains all variables from all input data
sets and contains the total number of observations from all
input data sets
x Observations are grouped by a common variable.

Copyright The Art of Service


39 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.6.4 Simple Match-Merging
To create a data set combining multiple data sets using match-
merging:
x Use the DATA statement and the MERGE statement
x Format:
DATA output-SAS-data-set;
MERGE SAS-data-set-1 SAS-data-set-n;
BY <DESCENDING> variable(s);
RUN;
o output-SAS-data-set names the data set to be
created
o SAS-data-set-n names the data sets to be read
o variable(s) specifies one or more variables used to
merge
o DESCENDING identifies if the input data set
should be sorted from largest to smallest. The
absence of DESCENDING has the input data set
sorted in ascending order.

The criteria used by Interweaving readings to select data:


x The new data set contains all observations from all input
data sets, though options can be in place that would
reduce the number of observations found in the output
x Each observation of each data set is checked sequentially
for matching BY values and writes the combined
observation to the new data set
x If an input data set doesn't have any observations for a
particular value for a same-name variable, missing values
will be displayed in the output.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 40
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.7 Processing Match-Merge

7.7.1 Match-merge Compilation and Execution


To compile data sets used for a merge-match:
x Descriptor portions of all data sets listed in the MERGE
statement are read
x The remaining DATA step program is read for variables
and options impacting the merge
x A program data vector is created for the merged data set
x A tracking pointer is assigned for each data set listed in
the MERGE statement.

After compilation, the observations are sequentially match-merged by


moving the tracking pointers through each observation of each data
set and checking for BY values that match:
x If a match is found, the observation is written to the
program data vector in the order the data sets appear in
the MERGE statement.
x The value of same-named variable retain the value of the
last data set where the variable appears
x If a match is not found, the order of the values is
determined by SAS and the observation is written to the
program data vector.

By default, all observations are written to the program data vector,


including observations with missing data and no matching BY values.
Observations written to the PDV are written to the output data set.

If an observation contains missing values for a variable, the


observation in the output data set contains missing values.
If a BY variable exists, observations that have missing values will
appear at the top of the output data set.
If an input data set doesn't have a matching BY value, observations in
the output data set contains missing values for the variables.

Copyright The Art of Service


41 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
7.7.2 Renaming Variables
Same-named variables can appear in multiple data sets to be
merged, which will overwrite each other based on the order the data
set appears in the MERGE statement.
Sometimes, the name of a variable in one or more data set needs to
be renamed to prevent being overwritten.

To rename variables:
x Use the RENAME-data set option and the MERGE
statement
x Format:
DATA output-SAS-data-set;
MERGE SAS-data-set-1(RENAME=old-variable-
name=new-variable-name))
SAS-data-set-n;
BY variable(s);
RUN;
o output-SAS-data-set names the data set to be
created
o SAS-data-set-n names the data sets to be read
o variable(s) specifies one or more variables used to
merge
o RENAME=options follows the name of each data
set that contains the variable(s) to be renamed
o old-variable-name identifies the name of the
variable to be renamed
o new-variable-name identifies the new name for the
variable.

Variables can be renamed for data sets specified by the DATA and
SET statements.

7.7.3 Excluding Unmatched Observations


Match-merging combines all observations in all input data sets.
Sometimes, the desired result displays only those observations that
match for two or more specific input data sets.
Exclusion of unmatched observations can be performed using the
IN=data set option and a subsetting IF statement in the DATA step:
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 42
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
x The IN= data set option creates and names the variable
that identifies if a data set contributed data to the current
observation
x The subsetting IF statement checks the IN= values and
writes to the merged data set only those observations
where IN= is specified.

To exclude unmatched observations:


x Use the IN= option and the MERGE statement
x Format:
DATA output-SAS-data-set;
MERGE SAS-data-set-1(IN=variable1)
SAS-data-set-n (IN=variablen) ;
BY variable(s);
IF variable1=1 and variablen=1
RUN;
o output-SAS-data-set names the data set to be
created
o SAS-data-set-n names the data sets to be read
o variable(s) specifies one or more variables used to
merge
o IN=option, in parentheses, follows the data set
name
o variablen names the variable to be created
o new-variable-name identifies the new name for the
variable
o The value of the variable is 1 if the data set is
contributing to the current observation and 0 if it is
not
o IF statement provides the condition that an
observation is created only if both conditions are
met.

Variables can be dropped or kept by using the DROP= and


KEEP=data set options.

Copyright The Art of Service


43 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
8 Working with Data

Raw data can be organized in several ways:


x arranged in columns, or fixed fields
x arranged without columns, or free format.

Raw data can contain:


x standard data without any special characters
x nonstandard data with special characters.

SAS has three input styles


x column input
x formatted input
x list input.

8.1 Column Input

Column input allows fields to be read in any order. Character variable


values that contain embedded blanks can be read. Missing data does
not require a placeholder.

A blank field will be read as missing and will not cause other fields to
be misread. Fields or parts of a field can be re-read. Fields do not
have to be separated by blanks or other delimiters.

8.1.1 Numeric Data


When raw data is organized in fixed fields:
x use column input to read standard data only
x use formatted input to read both standard and
nonstandard data.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 44
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Standard numeric data values contain:
x numbers
x decimal points
x numbers in scientific notation
x positive and negative numbers.

Nonstandard numeric data values contain:


x special characters
x data and time values
x fractions
x integer binary
x real binary
x hexadecimal forms.

To read both standard and nonstandard data in fixed fields:


x Use an INPUT statement
x Format:
INPUT <pointer-control> variable informat.;
o pointer-control places the input pointer in the
appropriate column
o variable is the name of the variable to be created
o informat is the special instruction specifying how
SAS reads raw data.

Two column pointer controls are utilized:


x the @n moves the input pointer to a specific column
number
x the+n moves the input pointer forward to a column number
relative to the current position
x Format:
INPUT @n variable informat.;
x Format:
INPUT +n variable informat.;

Copyright The Art of Service


45 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
@n is an absolute pointer control.
@ moves to column n, the first column to be read.
When reading a record, the @n pointer can be moved forward and
backward.

+n moves the input pointer forward to a column number relative to the


current position.
The notation +(-n) will move the pointer control backwards.

8.1.2 Informats
Informats are used to explain how the raw data is read. The $w.
informat allows character data to be read. The dollar sign indicates
character only data. The w represents the field width, or number of
columns, of the data. The period ends the informat.

The w.d format allows standard numeric data to be read. The w


represents the field width, or number of columns, of the data.
If a decimal point exists with the raw data, that acts as one decimal.
The period acts as a delimiter. The optional d specifies the number of
implied decimal places (not necessary if the value already has
decimal places).

The COMMAw.d will read nonstandard numeric data, removing any


embedded:
x blanks
x commas
x dashes
x dollar signs
x percent signs
x right parentheses
x left parentheses.

8.1.3 Record Formats


How data is read by column input and formatted input processes can
be affected by the record format of the external file. Two common
record formats are fixed-length records and variable-length formats.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 46
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
A fixed-length record format has a predetermined number of columns.
The end-of-record marker exists after the last column. The typical
fixed-length record will have a length of 80 columns.

A variable-length record format has an end-of-record market after the


last field of each record. Variable-length records containing fixed-field
data might have values which are shorter than others or values that
are missing.

With varied data, the end of column can conflict with the end-of-
record marker causing errors. A PAD option can be used to bypass
common problems with reading fixed-field data in variable-length
records. The PAD option will add blanks to each record to give all
data lines the same length.

The default value of the maximum record length is determined by the


operating environment. The maximum record length can be changed
using the LRECL=option in the INFILE statement.

8.2 Reading Free Format Data

In free format records, the fields are often separated by a delimiter,


such as blanks or the pound sign.

A List input can read standard and nonstandard data in a free-format


record:
x Use an INPUT statement
x Format:
INPUT variable <$>;
o variable is the name of the variable to be read
o $ identifies the variable as a character variable.

By default, List input does not have specified column locations, so:
x all fields must be separated by at least one delimiter
x fields cannot be skipped or re-read
x the order for reading fields is from left to right.

Copyright The Art of Service


47 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
8.2.1 Processing List Input
List input processing will scan for values instead of reading from
specific columns.
x SAS reads the first field until it encounters a delimiter
x The delimiter indicates the end of the field, so the data
value is assigned to the first variable in the program data
vector
x Then the scan continues until the next delimiter is
encountered
x The data value for the next variable is assigned in the
program data vector
x This continues until all the fields have been read and
values are assigned to variables.

The data set can be displayed using the PRINT procedure.

Blanks as delimiters are easy to read. Other delimiters need to be


explicitly identified.

To specify a delimiter:
x Use an INFILE statement and DLM-option
x Format:
INFILE file-specification DLM=delimiter(s);
o file-specification takes the form of fileref from a
previously defined file reference or 'filename' to
point to the actual name and location of the file
o delimiter(s) specifies a delimiter for list input.

Delimiters can be specified in two ways:


x A 'list of delimiting characters' will specify one or more
characters (must be enclosed in quotation marks)
x character-variable specifies a character variable to be a
delimiter.
A delimiter should be the same as a character within the field values.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 48
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
When the variable values are sequential and separated by a
delimiter, variables can be expressed and specified in a range.

By default several limitations exist on the type of data that can be


read using list input:
x Character values that are longer than eight characters will
be truncated
x Data must be in standard numeric or character format
x Character values cannot contain embedded delimiters
x Missing numeric and character values must be
represented by a period or some other character.

8.2.2 Reading Missing Values


Missing values can exist anywhere in the data set. If missing values
occur at the end of the record, the MISSOVER option in the INFILE
statement can be used to read them. The MISSOVER option will
prevent the SAS from going to another record if values cannot be
found for every specified variable in the current line.

MISSOVER only works with missing values at the end of a record.


To begin to read missing values in the beginning or middle of the
record, the DSD option in the INFILE statement can be used.

DSD changes how delimiters are treated when using a list input:
x by setting the default delimiter to a comma
x treating two consecutive delimiters as a missing value
x removing quotation marks from values.

If multiple delimiters are used in the original file, they can be identified
by using the DLM=option.

Modifying List inputs allows it to be more versatile.


Two modifiers can be used:
x the ampersand is used to read character values that
contain embedded blanks
x the colon is used to read nonstandard data values and
character values longer than eight characters.

Copyright The Art of Service


49 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
8.2.3 Managing Length
The default length of character values is 8. Variables that are longer
than 8 are truncated when written to the program data vector.

Using a LENGTH statement before the INPUT statement will define


the length and type of the variable.

8.2.4 Creating Free-Format Data


To create free-format data:
x Use a PUT statement
x Format:
PUT variable <: format>;
o variable is the name of the variable to be written
o a colon precedes a format
o format specifies a format to use when writing
variables.

Using a DLM=option in the FILE statement can be used to be create


a character-delimited raw data file. If special characters are required
in the data format, use the DSD=option in the FILE statement,
including quoted strings.

To create a simple raw data file:


x Use a PROC EXPORT statement
x Format:
PROC EXPORT DATA=SAS-data-set;
OUTFILE=filename <DELIMITER='delimiter'>;
RUN;
o SAS-data-set names the input SAS data set
o filename identifies the complete path and file name
of the output
o delimiter identifies the delimiter used to separate
columns in the output file.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 50
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
8.3 Reading Data and Time Values

SAS stores a date value as the number of days from January 1, 1960
to a given date. SAS stores a time value as the number of seconds
since midnight.

The SAS datetime value is the number of seconds between midnight


on January 1, 1960 and a given data and time.

Data and Time informats are used to read data and time expressions
and convert them to SAS data and time values. Like other informats,
they consist of:
x a name
x a field width
x a period delimiter.

When specifying an informat:


x Use an INPUT statement
x Format:
INPUT <pointer-control> variable informat.;
o pointer-control places the input pointer in the
appropriate column
o variable is the name of the variable to be created
o informat is the special instruction specifying how
SAS reads raw data.

Some common data informats:


x MMDDYYw. - 11/15/09
x Dates. - 15Nov09
x MMDDYYw. - 11-15-09
x YYMMDDw. - 09/11/15
MM will always represent the month and be an integer between 01
and 12. DD will always represent the day and be an integer between
01 and 31. YY is the year, which can be represented YYYY. w is the
field width.

Copyright The Art of Service


51 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Date fields can be separated by blanks, hyphens, and slashes.
The field width should be large enough to include all delimiters.

x Time informat is expressed as hh:mm:ss:ss.


x The hh represents hours and be an integer between 00
and 23.
x The mm represents minutes and be an integer between 00
and 59.
x The ss:ss represents seconds and hundredths of seconds
and is optional.

The WEEKDATEw. format will display the day of the week, month-
name, day and year.

The WORDDATEw. format will write the data in the form month-
name, day, and year.

9 Working with Observations

In the simplest form, the information for a single observation will be


found in a single record.
In many cases, the information for a single observation can be found
in multiple records.
To access this information:
x write multiple INPUT statement to read each record
x write one INPUT statement contain a line pointer control to
specify the record(s) being read.

9.1 Creating One Observation From Multiple Records

9.1.1 Line Pointer Controls


When SAS reads raw data, its position is tracked using an input
pointer. Column pointer controls are used to determine the column
position of the input pointer. The input pointer can also be positioned
on a specific record using a line pointer control.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 52
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Two types of line pointer controls exist:
x the forward slash specifies the line location relative to the
current one
x the #n specifies an absolute number of the line for moving
the pointer.

9.1.2 Reading Multiple Records


To read multiple records sequentially:
x use the forward slash line pointer
x the / line pointer control will only move the input pointer
forward
x the / line pointer control is specified after instructions for
reading values in the current record.

When a DATA step executes, the values of the first record are read
and the input pointer is moved by the line pointer control to the
second record. The second record is read and the second line pointer
control moves the input pointer to the third record. This continues until
the INPUT statement is complete.

The values read are written to the data set as an observation.


Control returns to the top of the DATA steps and the variable values
are reinitialized to missing. The process continues starting where the
last iteration left off and continues until all records have been read.
The number of records for each observation must be the same.

To read records non-sequentially:


x The #n line pointer control specifies the absolute number
of the line where the input pointer will be moved
x This absolute condition allows records to be read in any
order
x The #n line pointer control must be specified before any
instructions for reading values in a record.

A SAS program can use the forward slash line pointer control and #n
line pointer control together to read records sequentially and non-
sequentially.

Copyright The Art of Service


53 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
9.2 Creating Multiple Observations From Single Records

In some cases, the information for multiple observations can be found


in a single record.
To manage the control for reading a record, a line-hold specifier is
required.
SAS provides two line-hold specifiers:
x a trailing at sign (@) will hold the input record for the
execution of the next INPUT statement
x a double trailing at sign (@@) holds the input record for
the execution of the next INPUT statement, even across
iterations of the DATA step.

Normally, each time a DATA step executes, the INPUT statement


reads a new record. When @@ is used:
x to read multiple SAS observations from single data line
x the data line is held in the input buffer across multiple
executions of the DATA step
x the @ pointer control, column input, nor the MISSOVER
option should be used.

A record being held by @@ will not be released until:


x the input pointer moves past the end of the record
x an INPUT statement that has no line-hold specifier
executes.

With a single @, reading records:


x will allow the next INPUT statement to read from the same
record
x release the current record when a subsequent INPUT
statement executes without a line-hold specifier.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 54
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10 Creating Reports

10.1 Basic Reports

The PROC PRINT statement can be used to create a printout, or


report, of the information in a data set. The report can be enhanced
with additional statements and options.

To print the contents of a SAS data set:


x Use the PROC PRINT statement
x Format:
LIBNAME libref 'SAS-data-library'
PROC PRINT data=SAS-data-set;
RUN;
o libref is 1 to 8 characters long beginning with a
letter or underscore and containing only letters,
numbers, or underscores
o SAS-data-library is the name of the library where
files are stored
o SAS-data-set is the name of the SAS data set to
be printed.

Default layout for a report includes:


x all observations and variables are printed
x observations numbers are listed in the far left column
x variables appear in the order they occur in the data set.

Statements and options for changing report characteristics


SUM variable(s) Produce column totals for numeric values
VAR variable(s); Select the variables to include in the report and the
order they appear
PROC PRINT Observation numbers are not displayed
data=libref
NOOBS
ID variable(s); Identifies one or more variables that will replace the
Copyright The Art of Service
55 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
OBS column. If a variable is listed in an ID statement
and a VAR statement, the output will have two
columns for the same variable
WHERE where- Identifies the conditions for selecting observations in
expression the report. The where-expression can be any SAS
expression
LABEL Change the label name of a variable
variable=new-
variable-name

10.1.1 WHERE Expressions


Any variable found in the SAS data set can be specified in the
WHERE statement, not just variables specified in the VAR statement.
Either character or numeric variables can be included in an WHERE
statement.

Conditions based on the value of a character variable must:


x be enclosed in quotation marks
x must be written exactly as it appears in the data set.

The operators, CONTAINS and ?, can be used with the WHERE


statement to select observations with a specific substring.

Multiple conditions can be used to select observations using logical


operators. When testing for multiple values of the same variable:
x The variable name is specified in each expression
WHERE variable=value1 OR variable=value2;
x The use of the IN operator
WHERE variable IN (value1, value2);
x Control the evaluation of compound expression, use
parentheses.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 56
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.1.2 Sorting Data
Default reports list observation in the order they appear in the data
set. To sort the report based on the values of a variable, use the
PROC SORT statement.

The SORT procedure will:


x rearrange observations in the SAS data set
x create a new SAS data set containing the rearranged
observations
x replace the original SAS data set
x sort on multiple variables
x sort in ascending or descending order
x treats missing values as the smallest value
x not generate printed output.

To perform a basic sort of variables:


x Use the PROC SORT statement
x Format:
PROC SORT DATA=SAS-data-set <OUT=SAS-data-set>;
BY <DESCENDING> BY-variable(s);
RUN;
o DATA= specifies the data set to be read
o OUT= specifies the output data set that contains
the data in sorted order
o BY-variable(s) specifies one or more variables
used to sort the data
o DESCENDING identifies if the input data set
should be sorted from largest to smallest. The
absence of DESCENDING has the input data set
sorted in ascending order.

Without the OUT= option, the data set specified is permanently


sorted.

Copyright The Art of Service


57 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.1.3 Creating Subtotals
Creating column totals for numeric variables is done using SUM
statement. To subtotal numeric variables, use a BY statement in
conjunction with the SUM statement.

To create a total and subtotals for numeric variables:


x Use the SUM statement and the BY statement
x Format:
SUM variable(s);
BY <<DESCENDING> <BY-variable(s)>>
<NOTSORTED>;
o variable(s) specifies one or more variables,
separated by blanks.
o BY-variable(s) specifies variables to be used to
form BY groups. More than one variable should be
separated by blanks
o DESCENDING identifies if the input data set
should be sorted from largest to smallest. The
absence of DESCENDING has the input data set
sorted in ascending order
o NOTSORTED specifies that the observations are
not necessarily sorted in any order.

Variables listed in the SUM statement are not required to be specified


in the VAR statement.

10.1.4 Creating a Customized Layout


Using the ID statement in conjunction with the ID and SUM
statements will show the BY variable heading only once:
If the variable specified by the IN statement is the same as the BY
statement, then:
x The OBS column is suppressed
x The ID/BY variable is printed in the far left column
x Each ID/BY value is printed at the start of each BY group
and on the same line as the group's subtotal.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 58
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Each BY group can be printed on a separate page using the
PAGEBY statement:
x Format:
PAGEBY BY-variable;

To double space the report, use the DOUBLE option in the PROC
PRINT statement.

10.1.5 Titles and Footnotes


Reports can be created so they are easy to interpret, including:
x adding titles and footnotes
x replacing variable names with descriptive labels
x formatting variable values.

Up to 10 titles with procedure output can be specified using the TITLE


statement before the PROC step.
Up to 10 footnotes can be specified using the FOOTNOTE statement
before the PROC step.
x Format:
TITLE <n> 'text';
FOOTNOTE <n> 'text';
PROC PRINT SAS-data-set;
o n is a number from 1 to 10 which specifies the title
or footnote
o 'text' is the title or footnote to be displayed.

x The maximum length is dependent on the operating


environment and the value of the LINESIZE= option
x TITLE is like TITLE1; FOOTNOTE is like FOONOTE1
x The default tile is The SAS System
x There is no default footnote.

Blank lines can be created in the title or footnote by skipping title


expressions; such as title1, title3.

Copyright The Art of Service


59 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
When multiple titles occur:
x In HTML output, the lines appear consecutively without
extra spacing
x In SAS listing output, lines are centered by default.

When multiple footnotes occur:


x The FOOTNOTE statement with the largest number
appear on the bottom line
x In HTML output, the lines appear consecutively without
extra spacing
x In SAS listing output, lines are centered by default.

TITLE and FOOTNOTE statements are global statements and are in


place until they are modified, canceled, or the SAS session ends.

When redefining a title or footnote, all higher-numbered titles or


footnotes are canceled. A null TITLE or FOOTNOTE statement has
no number or text.

10.1.6 Assigning Descriptive Labels


Columns can be labeled with descriptive text.
To label columns, use:
x the LABEL statement to assign a descriptive label to a
variable
x the LABEL option in the PROC PRINT statement to
specify the labels to be displayed

x Format:
LABEL variable1='label1';
variable2='label2';
... ;

Variables can be assigned separate LABEL statements or be


grouped with several variables assigned a single LABEL statement.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 60
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.1.7 Assigning Formats
To associate formats with variables, use the FORMAT statement.
Formats only affect the output of the data values, not the actual data
values stored.

To format a variable:
x Format:
FORMAT variable(s) format-name;
o variable(s) is a name of one or more variables
whose values are to be written according to a
particular pattern
o format-name specifies a SAS or user-defined
format for the values.

Variables can be assigned separate FORMAT statements or be


grouped with several variables assigned a single FORMAT
statement.

Common SAS formats:


w. Rounded to the nearest integer in w spaces.
w.d Rounded to d decimal places in w spaces.
$w. Character values in w spaces.
COMMAw.d Contains commas and decimal places.
DOLLARw.d Contains dollar signs, commas and decimal spaces.
DATEw. Date values in the form 12/12/10 (DATE7.) or
12DEC2010 (DATE9.)
MMDDYYw. Date values in the form 12DEC10 (DATE8.) or
12/12/2010 (DATE10.)

All SAS formats specify a total field width (w).


The field width includes the numeric digits and commas.
Stored Value 4060
Desired Format COMMAw.d

Copyright The Art of Service


61 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Displayed Value 4,060
Positions 5
Displayed
FORMAT format variable
Statement comma5.0;

The number of decimal places used for a numeric decimal can be


specified (D).
Stored Value 4060
Desired Format COMMAw.d
Displayed Value 4,060.00
Positions 8
Displayed
FORMAT format variable
Statement comma8.2;

Currency is displayed with appropriate dollar signs, commas, and


decimal spaces.
Stored Value 4060
Desired Format DOLLARw.d
Displayed Value $4,060.00
Positions 9
Displayed
FORMAT format variable
Statement dollar9.2;

To temporarily assign a label or format to the data output, use the


LABEL or FORMAT statements within the PROC PRINT step. To
permanently assign a label or format to the data , use the LABEL or
FORMAT statements within the DATA step.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 62
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.2 Advanced Reporting

To create advanced forms of reports, use the PROC REPORT


statement.
Types of customization allow:
x Request separate subtotals and grand totals
x Calculate columns
x Create and store report definitions.

The PROC REPORT statement can be used:


x In a windowing mode with prompts to guide reporting
building
x In a windowing mode without the prompts
x In a nonwindowing mode.

10.2.1 Default List Reports


To create a basic list report:
x Use the PROC REPORT statement
x Format:
PROC PRINT <data=SAS-data-set> <options>;
RUN;
o SAS-data-set is the name of the SAS data set to
be used for the report
o options provides the report in a window or non-
windowing environment.

The COLUMN statement identifies the variables displayed in the


report and their order of appearance.
x Format:
COLUMN variable(s);

The WHERE statement identifies the observations, or rows, to be


selected based on a condition.

Copyright The Art of Service


63 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.2.2 Defining Variables
Reports can be enhanced by:
x defining how variables can be used in the report
x assigning formats to variables
x specifying column headings and widths
x justifying the variable values
x justifying column headings
x changing the order of rows.

The DEFINE statement is used to describe the use and display of


variables.
To define variables:
x Use the DEFINE statement
x Format:
DEFINE variable/ <usage> <attribute(s)> <option(s)>
<justification> <”column heading”>;
o variable is the name of the variable to be defined
o usage specifies how a variable is to be used. Valid
options are ACROSS, ANALYSIS, COMPUTED,
DISPLAY, GROUP, and ORDER
o attribute(s) specifies the attributes of the variables,
including FORMAT=, WIDTH=, and SPACING=
o option(s) specifies formatting options, including
DESCENDING, NOPRINT, NOZERO, and PAGE
o justification specifies the justification of the column
(CENTER, LEFT, or RIGHT)
o “column heading” specifies the label for the
column.

Format controls the appearance of data in the report. Any SAS or


user-defined format can be used with FORMAT=.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 64
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
By default, the column width is the length of the variable for character
variables or 9 for numeric variables.
x The WIDTH=attribute can be used to specify the width for
the columns in the report
x The value of WIDTH can be 1 to the value of the
LINESIZE= system option
x WIDTH= attribute has no effect on HTML output.

Spacing of columns can allow reports to be easier to read. Defining


spacing allows the number of blank characters between the selected
column and the column immediately to the left to be specified.
x The default spacing is 2
x The spacing can be redefined using SPACING= attribute
x The SPACING= attribute has no effect on HTML output.

Column headings can be changed by enclosing the heading text in


quotation marks. How words break in column headings can be
controlled using a split character. To use a split character:
x use the default slash (/)
x use the SPLIT= option and define a split character.

By default, PROC REPORT will left-justify character variables and


right-justify numeric variables. To define a different justification,
specify the desired option (right, left, center) in the DEFINE
statement. The option will justify the column headings and the values
within the column width.

The appearance of the headings found in the list report can be


changed using two options:
x HEADLINE underlines all column headings and the
spaces between them
x HEADSKIP creates a blank line beneath all column
headings or after the underline if the HEADLINE option is
used.

These options have no effect on HTML output.

Copyright The Art of Service


65 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.2.3 Variable Usage
How variables are used will change the layout of the report.
Each variable can be used:
x DISPLAY – default for character variables
x ORDER
x GROUP
x ACROSS
x ANALYSIS – default for numeric variables
x COMPUTED.

Display variables don't affect the order of rows in the report. If one or
more display variables are in the report, a detail row for each
observation is read from the data set. A detail contains a value for
each display variable.

Order variables determine the order of rows in the report. The detail
rows are ordered according to their formatted values. By default, the
order is ascending: to change this, use the DESCENDING option.

Summary reports are possible using group variables. A group


variable will group the detail rows in a report according to their
formatting value. The PROC REPORT will consolidate the relevant
observations into one row.

To create a summary report, all variables in the report have to be


defined as group, analysis, across, or computed. If a variable in the
report is not defined or defined as display or group, a group variable
is displayed as an order variable.

The default statistic for analysis variables is SUM. To associate a


statistic with an analysis variable, the statistic should be specified as
an attribute in the DEFINE statement.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 66
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Possible statistical attributes:
CSS Corrected sum of squares
CV Coefficient of variation
MAX Maximum value
MEAN Average
MIN Minimum value
N Number of observations with nonmissing values
NMISS Number of observations with missing values
PCTN Percentage of a cell or row frequency to a total frequency
PCTSUM Percentage of a cell or row sum to a total sum
PRT Probability of a greater absolution value of student's t
RANGE Range
STD Standard deviation
STDERR Standard error of the mean
SUM Sum
SUMWGT Sum of the Weight variable values
T Student's t for testing the hypothesis that the population mean
is 0
USS Uncorrected sum of squares
VAR Variance

Across variables are functionally similar to group variables. Groups


are created horizontally, not vertically, with across variables. Analysis
variables will have all values for a variable summed when using
across variables.

Computed variables can be associated to character or numeric


variables. Computed variables are not in the input data set, and they
are not added to the input data set by the PROC REPORT. The use
of a computed variable cannot be changed.

Copyright The Art of Service


67 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Computed variables can be added in a nonwindowing environment:
x Include the computed variable in the COLUMN statement
x Define the usage as COMPUTED
x Compute the value of the variable in a compute block
associated with the variable.

10.3 Descriptive Statistics

Descriptive statistics are used to summarize large amounts of data.


The type of descriptive statistics required and the SAS procedure
used is dependent on what needs to be summarized.
Continuous data values or discrete data values can be summarized.

10.3.1 Computing Numeric Variables


The MEANS procedure is used to provide mean, minimum, and
maximum and other data summarization to control the output of the
report.

To create a MEANS procedure:


x Use PROC MEANS statement
x Format:
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>;
RUN;
o SAS-data-set is the name of the SAS data set used
o statistic-keyword(s) identifies the statistics to
compute
o options will control the content, analysis, and
appearance of the output.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 68
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
In its simplest form, PROC MEANS will print for every numeric
variable in a data set:
x the number of nonmissing values (n-count)
x the mean
x the standard deviation
x the minimum value
x the maximum value.

To limit the output, the use of statistic keywords will focus the output.
DESCRIPTIVE STATISTICS
CLM Two-sided confidence limit for the mean
CSS Corrected sum of squares
CV Coefficient of variation
KURTOSIS Kurtosis
/ KURT
LCLM One-sided confidence limit below the mean
MAX Maximum value
MEAN Average
MIN Minimum value
N Number of observations with nonmissing values
NMISS Number of observations with missing values
RANGE Range
SKEWNES Skewness
S /SKEW
STDDEV Standard deviation
/STD
STDERR / Standard error of the mean
STDMEAN
SUM Sum
SUMWGT Sum of the Weight variable values
UCLM One-sided confidence limit above the mean
USS Uncorrected sum of squares
VAR Variance
Copyright The Art of Service
69 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
QUANTILE STATISTICS
th
MEDIAM / Median or 50 percentile
P50
st
P1 1 percentile
th
P5 5 percentile
st
P10 10 percentile
th
Q1 / P25 Lower quartile or 25 percentile
th
Q3 / P75 Upper quartile or 75 percentile
st
P90 90 percentile
st
P95 95 percentile
st
P99 99 percentile
QRANGE Difference between upper and lower quartiles: Q3-Q1
HYPOTHESIS TESTING
PROBT Probability of a greater absolute value for the t value
T Student's t for testing the hypothesis that the population mean
is 0

10.3.2 Formatting Statistics


By default, PROC MEANS uses the best format to display values
which can lead to unnecessary decimal places. MAXDEC= can limit
the number of decimal places displayed.

By default, statistics are generated for every numeric variable in a


data set. To focus on specific variables, a VAR statement can be
used.

By default, the statistics will be applied to all observations in a data


set. To apply statistics to groups of observations, add a CLASS
statement.

The values of CLASS variables are used only to categorize data,


therefore, statistics are not generated for CLASS variables. They can
be either character or numeric, but must have a limited number of
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 70
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
discrete values to be grouped meaningfully. The order of the
variables in the CLASS statement determines their order in the
output.

BY statements can specify variables for categorizing like CLASS


statements. Differences include:
x BY processing requires that data is already sorted or
indexed. The SORT procedure must be used before using
PROC MEANS
x The layout of group results are different: A CLASS
statement will create a single table, while a BY statement
will create multiple tables.

10.3.3 Creating Summarized Data Sets


A SAS data set can be created that contains only a summarized
variable.
To create summarized DATA set using PROC MEANS:
x Use PROC MEANS statement and OUTPUT statement
x Format:
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>;
OUTPUT OUT=SAS-data-set-out <statistic-
keyword=variable-name(s)>;
RUN;
o SAS-data-set is the name of the SAS data set to
be used
o statistic-keyword(s) identifies the statistics to
compute
o options will control the content, analysis, and
appearance of the output
o SAS-data-set-out is the name of the output SAS
data set
o statistic-keyword identifies the summary statistic to
be written out
o variable-name(s) identifies the names of the
variables created to contain the values of the
summary variables.

Copyright The Art of Service


71 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
When the statistic-keyword= option is not used, the summary
statistics produced will be N, MEAN, STD, MIN, and MAX for all
numeric variables listed in the VAR statement or all numeric variables
is no VAR statement exists.

The keyword for the statistics must be specified and then all the
variables to be computed. If a VAR statement exists, the variables
must be listed in the same order as the variables in the VAR
statement.

Summarized output data set can be created using the SUMMARY


procedure. The PROC SUMMARY uses the same code to produce
output data as PROC MEANS. The difference between PROC
MEANS and PROC SUMMARY is the use of defaults: PROC
MEANS produces a report by default, PROC SUMMARY must
include a PRINT option.

10.4 Creating Frequency Tables

To describe the data by reporting the distribution of variable values,


the FREQ procedure should be used. The FREQ procedure is a
descriptive procedure and a statistical procedures. One-way and n-
way, or crosstabulation, frequency tables can be created.

The FREQ procedures can include several statements and options


for controlling output.
To create a basic frequency report:
x Use PROC FREQ statement
x Format:
PROC FREQ <DATA=SAS-data-set>;
RUN;
o SAS-data-set is the name of the SAS data set to
be used.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 72
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
By default, a one-way table is created with the frequency, percent,
cumulative frequency, and cumulative percent of every value of all
variables in a data set.
x Frequency is the number of observations with the desired
value
x Percent is the frequency of the value divided by the total
number of observations
x Cumulative frequency is the sum of the frequency counts
of the value and all other values above it in the table
x Cumulative percent is the cumulative frequency of the
value divided by the total number of observations.

By default, the FREQ procedure will create frequency tables for every
variable in the data set. Frequency distributions work best with
variables that can be categorized and whose values are best
summarized by counts.

To identify the variables to be processed by the FREQ procedure:


x Use TABLES statement
x Format:
TABLES variable(s);

10.4.1 Creating N-Way Tables


Crosstabulation is used when two distinct, but related, variables are
compared. The most basic form of crosstabulation is a two-way table.

To create an n-way table:


x Use PROC FREQ statement and TABLES statement
x Format:
PROC FREQ <DATA=SAS-data-set>;
TABLES variable-1* variable-2 <*.. variable-n>;
RUN;
o SAS-data-set is the name of the SAS data set to
be used
o variable-1 identifies table rows
o variable-2 identifies table columns
o variable-n identifies a multi-way table.

Copyright The Art of Service


73 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
When specifying crosstabulation, PROC FREQ produces tables with
cells that contain:
x cell frequency
x cell percentage of total frequency
x cell percentage of row frequency
x cell percentage of column frequency.

When creating n-way tables, a series of two-way tables are produced


with a table for each level of the variables.

10.4.2 Table Layout


To read n-way tables, the order of variables is an important
consideration. The last two variables of the TABLES statement will
become the two-way rows and columns. A table will be created for
each value of the of the variables preceding the last two variables.

A CROSSLIST option will display crosstabulation tables in ODS


column format. CROSSLIST creates a table with a customizable table
definition. A table definition is customized using the TEMPLATE
procedure.
x Format:
TABLES variable-1* variable-2 <*.. variable-n>
/CROSSLIST;

A LIST option will allow bulky, complex crosstabulations to be read as


a continuous list - great for n-way tables that have three or more
variables specified. It eliminates row and column frequencies and
percents
x Format:
TABLES variable-1* variable-2 <*.. variable-n> /LIST;

The output of the FREQ procedures can be limited to a few specific


statistics:
x NOFREQ suppresses cell frequencies
x NOPERCENT suppresses cell percentages
x NOROW suppresses row percentages
x NOCOL suppresses column percentages.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 74
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
10.5 Producing HTML Output

SAS uses Output Delivery System (ODS) statements to generate


HTML output.
With ODS, HTML output in any operating environment can be
created, customized, and managed through programming statements.

10.5.1 Output Delivery System


Output can be created in a variety of formats with ODS:
x HTML output
x Output data set of procedures results
x Traditional SAS listing output.

When ODS statements are submitted and the output is created by the
SAS program:
x ODS creates the output in the form of output objects; each
output object containing:
o Data component - the results of a procedure or
DATA step
o Table definition – information about how the results
are rendered.
x The output object is sent to specified ODS destination(s)
and creates the formatted output controlled by the
destination.

ODS destinations supported include:


x HTML – output formatted in HTML
x Listing – output formatted like traditional SAS procedure
output
x Markup Language Family – output formatted in markup
languages, such as XML
x ODS Document – hierarchy of output objects
x Output – SAS data sets
x Printer Family – output formatted in PS, PDF, or PCL files
x RTF – output formatted in RTF format using Microsoft
Word.
Copyright The Art of Service
75 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
For each type of formatted output created, an ODS statement is
required to open the destination and another ODS statement to close
the destination. The default destination is Listing.

To open and close ODS destinations:


x Use ODS statement
x Format:
ODS open-destination;
ODS close-destination CLOSE ;
o open-destination is the keyword and any required
options for the output type to be created
o close-destination is the keyword for the type of
output.

Since the default destination is Listing, it is considered always open.


Best practices would close the listing destination at the beginning of
the program.

If multiple ODS destinations are open concurrently, they can be


closed at the same time using the keyword _ALL_.

10.5.2 Creating Simple HTML


To create HTML output:
x Use ODS HTML statement
x Format:
ODS HTML BODY=file-specification;
ODS HTML CLOSE ;
o file-specification identifies the file containing the
HTML output. The specification can be:
ƒ an HTML filename
ƒ a fileref
ƒ a SAS catalog entry.

ODS HTML statements can be used to direct the output from multiple
procedures to the same HTML file.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 76
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To create HTML output with a table of contents:
x Use ODS HTML statement
x Format:
ODS HTML
o BODY=body-file-specification;
o CONTENTS=contents-file-specification;
o FRAME=frame-file-specification;
ODS HTML CLOSE ;
ƒ body-file-specification is the name of the
HTML file containing the procedure output
ƒ contents-file-specification is the name of the
HTML file containing a table of contents
with links to the procedure output
ƒ frame-file-specification is the name of the
HTML file connecting the table of contents
with the body file

x A CONTENTS= option is required if FRAME= is used


x A table of contents will contain a numbered heading for
each procedure that creates output
x Links are generated between files by using HTML
filenames specified in the ODS HTML statement.

The URL=suboption in the BODY or CONTENTS file specification will


allow an URL to be used for all links that is created to the file. Either
relative URLs or absolute URLs can be specified.

PATH=option will allow the specification of a location where HTML


output can be store and assists in streamlining the ODS HTML
statement.

STYLE=option can be used to change the appearance of the HTML


output, and valid SAS or user-defined style definition can be used.

Copyright The Art of Service


77 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11 Advanced Work with Variables

11.1 User-Defined Formatting

Raw data or data found in a basic SAS data set may not be easily
read in its basic form requiring additional formatting to clarify the
information.

A common occurrence is the existence of coded information: a set of


numerical expressions used to identify meaningful information.
PROC FORMAT can be used to replace the code with the information
it represents.

To define a specific format:


x Use PROC FORMAT statement
x Format:
PROC FORMAT <options>;
options include:
ƒ LIBRARY=libref identifies the SAS data
library which contains the catalog storing
user-defined formats
ƒ FMTLIB will print the contents of a format
catalog

All user-defined formats are stored in a format catalog. If one doesn't


exist, SAS will automatically create one named Work.Formats.
Work.Formats is a temporary library that exists until the end of the
SAS session.

A permanent format catalog named Formats can be used when


specified in the LIBRARY=option of the PROC FORMAT statement.
To use the Formats library, a LIBNAME statement is required to
associate the libref to the permanent library. When a permanent
format is associated with a variable, the libref is used to reference the
location of the format catalog.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 78
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11.1.1 Creating User-Defined Formats
To define format for displaying one or more values:
x Use VALUE statement
x Format:
VALUE format-name
range1='label1'
range2=label2'
...;
o format-name is the name of the format that is being
created
o range specifies one or more variable values and a
character string or existing format
o label is a text string enclosed in quotation marks.

A VALUE format name must:


x begin with a dollar sign if the format is applied to character
data
x not be longer than eight characters
x not be the name of an existing SAS formatting
x not end in a number
x not end in a period when in the VALUE statement.

A VALUE range will identify:


x a single value
x a range of numeric values
x a range of character values enclosed in quotation marks
x a list of unique values separated by commas.

Character values must be enclosed in quotation marks and match the


variable's values. The format's name must start with a dollar sign.

Numeric values do not need to be enclosed in quotation marks. The


format's name does not require a dollar sign.

Copyright The Art of Service


79 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Non-inclusive ranges can be specified:
x using the “less than” symbol to avoid overlapping, as in a-
<b
x using the keywords LOW and HIGH to identify lower and
upper limits, as in low-<b and a-<high
x using the keywords OTHER to label missing values.

Labels used to display ranges must be:


x enclosed in quotation marks (double quotations if an
apostrophe is part of the label)
x limited to 256 characters.

Multiple formats can be defined by using multiple VALUE statements


in a single PROC FORMAT statement.

Each VALUE statement will define a different format.

11.1.2 Using User-Defined Formats


Permanent, user-defined formats are stored in a format catalog.

When using a user-defined format, SAS will search in two libraries:


x the temporary library, Work
x a permanent library, Library
SAS uses the first instance of the format found.

Association of formats with variables is done using a FORMAT


statement:
x use the same format name in the FORMAT statement as
in the VALUE statement
x end the format name in the FORMAT statement with a
period.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 80
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11.1.3 Displaying User-Defined Formats
Top display a list of all the formats in the format catalog with the
descriptions of their values, the keyword FNTLIB can be added to the
PROC FORMAT statement.

The name, range, and label for each format is provided, as well as:
x the length of the longest value
x number of values defined by this format
x version of SAS the format is compatible with
x the data and time the format was created.

11.1.4 Creating Variables


Anew variable can be created to accumulate the values of multiple
variables.
To create an accumulator variable:
x Use SUM statement
x Format:
variable+expression;
o variable is the name of the accumulator variable
o expression is any valid SAS expression.

An accumulator variable must be numeric. The variable is


automatically set to 0 before reading the first observation.

The sum statement will add the results of the expression on the right
side of the plus sign to the numeric variable on the left. The numeric
variable will retain the new value in the program data vector until the
next observation is read.

By default, an accumulator variable is initialized to 0. To initialize to a


different number, use the RETAIN statement.

The RETAIN statement will:


x assign an initial value to a retained variable
x prevent variables from being initialized with each
execution of the DATA step
Copyright The Art of Service
81 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To initialize a variable to a value other than 0:
x Use RETAIN statement
x Format:
RETAIN variable initial-value;
o variable is the name of the variable whose value
will be retained
o initial-value is the initial numeric or character value
for the variable.

Restrictions to the RETAIN statement include:


x no effect on variables which are read with SET, MERGE,
and UPDATE statements
x the retained variable will be initialized to missing of no
initial value is specified
x the RETAIN statement is a compile-time only statement to
create variables that do not already exist.

11.1.5 Grouping Variables


Variables can be assigned values conditionally.
To conditionally assign a value:
x Use IF-THEN statement
x Format:
IF expression THEN statement;
o expression is any valid SAS expression
o statement is any executable SAS statement.

The assignment statement is executed when the condition is true; if


false the value of the variable is missing. Any of the comparison of
logical operators can be used in an IF-THEN statement. Logical
comparisons are enclosed in parentheses to be evaluated.

IF-THEN ELSE statements allow alternative actions to be applied to


the condition. Multiple IF-THEN ELSE can be used to provide
numerous alternatives. SAS will execute the IF-THEN statements
until a true statement is found.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 82
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
The length of the variable can vary based on the expressions used
unless it is explicit specified using the LENGTH statement.
x Format:
LENGTH variable(s) <$> length;
The dollar sign is required if the variable is a character variable.

Some observations may be unwanted; to remove them use the


DELETE statement.
x Format:
DELETE ;

To conditionally delete an observation:


x Use IF-THEN statement and DELETE statement
x Format:
IF expression THEN DELETE
If the expression is:
o TRUE, the DELETE statement executes
o FALSE, the DELETE statement does not execute
and the processing continues.

To read and process fields that are not kept in the data set, use the
DROP= and KEEP= data set options:
x Format:
(DROP variable(s))
(KEEP variable(s))

The KEEP option is used when there are more variables dropped
than kept.

Exclusion of variables can be done using statements instead of data


set options; the differences are:
x DROP statements cannot be used in the SAS procedure
steps
x The DROP statement applies to all output data sets
named in the DATA statement
x To exclude variables in some data sets but not all, use the
DROP data set options and associate it to the individual
data sets to impact.

Copyright The Art of Service


83 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Another method for conditionally assigning values is to sue SELECT
groups.

To conditionally assign values using SELECT:


x Use SELECT statement
x Format:
SELECT <select-expression)>;
WHEN-1 (when-expression-1 <..., when-expression-
n>) statement;
WHEN-n (when-expression-1 <..., when-expression-
n>) statement;
<OTHERWISE statement;>
END;
o SELECT begins the SELECT group
o select-expression is the SAS expression that is
evaluated
o WHEN identifies the SAS statements to be
executed when a condition is true
o when-expression identifies any SAS expression,
value. There must be at least one expression
o statement is any executable SAS statement
o OTHERWISE specifies a statement to be executed
when all WHEN statement prove false (optional)
o END terminates the SELECT group processing.

Conditional processing can be done on groups of statements rather


than single SAS statements by creating DO groups.
To create a DO group:
x Use DO statement
x Format:
DO;
SAS statements
END;
o DO statement begins the processing
o SAS statement are called a DO group and
executed as a unit
o END terminates the DO group processing.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 84
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
DO groups can be used within IF/THEN.ELSE statements and
SELECT groups.

Three forms of DO statements:


x The iterative DO statement will execute any statement
between DO and END repetitively
x The DO UNTIL statement executes statements in a DO
loop repetitively until a condition is true
x The DO WHILE statement executes statements in a DO
loop repetitively while a condition is true

11.2 Using SAS Functions

SAS functions are built-in routines enabling data manipulations


quickly and easily.
Categories of SAS functions include:
x Array
x Bitwise Logical Operations
x Character
x Character String Matching
x Currency Conversion
x Data and Time
x Descriptive Statistics
x Double Byte Character Set
x External Files
x Financial
x Hyperbolic
x Macro
x Mathematical
x Probability
x Quantile
x Random Number
x SAS File I/O
x Special
x State and ZIP Code
x Trigonometric
x Truncation
Copyright The Art of Service
85 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
x Variable Control
x Variable Information
x Web Tools.

With SAS functions:


x Calculate sample statistics
x Create SAS date values
x Convert Zip Codes to state postal codes
x Round values
x Generate random numbers
x Extract a portion of a character value
x Convert data from one type to another.

11.2.1 Basics of SAS Functions

To use a SAS function:


x Format:
function-name (argument-1 <,argument-n>);
arguments can be variables, constants, or
expressions

If more than one argument, a comma is used to separate arguments.


If the argument is a list or array, the argument must be preceded by
the word OF to be interpreted correctly.

Target variables are the variables where the result of a function is


assigned. The default length of the target variable is dependent on
the function, causing more space than necessary in the data set. Add
a LENGTH statement for the target variable to control the length of
the variable.

11.2.2 Data Conversions


The INPUT function will convert character data values to numeric
values. The PUT function will convert numeric data values to
character values.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 86
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
When conversion between variable types is required, SAS will
automatically attempt the conversion with unpredictable results.
Explicit conversions provide better results.

Automatic character-to-numeric conversion happens when a


character value is:
x assigned to a defined numeric variable
x used in an arithmetic operation
x compared to a numeric value
x specified in a function requiring numeric arguments.

Automatic conversions:
x use the w.d informat
x produce a numeric missing value from any character value
that does not conform to the standard numeric notation.

WHERE statements do not perform automatic conversions.

The INPUT statement will explicitly convert characters to numeric


values. To explicitly convert character to numeric values:
x Format:
INPUT (source, informat);
o source identifies the character variable, constant,
or expression to be converted to a numeric value
o informat must be numeric and identified

Automatic numeric-to-character conversion happens when a numeric


value is:
x assigned to a defined character variable
x used with an operator that requires character values
x specified in a function requiring character arguments.

Copyright The Art of Service


87 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
The INPUT statement will explicitly convert characters to numeric
values.
To explicitly convert numeric to character values:
x Format:
PUT (source, format);
o source identifies the numeric variable, constant, or
expression to be converted to a character value
o format must match the data type of the source
being specified.

A PUT statement will:


x always return a character string
x return the source written with a format.

The format and the source must agree in type.

11.2.3 Modifying Date Values


Many SAS functions work with data and time values. SAS stores a
date value as the number of days from January 1, 1960 to a given
date. SAS stores a time value as the number of seconds since
midnight. The SAS datetime value is the number of seconds between
midnight on January 1, 1960 and a given data and time.

SAS Date Functions


MDY SAS data (month, day, year)
TODAY Today's date
DATE
TIME Current time
DAY Extracts the day value from a SAS data.
QTR Extracts the QTR value from a SAS data.
WEEKDAY Represents the days of the week.
MONTH Extracts the month value from a SAS data.
YEAR Extracts the year value from a SAS data.
INTCK Returns the number of time intervals that occur in
a given time span.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 88
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
INTNX Applies multiples of a given interval to a date,
time, or datetime to obtains a result.
DATDIF Calculates the days or years between two SAS
YRDIF dates.

11.2.4 Modifying Character Values


Functions which modify character values can:
x separate character string variables into multiple smaller
variables
x replace a portion of a character variable's values
x search for a specific string.

Character functions include:


SCAN Returns a specified word from a character value.
SUBSTR Extracts a substring or replaces character values.
TRIM Trims trailing blanks from character values.
CATX Concatenates character strings, removing leading
and trailing blanks and inserts separators.
INDEX Searches a character value for a specific string.
FIND Searches for a specific substring of characters
within a character string specified.
UPCASE Converts all letters in a value to uppercase.
LOWCASE Converts all letters in a value to lowercase.
PROPCASE Converts all letters in a value to proper case.
TRANWRD Replaces or removes all occurrences of a pattern
of characters within a character string.

The SCAN function can be used to separate a character value into


words and to return a specified word.

Delimiters are used to separate a character string into words. Two or


more contiguous delimiters are treated as one delimiter.

Copyright The Art of Service


89 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Default delimiters include: blanks . < ( + | & ! $ * ) ; ^ - / , %

To use a SCAN function with delimiters:


x Format:
SCAN (argument, n, delimiters);
o argument identifies the character variable or
expression to be scanned
o n identifies which word to read
o delimiters are special characters enclosed in
parentheses.

The SCAN function assigns a length of 200 to every target variable.


A LENGTH statement in the DATA step will limit the length.

The SUBSTR function allows:


x a portion of the character value to be extracted
x a portion of the character value to be replaced.

To use a SUBSTR function:


x Format:
SUBSTR (argument, position, <n>);
o argument identifies the character variable or
expression to be scanned
o position is the character position to start from
o n identifies the number of characters to extract.

To use a TRIM function to remove trailing blanks:


x Format:
TRIM (argument);
o argument is any character expression: character
variable or character function

To use a CATX function for concatenation:


x Format:
CATX (separator, string-1 <,...string-n>);
o separator identifies the character string to be used
between concatenated strings
o string specifies a SAS character string.
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 90
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To use a INDEX function to search for a string within a value:
x Format:
INDEX (source, excerpt);
o source is the character variable or expression to
search
o excerpt is the character string enclosed in
quotation marks.

To use a FIND function to search for a string within a character string:


x Format:
FIND (string, substring<,modifiers>);
o string is the character constant, variable, or
expression to be searched
o substring is the character constant, variable, or
expression to be searched for in string
o modifiers is the character constant, variable, or
expression which specifies one or more modifiers
o startpos is an integer specifying the position where
the search should start and its direction.

To use an UPPERCASE function:


x Format:
UPCASE (argument);
o argument is any character expression: character
variable or character constant.

To use a LOWERCASE function:


x Format:
LOWCASE (argument);
o argument is any character expression: character
variable or character constant.

To use a PROPCASE function to convert all words to proper names:


x Format:
PROPCASE (argument <,delimiter(s)>);
o argument is any character expression: character
variable or character constant
o delimiter(s) identify the separators of the string.

Copyright The Art of Service


91 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
To use a TRANWRD function to replace or remove all occurrences of
a pattern:
x Format:
TRANWRD (source, target, replacement)
o source is the string to translate
o target is the string to be searched for in the source
o replacement is the string that replaces target.

11.2.5 Modifying Numeric Values


To return the integer portion of a numeric value:
x Format:
INT (argument);
o argument is any numeric variable, constant or
expression.

To round values to the nearest unit:


x Format:
ROUND (argument,round-off-unit);
o argument is any numeric variable, constant or
expression
o round-off-unit is numeric and nonnegative.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 92
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11.3 Expanding DO Loops

DO loops allow:
x concise DATA steps to be written
x data to be generated
x statements to be conditionally executed
x data to be read.

Groups of statements are processes repeatedly, not just once.

To construct a DO loop:
x Use a DO statement
x Format:
DO index-variable=start TO stop BY increment;
SAS statements
END;
o index variable will store the value of the current
iteration of the DO loop
o The start, stop, and increment values:
ƒ are set when the entering the DO loop
ƒ cannot be changed during processing
ƒ can be numbers, variables, or SAS
expressions.

11.3.1 Executing DO Loops


Initiating DO loops within DATA steps begin with compiling the data.
A program data vector is created. When the DATA step is executed,
the values are assigned to the variables that will be used within the
DO loop.

The DO loop executes where the specified calculations are made and
the result added to the previous value for the variable in the program
data vector. The DO loop continues to execute for as many times as
specified in the program. At this point the DO loop ends and finally
the DATA step.

Copyright The Art of Service


93 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
If an observation for each iteration of the DO loop is desired, place an
OUTPUT statement inside the loop.

The DO loop's index variable can be decremented by specifying a


negative value for the BY value of the DO statement.

The number of times a DO loop is executed can be specified by


listing items in a series. The DO loop will execute for each item listed
in the series. The items in the series must be separated by commas.

DO loops can be nested in a DO loop. For multiple nested DO loops,


a unique index-variable name must be assigned for each iterative DO
statement. Each DO loop must have an END statement.

The number of times a DO loop is executed can be conditionally


determined using the DO UNTIL and DO WHERE statements.

DO UNTIL statements are used when DO loops should be executed


repeatedly until a condition is met.
To execute a DO loop until a condition is met:
x Use a DO UNTIL statement
x Format:
DO UNTIL (expression);
more SAS statements
END;
expression is a valid SAS expression enclosed in
parentheses.

The DO UNTIL statement will execute at least once.

DO WHILE statements are used when DO loops should be executed


repeatedly while a condition is met.
To execute a DO loop until a condition is met:
x Use a DO WHILE statement
x Format:
DO WHILE (expression);
more SAS statements
END;
expression is a valid SAS expression enclosed in
parentheses
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 94
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
If the expression is false during the first iteration, the DO loop will
never execute.

DO loops can be used to draw sample observations from a data set.


To create sample data, enclose a SET statement in the DO loop. Use
the start, stop, and increment values to control which observations to
select from.

A POINT=option should be associated with the index variable.

11.4 Process Arrays

Arrays can be used to:


x perform repetitive calculations
x create several variables with the same attributes
x read data
x change variables to observations
x change observations to variables
x compare variables
x perform a table lookup.

A SAS array is a temporary grouping of variables under a single


name. An array exists for the duration of the DATA step. Arrays are
used to reduce the number of statements required for processing
variables.

To group variables into an array:


x Use an ARRAY statement
x Format:
ARRAY array-name{dimension} <elements>;
o array-name specifies the name of the array
o dimension shows the number and arrangement of
array elements. The default dimension is one.
o elements list the variables to include in the array.

Copyright The Art of Service


95 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11.4.1 One-dimensional Arrays
A dimension for a one-dimensional array can be specified in several
ways:
x The number of array elements is specified
x A range of values is specified
x An asterisk will have the SAS count the number of
elements
x The dimension can be enclosed in parentheses, graces, or
brackets.

When elements are specified, they need to be separated by a space.


Elements can be specified as a variable list.
Common variable lists:
x a numbered range of variables Var1-Varn
x all numeric variables _NUMERIC_
x all character variables CHARACTER
x all variables _ALL_

The advantage of arrays is the ability to reference the elements by an


index value. An index value is assigned to each array element. An
array reference is used to perform actions on an array element during
execution.

To create an array reference:


x Format:
array-name{index value}
index value:
o is enclosed in parentheses, braces, or brackets
o specifies an integer, numeric variable, or SAS
numeric expression
o Is within the lower and upper bounds of the array's
dimension.

11.4.2 Executing Arrays


During compilation, a program data vector is created. The DATA step
is scanned for syntax errors. The index values of the array elements
are assigned. Since an ARRAY statement is a compile-time only
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 96
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
statement, it is ignored during execution. The DATA step executes.

The DIM function can be used to return the number of elements in the
array.
x Format:
DIM (array-name)

11.4.3 Advanced Array Functions


Variables can be created in an ARRAY statement. Omit the array
elements from the ARRAY statement. With no array elements, SAS
will automatically create variables and assigns default names to them.

To create variables into an array:


x Use an ARRAY statement
x Format:
ARRAY array-name{dimension};
o array-name specifies the name of the array
o dimension shows the number and arrangement of
array elements. The default dimension is one.

Default variable names are created by concatenating the array name


and the numeric value of the dimension.

An array of character variables can be created by adding a dollar sign


after the array dimension.
By default, the length of the variable name is 8, but that can be
changed by assigning a length after the dollar sign.

To assign initial values in an ARRAY statement:


x place the values after the array elements
x specify an initial value for each corresponding array
element
x separate each value with a blank or comma
x enclose the initial values in parentheses.

Temporary array elements for the DATA step process can be done by
specifying _TEMPORARY_ after the array name and dimension.
Temporary array elements do not appear in the resulting data set.
Copyright The Art of Service
97 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
11.4.4 Multidimensional Arrays
Variables can be grouped into table-like structures, called
multidimensional arrays. To define a multidimensional array, the
number of elements in each dimension is specified separated by a
comma.

The first dimension specifies the number of rows. The second


dimension specifies the number of columns. Array elements are
grouped in the order they are listed in the ARRAY statement.

Multidimensional arrays are typically used with nested DO loops.

Arrays can also rotate a SAS data set:


x changing variables into observations
x changing observations into variables.

12 Troubleshooting SAS Programs

SAS programs consist of SAS statements. A consistent layout


enhances readability.

Best practices:
x begin DATA and PROC steps in column one
x indent statements within a step
x include a RUN statement after every DATA or PROC step
x begin RUN statements in column one.

12.1 Types of Errors

The most common errors:


x syntax errors – occur when statements do not conform to
language rules
x data errors – occur when data values are not appropriate
for the specified SAS statements.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 98
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
12.1.1 Syntax Errors
SAS scans for syntax errors in each step before processing the step.
Information, including any errors, is written to the SAS log.

Errors are displayed in the SAS log:


x beginning with the word ERROR
x the possible location of the error
x explanation for the error.

Programs are edited in the Editor window. After correcting programs,


you resubmit it.

12.1.2 Resolving Common Problems


Common errors that may require editing:
x spelling mistakes
x omitted semicolons
x unbalanced quotation marks
x invalid options specified.

Many errors come with messages in the SAS log:


x omitted semicolon – log message indicates an error in a
statement which seems valid
x unbalanced quotations – log message indicates test string
is too long or ambiguous
x invalid options – log message indicated option is invalid or
unrecognized.

A missing RUN statement will have an error message “DATA (or


PROC) step running” at the top of the active window.

Each step in a SAS program is compiled and executed independent


of all other steps. A DATA or PROC statement indicated the
beginning of the step. A RUN or QUIT statement indicated the end of
the step. To resolve, submit a RUN statement.

Copyright The Art of Service


99 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
13 Practice Exam

The following multiple-choice questions are a refresher from the


Foundation level as a prelude.

Question 1

Which of the following items is not considered a category for SAS functions?

A) Special
B) Descriptive Statistics
C) Arithmetic Operations
D) Variable Control

Question 2

Statements are free format expressions that make up the SAS program.
Which of the following are true about statements?

A) Statements must begin at the start of the line


B) Statements are always in lowercase to delineate from keywords
C) Only one statement is allowed any line of programming code
D) Statements always end in a semicolon

Question 3

Which of the following is true about priorities in relation to calculating


arithmetic expressions?

A) If consecutive operators have the same priority, they are always


read from right to left
B) Exponential calculations will always be done before multiplication in
any simple expression
C) Multiplication and addition have a higher priority than subtraction
D) Arithmetic operations have to be enclosed by parentheses

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 100
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 4

Which of the following statements are correct when using the RETAIN
statement?

A) Variables read with the SET statement will not be affected by the
RETAIN statement
B) The RETAIN statement is used when a variable needs to be
initialized to a value of 0
C) Variables will be initialized to the desired value each time the DATA
step is executed
D) Variables are created during execution

Question 5

When multiple expressions are used within a function, what delimiter should
be used to separate them?

A) Space
B) Semicolon
C) Parentheses
D) Comma

Question 6

Which of the following statements is not correct about crosstabulation of


data?

A) Frequency calculations for the number of cells that appear will be


included in the table
B) Crosstabulation is used when comparing three or more distinct
variables
C) The first variable listed in the TABLES statement will define the rows
of the table
D) Each level of variables will be represented by a separate two-way
table

Copyright The Art of Service


101 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 7

Which of the following statements is not true for list inputs?

A) Fields cannot be skipped or re-read


B) Fields are read from left to right
C) Fields must be separated by a delimiter
D) The MISSOVER option must be used at all times to read missing
values

Question 8

Each data set has a descriptive portion providing useful information about
the data set. Which of the following items would not be found in the
descriptive portion?

A) Creation data and time


B) Set name
C) Information on labels
D) Error messages

Question 9

What would be the correct raw data value of a datatime field for Dec 15,
1986 10:00pm for SAS?

A) 10942
B) 10935
C) 15756360
D) 121519862200

Question 10

Which location pointer identifies the exact line number from which to begin
reading data from?

A) #n pointer
B) input pointer
C) / pointer
D) Column pointer
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 102
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 11

What is the purpose of the SUM statement?

A) To combine numeric variables


B) To obtain a total output of multiple values
C) To produce column totals
D) All of the above

Question 12

Which programming statement is used to perform conditional processing on


groups of SAS statements?

A) DO
B) BY
C) WHERE
D) IF/THEN

Question 13

Which character function would be used to remove extra spaces from


character strings?

A) SUBSTR
B) TRIM
C) FIND
D) CATX

Question 14

Which of the statements below is incorrect?

A) SAS will read and execute steps simultaneously


B) A step is completed when the QUIT statement appears
C) All programming steps begin with either a DATA or PROC keyword
D) A step is completed when the RUN statement appears

Copyright The Art of Service


103 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 15

Which of the following statements is true about column inputs of data?

A) Once a field is read, the data step cannot come back to it


B) Should not be used to read nonstandard numeric data
C) All fields have to be read sequentially
D) Missing data must have a placeholder in the form of a period

Question 16

Which of the following is true about performance testing of DATA steps?

A) Performance testing cannot be done on data steps


B) A data set is created during performance testing
C) The number of observations can be limited using the OBS= in the
FILENAME statement
D) Conditions can be placed in the performance testing to enable value
testing

Question 17

Which of the following conditions should discourage the use of the one-to-
one readings to combine data sets?

A) When a better method is available


B) All observations from each data set read should be contained in the
final data set
C) All variables from each data set read should be contained in the final
data set
D) Variables have the same name in multiple data sets

Question 18

When is the best time to use the DROP= statement?

A) Only when used with the KEEP= statement


B) Only when a variable should not be processed
C) When the variable should not appear in the data set
D) Never
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 104
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 19

In order to sequentially read multiple records to create a single observation,


what condition must be in place to succeed?

A) The records must be in free format.


B) Variables have to be specified explicitly
C) The #n line pointer has to be specified
D) The number of observations for each record must be the same

Question 20

When would you use a CLASS statement?

A) The CLASS statement is used to sort data in a defined structure


B) The CLASS statement is used to define and group observations
C) The CLASS statement is used when applying a specified statistic to
a group of observations
D) The CLASS statement is used when a table must be created

Question 21

Which of the following is not considered a character type of variable?

A) Miller
B) Miller12
C) Miller.2
D) 2miller4u

Question 22

When would the keyword, FILE, be appropriate to use?

A) When attempting to describe the file where raw data is being written
B) When referencing the file to be read
C) When creating a raw data file
D) When referencing the file to be written to

Copyright The Art of Service


105 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 23

What data summarizations are provided by the PROC MEANS statement by


default?

A) The minimum value


B) the n-count
C) The standard deviation
D) All of the above

Question 24

Which of the following is a true statement about executing a data step?

A) The input raw data is read from the input buffer


B) Error checking is performed at the beginning of the execution phase
C) Multiple observations can be found in a single fixed field record
D) The iteration for a DATA step refers to the number of INPUT
statement in the step

Question 25

When using the VALUE statement to define the format of specific values,
which of the following statements is not correct about identifying the range.

A) A series of character values can be identified as the range for a


VALUE statement
B) A single value can be identified as the range for a VALUE statement
C) A list of unique values separated by commas can be identified as the
range for a VALUE statement
D) A series of numeric values can be identified as the range for a
VALUE statement

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 106
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 26

Which of the following describes the programming process of adding


observations from one data set to the observations of another data set?

A) Match-merging
B) Interweaving
C) Concatenating
D) None of the above

Question 27

What statement is used to identify the conditions for selecting observations


in a report?

A) WHERE
B) IF/THEN ELSE
C) SET
D) INPUT

Question 28

Which of the following items can be expected by the SORT procedure?

A) Replace the original SAS data set


B) Will not generate printed output
C) Rearrange observations in the SAS data set
D) All of the above

Question 29

When is character-to-numeric conversion not automatically performed?

A) When used in an arithmetic operation


B) When used within a WHERE statement
C) When compared to a numeric value
D) When assigned a defined numeric variable

Copyright The Art of Service


107 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 30

Several libraries are created during a SAS session by default. Which of the
following is not one of those libraries?

A) Sasuser
B) Libname
C) Sashelp
D) Work

Question 31

When are SAS arrays not used?

A) SAS arrays are used to compare variables


B) SAS arrays are used to change variables to observations
C) SAS arrays are used to group data within the data set
D) SAS arrays are used to perform repetitive calculations

Question 32

Which of the following in a correctly created data set name?

A) _Set.name
B) addAnew_1
C) 0pen_2C1other_rtm*XCX
D) create\B4infinity

Question 33

What must happen for DO Loops to be nested within each other?

A) Each DO loop must end in a RUN statement


B) A condition must be in place to determine if the DO loop should be
done
C) An OUTPUT statement is required to ensure the result of the DO-
loop is retained
D) A unique index-variable name must be assigned for each iterative
DO statement

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 108
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 34

Which of the following programming conventions can be used in conjunction


with the line-hold specifier, @@?

A) @ pointer control
B) MISSOVER option
C) column input
D) INPUT statement

Question 35

Which of the following is not an error identified during the compilation


phase?

A) Quotation marks are unbalances


B) No RUN statement in the step
C) An option is invalid
D) Semicolons are missing in statements

Question 36

Which of the following methods for combining data sets is distinguished for
its use of the BY statement?

A) Interweaving
B) Match-merging
C) Concatenation
D) None of the above

Question 37

Which of the following options does not describe how variables are to be
used in creating a report?

A) DISPLAY
B) ACROSS
C) GROUP
D) NUMERIC

Copyright The Art of Service


109 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 38

Which of the following is a standard numeric data value?

A) Integer binaries
B) Scientific notation
C) Fractions
D) Hexadecimal numbers

Question 39

Which of the following keywords is used to create a raw data file?

A) filename
B) libname
C) infile
D) set

Question 40

Which of the following is not an attribute of the program data vector?

A) Will scan data for errors


B) Is created after the creation of the input buffer
C) Refers to physical and logical memory available to the SAS program
D) Will expand as the number of variables increase

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 110
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
14 Answer Guide

14.1 Answers to Questions

Question 1
Answer: C
Reasoning: There are many categories for SAS functions, but
Arithmetic Operations is not one of them. The category is actually
Bitwise Logical Operations.

Question 2
Answer: D
Reasoning: Statements can start anywhere in a line which also
allows for multiple statement to exist on a single line. There are no
restrictions to the case required for statements. Statements always
begin with keywords and end in a semicolon.

Question 3
Answer: B
Reasoning: Exponential expressions have a higher priority than
multiplication expressions, unless parentheses exist to control any
calculations.

Question 4
Answer: A
Reasoning: The RETAIN statement is used when a variable needs to
be initialized to a value other than 0. The statement will prevent
variables from being initialized. All of this is done during the
compilation stage, not the execution stage. Variables that are read
with the SET, MERGE, and UPDATE statements are not affected by
the RETAIN statement.

Question 5
Answer: D
Reasoning: Multiple variables, constants, and expressions must be
separated by a comma when used in an argument.

Copyright The Art of Service


111 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 6
Answer: B
Reasoning: Crosstabulation is used when comparing two distinct
variables. More variables may be compared as well.

Question 7
Answer: D
Reasoning: If missing values are represented by a special character
that is not used as a separated of data, then the MISSOVER option is
not required.

Question 8
Answer: D
Reasoning: Error messages are found in the program log, not the
descriptive portion of the data set.

Question 9
Answer: C
Reasoning: The datetime value is the number of seconds from
midnight on 1/1/1960. This of course would be a rather large number,
in fact, 120 seconds from 10942 days (including 7 extra days from
leap years). The last answer is incorrect as it is the date and time
written out without any delimiters.

Question 10
Answer: A
Reasoning: The #n pointer references the absolute number of the
line for moving the pointer to read data.

Question 11
Answer: C
Reasoning: In creating reports for SAS programs, the SUM
statement is used to create column totals for numeric values.

Question 12
Answer: A
Reasoning: The key word in this question is ‘group’. Conditional
processing can be performed using some iteration of BY, WHERE,
and IF/THEN, but DO provides an opportunity to group statements
together.
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 112
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 13
Answer: D
Reasoning: CATX is the most correct answer as it removes both
leading and trailing blanks from character strings. TRIM only
removes the trailing blanks from character values.

Question 14
Answer: A
Reasoning: Programming steps begin with DATA or PROC and end
with either QUIT or END. SAS steps are completely read before
executing.

Question 15
Answer: B
Reasoning: Column input is used when data fields are fixed. In this
situation, the columns can be read in any order and re-read if
necessary. Missing data does not require a placeholder either.
Unfortunately, only standard data should be read using column input:
nonstandard data should utilize the formatted input.

Question 16
Answer: D
Reasoning: Performance testing can be done on data steps by
putting a NULL in the DATA step. This will prevent a data set from
being created. The number of observations can be limited by putting
the OBS= option in the INFILE statement. Conditions can be set
using IF/THEN ELSE statements to test for values.

Question 17
Answer: B
Reasoning: One-to-one readings are the simplest form of combining
data sets which will provide the best option in ensuring the
programming code is not overly complex. In this type of combining,
all the variables from each of the data sets will exist in the new data
set. Variables with the same name does not matter, since the values
of the last data set read will overwrite all previous values from existing
data sets. If the values should not be overwritten, the variable name
can be changed before the output is rendered. Not all observations
will appear in the final data set: if this is a requirement than
concatenation is the next best choice for combining data sets.
Copyright The Art of Service
113 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 18
Answer: C
Reasoning: DROP= and KEEP= can be used independently from
each other. Whether the DROP= statement appears in the same
statement as the DATA or SET statement will determine if the variable
will be processed or not. In all cases though, the variable will not
appear in the data set.

Question 19
Answer: D
Reasoning: The only correct answer is that the number of
observations for each record must be the same.

Question 20
Answer: C
Reasoning: The CLASS statement will allow statistics to be applied
to a group of observations. As a result of this statement, the data
would be sorted, grouped and displayed in a single table; but is not
the primary reason for using the CLASS statement.

Question 21
Answer: C
Reasoning: SAS naming conventions require variables names to be
1-32 characters long, can begin and consist of any combination of
letters, numbers, and underscores.

Question 22
Answer: D
Reasoning: SET is used to create a raw data file. FILENAME
references a file to be read, while FILE is used to reference a file to
be written to. The PUT statement will describe the lines written to the
raw data.

Question 23
Answer: D
Reasoning: PROC MEANS will also provide the mean and the
maximum value without limiting the output.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 114
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 24
Answer: B
Reasoning: Input raw data is read from the program data vector while
INPUT statements are read from the input buffer. In raw data with
fixed fields, a single record provides only one observation. Multiple
observations can be contained in a single record for free-format files.
An iteration refers to the complete cycle through the data step,
typically one iteration for each observation found.

Question 25
Answer: A
Reasoning: C shows that detail is required in the answer. The range
can consist of character values which are enclosed by quotation
marks.

Question 26
Answer: C
Reasoning: Match merging will merge several data sets together
based on the value of a common variable. Interweaving can do the
same as match-weaving, but with more than one common variable.
Concatenation will append, or add on, to existing data sets.
.
Question 27
Answer: A
Reasoning: The WHERE statement will identify conditions for
identifying variables and values to print in a report.

Question 28
Answer: D
Reasoning: All of the situations describe are possible with the SORT
procedure.

Question 29
Answer: B
Reasoning: A WHERE statement will not use automatic conversion of
any kind.

Copyright The Art of Service


115 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 30
Answer: B
Reasoning: All of the libraries are created by default except libref,
which is not a library but a keyword for naming a library.

Question 31
Answer: C
Reasoning: SAS arrays are a temporary grouping of variables that
exist for the duration of a data step. They will not 'group' data within
the data set.

Question 32
Answer: B
Reasoning: Data set names can be 1-32 characters long and
consists and begin with alphanumeric characters and underscores.

Question 33
Answer: D
Reasoning: DO statements must end with an END statement. A
condition is not necessary. The OUTPUT statement is used to create
observations for each iteration of the DO loop. A unique index-
variable name is required when nesting DO loops.

Question 34
Answer: D
Reasoning: A, B, C should not be used with the double training at
line-hold specifier.

Question 35
Answer: B
Reasoning: The compilation phase will identify most syntax errors,
while the executing phase will identify processing errors. No RUN or
QUIT statement will result in processing to continue indefinitely.

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 116
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Question 36
Answer: A
Reasoning: Match-merging is distinguished by the MERGE
statement and concatenation by multiple SET statements. The
interweaving method uses BY statements to choose which
observations are appropriate from each of the data sets. Though the
match-merging method also uses BY statements to focus the
observations used, it must have the MERGE statement to perform
properly.

Question 37
Answer: D
Reasoning: There is no numeric options: the default for numeric
values is ANALYSIS. Additional options not mentioned include
ORDER and COMPUTED.

Question 38
Answer: B
Reasoning: Standard numeric data values consist of positive and
negative numbers, decimal points, and scientific notation.

Question 39
Answer: D
Reasoning: 'Libname' is used to create a library. 'Filename' will
reference an external file where raw data is stored and after it has
been referenced once, the 'infile' is used to identify the external file.
'Set' will create a raw data file.

Question 40
Answer: C
Reasoning: The program data vector refers to logical memory only.

Copyright The Art of Service


117 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
15 References
SAS Certification Prep Guide: Base Programming for SAS 9,
www.scripd.com

For more information:


www.support.sas.com

Websites
www.artofservice.com.au
www.theartofservice.org
www.theartofservice.com

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 118
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
INDEX*

A
absolution value 67, 119
accumulator variable 81, 119
Advanced Reporting 3, 63
Amazon 2
analysis variables 66-7, 119
answer 112, 115, 119
Answer 4, 111-17, 119
answer, correct 113-14
Answer Guide 4, 111, 119
appearance 17, 63-5, 68, 71, 77, 119
Appends observations 37, 119
argument 86, 90-2, 111, 119
Arithmetic Operations 100, 107, 111, 119
Arithmetic operators 24-5, 119
array elements 96-8, 119
arrangement of 95, 97
array-name 95-7, 119
ARRAY statement 96-8, 119
ARRAY statement Format 95, 97, 119
arrays 86, 95-8, 119
multidimensional 98
Art of Service Copyright 1, 119
ascending order 40, 57-8, 119
attributes 15, 28, 38, 64-6, 110, 119
automatic conversions 87, 115, 119
automatic variables 32, 119

B
blanks 16, 44, 47-8, 52, 58, 90, 97, 119
BODY 76-7, 119
book 1-2, 5, 119
brackets 96, 119
Brisbane 4-6, 17-20, 32-4, 39, 55-6, 61-2, 67, 69-70, 75, 88-9, 92, 104,
118-19, 126, 129
bytes 16, 119

C
candidate 7, 119
catalog 2
CATX 89-90, 103, 113, 119
cells 67, 74, 101, 119
certifications 6, 119
character constant 91, 119
Copyright The Art of Service
119 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
character data 46, 79, 119
character data values 86, 119
character expression 90-1, 119
Character functions 89-90, 103, 120
character strings 31, 79, 89-91, 103, 113, 120
character value 86, 120
character values 16, 49-50, 61, 79, 82, 86-90, 106, 113, 115, 120
character values 87, 120
character variable 47-8, 56, 65, 83, 87, 90-1, 97, 120
character variables ORDER xf0b7 GROUP 66, 120
characters 11-12, 14-16, 18, 20, 22-4, 46, 48-9, 55-6, 67, 70, 80, 87-
90, 114, 116, 120
formatting 18
special 9, 22, 44-5, 50, 90, 112
split 65
CLASS statement 70-1, 105, 114, 120
CLASS variables 70, 120
code 31-2, 72, 78, 120
code type 31, 120
coded information 78, 120
collection 9-10, 15, 120
colon 49-50, 120
column frequencies 74, 120
column headings 64-5, 120
column input 3, 22-3, 44, 46, 54, 104, 109, 113, 120
column number 45-6, 120
COLUMN statement 63, 120
COLUMN statement 68, 120
column totals 55, 58, 103, 112, 120
column width 65, 120
columns 15, 21-3, 29, 44-8, 50-1, 56, 60, 64-5, 74, 98, 113, 120
combining data sets 37, 109, 113, 120
combining multiple data sets 37-40, 120
commas 61-2, 79, 86, 94, 98, 101, 106, 111, 120
format variable 62
COMMAw 46, 61-2, 120
compilation 31, 36, 41, 96, 120
compilation phase 28, 30, 109, 116, 120
concatenation 38, 90, 109, 113, 115, 117, 120
Conditional processing 32, 84, 103, 112, 120
conditions 25-6, 34-5, 43, 56, 63, 82, 84-5, 94, 104-5, 107-8, 113, 115-
16, 120
conjunction 36, 58, 109, 120
CONTENTS DATA 13-14, 120
contents-file-specification, procedure output 77
CONTENTS statement 13, 120
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 120
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
Continuous data values 68, 120
control 10, 25, 35-6, 53-4, 68, 71, 86, 95, 111, 121
conversion 87, 121
correct raw data value 102, 121
created data set name 108, 121
CROSSLIST 74, 121
crosstabulation 72-4, 101, 112, 121

D
DATA and PROC statement 9, 121
DATA and PROC steps 98, 121
Data and Time informats 51, 121
data output 62, 121
DATA output-SAS-data-set 37-40, 42-3, 121
DATA SAS-data-set 33, 35, 121
data set names 30, 121
data set 28, 37, 121
DATA statement 19, 21, 33-4, 37-40, 83, 121
DATA statement and SET statement Format 33, 121
DATA statement Format 21, 31, 121
data step 3, 21, 36, 104, 106, 113, 115-16, 121
DATA step 8, 19, 21, 24, 26-31, 35-6, 42, 53-4, 62, 75, 90, 93, 95-6,
101, 104, 106 [2]
DATA Step 28-9, 121
DATA step Copyright 81, 121
DATA step executes 53-4, 97, 121
DATA step process 97, 121
DATA step program 19, 41, 121
DATA step 54, 121
DATALINES 26, 121
DATALINES statement 26, 121
DATALINES statement Format 26, 121
date 17, 51, 88-9, 112, 121
DATE 17, 61, 88, 121
Date values 51, 61, 88, 121
Create SAS 86
datetime value 112, 121
days 7, 51-2, 88-9, 112, 121
DB2 12, 121
Debugging DATA Steps 3, 30, 121
decimal places 16, 46, 61-2, 70, 121
Default delimiters 49, 90, 121
default names 97
Default reports list observation 57, 121
default value 47
Default variable names 97
Copyright The Art of Service
121 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
delimiters 44, 46-50, 52, 89-91, 101-2, 112, 121
DESCENDING 40, 57-8, 64, 121
descriptive labels formatting variable values 59, 121
descriptive portion 14-15, 102, 112, 121
Descriptive Statistics 3, 68-9, 100, 122
descriptor information 13-14, 122
descriptor portion 27-8, 36, 122
designations 5, 122
Desired Format 61-2, 122
destination 75-6, 122
default 76
detail rows 66
deviation, standard 67, 69, 106
dimension 95-8, 122
default 95, 97
display 19, 31, 52, 64, 66, 80, 109, 122
display values 70, 122
display variables 66, 122
DLM 49-50, 122
dollar sign 23, 46, 61-2, 79, 83, 97, 122
DOLLARw 61-2, 122
DSD 49-50, 122

E
eBook 2
END 36, 84-5, 93-4, 113, 122
end-of-file marker 29, 35-6, 122
end-of-record marker 47, 122
END statement 94, 116, 122
engine 11-12, 122
environment, operating 10, 12, 47, 59, 75
Error messages 30, 38, 99, 102, 112, 122
errors 4, 8-9, 27-30, 32, 36, 47, 98-9, 106, 109-10, 122
common 98-9
syntax 28, 30, 96, 99, 116
exam 1, 7, 122
excerpt 91, 122
Exclusion of unmatched observations 42, 122
execute 9, 20, 82-3, 85, 93-5, 122
Execute DATA step 20, 122
execution 31, 41, 54, 81, 96-7, 101, 122
execution phase 27, 29-30, 36, 106, 122
exporting raw data files xf0b7 6, 122
expression 24-6, 31-2, 56, 81-4, 86-8, 90-2, 94-5, 100, 111, 122
Extracts 88-90, 122

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 122
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
F
field width 46, 51-2, 61, 122
fields 19, 22-3, 44, 47-8, 102, 104, 122
delimiter 47
fixed 19, 21, 44-5, 115
file name 50, 123
file-specification 21, 27, 48, 76, 123
FILE statement 27, 50, 123
FILE statement Format 27, 123
filename 10, 12, 20-1, 27, 33, 48, 50, 76-7, 110, 117, 123
FILENAME fileref 20, 123
FILENAME references 114, 123
FILENAME statement 19-20, 104, 123
FILENAME statement Format 20, 123
fileref 20-1, 27, 48, 76, 123
files 10-14, 20-2, 27, 29, 48-9, 55, 76-7, 105, 114, 123
external 19-21, 26, 28, 46, 117
free-format 115
input 21, 27
FIRSTOBS 18, 123
fixed-field data 47, 123
floating-point numbers 16, 123
following 100-2, 104-10, 123
FOOTNOTE statement 59-60, 123
format 10-11, 16, 34, 38, 46, 50, 52, 61-2, 64, 72, 75, 78-81, 88, 106,
123, 125
assigning 61, 64
free 9, 22, 44, 105, 129
format catalog 78, 80-1, 123
format name 80, 123
format-name 61, 79, 123
FORMAT statement 61, 80, 123
FORMAT Statement 62, 123
single 61
single PROC 80
FORMAT variable 34, 61, 123
format's name 79, 123
formatted input processes 46, 123
formatted values 66, 123
formatting 8, 16, 78, 120, 123
formatting-character 18, 123
formatting value 66, 123
frame-file-specification, procedure output 77
free-format data 50, 123
free format expressions 100, 123
FREQ procedures 72-4, 123
Copyright The Art of Service
123 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
frequency 73, 123
cumulative 73
frequency tables 3, 72-3, 123
function 24, 86-7, 89, 91, 101, 123
function-name 86, 123

G
group processing 84, 124
group variables 66-7, 95, 124
groups 34, 39, 58-9, 64, 66-7, 70, 84-5, 93, 103, 109, 116, 124, 127

H
Hexadecimal numbers 110, 124
hh 52, 124

I
ID/BY value 58, 124
ID statement 56, 58, 124
IF/THEN 103, 107, 112-13, 124
IF-THEN 82, 124
IF-THEN statement 82, 124
increment values 93, 95, 124
INDEX 89, 91, 119, 124
index value 96, 124
index variable 93, 95, 124
index-variable name 124
unique 94, 108, 116
INFILE statement 19, 21, 26, 29, 31, 47, 49, 113, 124
INFILE statement and DLM-option Format 48, 124
INFILE statement Format 21, 124
infinite number 16, 124
information 5, 13-16, 22, 52, 54-5, 75, 78, 99, 102, 118, 124
informats 16, 38, 45-6, 51, 87, 124
initialize 81-2, 124
input 123, 129, 131
INPUT 28-9, 45, 51, 87, 107, 115, 124
input, formatted 44, 113, 123
input buffer 28-9, 54, 106, 110, 115, 124
input data set name 18, 124
input data sets 34, 36, 38-42, 57-8, 67, 124
input pointer 29, 45-6, 51-4, 102, 124
input record 54
INPUT statement 19, 22, 50, 52-4, 87-8, 106, 109, 124
multiple 52
INPUT statement executes 54, 124
INPUT statement Format 23, 45, 47, 51, 124
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 124
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
instructions 5, 11, 19, 24, 53, 124
integer 51-2, 91, 96, 124
time values fractions xf0b7 45
Intersperses observations 37, 124
Interweaving 39, 107, 109, 115, 124
interweaving readings 39-40, 124
iteration 29, 36, 53-4, 93-5, 106, 112, 115-16, 124

J
justification 64-5, 125

K
keywords 8, 72, 76, 80, 100, 105, 110-11, 116, 125

L
label 16, 18, 34, 38, 60, 62, 64, 79-81, 102, 125
label columns 60, 125
label name 56, 125
LABEL statement 60, 125
single 60
length 16, 38, 47, 50, 65, 81, 83, 86, 90, 97, 125
LENGTH statement 50, 83, 86, 90, 125
letters 11-12, 14-15, 20, 23, 55, 89, 114, 125
liability 5, 125
libname 12, 21, 108, 110, 117, 125
LIBNAME libref 11, 20, 55, 125
LIBNAME libref engine 12, 125
LIBNAME statement 11, 19-20, 78, 125
LIBNAME statement Format 11-12, 20, 125
libraries 10-12, 20, 33, 55, 78, 80, 108, 116-17, 125
formats 78
permanent 10, 20, 78, 80
temporary 11, 78, 80
library contents 13, 125, 130
library name 10-11, 125
library NODETAILS 13-14
libref 10-14, 20, 33, 55, 78, 116, 125
libref.filename 10, 125
limited number 70, 125
line pointer control 52-3, 125
lines 17, 27, 49, 53, 58, 60, 100, 111-12, 114, 125
links 2, 25, 77, 125
list input 47-9, 102, 125
listing 13-14, 17, 76, 125
location 19-21, 27, 29, 48, 77-8, 99, 125
log 2, 17-18, 23, 30-1, 125
Copyright The Art of Service
125 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
log message 99, 125
log Values 30, 125
loops 4, 29, 85, 93-5, 98, 108, 116, 125
Lower quartile 70, 125
lowercase 9, 14-15, 23, 89, 100, 125

M
match 41-2, 79, 88, 125
Match-merging 42, 107, 109, 117, 125
Matches observations 37, 126
MAX 18, 67, 69, 72, 126
maximum value 67, 69, 114, 126
memory 28, 126
merge 40-3, 82, 111, 115, 126
MERGE SAS-data-set-1 40, 42-3, 126
MERGE statement 41-2, 117, 126
MERGE statement Format 40, 43, 126
Miller 105, 126
minimum value 67, 69, 106, 126
missing value 49, 126
MISSOVER option 49, 54, 102, 109, 112, 126
modifiers 49, 91, 126
Modifying Character Values 89, 126
Modifying Date Values 88, 126
Modifying Numeric Values 92, 126
multiplication 100, 126

N
name 10-12, 14-16, 19-21, 24, 27-8, 31, 37-40, 42-3, 47-8, 50-1, 55,
63-4, 71-3, 77, 79, 81-2 [10]
negative numbers 22, 45, 117, 126
negative value 94, 126
newdata.dat 12, 126
NODATE 17, 126
NODETAILS 13, 126
Nonstandard numeric data values 22, 45, 126
NONUMBER 17, 126
NOTSORTED 58, 126
NULL statement 31, 126
number 11-12, 14-17, 20, 23, 28, 31, 38, 46, 51, 59-60, 67, 69-70, 88,
93-8, 104-6, 112-14 [11]
numbered heading 77, 126
numbered range 96, 126
numeric 16, 22-3, 70, 81, 87-8, 92, 96, 126
numeric data values 86, 126
numeric value 87, 126
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 126
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
numeric values 16, 55, 79, 86-8, 92, 97, 106-7, 112, 117, 126
numeric values 21, 79, 126
numeric values Representation 16, 126
numeric variables 16, 56, 58, 65, 67, 69-70, 72, 81, 88, 92, 96, 103,
126

O
obs 18, 127
OBS 18, 23, 31, 104, 113, 127
OBS column 56, 58, 127
Observation numbers 35, 55, 127
observations 3, 15, 17-18, 26, 28-9, 34-43, 52-3, 55-6, 66-7, 69-70,
73, 83, 94-5, 104-5, 107-8, 113-17 [6]
changing 98
combined 40
first 18, 29, 36, 38, 81
group 34, 105
group of 105, 114
last 18, 34, 36
multiple 3, 54, 106, 115
multiple SAS 54
observations change 95
reading 35, 129
sample 95
selecting 56, 107
single 37, 52, 105
observations 98, 127
rearranged 57
observations change 127
observations number 15, 127
ODS (Output Delivery System) 75-7, 127
ODS column format 74, 127
ODS destinations 75, 127
ODS statement 76, 127
One-dimensional Arrays 96, 127
One-sided confidence limit 69, 127
One-to-One Reading 37-8, 113, 127
online 2
operators 24-5, 56, 87, 127
comparison 25, 120
logical 25, 56, 82
OPTIONS statement 17, 127
OS 10-12, 127
output 3, 13-14, 17, 35, 40, 50, 56, 60-1, 65, 68-9, 71-2, 74-7, 113-14,
127
formatted 75-6
Copyright The Art of Service
127 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
printed 57, 107
output data 72, 127
output data set 41, 57, 83, 127
Output Delivery System, see ODS
output file 27, 50, 127
output objects 75, 127
output objects Output 75, 127
output SAS data set 71, 127
output-SAS-data-set names 37-40, 42-3, 128
OUTPUT statement 35, 94, 108, 116, 128
OUTPUT statement Format 35, 71, 128
output Links 77, 128

P
PAD option 47, 128
PAGEBY statement 59, 128
parentheses 18, 21, 25, 43, 82, 90, 94, 96-7, 101, 111, 128
percentile 70, 128
performance 104, 113, 128
physical names 11, 128
pointer 46, 53, 102, 112, 128
position 19, 45-6, 52, 90-1, 128
relative 37-8
predetermined number 47, 128
printing 17, 128
priorities 25, 100, 111, 128
PROC 8, 19-20, 23, 55, 59-60, 62-3, 68-72, 99, 106, 113-14, 128
PROC CONTENTS statement Format 13, 128
PROC DATASETS 13, 128
PROC DATASETS statement 13, 128
PROC EXPORT statement Format 50, 128
PROC FORMAT 78, 128
PROC FORMAT statement 78, 81, 128
PROC FREQ 72-4, 128
PROC REPORT 65-7, 128
PROC REPORT statement 63, 128
PROC REPORT statement Format 63, 128
PROC SORT statement 57, 128
PROC SORT statement Format 57, 128
PROC statement 9, 99, 121, 128
PROC step 98, 128
PROC steps 8, 16, 59, 98, 121, 128
PROC SUMMARY 72, 128
procedure output 120, 123
processing 9, 14, 26-8, 38, 71, 83-4, 99, 116, 128
Processing DATA Steps 3, 27, 128
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 128
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
processing errors 116, 128
program data vector 28-30, 36-7, 41, 50, 81, 93, 96, 110, 115, 117,
128
program data vector 36, 48, 128
program statement 128
final 20
Prometric 7, 128
publisher 5, 128
PUT 27, 31-2, 88, 128
PUT statement 27, 31, 88, 114, 128
PUT statement Format 27, 50, 129

Q
QTR value 88, 129
qualified name 20, 129
QUIT 13-14, 113, 129
QUIT statement 9, 13, 99, 103, 116, 129

R
random numbers 86, 129
range 18, 22, 49, 67, 69, 79, 81, 96, 106, 115, 129
raw data 20-3, 26-7, 29, 36, 44-6, 51-2, 78, 105, 114-15, 117, 129
raw data file 19, 23, 26, 29, 50, 105, 110, 114, 117, 129
character-delimited 50
Raw data files 19, 129
Raw Data Files 3, 19, 129
Raw data files, software programs use 19
Reading Data and Time Values 3, 51, 129
Reading Free Format Data 3, 47, 129
Reading Missing Values 49, 129
Rearrange observations 107, 129
Reasoning 111-17, 129
record formats 46, 129
common 46
fixed-length 47
records 19, 23, 27-8, 46-7, 49, 52-4, 105, 114, 129
fixed-length 46-7
free format 47
free-format 47
multiple 3, 52-3, 105
single fixed field 106
variable-length 47
reference 10, 19-20, 78, 96, 114, 117
RENAME 42, 129
rename variables 42, 129
replacement 1, 92, 129
Copyright The Art of Service
129 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
report 17, 55-7, 59, 63-6, 68, 72, 107, 109, 112, 115, 129
review 2
round-off-unit 92, 129
Rounded 61, 129
rows 15, 63, 66, 74, 98, 101, 129
order of 64, 66

S
Same-named variables 41-2, 130
sample 11, 127, 130
SAS 1, 6, 9-11, 19-20, 28, 30, 36, 41, 44-5, 49, 51-2, 80-2, 84, 93-4,
96-8, 102-3 [7]
SAS arrays 95, 108, 116, 130
SAS Certified Base Programmer for SAS 1, 3, 6-7, 130
SAS data 11, 51, 88, 130
SAS-data-library 11-12, 20, 55, 130
SAS data set 8, 13-16, 19, 21, 26-7, 33, 36-7, 55-7, 63, 68, 71-3, 98,
107, 130
SAS-data-set 33, 35, 50, 55, 57, 59, 63, 68, 71-3, 130
SAS-data-set names 50, 130
SAS-data-set-out 71, 130
SAS data set 8, 19, 57, 130
SAS Data Sets 3, 14, 19, 33, 37, 130
SAS data step 23, 130
SAS datetime value 51, 88, 130
SAS expressions 24, 56, 84, 93, 130
valid 24, 26, 81-2, 94
SAS files 10, 14, 130
SAS formats 17, 61, 130
SAS functions 85-6, 100, 111, 130
SAS library 13-14, 130
SAS listing output 60, 130
SAS log 9, 99, 130
SAS name 23, 130
SAS output of library contents 13, 130
SAS procedure output Markup 130
traditional 75
SAS procedure steps 83, 130
SAS programming 1, 6, 130
SAS programs 8-9, 26, 53, 75, 98-100, 110, 112, 130
SAS Programs 3, 8, 130
SAS session 11, 21, 60, 78, 108, 130
SAS statement 84, 130
executable 82, 84
SAS stores 51, 88, 130
SAS System Outputs 3, 17, 130
Copyright The Art of Service
Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 130
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
SAS uses Output Delivery System 75, 130
scan 48, 89-90, 130
SCAN function 89-90, 130
search 80, 91, 130
character variable's values 89
Searches 89, 130
seconds 51-2, 88, 112, 130
semicolons 8, 26, 100-1, 109, 111, 130
omitted 99
separators 89-91, 131
sequences 24-5, 131
session 10-11, 131
SET SAS-data-set 33, 35, 131
SET SAS-data-set-1 37-9, 131
SET statement 33-8, 95, 101, 114, 131
SET statement Format 26, 38-9, 131
Set.name 108, 131
sign 46, 54, 81, 131
Single Records 3, 52, 54, 115, 131
source 18, 87-8, 91-2, 131
spaces 9, 61, 65, 86, 96, 101, 131
decimal 61-2
spacing 64-5, 131
extra 60
specifier 54, 109, 116, 131
squares 67, 69, 131-2
ss 52, 131
Standard error 67, 69, 131
Standard numeric data values 22, 45, 110, 117, 131
statement 20, 26-8, 34, 42-3, 55-6, 58-60, 63-6, 71, 77, 81-5, 94, 97-
101, 103-4, 106-9, 111, 114-15 [3]
assignment 24, 82
global 17
infile 23
input 29
multiple 111
programming 103
statement Copyright 108, 131
statement executes 83, 131
statement Format 26, 35, 55, 58, 64, 68, 76-7, 82-4, 93-4, 131
statistic-keyword 68, 71-2, 131
statistics 66, 68, 70-2, 74, 105, 114, 131
summary 71-2
STD 67, 69, 72, 131
STDERR 67, 69, 131
step 8-9, 11, 19, 23, 30, 62, 99, 103, 106, 109, 126, 131
Copyright The Art of Service
131 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
stop 35, 93, 95, 131
STOP statement 35, 131
Stored Value 61-2, 131
string 31, 89-92, 131
subsetting 26, 42-3, 131
SUBSTR 89-90, 103, 131
SUBSTR function 90, 131
substring 56, 89, 91, 131
subtotals 58, 63, 132
Sum 67, 69, 132
SUM 58, 66-7, 69, 132
sum statement 81, 132
SUM statement 58, 103, 112, 132
SUM variable, changing report characteristics 55
Summarized output data set 72, 132
SUMWGT 67, 69, 132
system options 17-18, 65, 132

T
table 15, 73-4, 101, 105, 132
n-way 73-4
two-way 73-4
table of contents 77, 132
Table of Contents 3, 132
TABLES statement 74, 101, 132
TABLES statement Format 73, 132
TABLES variable-1 73-4, 132
Target variables 86, 90, 132
Temporary array elements 97, 132
Temporary files 10, 132
temporary variables 34, 36, 132
text 59-60, 132
time informats 51, 121, 132
time values 3, 51, 88, 129, 132
time values fractions 22, 132
time values fractions xf0b7 124
time number 15, 132
title 59-60, 132
higher-numbered 60
TITLE statement 59, 132
tracking pointers 41, 132
trade name 5, 132
trademarks 5, 132
trailing blanks 89-90, 113, 132
TRIM 89-90, 103, 113, 132

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 132
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
U
Uncorrected sum of squares 67, 69, 132
unmatched observations 42-3, 122, 132
uppercase 9, 14-15, 23, 89, 132
URL 77, 132
usage 64, 68, 132
user-defined formats 61, 64, 78-81, 132
User-Defined Formatting 3, 78, 132
USS 67, 69, 133

V
Valid data set names 14, 133
valid SAS variable name 22, 133
VALUE format name 79, 133
VALUE format-name 79, 133
VALUE statement 79-80, 106, 133
VALUE statement Copyright 106, 133
value 126, 133
smallest 57
value number 133
longest 81
value Numeric variables 16
values 15-16, 23-4, 27-9, 31, 33-8, 40-1, 46-9, 53, 56-7, 61-2, 65-8,
70-1, 73-4, 81-2, 91-4, 113 [15]
values 24, 86, 133
character variable's 130
VAR 67, 69, 133
VAR statement 56, 58, 70, 72, 133
variable-2 73-4, 133
Variable conditions 33-4, 133
variable informat 45, 51, 133
variable-length record format 47
variable names 133
invalid 28
variable values 64, 133
variablen 43, 133
variablen names 43, 133
variable's values 79, 133
variables number 15, 133
Variance 67, 69, 133

W
Weight variable values 67, 69, 133
when-expression-1 84, 133
width 17, 64-5, 133
maximum 16-17
Copyright The Art of Service
133 Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055
windowing mode 63, 133
Work.Formats 78, 133
Work‟ 10-11, 133
www.emereo.org 2

Copyright The Art of Service


Brisbane, Australia │ Email: service@theartofservice.com │Web: http://theartofservice.com 134
│eLearning: http://theartofservice.org
Phone: +61 (0)7 3252 2055