Sie sind auf Seite 1von 108

Accessing Data

Last Updated : 29 June, 2004

Center of
Excellence
Agenda
Creating datasets using DATA step.

Infile statement

Different input styles.

Combining datasets using DATA step


What Is the SAS System?
 The SAS System is an integrated system of software products
that enables you to perform

 data entry, retrieval, and management

 report writing and graphics

 statistical and mathematical analysis

 business planning, forecasting, and decision support

 operations research and project management

 quality improvement

 applications development.
What Is the SAS System?
In addition, you can integrate with SAS
many SAS business solutions that enable
you to perform large scale business
functions, such as

 Data Warehousing and Data Mining

 Human Resources Management and Decision


Support

 Financial Management and Decision Support


Overview of Base SAS Software
 The core of the SAS System is base SAS software
 SAS language
 a programming language that you use to manage your
data.
 SAS procedures
 software tools for data analysis and reporting.
 Macro facility
 a tool for extending and customizing SAS software
programs and for reducing text in your programs.
 DATA step debugger
 a programming tool that helps you find logic problems in
DATA step programs.
 Output Delivery System (ODS)
 a system that delivers output in a variety of easy-to-
access formats, such as SAS data sets, listing files, or
Hypertext Markup Language (HTML).
Base SAS

data access
management
analysis
presentation
Components of the SAS
Language
 SAS Files
 Files with formats or structures known to SAS.
 All SAS files reside in a SAS data library.

 SAS data set


 is structured in a format that SAS can process.

 SAS catalog
 Many different kinds of information that are used in a SAS job
are stored in SAS catalogs, such as instructions for reading and
printing data values, or function key settings that you use in the
SAS windowing environment.

 SAS stored program


 contains compiled code that you create and save for repeated
use.
Components of the SAS
Language
SAS Data Sets
SAS data file
 both describes and physically stores data values.
 descriptor portion
– describes the contents of the SAS data set to SAS.
 Data portion
– data that has been collected or calculated.
– An observation is a collection of data values that usually
relate to a single object. A variable is the set of data values
that describe a given characteristic.
 SAS data view
 does not actually store values but create logical SAS data
sets without using the storage space required by SAS data
files.
Structure of SAS Data Sets
SAS Data Set
General Data Set Information
Name Number of Obs.
*Label Number of Variables
Descriptor *Date/Time Created
Portion Storage Information
Information for Each Variable
Name Type Length Position
*Label *Format *Informat
IDNUM NAME WAGECAT WAGERATE
1351 Farr, Sue S 3392.50
161 S 5093.75
Data 212 Moore, Ron S .
Portion 2512 Ruth, G H S 1572.50
...
...
...
5151 Coxe, Susan S 3163.00
Components of the SAS
Language
 External Files
 Data files that you use to read and write data, but which
are in a structure unknown to SAS.
 External files can be used for storing
 raw data that you want to read into a SAS data file
 SAS program statements
 procedure output

 Database Management System Files


 SAS software is able to read and write data to and from
other vendors' software, such as many common database
management system (DBMS) files.
 In addition to base SAS software, you must license the
SAS/ACCESS software for your DBMS and operating
environment.
Components of the SAS
Language
 SAS Language Elements
 DATA step
 consists of a group of statements in the SAS language that reads
raw data or existing SAS data sets to create a SAS data set.
 PROC step
 A group of procedure statements used to analyze data in SAS data
sets to produce statistics, tables, reports, charts, and plots, to
create SQL queries, and to perform other analyses and operations
on your data. They also provide ways to manage and print SAS
files.
 SAS Macro Facility
 a powerful programming tool for extending and customizing your
SAS programs, and for reducing the amount of code that you must
enter to do common tasks. Macros are SAS files that contain
compiled macro program statements and stored text.
What Can the DATA Step Do?
You can use the DATA step in the
following ways to transform your
information:

 Read from a raw data file into the SAS system.


SAS Data Set
Raw Data File
Descriptor

DATA Step
What Can the DATA Step Do?
 Create multiple SAS data sets in one DATA
step.

DATA Step
What Can the DATA Step Do?
 Combine existing data sets.

SAS Data Set 1 SAS Data Set 2

DATA Step
What Can the DATA Step Do?
You can also add or augment information
in a variety of ways.

 Create accumulating totals.

Sale
Sale
SaleDate
SaleDate Amt
Amt Mth2Dte
Mth2Dte

01APR2001
01APR2001 498.49
498.49 498.49
498.49
02APR2001
02APR2001 946.50
946.50 1444.99
1444.99
03APR2001
03APR2001 994.97
994.97 2439.96
2439.96
04APR2001
04APR2001 564.59
564.59 3004.55
3004.55
05APR2001
05APR2001 783.01
783.01 3787.56
3787.56
What Can the DATA Step Do?
 Manipulate numeric values.

BirthDay
4253

Age
SAS Function
30
What Can the DATA Step Do?
 Summarize data sets.

Salary
Salary Div
Div
Div
Div DivSal
DivSal
42000
42000 HUMRES
HUMRES
34000
34000 FINACE
FINACE FINACE 42000
DATA Step FINACE 42000
27000
27000 FLTOPS
FLTOPS FLTOPS 46000
FLTOPS 46000
20000
20000 FINACE
FINACE HUMRES 73000
HUMRES 73000
19000
19000 FINACE
FINACE
19000
19000 FLTOPS
FLTOPS
What Can the DATA Step Do?

presenting your data

 file management

And much, much more……………


Words in the SAS Language

 A word or token in the SAS language is a collection of


characters that communicates a meaning to SAS.
 A word or token ends when SAS encounters one of the
following:
 the beginning of a new token
 a blank after a name or a number token
 the ending quotation mark of a literal token.

 Each word or token in the SAS language classified into four


categories.
 Names - a series of characters that begin with a letter or an
underscore. Ex.: data, _old, yearcutoff, _n_, year_04, descending
 Literal - consists of 1 to 32,767 characters enclosed in single or
double quotation marks ( ‘Bangalore’, “2003-04”, ‘ Wipro”s Plan’,
 "Report for the Third Quarter" )
Words in the SAS Language
 Number - in general is composed entirely of
numeric digits, with an optional decimal point and
a leading plus or minus sign. SAS also recognizes
numeric values in the following forms as number
tokens: scientific (E-) notation, hexadecimal
notation, missing value symbols, and date and
time literals.
 Ex: 1234, -2004, 1.25, 5.4E-1, ’30jun04'd

 Special character - is usually any single keyboard


character other than letters, numbers, the
underscore, and the blank. In general, each special
character is a single token, although some two-
character operators, such as ** and <=, form
single tokens.
 Ex: =, :, @, ‘, +, /
Names in the SAS Language
 A SAS name is a name token that represents
 variables
 SAS data sets
 formats or informats
 SAS procedures
 options
 arrays
 statement labels
 SAS macros or macro variables
 SAS catalog entries
 librefs or filerefs.
 There are two kinds of names in SAS.
 names of elements of the SAS language
 names supplied by SAS users.
Names in the SAS Language
 Rules for User-Supplied SAS Names

 Members of SAS data libraries - 32


 (SAS data sets, views, catalogs, indexes)
 Generation data sets - 28
 Catalog entries - 32
 Engines, Librefs, Filerefs, Passwords -8
 DATA step variables - 32
 DATA step variable labels - 256
 DATA step statement labels - 32
 Arrays - 32
 Functions - 16
 Formats -8
 Informats -7
 Macros, Macro variables - 32
Names in the SAS Language
 Rules for User-Supplied SAS Names
 The first character must be a letter (A, B, C, . . ., Z) or underscore (_). Subsequent
characters can be letters, numeric digits (0, 1, . . ., 9), or underscores.
 You can use upper or lowercase letters. SAS processes names as uppercase
regardless of how you type them.
 Blanks cannot appear in SAS names.
 SAS reserves a few names for automatic variables and variable lists, SAS data
sets, and librefs.
 When creating variables, do not use the names of special SAS automatic variables (for
example, _N_ and _ERROR_) or special variable list names (for example, _CHARACTER_,
_NUMERIC_, and _ALL_).
 When associating a libref with a SAS data library, do not use SASHELP, SASMSG,
SASUSER, WORK .
 When you create SAS data sets, do not use _NULL_, _DATA_, _LAST_.
Names in the SAS Language
Rules for User-Supplied SAS Names
 Special characters, except for the underscore,
are not allowed. In filerefs only, you can use
the dollar sign ($), pound sign (#), and at sign
(@).
 When assigning a fileref to an external file, do
not use:
 SASCAT.
 When you create a macro variable, do not use
names that begin with SYS
SAS Dates
SAS dates are special numeric values
representing the number of days between
January 1, 1960 and a specified date.
1jan1959 1jan1960 1jan1961 1jan2000
DATE9. Informat

-365 0 366 14610


SAS Date SAS Date
Values Values

MMDDYY10. Format
01 01 01 01
/01 /01 /01 /01
/19 /19 /19 /20
59 60 61 00
Standard Data

The term standard data refers to character and


numeric data that SAS recognizes automatically.

Some examples of standard numeric data include


 35469.93
 3E5 (exponential notation)
 -46859

Standard character data is any character you can


type on your keyboard. Standard character
values are always left-justified by SAS.
Nonstandard Data
The term nonstandard data refers to
character and numeric data that SAS does
not recognize automatically.

Examples of nonstandard numeric data


include
 12/12/2012
 29FEB2000
 4,242
 $89,000
Create a SAS Data Set from a Raw
Data File

E1232 15OCT1999 61065 A raw data file


E2341 01JUN1997 91688 contains employee
E3452 26OCT1993 32639 information for
E6781 16SEP1992 28305 the level 1 flight
E8321 26NOV1996 40440 attendants. Use the
E1052 27FEB1997 39461 raw data file to
E1062 10MAY1987 41463 create the work.fltat1
E8172 06JAN2000 40650 SAS data set.
E1091 20AUG1991 40950
Desired Output

Hire
Obs EmpID Date Salary Bonus

1 E1232 14532 61065 3053.25


2 E2341 13666 91688 4584.40
3 E3452 12352 32639 1631.95
4 E6781 11947 28305 1415.25
5 E8321 13479 40440 2022.00
6 E1052 13572 39461 1973.05
7 E1062 9991 41463 2073.15
8 E8172 14615 40650 2032.50
9 E1091 11554 40950 2047.50
The DATA Statement
A DATA step always begins with a DATA
statement.

General form of a DATA statement:

DATA
DATASAS-data-set;
SAS-data-set;

The DATA statement starts the DATA step


and names the SAS data set being created.
The INFILE Statement
If you are reading data from a raw data file,
you need an INFILE statement.

General form of an INFILE statement:

INFILE
INFILE'raw-data-file'
'raw-data-file' <options>;
<options>;

The INFILE statement points to the raw


data file being read. Options in the INFILE
statement affect how SAS reads the raw
data file.
The INPUT Statement
When you are reading from a raw data file, the
INPUT statement follows the INFILE statement.

General form of an INPUT statement:

INPUT
INPUT variable-specification
variable-specification…;
…;

The INPUT statement describes the raw data


fields and specifies how you want them
converted into SAS variables.
Formatted Input
The input style tells SAS where to find the
fields and how to read them into SAS.
INPUT
INPUT@n
@nvariable-name
variable-name informat.
informat. ...;
...;

@n - moves the pointer to the starting


point of the field.
variable-name - names the SAS variable
being created.
Informat - specifies how many positions to
read and how to convert the raw data into
a SAS value.
The INPUT Statement
Common SAS informats:

$w. - reads a standard character field, where w


specifies the width of the field in bytes.

W - reads a standard numeric field, where w


specifies the width of the field in bytes.

DATE9. - reads dates in the form 31DEC2012.


The Assignment Statement
To create a new variable in the DATA step,
use an assignment statement:

variable-name=expression;
variable-name=expression;

The assignment statement creates a SAS


variable and specifies how to calculate that
variable's value.
Create a SAS Data Set from a
Raw Data File

data work.fltat1;
infile 'raw-data-file';
input @1 EmpID $5.
@7 HireDate date9.
@17 Salary 5.;
Bonus=.05*Salary;
run;
Create a SAS Data Set from a
Raw Data File
Partial Log
NOTE: 9 records were read from the infile
'fltat1.dat'.
The minimum record length was 21.
The maximum record length was 21.
NOTE: The data set WORK.FLTAT1 has
9 observations and 4 variables.
Overview of DATA Step
Processing
Processing the DATA Step
 The SAS System processes the DATA step in two
phases:
 compilation
 execution.
 When you submit a DATA step for execution, SAS
checks the syntax of the SAS statements and
compiles them. During the compile phase, SAS
creates the following three items
 input buffer
 is a logical area in memory into which SAS reads each
record of raw data when SAS executes an INPUT statement.
 program data vector (PDV)
 is a logical area in memory where SAS builds a data set, one
observation at a time. When a program executes, SAS reads
data values from the input buffer or creates them by
executing SAS language statements.
DATA Step Processing
 The data values are assigned to the appropriate
variables in the program data vector. From here, SAS
writes the values to a SAS data set as a single
observation Along with data set variables and computed
variables, the PDV contains two automatic variables, _N_
and _ERROR_. The _N_ variable counts the number of
times the DATA step begins to iterate. The _ERROR_
variable signals the occurrence of an error caused by
the data during execution.

 descriptor information
 is information that SAS creates and maintains about
each SAS data set, including data set attributes and
variable attributes. It contains, for example, the name
of the data set and its member type, the date and time
that the data set was created, and the number, names
and data types (character or numeric) of the variables.
DATA Step Processing
 The flow of action in the Execution Phase of a simple
DATA step
 The DATA step begins with a DATA statement. Each time the
DATA statement executes, a new iteration of the DATA step
begins, and the _N_ automatic variable is incremented by 1.

 SAS sets the newly created program variables to missing in


the program data vector (PDV).

 SAS reads a data record from a raw data file into the input
buffer, or it reads an observation from a SAS data set directly
into the program data vector. You can use an INPUT, MERGE,
SET, MODIFY, or UPDATE statement to read a record.

 SAS executes any subsequent programming statements for


the current record.
DATA Step Processing
 At the end of the statements, an output, return, and reset occur
automatically. SAS writes an observation to the SAS data set, the
system automatically returns to the top of the DATA step, and
the values of variables created by INPUT and assignment
statements are reset to missing in the program data vector. Note
that variables that you read with a SET, MERGE, MODIFY, or
UPDATE statement are not reset to missing here.
 SAS counts another iteration, reads the next record or
observation, and executes the subsequent programming
statements for the current observation.
 The DATA step terminates when SAS encounters the end-of-file in
a SAS data set or a raw data file.
DATA step
 Reading External File Data

 data bonus_04; [1]


 infile 'your-input-file'; [2]
 input IDnumber name $ salary ; [3]
 bonus=salary * 0.25; [4]
 run; [5]

 1- Begin the DATA step and create a SAS data set


called bonus_04.
 2- Specify the external file that contains your data.
 3- Read a record and assign values to three
variables.
 4- Calculate a value for variable bonus.
 5- Execute the DATA step.
DATA step Input Styles
 The INPUT statement reads raw data from instream
data lines or external files into a SAS data set and input
styles depending on the layout of data values in the
records.

 INPUT, Formatted - Reads input values from specified columns


and assigns them to the corresponding SAS variables

 INPUT, Column - Reads input values with specified informats and


assigns them to the corresponding SAS variables

 INPUT, List - Scans the input data record for input values and
assigns them to the corresponding SAS variables

 INPUT, Named - Reads data values that appear after a variable


name that is followed by an equal sign and assigns them to
corresponding SAS variables
DATA step
An informat is an instruction that SAS uses to read
data values into a variable.
 The INPUT statement with an informat after a variable name
is the simplest way to read values into a variable.
 $w. - Reads standard character data
 DATEw. - Reads date values in the form ddmmmyy or
ddmmmyyyy
 MMDDYYw. - Reads date values in the form mmddyy or
mmddyyyy
 w.d - Reads standard numeric data
 COMMAw.d - Removes embedded characters
Accessing Data – List input
 List input uses a scanning method for locating data
values. Data values are must be separated by at
least one blank (or other defined delimiter). List
input requires only that you specify the variable
names and a dollar sign ($), if defining a character
variable.
 Libname new “/wipro/dw/data”;
 data new.scores;
 length name $ 12;
 input name $ score1 score2;
 datalines;
 aaaaaa 1132 1187
 bbbbbbbb 1015 1102
 cccc 246 357
 ;
 Run;
Modified List Input
data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
run;
output
Name Score1 Score2 Score3 Team Div
Smith 12 22 46 "Green Hornets, Atlanta“ AAA
Mitchel 23 19 25 "High Volts, Portland“ AAA
Jones 09 17 54 "Vulcans, Las Vegas“ AA
Modified List Input
 The : (colon) format modifier enables you to use list input
but also to specify an informat after a variable name,
whether character or numeric. SAS reads until it
encounters a blank column.
 The ~ (tilde) format modifier enables you to read and
retain single quotation marks, double quotation marks, and
delimiters within character values.
 If you want SAS to read consecutive delimiters as though
there is a missing value between them, specify the DSD
option in the INFILE statment.
 To read and store a character input value longer than 8
bytes, define a variable's length by using a LENGTH,
INFORMAT.
 Character values cannot contain embedded blanks when
the file is delimited by blanks.
 Fields must be read in order.
 Data must be in standard numeric or character format.
Data Accessing - Column Input
 Column input enables you to read standard data values that
are aligned in columns in the data records. Specify the
variable name, followed by a dollar sign ($) if it is a
character variable, and specify the columns in which the
data values are located in each record:

data scores;
infile datalines truncover;
input name $ 1-12 score1 17-20 score2 27-30;
datalines;
123456789101112131415161718192021222324252627282930
Riley 1132 987
Henderson 1015 1102
;
run;
Data Accessing - Column Input
 To use column input, data values must be
 in the same field on all the input lines
 in standard numeric or character form.
 Features of column input include the following
 Character values can contain embedded blanks.
 Character values can be from 1 to 32,767 characters long.
 Placeholders, such as a single period (.), are not required for
missing data.
 Input values can be read in any order, regardless of their position in
the record.
 Values or parts of values can be reread.
 Both leading and trailing blanks within the field are ignored.
 Values do not need to be separated by blanks or other delimiters.
 Use the TRUNCOVER option on the INFILE statement to ensure that
SAS handles data values of varying lengths appropriately.  
Data Accessing - Formatted
Input
 Formatted input combines the flexibility of using
informats with many of the features of column
input. By using formatted input, you can read
nonstandard data for which SAS requires
additional instructions. Formatted input is
typically used with pointer controls that enable
you to control the position of the input pointer in
the input buffer when you read data.

data scores;
input name $12. +4 score1 comma5. +6 score2
comma5.;
datalines;
Riley 1,132 1,187
Henderson 1,015 1,102
;
Data Accessing - Formatted
Input
Important points about formatted input are
 Characters values can contain embedded blanks.
 Character values can be from 1 to 32,767
characters long.
 Placeholders, such as a single period (.), are not
required for missing data.
 With the use of pointer controls to position the
pointer, input values can be read in any order,
regardless of their positions in the record.
 Values or parts of values can be reread.
 Formatted input enables you to read data stored
in nonstandard form, such as packed decimal or
numbers with commas.
Data Accessing - Named Input
 You can use named input to read records in
which data values are preceded by the name of
the variable and an equal sign (=). The following
INPUT statement reads the data lines containing
equal signs.

data games;
input name=$ score1= score2=;
datalines;
name=abc score1=1132 score2=1187
;
run;
The MISSOVER Option
 The MISSOVER option prevents SAS from loading
a new record when the end of the current record
is reached.

 General form of the INFILE statement with the


MISSOVER option:
INFILE
INFILE‘raw-data-file’
‘raw-data-file’MISSOVER;
MISSOVER;

 If SAS reaches the end of the row without finding


values for all fields, variables without values are
set to missing.
Using the MISSOVER Option

data airplanes3;
length ID $ 5;
infile 'raw-data-file' dlm=',' missover;
input ID $
InService : date9.
PassCap CargoCap;
run;

Raw Data File


50001 ,25feb1989,132, 530
50002, 11nov1989,152
50003, 22oct1991,168, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
Using the MISSOVER Option

Partial SAS Log

NOTE: 6 records were read from the infile


'aircraft3.dat'.
The minimum record length was 19.
The maximum record length was 26.
NOTE: The data set WORK.AIRPLANES3 has 6 observations
and 4 variables.
NOTE: DATA statement used:
real time 0.42 seconds
cpu time 0.07 seconds
Missing Values without
Placeholders
There is missing data represented by two
consecutive delimiters.

50001 ,25feb1989,, 540


50002, 11nov1989,132, 530
50003, 22oct1991,168, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180,520
Missing Values without
Placeholders
By default, SAS treats two consecutive
delimiters as one. Missing data should be
represented by a placeholder.

5 0 0 0 1 ,25feb1989 , . , 530
The DSD Option
 General form of the DSD option in the INFILE
statement:

INFILE
INFILE‘file-name’
‘file-name’DSD;
DSD;
Missing Values without
Placeholders
The DSD option
 sets the default delimiter to a comma
 treats consecutive delimiters as missing values
 enables SAS to read values with embedded
delimiters if the value is
 surrounded by double quotes.
Using the DSD Option
data airplanes4;
length ID $ 5;
infile 'raw-data-file' dsd;
input ID $
InService : date9.
PassCap CargoCap;
run;

50001 ,25feb1989,, 540


50002, 11nov1989,132, 530
50003, 22oct1991,168, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180,520
INFILE Statement Options
Problem Option
Non-blank delimiters DLM='delimiter(s)'

Missing data at end of row MISSOVER

Missing data represented by DSD


consecutive delimiters
or
Embedded delimiters where values
are surrounded by double quotes

These options can be used separately or


together in the INFILE statement.
Multiple Records Per
Observation

Farr, Sue A raw data file has


three records per
Anaheim, CA employee. Record 1
869-7008 contains the first and
Anderson, Kay B. last names, record 2
Chicago, IL contains the city and
483-3321 state of residence,
and record 3 contains
Tennenbaum, Mary Ann
the employee’s phone
Jefferson, MO number.
589-9030
Desired Output
The SAS data set should have one
observation per employee.
LName FName City State Phone

Farr Sue Anaheim CA 869-7008


Anderson Kay B. Chicago IL 483-3321
Tennenbaum Mary Ann Jefferson MO 589-9030
Multiple INPUT Statements

data address;
length LName FName $ 20
City $ 25 State $ 2
Phone $ 8;
infile 'raw-data-file' dlm=',';
Load Record input LName $ FName $;
Load Record input City $ State $;
Load Record input Phone $;
run;
Line Pointer Controls
You can also use line pointer controls to
control when SAS loads a new record.

DATA
DATASAS-data-set;
SAS-data-set;
INPUT
INPUTvar-1
var-1var-2
var-2var-3
var-3// var-4
var-4var-5;
var-5;
additional
additionalSAS
SAS statements
statements

SAS loads the next record when it


encounters a forward slash.
Reading Multiple Records Per
Observation

data address;
length LName FName $ 20
City $ 25 State $ 2
Phone $ 8;
infile 'raw-data-file' dlm=',';
Load Record input LName $ FName $ / Load Record
City $ State $ / Load Record
Phone $;
run;
Reading Multiple Records Per
Observation
Partial Log

NOTE: 9 records were read from


the infile 'addresses.dat'.
The minimum record length was 8.
The maximum record length was 20.
NOTE: The data set WORK.ADDRESS has
3 observations and 5 variables.
Reading Raw Data Files with
Multiple Records Per
Observation
Mixed Record Types

101 USA 1-20-1999 3295.50


3034 EUR 30JAN1999 1876,30
101 USA 1-30-1999 2938.00
128 USA 2-5-1999 2908.74
1345 EUR 6FEB1999 3145,60
109 USA 3-17-1999 2789.10
Reading Raw Data Files with
Multiple Records Per
Observation
Desired Output

Sales Sale
ID Location Date Amount

101 USA 14264 3295.50


3034 EUR 14274 1876.30
101 USA 14274 2938.00
128 USA 14280 2908.74
1345 EUR 14281 3145.60
109 USA 14320 2789.10
Reading Raw Data Files with
Multiple Records Per
Observation
The Single Trailing @

 The single trailing @ option holds a raw data record in the input
buffer until SAS
 executes an INPUT statement with no trailing @
 reaches the bottom of the DATA step.

 General form of an INPUT statement with the single trailing @:

INPUT
INPUT var1
var1 var2
var2 var3
var3 …
…@;
@;
Reading Raw Data Files with
Multiple Records Per
Observation
Processing the Trailing @

Hold record for next


INPUT statement.
Load next record.
input SalesID $ Location $ @;
if location='USA' then
input SaleDate : mmddyy10.
Amount;
else if Location='EUR' then
input SaleDate : date9.
Amount : commax8.;
Multiple Observations Per
Record
The raw data file RETIRE contains each
employee’s identification number and this
year’s contribution to his or her
retirement plan. Each record contains
information
E00973 for three
1400 E09872 employees.
2003 E73150 2400
E45671 4500 E34805 1980 E47200 4371

EmpID Contrib
Desired Output
E00973 1400
E09872 2003
E73150 2400
E45671 4500
E34805 1980
Multiple Observations Per
Record
 Processing: What Is Required?
E00973 1400 E09872 2003 E73150 2400

Read for Read for Read for


Obs. 1 Obs. 2 Obs. 3

Process Process Process


Other Other Other
Statements Statements Statements

Output Output Output


Multiple Observations Per
Record
The Double Trailing @
 The double trailing @ holds the raw data
record across iterations of the DATA step until
the line pointer moves past the end of the line.

INPUT
INPUT var1
var1 var2
var2 var3
var3…
…@@;
@@;

data work.retire;
length EmpID $ 6;
infile 'raw-data-file';
input EmpID $ Contrib @@;
run; Hold until end of
record.
Multiple Observations Per
Record
Partial Log
NOTE: 2 records were read from the
infile 'retire.dat'.
The minimum record length was 35.
The maximum record length was 36.
NOTE: SAS went to a new line when INPUT
statement reached past the end of
a line.
NOTE: The data set WORK.RETIRE has
6 observations and 2 variables.
The “SAS went to a new line” message is expected because the
@@ option indicates that SAS should read until the end of each
record.
Multiple Observations Per
Record
Trailing @ Versus Double Trailing @

Option Effect
Trailing @ Holds raw data record until
1) an INPUT statement with no trailing @
INPUT var-1... @; 2) the bottom of the DATA step.
Double trailing @ Holds raw data records in input buffer until
SAS reads past end of line.
INPUT var-1 ... @@;
Reading Hierarchical Raw Data
Files
Processing Hierarchical Files
•Many files are hierarchical in structure, consisting of a
•header record
•one or more related detail records. Header
Detail
Typically, each record contains a field
Detail
that identifies whether it is a header
record or a detail record. Header
Header
Detail
Header
Detail
Detail
Reading Hierarchical Raw Data
Files
Processing Hierarchical Files
Heade Heade
You can read a hierarchical file into a SAS data set by creating
r1 r and storing Detail
observationDetail
per detail record the header informatio
Variab Variab
1
of each observation. les les
Detail
2
Hierarchica Heade Detail
Detail r 1 SAS Data1Set
l File
3 Heade Detail
Heade r1 2
r2 Heade Detail
Detail r1 3
1 Heade Detail
Heade r2 1
r3 Heade Detail
Detail r3 1
1 Heade Detail
Detail r3 2
2
Reading Hierarchical Raw Data
Files
Creating One Observation Per Detail

E:Adams:Susan  The raw data file


DEPENDANTS has a
D:Michael:C header record
D:Lindsay:C containing the name of
E:Porter:David the employee and a
D:Susan:S detail record for each
dependant on the
E:Lewis:Dorian D. employee’s health
D:Richard:C insurance.
E:Dansky:Ian
E:Nicholls:James
D:Roberta:C
E:Slaydon:Marla
D:John:S
Reading Hierarchical Raw Data
Files
 Desired Output
 Personnel would like a list of all the dependants and the
name of the
 associated employee.

EmpLName EmpFName DepName Relation

Adams Susan Michael C


Adams Susan Lindsay C
Porter David Susan S
Lewis Dorian D. Richard C
Nicholls James Roberta C
Slaydon Marla John S
Reading Hierarchical Raw Data
Files
The RETAIN Statement
General form of the RETAIN statement:

RETAIN
RETAINvariable-name
variable-name<initial-value>;
<initial-value>;

The RETAIN statement prevents SAS from reinitializing the values of


new variables at the top of the DATA step. This means that values from
previous records are available for processing.
Reading Hierarchical Raw Data
Files
Hold EmpLName and EmpFName
data dependants(drop=Type);
length EmpLName EmpFName
DepName $ 20 Relation $ 1;
retain EmpLName EmpFName;
infile 'raw-data-file' dlm=':';
input Type $ @;
if Type='E' then
input EmpLName $ EmpFName $;
else do;
input DepName $ Relation $;
output;
end;
run;
Example – Modified List Input
This example explains
 Compilation Phase
 Execution Phase.
Example – Modified List
Input
Raw Data File Compile data airplanes;
length ID $ 5;
50001 4feb1989 132 530
infile 'raw-data-file';
50002 11nov1989 152 540
50003 22oct1991 90 530
input ID $
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer

ID PDV
$
5
Raw Data File Compile data airplanes;
length ID $ 5;
50001 4feb1989 132 530
infile 'raw-data-file';
50002 11nov1989 152 540
50003 22oct1991 90 530
input ID $
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

...
Raw Data File Execute data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

. . .
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer

5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

. . .
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer

5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50001 .
10627 .
132 .
530
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return Implicit
Input Buffer output

5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50001 .
10627 .
132 .
530
Write out observation to airplanes. ...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit
Input Buffer output

5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50001 .
10627 .
132 .
530
Write out observation to airplanes. ...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return
Input Buffer

5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50001 .
10627 .
132 .
530
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer

5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

. . .
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer

5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50002 .
10907 .
152 .
540
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530
infile 'raw-data-file';
50002 11nov1989 152 540
50003 22oct1991 90 530
input ID $
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return Implicit
Input Buffer output

5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50002 .
10907 .
152 .
540
Write out observation to airplanes. ...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit
Input Buffer output

5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50002 .
10907 .
152 .
540
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return
Input Buffer

5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

50002 .
10907 .
152 .
540
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;

Input Buffer
Continue processing until
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
end of the raw data file.

PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8

. . .
Output of Dataset

proc print data=airplanes noobs;


run;

In Pass Cargo
ID Service Cap Cap

50001 10627 132 530


50002 10907 152 540
50003 11617 168 530
50004 12088 172 550
50005 12228 170 510
50006 12772 180 520
Combining SAS Data Sets
Create a Data set from two or more existing
data sets by joining observation side-by-side
or appends the observations from one data
set to another data set.

Methods to combine SAS data sets


 concatenating
 interleaving
 one-to-one reading
 one-to-one merging
 match merging
 updating.
Combine SAS data sets

 Concatenating
 Concatenating the data sets
appends the observations
from one data set to another
data set. The DATA step reads
DATA1 sequentially until all
observations have been
processed, and then reads
DATA2. Data set COMBINED
contains the results of the
concatenation.
Combine SAS data sets
 Interleaving
 intersperses
observations from two
or more data sets,
based on one or more
common variables.
Combine SAS data sets
 One-to-One Reading and One-to-
One Merging.
 One-to-one reading combines
observations from two or more SAS
data sets by creating observations
that contain all of the variables
from each contributing data set.
Observations are combined based
on their relative position in each
data set. The DATA step stops after
it has read the last observation
from the smallest data set. One-to-
one merging is similar to a one-to-
one reading, with two exceptions:
you use the MERGE statement
instead of multiple SET statements,
and the DATA step reads all
observations from all data sets.
Combine SAS data sets
 Match merging
 combines observations
from two or more SAS
data sets into a single
observation in a new
data set based on the
values of one or more
common variables.
Combine SAS data sets
 Identifying Data Set Contributors
 When you read multiple SAS data sets in one DATA step, you
can use the IN= data set option to detect which data set
contributed to an observation.
 General form of the IN= data set option:

SAS-data-set(IN=variable)
SAS-data-set(IN=variable)
 where variable is any valid SAS variable name.
 Variable is a temporary numeric variable with a value of:
 0 to indicate false; the data set did not contribute to the current
observation
 1 to indicate true; the data set did contribute to the current
observation
Combine SAS data sets
 IN= Data Set Option
Transact Branch
Num Trans Amnt Num Branch
111 D 126.32 111 M.G.Road
111 C 560 112 Sivaji Nagar
113 C 235 114 Madiwala
114 D 14.56 115 Koramangala
116 C 371.69 116 BTM

■ A data set named Newtrans shows this week’s


transactions.
■ A data set named noactiv shows accounts with no

transactions
this week.
■ A data set named noacct shows transactions with no

matching
Combine SAS data sets

Num Trans Amnt Num Branch


111 D 126.32 111 M.G.Road
111 C 560 112 Sivaji Nagar
113 C 235 114 Madiwala
114 D 14.56 115 Koramangala
116 C 371.69 116 BTM

data Transact Branch


newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch);
merge prog2.transact(in=InTrans) prog2.branch(in=InBanks);
by ActNum;
if InTrans and InBanks then output newtrans;
else if InBanks and not InTrans then output noactiv;
else if InTrans and not InBanks then output noacct;
run;
Combine SAS data sets

A data set named Newtrans shows this


week’s transactions.
Num Trans Amnt Branch
111 D 126.32 M.G.Road
111 C 560 M.G.Road
114 D 14.56 Madiwala
116 C 371.69 BTM

A data set named noactiv shows accounts


with no transactions this week
Num . Branch
112 Sivaji Nagar
115 Koramangala
A data set named noacct shows
transactions with no matching account
number. Num Trans Amnt
113 C 235
Questions

Das könnte Ihnen auch gefallen