Beruflich Dokumente
Kultur Dokumente
Center of
Excellence
Agenda
Creating datasets using DATA step.
Infile statement
quality improvement
applications development.
What Is the SAS System?
In addition, you can integrate with SAS
many SAS business solutions that enable
you to perform large scale business
functions, such as
data access
management
analysis
presentation
Components of the SAS
Language
SAS Files
Files with formats or structures known to SAS.
All SAS files reside in a SAS data library.
SAS catalog
Many different kinds of information that are used in a SAS job
are stored in SAS catalogs, such as instructions for reading and
printing data values, or function key settings that you use in the
SAS windowing environment.
DATA Step
What Can the DATA Step Do?
Create multiple SAS data sets in one DATA
step.
DATA Step
What Can the DATA Step Do?
Combine existing data sets.
DATA Step
What Can the DATA Step Do?
You can also add or augment information
in a variety of ways.
Sale
Sale
SaleDate
SaleDate Amt
Amt Mth2Dte
Mth2Dte
01APR2001
01APR2001 498.49
498.49 498.49
498.49
02APR2001
02APR2001 946.50
946.50 1444.99
1444.99
03APR2001
03APR2001 994.97
994.97 2439.96
2439.96
04APR2001
04APR2001 564.59
564.59 3004.55
3004.55
05APR2001
05APR2001 783.01
783.01 3787.56
3787.56
What Can the DATA Step Do?
Manipulate numeric values.
BirthDay
4253
Age
SAS Function
30
What Can the DATA Step Do?
Summarize data sets.
Salary
Salary Div
Div
Div
Div DivSal
DivSal
42000
42000 HUMRES
HUMRES
34000
34000 FINACE
FINACE FINACE 42000
DATA Step FINACE 42000
27000
27000 FLTOPS
FLTOPS FLTOPS 46000
FLTOPS 46000
20000
20000 FINACE
FINACE HUMRES 73000
HUMRES 73000
19000
19000 FINACE
FINACE
19000
19000 FLTOPS
FLTOPS
What Can the DATA Step Do?
file management
MMDDYY10. Format
01 01 01 01
/01 /01 /01 /01
/19 /19 /19 /20
59 60 61 00
Standard Data
Hire
Obs EmpID Date Salary Bonus
DATA
DATASAS-data-set;
SAS-data-set;
INFILE
INFILE'raw-data-file'
'raw-data-file' <options>;
<options>;
INPUT
INPUT variable-specification
variable-specification…;
…;
variable-name=expression;
variable-name=expression;
data work.fltat1;
infile 'raw-data-file';
input @1 EmpID $5.
@7 HireDate date9.
@17 Salary 5.;
Bonus=.05*Salary;
run;
Create a SAS Data Set from a
Raw Data File
Partial Log
NOTE: 9 records were read from the infile
'fltat1.dat'.
The minimum record length was 21.
The maximum record length was 21.
NOTE: The data set WORK.FLTAT1 has
9 observations and 4 variables.
Overview of DATA Step
Processing
Processing the DATA Step
The SAS System processes the DATA step in two
phases:
compilation
execution.
When you submit a DATA step for execution, SAS
checks the syntax of the SAS statements and
compiles them. During the compile phase, SAS
creates the following three items
input buffer
is a logical area in memory into which SAS reads each
record of raw data when SAS executes an INPUT statement.
program data vector (PDV)
is a logical area in memory where SAS builds a data set, one
observation at a time. When a program executes, SAS reads
data values from the input buffer or creates them by
executing SAS language statements.
DATA Step Processing
The data values are assigned to the appropriate
variables in the program data vector. From here, SAS
writes the values to a SAS data set as a single
observation Along with data set variables and computed
variables, the PDV contains two automatic variables, _N_
and _ERROR_. The _N_ variable counts the number of
times the DATA step begins to iterate. The _ERROR_
variable signals the occurrence of an error caused by
the data during execution.
descriptor information
is information that SAS creates and maintains about
each SAS data set, including data set attributes and
variable attributes. It contains, for example, the name
of the data set and its member type, the date and time
that the data set was created, and the number, names
and data types (character or numeric) of the variables.
DATA Step Processing
The flow of action in the Execution Phase of a simple
DATA step
The DATA step begins with a DATA statement. Each time the
DATA statement executes, a new iteration of the DATA step
begins, and the _N_ automatic variable is incremented by 1.
SAS reads a data record from a raw data file into the input
buffer, or it reads an observation from a SAS data set directly
into the program data vector. You can use an INPUT, MERGE,
SET, MODIFY, or UPDATE statement to read a record.
INPUT, List - Scans the input data record for input values and
assigns them to the corresponding SAS variables
data scores;
infile datalines truncover;
input name $ 1-12 score1 17-20 score2 27-30;
datalines;
123456789101112131415161718192021222324252627282930
Riley 1132 987
Henderson 1015 1102
;
run;
Data Accessing - Column Input
To use column input, data values must be
in the same field on all the input lines
in standard numeric or character form.
Features of column input include the following
Character values can contain embedded blanks.
Character values can be from 1 to 32,767 characters long.
Placeholders, such as a single period (.), are not required for
missing data.
Input values can be read in any order, regardless of their position in
the record.
Values or parts of values can be reread.
Both leading and trailing blanks within the field are ignored.
Values do not need to be separated by blanks or other delimiters.
Use the TRUNCOVER option on the INFILE statement to ensure that
SAS handles data values of varying lengths appropriately.
Data Accessing - Formatted
Input
Formatted input combines the flexibility of using
informats with many of the features of column
input. By using formatted input, you can read
nonstandard data for which SAS requires
additional instructions. Formatted input is
typically used with pointer controls that enable
you to control the position of the input pointer in
the input buffer when you read data.
data scores;
input name $12. +4 score1 comma5. +6 score2
comma5.;
datalines;
Riley 1,132 1,187
Henderson 1,015 1,102
;
Data Accessing - Formatted
Input
Important points about formatted input are
Characters values can contain embedded blanks.
Character values can be from 1 to 32,767
characters long.
Placeholders, such as a single period (.), are not
required for missing data.
With the use of pointer controls to position the
pointer, input values can be read in any order,
regardless of their positions in the record.
Values or parts of values can be reread.
Formatted input enables you to read data stored
in nonstandard form, such as packed decimal or
numbers with commas.
Data Accessing - Named Input
You can use named input to read records in
which data values are preceded by the name of
the variable and an equal sign (=). The following
INPUT statement reads the data lines containing
equal signs.
data games;
input name=$ score1= score2=;
datalines;
name=abc score1=1132 score2=1187
;
run;
The MISSOVER Option
The MISSOVER option prevents SAS from loading
a new record when the end of the current record
is reached.
data airplanes3;
length ID $ 5;
infile 'raw-data-file' dlm=',' missover;
input ID $
InService : date9.
PassCap CargoCap;
run;
5 0 0 0 1 ,25feb1989 , . , 530
The DSD Option
General form of the DSD option in the INFILE
statement:
INFILE
INFILE‘file-name’
‘file-name’DSD;
DSD;
Missing Values without
Placeholders
The DSD option
sets the default delimiter to a comma
treats consecutive delimiters as missing values
enables SAS to read values with embedded
delimiters if the value is
surrounded by double quotes.
Using the DSD Option
data airplanes4;
length ID $ 5;
infile 'raw-data-file' dsd;
input ID $
InService : date9.
PassCap CargoCap;
run;
data address;
length LName FName $ 20
City $ 25 State $ 2
Phone $ 8;
infile 'raw-data-file' dlm=',';
Load Record input LName $ FName $;
Load Record input City $ State $;
Load Record input Phone $;
run;
Line Pointer Controls
You can also use line pointer controls to
control when SAS loads a new record.
DATA
DATASAS-data-set;
SAS-data-set;
INPUT
INPUTvar-1
var-1var-2
var-2var-3
var-3// var-4
var-4var-5;
var-5;
additional
additionalSAS
SAS statements
statements
data address;
length LName FName $ 20
City $ 25 State $ 2
Phone $ 8;
infile 'raw-data-file' dlm=',';
Load Record input LName $ FName $ / Load Record
City $ State $ / Load Record
Phone $;
run;
Reading Multiple Records Per
Observation
Partial Log
Sales Sale
ID Location Date Amount
The single trailing @ option holds a raw data record in the input
buffer until SAS
executes an INPUT statement with no trailing @
reaches the bottom of the DATA step.
INPUT
INPUT var1
var1 var2
var2 var3
var3 …
…@;
@;
Reading Raw Data Files with
Multiple Records Per
Observation
Processing the Trailing @
EmpID Contrib
Desired Output
E00973 1400
E09872 2003
E73150 2400
E45671 4500
E34805 1980
Multiple Observations Per
Record
Processing: What Is Required?
E00973 1400 E09872 2003 E73150 2400
INPUT
INPUT var1
var1 var2
var2 var3
var3…
…@@;
@@;
data work.retire;
length EmpID $ 6;
infile 'raw-data-file';
input EmpID $ Contrib @@;
run; Hold until end of
record.
Multiple Observations Per
Record
Partial Log
NOTE: 2 records were read from the
infile 'retire.dat'.
The minimum record length was 35.
The maximum record length was 36.
NOTE: SAS went to a new line when INPUT
statement reached past the end of
a line.
NOTE: The data set WORK.RETIRE has
6 observations and 2 variables.
The “SAS went to a new line” message is expected because the
@@ option indicates that SAS should read until the end of each
record.
Multiple Observations Per
Record
Trailing @ Versus Double Trailing @
Option Effect
Trailing @ Holds raw data record until
1) an INPUT statement with no trailing @
INPUT var-1... @; 2) the bottom of the DATA step.
Double trailing @ Holds raw data records in input buffer until
SAS reads past end of line.
INPUT var-1 ... @@;
Reading Hierarchical Raw Data
Files
Processing Hierarchical Files
•Many files are hierarchical in structure, consisting of a
•header record
•one or more related detail records. Header
Detail
Typically, each record contains a field
Detail
that identifies whether it is a header
record or a detail record. Header
Header
Detail
Header
Detail
Detail
Reading Hierarchical Raw Data
Files
Processing Hierarchical Files
Heade Heade
You can read a hierarchical file into a SAS data set by creating
r1 r and storing Detail
observationDetail
per detail record the header informatio
Variab Variab
1
of each observation. les les
Detail
2
Hierarchica Heade Detail
Detail r 1 SAS Data1Set
l File
3 Heade Detail
Heade r1 2
r2 Heade Detail
Detail r1 3
1 Heade Detail
Heade r2 1
r3 Heade Detail
Detail r3 1
1 Heade Detail
Detail r3 2
2
Reading Hierarchical Raw Data
Files
Creating One Observation Per Detail
RETAIN
RETAINvariable-name
variable-name<initial-value>;
<initial-value>;
ID PDV
$
5
Raw Data File Compile data airplanes;
length ID $ 5;
50001 4feb1989 132 530
infile 'raw-data-file';
50002 11nov1989 152 540
50003 22oct1991 90 530
input ID $
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
...
Raw Data File Execute data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
. . .
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
. . .
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50001 .
10627 .
132 .
530
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return Implicit
Input Buffer output
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50001 .
10627 .
132 .
530
Write out observation to airplanes. ...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit
Input Buffer output
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50001 .
10627 .
132 .
530
Write out observation to airplanes. ...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50001 .
10627 .
132 .
530
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
. . .
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50002 .
10907 .
152 .
540
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530
infile 'raw-data-file';
50002 11nov1989 152 540
50003 22oct1991 90 530
input ID $
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return Implicit
Input Buffer output
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50002 .
10907 .
152 .
540
Write out observation to airplanes. ...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit
Input Buffer output
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50002 .
10907 .
152 .
540
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Implicit return
Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
50002 .
10907 .
152 .
540
...
Raw Data File data airplanes;
length ID $ 5;
50001 4feb1989 132 530 infile 'raw-data-file';
50002 11nov1989 152 540
input ID $
50003 22oct1991 90 530
50004 4feb1993 172 550 InService : date9.
50005 24jun1993 170 510 PassCap CargoCap;
50006 20dec1994 180 520 run;
Input Buffer
Continue processing until
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
end of the raw data file.
PDV
ID INSERVICE PASSCAP CARGOCAP
$ N N N
5 8 8 8
. . .
Output of Dataset
In Pass Cargo
ID Service Cap Cap
Concatenating
Concatenating the data sets
appends the observations
from one data set to another
data set. The DATA step reads
DATA1 sequentially until all
observations have been
processed, and then reads
DATA2. Data set COMBINED
contains the results of the
concatenation.
Combine SAS data sets
Interleaving
intersperses
observations from two
or more data sets,
based on one or more
common variables.
Combine SAS data sets
One-to-One Reading and One-to-
One Merging.
One-to-one reading combines
observations from two or more SAS
data sets by creating observations
that contain all of the variables
from each contributing data set.
Observations are combined based
on their relative position in each
data set. The DATA step stops after
it has read the last observation
from the smallest data set. One-to-
one merging is similar to a one-to-
one reading, with two exceptions:
you use the MERGE statement
instead of multiple SET statements,
and the DATA step reads all
observations from all data sets.
Combine SAS data sets
Match merging
combines observations
from two or more SAS
data sets into a single
observation in a new
data set based on the
values of one or more
common variables.
Combine SAS data sets
Identifying Data Set Contributors
When you read multiple SAS data sets in one DATA step, you
can use the IN= data set option to detect which data set
contributed to an observation.
General form of the IN= data set option:
SAS-data-set(IN=variable)
SAS-data-set(IN=variable)
where variable is any valid SAS variable name.
Variable is a temporary numeric variable with a value of:
0 to indicate false; the data set did not contribute to the current
observation
1 to indicate true; the data set did contribute to the current
observation
Combine SAS data sets
IN= Data Set Option
Transact Branch
Num Trans Amnt Num Branch
111 D 126.32 111 M.G.Road
111 C 560 112 Sivaji Nagar
113 C 235 114 Madiwala
114 D 14.56 115 Koramangala
116 C 371.69 116 BTM
transactions
this week.
■ A data set named noacct shows transactions with no
matching
Combine SAS data sets