Sie sind auf Seite 1von 187

SAS Environment and Concepts of Libraries

SAS Training

Producing Descriptive Statistics:


PROC FREQ

Produces oneway and n-way frequency tables, and it concisely describes the data by reporting
the distribution of variable values

Create crosstabulation tables that summarize data for two or more categorical variables by
showing the number of observations for each combination of variable values

Can include many statements and options for controlling frequency output

By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative
frequency, and cumulative percent of every value of all variables in a data set

Syntax
Proc Freq Data = <SAS-data-set>;
Run;
Where,

SAS-data-set is the name of the data set to be used

Example:

proc freq data = parts.widgets;


run;

Here,

In the above program FREQ procedure creates a frequency table for each variable in
the data set Parts.Widgets

Specifying Variables in PROC FREQ:

To specify the variables to be processed by the FREQ procedure, include a TABLES statement

Syntax:
Proc Freq Data = <SAS-data-set> ;
Tables variable(s);
Run;
Where,

SAS-data-set is the name of the data set to be used

variable(s) lists the variables to include

Example:

proc freq data = finance.loans;


tables rate months;
run;
Here,

In the above program FREQ procedure creates a frequency table for variables rate and
months in the data set finance.loans
Rate

Frequency

Percent

9.50%

22.22

22.22

9.75%

11.11

33.33

10.00%

22.22

55.56

10.50%

44.44

100.00

Cumulative Frequency

Cumulative Percent

Month
s

Frequency

Percent

Cumulative Frequency

Cumulative Percent

12

11.11

11.11

24

11.11

22.22

36

11.11

33.33

48

11.11

44.44

60

22.22

66.67

360

33.33

100.00

Rate

Frequency

Percent

9.50%

22.22

22.22

9.75%

11.11

33.33

10.00%

22.22

55.56

10.50%

44.44

100.00

Cumulative Frequency

Cumulative Percent

Months

Frequency

Percent

Cumulative Frequency

Cumulative Percent

12

11.11

11.11

24

11.11

22.22

36

11.11

33.33

48

11.11

44.44

60

22.22

66.67

360

33.33

100.00

Creating Two-Way Tables:

Crosstabulate frequencies with the values of other variables

Simplest crosstabulation is a two-way table

To create a two-way table, join two variables with an asterisk (*) in the TABLES statement of a
PROC FREQ step

Syntax:
Proc Freq Data = <SAS-data-set>;
Tables variable-1 * variable-2 * . <variable-n>;
Run;
Where,

SAS-data-set is the name of the data set to be used

variable-1 specifies table rows

variable-2 specifies table columns

variable-n specifies a multi-way table.

Example:

proc freq data = clinic.diabetes;


tables weight * height;
run;
Here,

The above program creates the two-way table for variables weight and height

Creating N-Way Tables:

Create n-way crosstabulation tables

A series of two-way tables is produced, with a table for each level of the other variables

Example:

proc freq data = clinic.diabetes;


tables sex*weight*height;
run;

Here,

The above program will produce two crosstabulation tables, one for each value of Sex.

Suppressing Table Information:

Limit the output of the FREQ procedure to a few specific statistics

To control the depth of crosstabulation results, add a slash (/) and any combination of the
following options to the TABLES statement:

NOFREQ suppresses cell frequencies.


NOPERCENT suppresses cell percentages
NOROW suppresses row percentages.
NOCOL suppresses column percentages.

Example:
proc freq data = clinic.diabetes;
tables sex*weight / nofreq norow nocol;
run;
Here,

The result will contain the statistics percent only.

Output:

PROC MEANS

Provides mean, minimum, maximum and other data summarization tools, as well as helpful
options for controlling the output

Include many statements and options for specifying needed statistics

Syntax:
Proc Means <DATA=SAS-data-set> <statistic- keyword(s)> <option(s)>;
Run;
Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

Example:

proc means data = perm.survey;


run;

Here,

PROC MEANS prints the n-count (number of nonmissing values), the mean, the standard
deviation, and the minimum and maximum values of every numeric variable in the data set
perm.survey

Specifying Statistics:

To specify statistics, include statistic keywords as options in the PROC MEANS statement

When a statistic is specified in the PROC MEANS statement, default statistics are not produced

Example

proc means data=perm.survey median range;


run;
Here,

Means procedure prints only median and range for all the numeric variables

The following keywords can be used with PROC MEANS to compute statistics:
Descriptive Statistics
Keyword

Description

CLM

Two-sided confidence limit for the mean

CSS

Corrected sum of squares

CV

Coefficient of variation

KURTOSIS / KURT

Kurtosis

LCLM

One-sided confidence limit below the mean

MAX

Maximum value

MEAN

Average

MIN

Minimum value

Number of observations with non-missing values

NMISS

Number of observations with missing values

RANGE

Range

SKEWNESS / SKEW

Skewness

STDDEV / STD

Standard deviation

STDERR / STDMEAN

Standard error of the mean

SUM

Sum

SUMWGT

Sum of the Weight variable values

UCLM

One-sided confidence limit above the mean

USS

Uncorrected sum of squares

VAR

Variance

Quantile Statistics
Keyword

Description

MEDIAN / P50

Median or 50th percentile

P1

1st percentile

P5

5th percentile

P10

10th percentile

Q1 / P25

Lower quartile or 25th percentile

Q3 / P75

Upper quartile or 75th percentile

P90

90th percentile

P95

95th percentile

P99

99th percentile

QRANGE

Difference between upper and lower quartiles: Q3-Q1

Hypothesis Testing
Keyword

Description

PROBT

Probability of a greater absolute value for the t value

Student's t for testing the hypothesis that the population mean is 0

Specifying Variables in PROC MEANS:

By default, the MEANS procedure generates statistics for every numeric variable in a data set

To specify the variables that PROC MEANS analyzes, add a VAR statement and list the variable
names

Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Var variable(s);
Run;
Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

variable(s) lists numeric variables for which to calculate statistics

Example:

proc means data = clinic.diabetes min max;


var age height weight;
run;

Here,

The means procedure will calculate the result for age, height and weight only.

Group Processing Using the CLASS Statement:

Give statistics for grouped observations, instead of for observations as a whole

To produce separate analyses of grouped observations, add a CLASS statement to the MEANS
procedure

does not generate statistics for CLASS variables, because their values are used only to
categorize data

CLASS variables can be either character or numeric, but they should contain a limited number of
discrete values that represent meaningful groupings

Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Class variable(s);
Run;
Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

variable(s) specifies category variables for group processing

Example:

proc means data = clinic.heart;


var arterial heart cardiac urinary;
class survive sex;
run;

Here,

The output of the program shown above is categorized by values of the variables
Survive and Sex.

Group Processing Using the BY Statement:

Specifies variables to use for categorizing observations

Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
By variable(s);
Run;

Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

variable(s) specifies category variables for group processing

Example:

proc means data = work.heartsort;


var arterial heart cardiac urinary;
by survive sex;
run;

Here,

The output of the program shown above is categorized by values of the variables
Survive and Sex.

Creates a separate table for each value of the group

Differences Between BY and CLASS Statements:

Unlike CLASS processing, BY processing requires that the data is already sorted or indexed in
the order of the BY variables

BY group results have a layout that is different from the layout of CLASS group results.

Creating a Summarized Data Set Using PROC MEANS:

Create an output SAS data set that contains only the summarized variable

Syntax:
Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Output Out = SAS-data-set <statistic-keyword= variable-name(s)>;
Run;
Where.

SAS-data-set in the output statement specifies the name of the output data set

statistic-keyword= specifies the summary statistic to be written out

variable- name(s) specifies the names of the variables that will be created to contain
the values of the summary statistic. These variables correspond to the analysis
variables that are listed in the VAR statement.

Example:

proc means data = clinic.diabetes;


var age height weight;
class sex;
output out = work.sum_gender
mean = AvgAge AvgHeight AvgWeight
in = MinAge MinHeight MinWeight;
run;
Here,

Obs

Sex

The above program creates a typical PROC MEANS report and also creates a
summarized output data set that includes only the MEAN and MIN statistics
_TYPE_

_FREQ_

AvgAge

AvgHeight

AvgWeight

MinAge

MinHeight

MinWeight

20

46.7000

66.9500

174.650

15

61

102

11

48.9091

63.9091

150.455

16

61

102

44.0000

70.6667

204.222

15

66

140

Creating a Summarized Data Set Using PROC SUMMARY

Create a summarized output data set

Similar to means procedure

The difference between the two procedures is that PROC MEANS produces a report by default.
By contrast, to produce a report in PROC SUMMARY, must include a PRINT option in the PROC
SUMMARY statement.

Syntax:
Proc Summary Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>;
Run;
Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

Example:

proc summary data = clinic.diabetes;


var age height weight;
class sex;
output out = work.sum_gender
mean = AvgAge AvgHeight AvgWeight;
run;
Here,

The above program creates an output data set but does not create a report

Output Delivery System (ODS)

Use ODS statements to specify destinations for your output

Create output in a variety of formats

the Listing destination is open by default

Syntax:
ODS open-destination;
ODS close-destination CLOSE;
Where,

open-destination is a keyword and any required options for the type of output that is to be
created, such as
HTML FILE='html-file-pathname'
LISTING

close-destination is a keyword for the type of output

Example:

ods html body = c:\mydata.html';


proc print data = sasuser.mydata;
run;
ods html close;

Here,

The ods html statement creates an HTML output of the name mydata.html in the path
specified.

ODS Destinations:

The table that follows lists the ODS destinations that are supported.

This destination

Produces

HTML

output that is formatted in HyperText Markup Language


(HTML)

Listing

output that is formatted like traditional SAS procedure


(listing) output

Markup Language Family

output that is formatted using markup languages such as


Extensible Markup Language (XML)

ODS Document

a hierarchy of output objects that enables you to render


multiple ODS output without re-running procedures

Output

SAS data sets

Printer Family

output that is formatted for a high-resolution printer, such as


PostScript (PS), Portable Document Format (PDF), or
Printer Control Language (PCL) files

RTF

Rich Text Format output for use with Microsoft Word

Closing Multiple ODS Destinations Concurrently:


Produce output in multiple formats concurrently by opening each ODS destination at the
beginning of the program

The keyword _ALL_ is used in the ODS CLOSE statement to close all open destinations
concurrently

Syntax:

ODS open-destination1;
ODS open-destination2;
ODS _all_ CLOSE;
Where,

open-destination1 is a keyword and any required options for the first type of output
that is to be created

open-destination2 is a keyword and any required options for the second type of output
that is to be created

_all_ keyword close all open destinations concurrently

Example:

ods html file = 'c:\admit.html ;


ods pdf file = 'c:\admit.pdf' ;
proc print data = sasuser.admit;
run;
ods _all_ close;

Here,

The ods html statement creates an HTML output of the name admit.html in the path
specified

The ods pdf statement creates a PDF output of the name admit.pdf in the path specified

FILE= can also be used to specify the file that contains the HTML output. FILE= is an
alias for BODY=.

Creating HTML Output from Multiple Procedures:

Can also use the ODS HTML statement to direct the results from multiple procedures to the same
HTML file

Syntax:
ODS open-destination;
Procedure1
Procedure2
ODS close-destination CLOSE;
Where,

open-destination is a keyword and any required options for the type of output that is to
be created

Procedure1 is the proc step for first procedure

Procedure2 is the proc step for second procedure

close-destination is a keyword for the type of output

Example:

ods html body = c:\records\data.html';


proc print data = clinic.admit label;
var id sex age height weight actlevel;
label actlevel = 'Activity Level';
run;
proc tabulate data = clinic.stress2;
var resthr maxhr rechr;
table min mean, resthr maxhr rechr;
run;
ods html close;

Here,

The program above generates HTML output for the PRINT and TABULATE procedures

Creating and Applying User-Defined Formats

SAS Formats can be associate with variables either temporarily or permanently

User can create some of custom formats to apply on same variables. For example, we can format
a product number so that it is displayed as descriptive text

FORMAT procedure, can be used to create user defined formats for variables

Can store formats temporarily or permanently

Syntax:

Proc Format <options> ;


Value format-name
range1='label1'
range2='label2'
... ;
Where,

options includes :
Library= libref , specifies the libref for a SAS data library that contains a
permanent catalog in which user-defined formats are stored
Fmtlib , prints the contents of a format catalog

format-name names the format that is being created


must begin with a dollar sign ($) if the format applies to character data
cannot be longer than eight characters
cannot be the name of an existing SAS format
cannot end with a number
does not end in a period when specified in a VALUE statement

range specifies one or more variable values and a character string or an existing format
label is a text string enclosed in quotation marks

When PROC FORMAT is used to create a format, the format is stored in a format catalog

If the SAS data library does not already contain a format catalog, SAS automatically creates one

If LIBRARY= option is not specified, then the formats are stored in a default format catalog
named Work.Formats

Formats are stored in a permanent format catalog named Formats when we specify the
LIBRARY= option in the PROC FORMAT statement
PROC FORMAT LIBRARY=libref;

A LIBNAME statement needed to associates the libref with the permanent SAS data library in
which the format catalog is to be stored

It is recommended, but not required, to use the word Library as the libref when creating our own
permanent formats
libname library 'c:\ sas \formats\lib ;

Example:

Sample Data Set Empdata :

(Without Format)
FirstName

LastName

JobTitle

Salary

Donny

Evans

112

29996.63

Lisa

Helms

105

18567.23

John

Higgins

111

25309.00

Amy

Larson

113

32696.78

Mary

Moore

112

28945.89

Jason

Powell

103

35099.50

Here,

The values for JobTitle are coded, and they are not easily interpreted

Using proc format we can create a format for this variable which describes the values of
this variable

libname library 'c:\sas\formats\lib ;


proc format lib = library ;
value jobfmt
103='manager'
105='text processor'
111='assoc. technical writer'
112='technical writer'
113='senior technical writer ;
run;
Data empinfo ;
set empdata ;
format jobtitile jobfmt ;
run ;
Here,

The format JOBFMT is stored in a catalog named Library.Formats, which is located in


the directory C:\Sas\Formats\Lib in the Windows environment

The user defined format JOBFMT is used for formatting a variable called jobtitle

Format statement can be placed in either a DATA step or a PROC step

Output:

(With Format)

FirstName

LastName

JobTitle

Salary

Donny

Evans

technical writer

29996.63

Lisa

Helms

text processor

18567.23

John

Higgins

assoc. technical writer

25309.00

Amy

Larson

senior technical writer

32696.78

Mary

Moore

technical writer

28945.89

Jason

Powell

manager

35099.50

Example:

proc format lib = library;


value $ grade
'A'='Good'
'B'-'D'='Fair'
F'='Poor'
'I','U'='See Instructor';
run;

Here,

Format is created for character variable ( $ sign before the format name)
proc format lib= library;
value jobfmt
103='manager'
105='text processor'
111='assoc. technical writer'
112='technical writer'
113='senior technical writer';
run;

Here,

Format is created for numeric variable ( no $ sign before the format name)

Example: Specifying Value Ranges

proc format lib = library;


value agefmt
0-<13 = 'child'
13-<20 = 'teenager'
20-<65 = 'adult'
65-100 = 'senior citizen ;
run;
or
proc format lib = library;
value agefmt
low -<13 = child'
13-<20 = teenager'
20-<65 = 'adult'
65-high = 'senior citizen'
other = 'unknown';
run;

Defining Multiple Formats:


proc format lib=library;
value jobfmt
103='manager'
105='text processor'
111='assoc. technical writer'
112='technical writer'
113='senior technical writer ;
value $response
'Y'='Yes'
'N'='No'
'U'='Undecided'
'NOP'='No opinion ;
run;

To define several formats, use multiple VALUE statements in a single PROC FORMAT step

Displaying a List of Your Formats:


libname library 'c:\sas\formats\lib ;
proc format library = library fmtlib ;
run;
Adding the keyword FMTLIB to the PROC FORMAT statement displays a list of all the formats in the
catalog, along with descriptions of their values
Output:
SAS Output
Format Name: JobFmt Length: 23 Number of Values: 5
Min Length: 1 Max Length: 40 Default Length: 23 Fuzz: Std
START

END

LABEL (VER. 9.00 29AUG2002:11:13:14)

103

103

manager

105

105

text processor

111

111

assoc. technical writer

112

112

technical writer

113

113

senior technical writer

Proc Transpose

Restructures the data by changing the variables into observations

Syntax
PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET>
<NAME=name> <OUT=output-data-set> <PREFIX=prefix>;
BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>;
COPY variable (s);
ID variable;
VAR variable (s);
Run;
where,

Label assign a name to the variable that contains the label of the variable being transposed

Name assign a variable name to the variable that contains the name of the variable being
transposed

Prefix assign the prefix for the transposed variables. The default is COL, which would produce
COL1,COL2, COL3, etc

Var select which variables to transpose

By statement specifies to transpose within the certain combination of BY variables

Id use the values of variable listed as the names for the variables transposed

Copy transfers variables without transposing them

Original Dataset

Example:
proc transpose data=long1 out=wide1 prefix=faminc;
by famid ;
id year;
var faminc;
run;

Obs

famid

year

faminc

96

40000

97

40500

98

41000

96

45000

97

45400

98

45800

96

75000

97

76000

98

77000

Result Dataset
Obs

famid

_NAME_

faminc96

faminc97

faminc98

faminc

40000

40500

41000

faminc

45000

45400

45800

faminc

75000

76000

77000

Example:

proc transpose data=long1 out=wide1 prefix=faminc name=family;


by famid ;
id year;
var faminc;
run;

Obs

famid

family

faminc96

faminc97

faminc98

faminc

40000

40500

41000

faminc

45000

45400

45800

faminc

75000

76000

77000

Exporting Data
Export Using SAS GUI:

SAS GUI can be used to export a SAS dataset

SAS dataset can be exported as an external file of any type such as:
Excel (.xls)
SAS dataset (.sas7bdat)
Text (.txt)
CSV (.csv)
HTML (.html)
Microsoft Access Files (.mdb)

Exporting SAS data set Using Proc Export:


Syntax:

Proc Export Data= <SAS-data-set>


Outfile =filename | Outtable = <table-name>
Dbms = <identifier>
Replace ; delimiter=<character>;
Where,

Data=SAS-data-set :- identifies the input SAS data set with either a one- or two-level
SAS name (library and member name
Outfile="filename" :- specifies the complete path and filename of the output PC file,
spreadsheet, or delimited external file
Outtable="tablename" :- specifies the table name of the output DBMS table
DBMS=identifier :- specifies the type of data to export. For example, DBMS=DBF
specifies to export a dBASE file, DBMS=ACCESS exports a Microsoft Access table
REPLACE :- overwrites an existing file
Delimiter=<character> :- If DBMS=DLM then delimiter= <delimiting character> should
be specified>

Exporting a Delimited External File:


Example:
proc export data= myfiles.class outfile =d:/myfiles/class" dbms=dlm;
delimiter ='&';
run ;
Here,
A text file with delimiter as & is created at the path specified in outfile=

Exporting a to an Excel Spreadsheet:


Example:

proc export data = SASUSER.Accounts


outfile=c:\ myfiles\ accounts.xls ;
run;
Here,
An excel file is created at the path specified by outfile=

Exporting a Microsoft Access Table:


Example:
proc export data = sasuser.cust
Outtable ="customers
Dbms =access
Database ="c: \ myfiles\ mydatabase.mdb";
Run ;
Here,
An access file is created with table name customers in the database specified by Database=

General Form of SAS Functions

To use a SAS function, specify the function name followed by the function arguments, which are
enclosed in parentheses

Even if the function does not require arguments, the function name must still be followed by
parentheses

Unless the length of the target variable has been previously defined, a default length is assigned

Syntax:
function-name (argument-1 , <argument-n>);
where,

arguments can be
variables P H D Q x,y,z
constants P H D Q 456,502,612,498
expressions P H D Q 37*2,192/5 mean(22,34,56)

Example:

A function that contains multiple arguments

std(x1,x2,x3) ;

mean (of x1-x3) ;


AvgScore = sum (exam1,exam2,exam3) ;

Sum Function

Calculates the sum of values

Syntax:
sum( argument , argument,...)

where,

argument can be sas variables, constants and expressions

Example:

Data work.after;
Set work.before;
totalsal = sum (sal1,sal2,sal3);
Run;

Here,

The above program calculates the sum of the values in sal1, sal2 and sal3 variables.

MEAN Function

calculate the average of nonmissing values

Syntax:

mean (argument, argument,...)


where,

argument can be sas variables, constants and expressions

Example:

Data work.after;
Set work.before;
avg = mean (marks1,marks2,marks3);
Run;

Here,

The above program calculates the average of the values in marks1, marks2 and marks3
variables.

MIN Function

Finds the minimum value

Syntax:

min ( argument, argument,...)


where,

argument can be sas variables, constants and expressions

Example:

Data work.after;
Set work.before;
minimum =min (marks1,marks2,marks3);
Run;

Here,

The above program finds the minimum of the values in marks1, marks2 and marks3
variables.

MAX Function

Finds the maximum value

Syntax:
max(argument, argument,...)

where,

argument can be sas variables, constants and expressions

Example:

Data work.after;
Set work.before;
maximum =max (marks1,marks2,marks3);
Run;

Here,

The above program finds the maximum of the values in marks1, marks2 and marks3
variables.

VAR Function

calculates the variance of the values

Syntax:
var(argument, argument,...)

where,

argument can be sas variables, constants and expressions

Example:

Data work.after;
Set work.before;
variance = var (s1, s2, s3);
Run;

Here,

The above program calculate the variance of the values in s1, s2 and s3 variables.

STD Function

Calculates the standard deviation of the values

Syntax:

std(argument, argument,...)
where,

argument can be sas variables, constants and expressions

Example

Data work.after;
Set work.before;
stdev =std (s1, s2, s3);
Run;

Here,

The above program calculate the standard deviation of the values in s1, s2 and s3
variables.

Converting Data with Functions


INPUT function

Explicitly convert the character values to numeric values

Syntax:
INPUT (source, informat );
Where.

source indicates the character variable, constant, or expression to be converted to a


numeric value

informat is the numeric informat to be specified. When choosing the informat, be sure
to select a numeric informat that can read the form of the values.

Example

Data hrd.newtemp;
Set hrd.temp;
Test=input(saletest,comma9.);
Run;

Here,
The function uses the numeric informat COMMA9. to read the values of the character
variable SaleTest. Then the resulting numeric values are stored in the variable Test.
Character Value

Informat

2115233

7.

2,115,233

COMMA9.

PUT Function

Explicitly convert the numeric values to character values

Format specified in the PUT function must match the data type of the source

Syntax:
PUT(source,format) ;

Where,

source indicates the numeric variable, constant, or expression to be converted to a


character value

format specifies the matching data type of the source

The PUT function always returns a character string.

The PUT function returns the source written with a format.

The format must agree with the source in type.

Numeric formats right-align the result; character formats left-align the result.

If you use the PUT function to create a variable that has not been previously identified,
it creates a character variable whose length is equal to the format width.

Example

data hrd.newtemp;
set hrd.temp;
Assignment = put (site,2.) || '/ || dept;
run;

Here,

Because Site has a length of 2, its given 2. as the numeric format.

Put function converts the data type of site variable into character data type.

After that the value is concatenated and saved in the new variable assignment.

Manipulating SAS Date Values with Functions


YEAR Function

Extracts the year value from a SAS date value

Syntax:
YEAR (date);

Where,

date is a SAS date value that is specified either as a variable or as a SAS date constant

Example

Data hrd.temp98;
Set hrd.temp;
yr = year(startdate);
Run;

Here,

Year function extracts the year portion from the date value variable startdate and save it
in the new variable yr.

QTR Function

Extracts the quarter value from a SAS date value

Syntax:

QTR (date) ;
Where,

date is a SAS date value that is specified either as a variable or as a SAS date
constant.

Example

Data hrd.temp98;
Set hrd.temp;
quarter = qtr(startdate);
Run;

Here,

QTR function extracts the quarter value from the date value variable startdate and save
it in the new variable quarter.

MONTH Function

Extracts the month value from a SAS date value

Syntax:

MONTH (date) ;
where,

date is a SAS date value that is specified either as a variable or as a SAS date
constant.

Example

data hrd.nov99;
set hrd.temp;
mn = month(startdate);
Run;

Here,

Month function extracts the month value from the startdate variable and save it in the
new variable mn.

DAY Function

Extracts the day value from a SAS date value.

Syntax:

DAY (date);
Where,

date is a SAS date value that is specified either as a variable or as a SAS date constant

Example:

data hrd.nov99;
set hrd.temp;
days = day(date);
Run;

Here,

Day function extracts the day value from the date variable and save it in the new
variable days.

WEEKDAY Function

Extract the day of the week from a SAS date value

Syntax:
WEEKDAY (date) ;

where,

date is a SAS date value that is specified either as a variable or as a SAS date constant

Example

data hrd.nov99;
set hrd.temp;
weekday = weekday(date);
Run;

Here,

WEEKDAY function extracts the day of the week value from the date variable and save
it in the new variable weekday.

The WEEKDAY function returns a numeric value from 1 to 7. The values represent the days of the
week.

Value

equals

Day of the Week

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

MDY Function

Creates a SAS date value from numeric values that represent the month, day, and year

Syntax:

MDY ( month , day , year );


Where,

month can be a variable that represents the month, or a number from 1-12

day can be a variable that represents the day, or a number from 1-31

year can be a variable that represents the year, or a number that has 2 or 4 digits.

Example:

data hrd.newtemp (drop=month day year);


set hrd.temp;
Date= mdy(month,day,year);
run;

Here,

A new variable date will be created by combining the values in the variables month,
day and year using the mdy function.

DATE and TODAY Functions

Return the current date from the system clock as a SAS date value

Syntax:

DATE()
TODAY()

These functions require no arguments, but they must still be followed by parentheses.

Example

data hrd.newtemp;
set hrd.temp;
EditDate = date();
run;

Here,

Date function returns the current system date and store it in a new variable editdate.

TIME Function

Return the current time as a SAS time

Syntax:

time ( );

This function require no arguments, but it must still be followed by parentheses

Example:

data hrd.newtemp;
set hrd.temp;
starttime = time();
run;

Here,

TIME function returns the current system time and store it in a new variable starttime.

INTCK Function

Returns the number of time intervals that occur in a given time span

Used to count the passage of days, weeks, months, and so on

Counts intervals from fixed interval beginnings, not in multiples of an interval unit from the from
value

Partial intervals are not counted

For example :

WEEK intervals are counted by Sundays rather than seven-day multiples from the from
argument

MONTH intervals are counted by day 1 of each month

YEAR intervals are counted from 01JAN, not in 365-day multiples

Syntax:

INTCK ('interval , from , to );


Where,

'interval' specifies a character constant or variable. The


value must be one of the following in the box:
from specifies a SAS date, time, or datetime value that
identifies the beginning of the time span
to specifies a SAS date, time, or datetime value that
identifies the end of the time span

DAY

DTMONTH

WEEKDAY

DTWEEK

WEEK

HOUR

TENDAY

MINUTE

SEMIMONTH

SECOND

MONTH
QTR
SEMIYEAR

The type of interval (date, time, or datetime) must match the


type of value in from

YEAR

Example:

Data work.anniv20;
SET flights.mechanics ( KEEP=id lastname firstname hired);
Years= INTCK ( 'year , hired , today() );
If years=20 and Month (hired) = Month (TODAY());
Proc Print Data = work.anniv20;
Run;
Here,

The program identifies mechanics whose 20th year of employment occurs in the
current month

It uses the INTCK function to compare the value of the variable Hired to the date on
which the program is run.

INTNX Function:

Applies multiples of a given interval to a date, time, or datetime value and returns the resulting
value

Used to identify past or future days, weeks, months, and so on

Syntax:
INTNX ( interval , start-from , increment< , 'alignment'> )

Where,

'interval' specifies a character constant or variable

start-from specifies a starting SAS date, time, or datetime value

increment specifies a negative or positive integer that represents time intervals toward the past or
future

'alignment' (optional) forces the alignment of the returned


date to the beginning, middle, or end of the interval.

The type of interval (date, time, or datetime) must match the


type of value in start-from and increment.

When specifying date intervals, the value of the character


constant or variable that is used in interval must be one of the
following in the box:

DAY

DTMONTH

WEEKDAY

DTWEEK

WEEK

HOUR

TENDAY

MINUTE

SEMIMONTH

SECOND

Optional alignment argument lets us specify whether the date


value should be at the beginning, middle, or end of the
interval.

MONTH

When specifying date alignment in the INTNX function, use


the following arguments or their corresponding aliases:

YEAR

BEGINNING
MIDDLE
END
SAMEDAY

B
M
E
S

QTRSEMIYEAR

Example:

SAS Statement

Date Value

MonthX = intnx ('month','01jan95'd,5,'b');

12935 (June 1, 1995)

MonthX = intnx ('month','01jan95'd,5,'m');

12949 (June 15, 1995)

MonthX = intnx ('month','01jan95'd,5,'e');

12964 (June 30, 1995)

The statements above count five months from January, but the returned value depends
on whether alignment specifies the beginning, middle, or end day of the resulting
month.
If alignment is not specified, the beginning day is returned by default.

DATEPART Function

To separate the date portion from date and time value

Syntax:
Datepart (variable);

where,

variable specifies the name of the variable

Example

data hrd.newtemp;
set hrd.temp;
Date = datepart(saledate);
run;

Here,

Datepart function extracts the date portion from saledate, which is in date and time
format, and save it in new variable date .

DATDIF Functions

Calculate the difference in days between two SAS dates

Accept dates that are specified as SAS date values

Syntax:

DATDIF( start_date , end_date , basis ) ;


Where,

start_date specifies the starting date as a SAS date value

end_date specifies the ending date as a SAS date value

basis specifies a character constant or variable that describes how SAS calculates the
date difference.

Example

data hrd.newtemp;
set hrd.temp;
date= DATDIF(sdate,edate,ACT/ACT);
run;

Here,

DATDIF function gives the difference between two dates in number of days.

YRDIF Function

Calculate the difference in years between two SAS dates

Accept start dates and end dates that are specified as SAS date values

Use a basis argument that describes how SAS calculates the date difference

Syntax
YRDIF ( start_date , end_date , basis )
where,

start_date specifies the starting date as a SAS date value

end_date specifies the ending date as a SAS date value

basis specifies a character constant or variable that describes how SAS calculates the
date difference.

Example:

data hrd.newtemp;
set hrd.temp;
date= YRDIF (sdate, edate, ACT/ACT);
run;

Here,

YRDIF function gives the difference between the two dates in number of years.

There are two character strings that are valid for basis in the DATDIF function and four character
strings that are valid for basis in the YRDIF function. These character strings and their meanings
are listed in the table below.

Character String

Meaning

Valid In DATDIF

Valid In YRDIF

'30/360'

specifies a 30 day month and a 360 day


year

yes

yes

'ACT/ACT'

uses the actual number of days or years


between dates

yes

yes

'ACT/360'

uses the actual number of days between


dates in calculating the number of
years (calculated by the number of
days divided by 360)

no

yes

'ACT/365'

uses the actual number of days between


dates in calculating the number of
years (calculated by the number of
days divided by 365)

no

yes

Modifying Character Values with Functions


SCAN Function:

Enables you to separate a character value into words and to return a specified word

Uses delimiters, which are characters that are specified as word separators, to separate a
character string into words

Can specify as many delimiters as needed to correctly separate the character expression

The default delimiters are


blank . < ( + | & ! $ * ) ; ^ - / , %

Syntax:

SCAN (argument , n , delimiters);

where,

argument specifies the character variable or expression to scan

n specifies which word to read

delimiters are special characters that must be enclosed in single quotation marks (' ').

Example:

Data hrd.newtemp ( DROP=name);


Set hrd.temp;
LastName = SCAN (name ,1 , );
FirstName =SCAN (name , 2 , );
MiddleName =SCAN (name ,3 , );
Run;
Here,

It creates three variables to store the employee's first name, middle name & last name
which is stored in a variable called name

SUBSTR Function:

Extract a portion of a character value

Replace the contents of a character value

When the function is on the right side of an assignment statement, the function returns the
requested string

When the function is on the left side of an assignment statement, the function is used to modify
variable values

Syntax:
SUBSTR (argument, position, <n>)

Where,

argument specifies the character variable or expression from which to extract


substring.

position is the character position to start from.

n specifies the number of characters to extract. If n is omitted, all remaining characters


are included in the substring.

Example:

Data work.newtemp (DROP = middlename);


Set hrd.newtemp;
MiddleInitial = Substr ( middlename , 1 ,1 );
Run;
Here,

It extract the first letter of the MiddleName value to create the new variable MiddleInitial.

Data hrd.temp2 (DROP = exchange );


Set hrd.temp;
Exchange= Substr ( phone , 1 , 3 );
If exchange='622' Then Substr (phone , 1 , 3) = '433';
Run;
Here,
It searches the value 622 and replace with 433 in the variable phone

SCAN Function Compared with SUBSTR Function:

SCAN extracts words within a value that is marked by delimiters

The SCAN function is best used when we

know the order of the words in the character value

the starting position of the words varies

the words are marked by some delimiter

SUBSTR extracts a portion of a value by starting at a specified location

SUBSTR function is best used when the exact position of the substring that is to be extracted from
the character value is known

Substring does not need to be marked by delimiters

TRIM Function:

Enables to remove trailing blanks from character values

Whenever the value of a character variable does not match the length of the variable, SAS pads
the value with trailing blanks

So problem occurs while concatenating two variable values.

Trim the values of a variable and then assign these values to a new variable, the trimmed values
are padded with trailing blanks again if the values are shorter than the length of the new variable

Syntax:
TRIM ( argument )

Where,

argument can be any character expression, such as

a character variable: trim ( address )


another character function: trim (left (id) )

Examples:

Data hrd.newtemp ( Drop = address city state zip);


Set hrd.temp;
NewAddress = Trim (address) || ', || TRIM (city) || ', || zip;
Run;
Here,

A new variable called newaddress is created which contain the full address taken from
three different variables called address, city and zip
The trailing spaces of the variables address and city are trimmed using trim function .

CATX Function:

Enables to concatenate character strings, remove leading and trailing blanks, and insert
separators

Returns a value to a variable, or returns a value to a temporary buffer

Results of the CATX function are usually equivalent to those that are produced by a combination
of the concatenation operator and the TRIM and LEFT functions

Syntax:
CATX ( separator , string-1 <,...string-n> )
Where,

separator specifies the character string that is used as a separator between


concatenated strings

string specifies a SAS character string.

Example:

Data hrd.newtemp ( DROP = address city state zip);


Set hrd.temp;
NewAddress = CATX ( ', , address , city , zip);
Run;

Here,

The above program uses CATX function to concatenate the variables address, city &
zip into new variable newaddress and separates each values with comma.

INDEX Function:

Enables to search a character value for a specified string

Searches values from left to right, looking for the first occurrence of the string

Returns the position of the string's first character

If the string is not found, it returns a value of 0

Is case sensitive

Syntax:
INDEX (source ,excerpt )
Where,

source specifies the character variable or expression to search

excerpt specifies a character string that is enclosed in quotation marks ( ').

Example:

Data hrd.datapool;
Set hrd.temp;
If Index ( job , 'word processing ) > 0;
Run;

Here,

It is creating a new dataset with only those observations, in which the function locates
the string word processing and returns a value greater than 0.

FIND Function:

Search for a specific substring of characters within a character string specified

Returns the position of that substring

If the substring is not found in the string, returns a value of 0

Similar to the INDEX function

Syntax:
FIND (string , substring , <modifiers> , < startpos> )
Where,

string specifies a character constant, variable, or expression that will be searched for substrings

substring is a character constant, variable, or expression that specifies the substring of


characters to search for in string

modifiers is a character constant, variable, or expression that specifies one or more modifiers

startpos is an integer that specifies the position at which the search should start and the direction
of the search

If startpos is not specified, FIND starts the search at the beginning of the string and searches
the string from left to right.
If startpos is positive, FIND searches from startpos to the right
If startpos is negative, FIND searches from startpos to the left

The modifiers argument specifies one or more modifiers for the function, as listed below.

The modifier i causes the FIND function to ignore character case during the search. If
this modifier is not specified, FIND searches for character substrings with the same
case as the characters in substring.
The modifier t trims trailing blanks from string and substring

Example:

Data hrd.datapool;
Set hrd.temp;
If Find ( job , word processing , t ) > 0;
Run;

Here,

It Creates a new dataset with only those observations, in which the function locates the
string word processing and returns a value greater than 0.

UPCASE Function:

Converts all letters in a character expression to uppercase

Syntax:
UPCASE (argument)

Where,

argument can be any SAS expression, such as a character variable or constant

Example:

Data hrd.newtemp;
Set hrd.temp;
Job = UPCASE (job) ;
Run;

Here,

The above program converts the values of Job to uppercase and save into a new
dataset.

LOWCASE Function:

Converts all letters in a character expression to lowercase

Syntax:

LOWCASE ( argument )
Where,

argument can be any SAS expression, such as a character variable or constant.

Example:

Data hrd.newtemp;
Set hrd.temp;
Contact = LOWCASE ( contact);
Run;
Here,

The above program converts the values of variable contact to lowercase and store in a
new dataset.

PROPCASE Function:

Converts all words in an argument to proper case (the first letter in each word is capitalized)

First copies a character argument and converts all uppercase letters to lowercase letters

Then converts to uppercase the first character of a word that is preceded by a delimiter

Uses the default delimiters unless specified

Syntax:
PROPCASE (argument , <delimiter (s)> )

Where,

argument can be any SAS expression, such as a character variable or constant

delimiter(s) specifies one or more delimiters that are enclosed in quotation marks. The
default delimiters are blank, forward slash, hyphen, open parenthesis, period, and tab.

Example:

Data hrd.newtemp;
Set hrd.temp;
Contact = PROPCASE(contact);
Run;

Here,

The program converts the values of variable contact into proper case and save into new
dataset.

TRANWRD Function

Replaces or removes all occurrences of a pattern of characters within a character string

Translated characters can be located anywhere in the string

Syntax
TRANWRD (source, target, replacement)

where

source specifies the source string that you want to translate

target specifies the string that SAS searches for in source

replacement specifies the string that replaces target.

target and replacement can be specified as variables or as character strings

Example:

Data work.after;
Set work.before;
name = TRANWRD (name, 'Miss', 'Ms.');
name = TRANWRD (name ,'Mrs. ','Ms.');
Run;

Here,

The above program change all occurrences of Miss or Mrs. to Ms. in the variable name.

Translate Function

Replaces or removes all occurrences of a character within a character string

Syntax
TRANSLATE(source, < to 1-n>, < from 1-n>)
where,

source specifies the source string or name of the variable whose value is to be translated

to 1-n specifies the characters to be replaced with

from 1-n specifies the characters to be replaced

Example:

Data work.after;
Set work.before;
name = TRANSLATE (name, XYZ', ABC.');
Run;

Here,

The above program will replace all the As with X, Bs with Y and Cs with Z in the name
variable.

Modifying Numeric Values with Functions


INT Function

Return the integer portion of a numeric value

Decimal portion of the INT function argument is discarded

Syntax:
INT (argument)

Where,

argument is a numeric variable, constant, or expression.

Example:

Data work.after;
Set work.before;
Intamt = INT(amount);
Run;

Here,

The value of the variable amount is converted to integer and stored in a new variable.

ROUND Function

Round values to the nearest specified unit

If a round-off unit is not provided, a default value of 1 is used

Syntax:
ROUND ( argument , round-off-unit );
Where,

argument is a numeric variable, constant, or expression.

round-off-unit is numeric and nonnegative.

Example:

Data work.after;
Set work.before;
amt = ROUND(amount,.2);
Run;

Here,

value of the variable amount is rounded to 2 decimal points.

SAS System Options

Are used to modify system options

Can place an OPTIONS statement anywhere in a SAS program to change the settings from that
point onwards

OPTIONS statement is global ie: the settings remain in effect until modify them, or end SAS
session

Syntax:

OPTIONS options;
Where,
options specifies one or more system options to be changed

The available system options depend on the host operating system

NUMBER | NONUMBER and DATE | NODATE Options:

Page numbers and dates appear with output

NONUMBER & NODATE Options:


Syntax:
options nonumber nodate;

This suppresses the printing of both page numbers and the date and time in listing output

NUMBER & DATE Options:


Syntax:
options nonumber nodate;

This prints both page numbers and the date&time in listing output

Example:

options nonumber nodate;


proc print data=clinic.admit ;
var id sex age height weight;
where age>=30;
run;
options date;
proc freq data = clinic.diabetes;
where fastgluc >= 300;
tables sex;
run;

Here,

Page numbers and the current date are not displayed in the PROC PRINT output

Page numbers are not displayed in the PROC FREQ output, either, but the date does
appear at the top of the page that contains the PROC FREQ report

Output:

Obs

2
3
4
5
7
8

ID

2462
2501
2523
2539
2552
2555

The SAS System


Sex Age Height

F
F
F
M
F
M

34
31
43
51
32
35

66
61
63
71
67
70

Weight

152
123
137
158
151
173

The SAS System


15:19 Thursday, September 23, 1999

Cumulative Cumulative
Sex Frequency Percent
Frequency Percent
-------------------------------------------------------------------------F
2
25.0
2
25.0
M
6
75.0
8
100.0

PAGENO, PAGESIZE & LINESIZE Options:

PAGENO= option is used to specify the beginning page number for the report

If its not specified, the output is numbered sequentially throughout the SAS session, starting with
page 1

The PAGESIZE= option specifies how many lines each page of output should contain

The LINESIZE= option specifies the width of the print line for the procedure output and log

Observations that do not fit within the line size continue on a different line

Syntax:
options pageno = n pagesize =n linesize = n;
Where,
n is any number

Example:

options pageno =1 pagesize=15 linesize =64 ;


proc print data = clinic.admit ;
run ;

Here,

The output pages are numbered sequentially throughout the SAS session

The page of the output that the PRINT procedure produces contains 15 lines

The length of the observations are no longer than 64 characters

YEARCUTOFF Option:

This option specifies which 100-year span is


used to interpret two-digit year values

When a two-digit year value is read, SAS


interprets it based on a 100-year span that
starts with the YEARCUTOFF= value

The default value of YEARCUTOFF= is


1920
The default value of yearcutoff can be
changed using the YEARCUTOFF= option
The value of the YEARCUTOFF= system
option affects only two-digit year values

Date
Expression

Interpreted As

12/07/41

12/07/1941

18Dec15

18Dec2015

04/15/30

04/15/1930

15Apr95

15Apr1995

Syntax:

options YEARCUTOFF = YEAR;


Where,
YEAR is the first year of the 100 year span

Example:

options yearcutoff =1950 ;


Here,

The 100-year span will be from 1950 to 2049

Using YEARCUTOFF=1950, dates are interpreted as shown below:

Date Expression

Interpreted As

12/07/41

12/07/2041

18Dec15

18Dec2015

04/15/30

04/15/2030

15Apr95

15Apr1995

OBS, FIRSTOBS options:

Used to specify the observations to process from SAS data sets

Can specify either or both of these options as needed

OBS= to specify the last observation to be processed

FIRSTOBS= to specify the first observation to be processed

FIRSTOBS= and OBS= together to specify a range of observations to be processed

Syntax:

OPTIONS FIRSTOBS=n;
OPTIONS OBS=n;
Where,
n is a positive integer
For FIRSTOBS=, n specifies the number of the first observation to process
For OBS=, n specifies the number of the last observation to process
By default, FIRSTOBS=1. The default value for OBS= is MAX

Example:

options firstobs =10 ;


proc print data =sasuser.heart ;
run ;

Assume the data set Sasuser.Heart contains 20 observations.

Here SAS reads the 10th observation of the data set first and reads through the last observation
(for a total of 11 observations)
options firstobs =1 obs =10 ;
proc print data =sasuser.heart ;
run ;

Here SAS reads 1st to 10th observation (for a total of 10 observations)

To reset the number of the last observation to process, you can specify OBS=MAX in the
OPTIONS statement.
options obs = max;

This instructs any subsequent SAS programs in the SAS session to process through the last
observation in the data set being read

Obs and firstobs will be for the duration of current SAS session

Viewing System Options:

OPTIONS procedure can be used to display the current setting of one or all SAS system options
The results are displayed in the log

Syntax:
PROC OPTIONS < option (s ) > ;
RUN;
Where, option(s) specifies how SAS system options are displayed
Example:

proc options;
Run;

This lists all SAS system options, their settings, and a description

To list the value of one particular system option, use the OPTION= option in the PROC OPTIONS
statement as shown below:

proc options option = yearcutoff ;


run ;

If a SAS system option uses an equal sign, such as YEARCUTOFF=, you do not include the
equal sign when specifying the option to OPTION=.

Importing Raw Data Files

Raw Data Files:

Is an external text file whose records contain data values that are organized in fields

Raw data files are non-proprietary and can be read by a variety of software programs

Create Dataset From Raw Data Files:

1.

Reference the SAS library to store the data set.

2.

Write a DATA step program to read the raw data file and create a SAS data set.

To read the raw data file, the DATA step must provide the following instructions to SAS:

the location or name of the external text file

a name for the new SAS data set

a reference that identifies the external file

a description of the data values to be read.

The table below outlines the basic statements that is used to import a Raw data file
To Do This

Use This SAS Statement

Reference a SAS data library

LIBNAME statement

Reference an external file

FILENAME statement

Name a SAS data set

DATA statement

Identify an external file

INFILE statement

Describe data

INPUT statement

Execute the DATA step

RUN statement

List the data

PROC PRINT statement

Execute the final program step

RUN statement

FILENAME statement:

Is used to refer a external file

Before reading raw data, it must be pointed to the location of the external file that contains the data

FILENAME perform the same function as LIBNAME:


They create a reference that temporarily point to a storage location for external data

Syntax:

FILENAME < fileref > path ;


where ,

fileref is a name which associate with an external file containing data

The name must be 1 to 8 characters long

Should begin with a letter or underscore

Contain only letters, numbers, or underscores.

path is the location of the external file in the memory

Example:

filename tests c:\users\ tmill.dat ;

Here,
The FILENAME statement temporarily associates the fileref Tests with the external file that
contains the data

Referencing Aggregate Storage Location:


A FILENAME statement can also be used to associate a fileref with an aggregate storage
location, such as a directory that contains multiple external files

Syntax:

FILENAME <fileref> directoryname ;

Where,

fileref is a name that associate with an external file

The name must be 1 to 8 characters long

Begin with a letter or underscore

Should contain only letters, numbers, or underscores.

directoryname is the full path or location of the directory.

Example:

filename finance c:\users\personal\finances ;

Here,

The FILENAME statement temporarily associates the fileref Finance

with the aggregate storage directory C:\Users\Personal\Finances

Infile Statement:

Is used to indicate the file which contains the Data

Syntax:
INFILE file-specification <options> ;
Where,

file-specification can take the form fileref to name a previously defined file reference or 'filename'
to point to the actual name and location of the file

options describes the input file's characteristics and specifies how it is to be read with the INFILE
statement.

Example:

FILENAME test 'c: \ irs \ personal\refund.dat ';


INFILE test obs =100;
Here,

INFILE statement is used along with FILENAME statement


Test is the file reference which contains the data
Obs= option will import only the first 100 observations from the data

INFILE statement can also specify the complete path of a file instead of using the FILENAME
statement:

Example:

INFILE c: \ irs \ personal \ refund.dat ;

Input Statement:

Describes the fields of raw data to be read and placed into the SAS data set.

Specify the variable names and data types

Syntax:
INPUT variable <$> startcol - endcol . . . ;
where

variable is the SAS variable name assigned to the field

($) identifies the variable type as character (if the variable is numeric, then $ is not specified)

startcol represents the starting column for this variable

endcol represents the ending column for this variable.

Example:

The following code reads data from the file below.


filename exer c : \ users\ exer.dat ;
data exercise ;
infile exer ;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;
run ;

Reading Column input or fixed field raw data files

It is the most common input style

Column input specifies actual column locations for values

In such files the values for each variable are in the same location in all records

When use column input, the data must be:

Standard character or numeric values


In fixed fields

The file below contains fixed fields;

Syntax:

The complete syntax for importing a raw data file from the memory to SAS is:
LIBNAME statement
FILENAME statement

DATA statement
INFILE statement
INPUT statement
RUN statement

Example:

libname libref 'SAS-data-library ;


filename exercise 'c:\users\exer.dat ;
data exer ;
infile exercise ;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;
Run ;

Here,
Libname creates library reference
Filename Reference a external file
Data set name a SAS data set to be created
Infile statement identifies a external file
Input statement describes the data from the external file

Features of Column Input:

It can be used to read character variable values that contain embedded blanks.
input Name $ 1-25;

No placeholder is required for missing data. A blank field is read as missing and does not cause
other fields to be read incorrectly.
input Item $ 1-13 IDnum $ 15-19 Instock 21-22 Backord 24-25;

Fields or parts of fields can be re-read.


input Item $ 1-13 IDnum $ 15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;

Fields do not have to be separated by blanks or other delimiters.


input Item $ 1-13 IDnum $ 14-18 InStock 19-20 BackOrd 21-22;

Standard and Nonstandard Numeric Data:

Standard numeric data values can contain only

numbers
decimal points
numbers in scientific or E-notation (2.3E4, for example)
plus or minus signs

Nonstandard numeric data includes

values that contain special characters, such as percent signs (%), dollar
signs ($), and commas (,)
date and time values
data in fraction, integer binary, real binary, and hexadecimal forms

The file below contains personnel information for a technical writing department of a small
computer manufacturer. The fields contain values for each employee's last name, first name, job
title, and annual salary.

The values for Salary contain commas. The values for Salary are considered to be nonstandard
numeric values.

Column input cannot be used to read these values.

Choosing an Input Style:

Nonstandard data values require an input style that is more flexibility than column input

Formatted input can be used, which combines the features of column input with the ability to
read both standard and nonstandard data.

When raw data that is organized into fixed fields is to be read, use:

Column input to read standard data only

Formatted input to read both standard and nonstandard data.

Reading formatted input:


INPUT Statement:
General Form of the INPUT Statement Using Formatted Input is :

Syntax:

INPUT < column pointer-control > variable informat . ;

Where,

Column pointer-control positions the input pointer on a specified column

variable is the name of the variable that is being created

informat is the special instruction that specifies how SAS reads raw data.

Column pointer controls:


The two column pointer controls are:

@n :- Moves the input pointer to a specific column number

+n :- Moves the input pointer forward to a column number that is relative to the current position

@n Column Pointer Control:

It moves the input pointer to a specific column number

The @ moves the pointer to column n, which is the first column of the field that is being read

The Syntax for Input using @n column pointer control is:

INPUT @n variable informat.;


Where,

variable is the name of the variable that is being created

informat is the special instruction that specifies how SAS reads raw data

Example:

input @9 FirstName $5. @1 LastName $7. @15 JobTitle 3. @19 Salary comma9. ;

Here,

The value for FirstName is read first, starting in column 9.


The lastname is read by taking the @ pointer to the 1st column
The jobtitle and salary is read from column 15 and column 19 respectively

The +n Pointer Control:

It moves the input pointer forward to a column number that is relative to the current position

It moves the pointer forward n columns

The Syntax for Input using +n column pointer control is:


INPUT +n variable informat . ;
Where,

variable is the name of the variable that is being created

informat is the special instruction that specifies how SAS reads raw data

In order to count correctly, it is important to understand where the column pointer control is
located after each data value is read

Example:

input LastName $7. +1 FirstName $5. +5 Salary comma9. @15 JobTitle 3.;

Here,

Because the values for LastName begin in column 1, a column pointer control is not
needed
After LastName is read, the pointer moves to column 8
To start reading FirstName, which begins in column 9, move the column pointer control
ahead 1 column with +1
After reading FirstName, the column pointer moves to column 14
Moved column pointer ahead 5 columns from column 14 to read Salary
@n column pointer control is used to return to column 15 to read jobtitle

INFORMAT

Used to Read data values in certain forms into standard SAS values

It determines how data values are read into a SAS data set

Informats are used to read numeric values that contain letters or other special characters

Informats must be used to read standard / non-standard data (numeric data containing letters or
special characters such as comma).

The numeric value $1,234.00 contains two special characters, a dollar sign ($) and a comma (,).
Informat is used to read the value while removing the dollar sign and comma, and then store the
resulting value as a standard numeric value

$ 1,000,000 is a non-standard numeric data as it contains a dollar sign($) and commas (,). In
order to remove the dollar sign and commas before storing the numeric value 1000000 in a
variable, read the value with COMMA11. Informat

INFORMAT statement:

It specifies the informat for reading the values of the variables that are listed in the INFORMAT
statement

An INFORMAT statement in a DATA step permanently associates an informat with a variable

Standard SAS informats or previously defined user-written informats can be used

A single INFORMAT statement can associate the same informat with several variables, or it can
associate different informats with different variables

If a variable appears in multiple INFORMAT statements, SAS uses the informat that is assigned
last.

Syntax:

INFORMAT <variablename> [$] informat<w>.<d>;


Where,

variablename is the name of the variable for which we are specifying the informat

$ Indicates a character informat; its absence indicates a numeric informat.

Informat names the informat

w Specifies the informat width, which for most informats is the number of columns in the input
data

d Specifies an optional decimal scaling factor in the numeric informats

If w and d values are omitted from the informat, SAS uses default values

Informat can be specified in INPUT statement also

Some important informats:

$w. reads standard character data.

w.d reads standard numeric data

COMMAw.d removes embedded characters

DATEw. reads date values in the form ddmmmyy or ddmmmyyyy

DATETIMEw. reads datetime values in the form ddmmyy hh:mm:ss.ss or ddmmmyyyy


hh:mm:ss.ss

DDMMYYw. reads date values in the form ddmmyy or ddmmyyyy

TIMEw. Reads hours, minutes, and seconds in the form hh:mm:ss.ss

Example:

INFORMAT Birthdate Interview date9. ;

Here,
we are specifying a numeric informat for variables Birthdate & Interview

Using Informat in Input Statement:

Informat is used in input statement to read the data in a particular format from the raw data file

Example:

input @9 FirstName $5. @1 LastName $7. +7 JobTitle 3. @19 Salary comma9.;


Here,

As FirstName and LastName is character in type, $ is used. 5 and 7 are the width of
FirstName and LastName respectively

As jobTitle is a numeric value which is 3 in width, 3. is used to read those values

Comma9. is used to read the Salary value, as it contains non-standard numeric values

COMMAw.d informat is used to read numeric values and to remove embedded


Blanks, commas,dashes , dollar signs, percent signs, right parentheses, left parentheses

Output:

Obs

FirstNa
me

LastName

JobTitle

Salary

DONNY

EVANS

112

29996.63

ALISA

HELMS

105

18567.23

JOHN

HIGGINS

111

25309.00

AMY

LARSON

113

32696.78

MARY

MOORE

112

28945.89

JASON

POWELL

103

35099.50

JUDY

RILEY

111

25309.00

Format

A Format is an instruction that SAS uses to write data values

It is used to control the written appearance of data values

In some cases, used to group data values together for analysis

SAS software offers a variety of character, numeric, and date and time formats

Can also create and store formats

Can permanently assign a format to a variable in a SAS data set

Can temporarily specify a format in a PROC step to determine the way the data values appear in
output

Syntax:

FORMAT <variablename> [<$>] format<w>.<d>;


Where,

variablename specifies the name of the variable for which the format is used

$ Indicates a character format; its absence indicates a numeric format.

Format names the format

w Specifies the format width, which for most formats is the number of columns in the
input data.

d Specifies an optional decimal scaling factor in the numeric formats.

Formats always contain a period (.) as a part of the name.

If omit w and d values from the format, SAS uses default values

The d value specified with format tells SAS to display that many decimal places, regardless of
how many decimal places are in the data

Formats never change or truncate the internally stored data values.

If the format width is too narrow to represent a value, SAS tries to squeeze the value into the
space available

Character formats truncate values on the right

Numeric formats sometimes revert to the BESTw.d format

SAS prints asterisks if adequate width is not specified

When a FORMAT statement is used in a procedure step, the formats that are associated with the
variables remain in the effect only for that particular step. That is the format association is
temporary and not permanent

Some Important Formats:

$w. writes standard character data.

w.d writes standard numeric data

COMMAw.d writes numeric values with commas and decimal points

DATEw. writes date values in the form ddmmmyy or ddmmmyyyy

DATETIMEw.d writes datetime values in the form ddmmmyy hh:mm:ss.ss or ddmmmyyyy


hh:mm:ss.ss

DDMMYYw. writes date values in the form ddmmyy or ddmmyyyy

TIMEw.d writes time values as hours, minutes, and seconds in the form hh:mm:ss.ss

Example:

To display the value 1234 as $1234.00 in a report, use the DOLLAR8.2 format

The WORDS22. format, which converts numeric values to their equivalent in words, writes the
numeric value 692 as six hundred ninety-two

Reading Variable-Length Records (Using PAD option):


Variable-Length Records:

Files that have a variable-length record format. They have an end-of-record marker after the last
field in each record

Variable-length records have values that are shorter than others or that are missing

This can cause problems when trying to read the raw data into SAS data set

Example:

input Dept $ 1-11 @13 Receipts comma8.;

Here,
The asterisk symbolizes the end-of-record marker and is not part of the data
INPUT statement specifies a field width of 8 columns for Receipts
In the third record, the input pointer encounters an end-of-record marker before the 8th
column
Input pointer moves down to the next record in an attempt to find a value for Receipts
However, GRILL is a character value, and Receipts is a numeric variable. Thus, an
invalid data error occurs, and Receipts is set to missing

The PAD Option:

When using column input or formatted input to read fixed-field data in variable-length records,
PAD option can be used to avoid problems

The PAD option is used in the INFILE statement

It PADs each record with blanks so that all data lines have the same length

Example:

infile receipts pad;

Here,
The pad option pads all the values of the variable receipts with spaces

Das könnte Ihnen auch gefallen