Sie sind auf Seite 1von 26

Chapter Contents Previous Next

The SQL Procedure

Overview
The SQL procedure implements Structured Query Language (SQL) for the SAS
System. SQL is a standardized, widely used language that retrieves and updates data
in tables and views based on those tables.

The SAS System's SQL procedure enables you to

• retrieve and manipulate data that are stored in tables or views.


• create tables, views, and indexes on columns in tables.
• create SAS macro variables that contain values from rows in a query's result.
• add or modify the data values in a table's columns or insert and delete rows.
You can also modify the table itself by adding, modifying, or dropping
columns.
• send DBMS-specific SQL statements to a database management system
(DBMS) and to retrieve DBMS data.

PROC SQL Input and Output summarizes the variety of source material that you can
use with PROC SQL and what the procedure can produce.

PROC SQL Input and Output

What Are PROC SQL Tables?


A PROC SQL table is synonymous with a SAS data file and has a member type of
DATA. You can use PROC SQL tables as input into DATA steps and procedures.

You create PROC SQL tables from SAS data files, from SAS data views, or from
DBMS tables using PROC SQL's Pass-Through Facility. The Pass-Through Facility is
described in Connecting to a DBMS Using the SQL Procedure Pass-Through
Facility .

In PROC SQL terminology, a row in a table is the same as an observation in a SAS


data file. A column is the same as a variable.
What Are Views?
A SAS data view defines a virtual data set that is named and stored for later use. A
view contains no data but describes or defines data that are stored elsewhere. There
are three types of SAS data views:

• PROC SQL views


• SAS/ACCESS views
• DATA step views.

You can refer to views in queries as if they were tables. The view derives its data
from the tables or views that are listed in its FROM clause. The data accessed by a
view are a subset or superset of the data in its underlying table(s) or view(s).

A PROC SQL view is a SAS data set of type VIEW created by PROC SQL. A PROC
SQL view contains no data. It is a stored query expression that reads data values from
its underlying files, which can include SAS data files, SAS/ACCESS views, DATA
step views, other PROC SQL views, or DBMS data. When executed, a PROC SQL
view's output can be a subset or superset of one or more underlying files.

SAS/ACCESS views and DATA step views are similar to PROC SQL views in that
they are both stored programs of member type VIEW. SAS/ACCESS views describe
data in DBMS tables from other software vendors. DATA step views are stored
DATA step programs.

You can update data through a PROC SQL or SAS/ACCESS view with certain
restrictions. See Updating PROC SQL and SAS/ACCESS Views .

You can use all types of views as input to DATA steps and procedures.

Note: In this chapter, the term view collectively refers to PROC SQL views, DATA
step views, and SAS/ACCESS views, unless otherwise noted.

SQL Procedure Coding Conventions


Because PROC SQL implements Structured Query Language, it works somewhat
differently from other base SAS procedures, as described here:

• You do not need to repeat the PROC SQL statement with each SQL statement.
You need only to repeat the PROC SQL statement if you execute a DATA
step or another SAS procedure between statements.
• SQL procedure statements are divided into clauses. For example, the most
basic SELECT statement contains the SELECT and FROM clauses. Items
within clauses are separated with commas in SQL, not with blanks as in the
SAS System. For example, if you list three columns in the SELECT clause,
the columns are separated with commas.
• The SELECT statement, which is used to retrieve data, also outputs the data
automatically unless you specify the NOPRINT option in the PROC SQL
statement. This means you can display your output or send it to a list file
without specifying the PRINT procedure.
• The ORDER BY clause sorts data by columns. In addition, tables do not need
to be presorted by a variable for use with PROC SQL. Therefore, you do not
need to use the SORT procedure with your PROC SQL programs.
• A PROC SQL statement runs when you submit it; you do not have to specify a
RUN statement. If you follow a PROC SQL statement with a RUN statement,
the SAS System ignores the RUN statement and submits the statements as
usual.

PROC SQL <option(s)>;


ALTER TABLE table-name
<constraint-clause> <,constraint-clause>...>;
<ADD column-definition <,column-definition>...>
<MODIFY column-definition

<,column-definition>...>
<DROP column <,column>...>;
CREATE <UNIQUE> INDEX index-name
ON table-name (column <,column>...);
CREATE TABLE table-name (column-definition <,column-definition>...);
(column-specification , ...<constraint-specification > ,...) ;
CREATE TABLE table-name LIKE table-name;
CREATE TABLE table-name AS query-expression
<ORDER BY order-by-item <,order-by-item>...>;
CREATE VIEW proc-sql-view AS query-expression
<ORDER BY order-by-item <,order-by-item>...>;
<USING libname-clause<, libname-clause>...>;
DELETE
FROM table-name|proc-sql-view |sas/access-view <AS alias>
<WHERE sql-expression>;
DESCRIBE TABLEtable-name<,table-name>... ;
DESCRIBE TABLE CONSTRAINTS table-name <, table-name>... ;
DESCRIBE VIEW proc-sql-view <,proc-sql-view>... ;
DROP INDEX index-name <,index-name>...

FROM table-name;
DROP TABLE table-name <,table-name>...;
DROP VIEW view-name <,view-name>...;
INSERT INTO table-name|sas/access-view|proc-sql-view
<(column<,column>...) >
SET column=sql-expression
<,column=sql-expression>...
<SET column=sql-expression
<,column=sql-expression>...>;
INSERT INTO table-name|sas/access-view|proc-sql-
view<(column<,column>...)>
VALUES (value<,value>...)
<VALUES (value <,value>...)>...;
INSERT INTO table-name|sas/access-view|proc-sql-view

<(column<,column>...)> query-expression;
RESET <option(s)>;
SELECT <DISTINCT> object-item <,object-item>...
<INTO :macro-variable-specification

<, :macro-variable-specification>...>
FROM from-list
<WHERE sql-expression>
<GROUP BY group-by-item

<,group-by-item>...>
<HAVING sql-expression>
<ORDER BY order-by-item

<,order-by-item>...>;
UPDATE table-name|sas/access-view|proc-sql-view <AS alias>
SET column=sql-expression

<,column=sql-expression>...
<SETcolumn=sql-expression
<,column=sql-expression>...>
<WHERE sql-expression>;
VALIDATEquery-expression;

To connect to a DBMS and send it a DBMS-specific nonquery SQL statement, use


this form:

PROC SQL;
<CONNECT TO dbms-name <AS alias><
<(connect-statement-argument-1=value
...<connect-statement-argument-n=value>)>>

<(dbms-argument-1=value
...<dbms-argument-n=value>)>>;
EXECUTE (dbms-SQL-statement)
BY dbms-name|alias;
<DISCONNECT FROM dbms-name|alias;>
<QUIT;>

To connect to a DBMS and query the DBMS data, use this form:

PROC SQL;
<CONNECT TO dbms-name <AS alias><
<(connect-statement-argument-1=value
...<connect-statement-argument-n=value>)>>

<(dbms-argument-1=value
...<dbms-argument-n=value>)>>;
SELECT column-list
FROM CONNECTION TO dbms-name|alias

(dbms-query)
optional PROC SQL clauses;
<DISCONNECT FROM dbms-name|alias;>
<QUIT;>

To do this Use this statement

Modify, add, or drop columns ALTER TABLE

CONNECT

Establish a connection with a DBMS

Create an index on a column CREATE INDEX

Create a PROC SQL table CREATE TABLE

Create a PROC SQL view CREATE VIEW

Delete rows DELETE

Display a definition of a table or view DESCRIBE

Terminate the connection with a DBMS DISCONNECT

Delete tables, views, or indexes DROP

Send a DBMS-specific nonquery SQL EXECUTE


statement to a DBMS
Add rows INSERT

Reset options that affect the procedure RESET


environment without restarting the procedure

Select and execute rows SELECT

Query a DBMS CONNECTION TO

Modify values UPDATE

Verify the accuracy of your query VALIDATE

PROC SQL Statement


PROC SQL <option(s)>;
To do this Use this option

Control output

Double-space the report DOUBLE|NODOUBLE

Write a statement to the SAS log that FEEDBACK|NOFEEDBACK


expands the query

Flow characters within a column FLOW|NOFLOW

Include a column of row numbers NUMBER|NONUMBER

Specify whether PROC SQL prints the PRINT|NOPRINT


query's result

Specify whether PROC SQL should SORTMSG|NOSORTMSG


display sorting information

Specify a collating sequence SORTSEQ=

Control execution

Allow PROC SQL to use names other DQUOTE=


than SAS names

Specify whether PROC SQL should ERRORSTOP|NOERRORSTOP


stop executing after an error

Specify whether PROC SQL should EXEC|NOEXEC


execute statements
Restrict the number of input rows INOBS=

Restrict the number of output rows OUTOBS=

Restrict the number of loops LOOPS=

Specify whether PROC SQL prompts PROMPT|NOPROMPT


you when a limit is reached with the
INOBS=, OUTOBS=, or LOOPS=
options

Specify whether PROC SQL writes STIMER|NOSTIMER


timing information to the SAS log

Specify how PROC SQL handles UNDO_POLICY=


updates when there is an interruption

Options
DOUBLE|NODOUBLE
double-spaces the report.
Default: NODOUBLE
Featured in: Combining Two Tables
DQUOTE=ANSI|SAS
specifies whether PROC SQL treats values within double-quotes as variables
or strings. With DQUOTE=ANSI, PROC SQL treats a quoted value as a
variable. This enables you to use the following as table names, column names,
or aliases:

• reserved words such as AS, JOIN, GROUP, and so on.


• DBMS names and other names not normally permissible in
SAS.

The quoted value can contain any character.

With DQUOTE=SAS, values within quotes are treated as strings.

Default: SAS
ERRORSTOP|NOERRORSTOP
specifies whether PROC SQL stops executing if it encounters an error. In a
batch or noninteractive session, ERRORSTOP instructs PROC SQL to stop
executing the statements but to continue checking the syntax after it has
encountered an error.

NOERRORSTOP instructs PROC SQL to execute the statements and to


continue checking the syntax after an error occurs.
Default: NOERRORSTOP in an interactive SAS session; ERRORSTOP
in a batch or noninteractive session
Interaction: This option is useful only when the EXEC option is in effect.
Tip: ERRORSTOP has an effect only when SAS is running in the
batch or noninteractive execution mode.
Tip: NOERRORSTOP is useful if you want a batch job to continue
executing SQL procedure statements after an error is
encountered.
EXEC | NOEXEC
specifies whether a statement should be executed after its syntax is checked
for accuracy.
Default: EXEC
Tip: NOEXEC is useful if you want to check the syntax of your SQL
statements without executing the statements.
See also: ERRORSTOP option
FEEDBACK|NOFEEDBACK
specifies whether PROC SQL displays a statement after it expands view
references or makes certain transformations on the statement.

This option expands any use of an asterisk (for example, SELECT *) into the
list of qualified columns that it represents. Any PROC SQL view is expanded
into the underlying query, and parentheses are shown around all expressions to
further indicate their order of evaluation.

Default: NOFEEDBACK
FLOW<=n <m>>|NOFLOW
specifies that character columns longer than n are flowed to multiple lines.
PROC SQL sets the column width at n and specifies that character columns
longer than n are flowed to multiple lines. When you specify FLOW=n m,
PROC SQL floats the width of the columns between these limits to achieve a
balanced layout. FLOW is equivalent to FLOW=12 200.
Default: NOFLOW
INOBS=n
restricts the number of rows (observations) that PROC SQL retrieves from any
single source.
Tip: This option is useful for debugging queries on large tables.
LOOPS=n
restricts PROC SQL to n iterations through its inner loop. You use the number
of iterations reported in the SQLOOPS macro variable (after each SQL
statement is executed) to discover the number of loops. Set a limit to prevent
queries from consuming excessive computer resources. For example, joining
three large tables without meeting the join-matching conditions could create a
huge internal table that would be inefficient to execute.
See also: Using Macro Variables Set by PROC SQL
NODOUBLE
See DOUBLE|NODOUBLE .
NOERRORSTOP
See ERRORSTOP|NOERRORSTOP .
NOEXEC
See EXEC|NOEXEC .
NOFEEDBACK
See FEEDBACK|NOFEEDBACK .
NOFLOW
See FLOW|NOFLOW .
NONUMBER
See NUMBER|NONUMBER .
NOPRINT
See PRINT|NOPRINT .
NOPROMPT
See PROMPT|NOPROMPT .
NOSORTMSG
See SORTMSG|NOSORTMSG .
NOSTIMER
See STIMER|NOSTIMER .
NUMBER|NONUMBER
specifies whether the SELECT statement includes a column called ROW,
which is the row (or observation) number of the data as they are retrieved.
Default: NONUMBER
Featured in: Joining Two Tables
OUTOBS=n
restricts the number of rows (observations) in the output. For example, if you
specify OUTOBS=10 and insert values into a table using a query-expression,
the SQL procedure inserts a maximum of 10 rows. Likewise, OUTOBS=10
limits the output to 10 rows.
PRINT|NOPRINT
specifies whether the output from a SELECT statement is printed.
Default: PRINT
Tip: NOPRINT is useful when you are selecting values from a table
into macro variables and do not want anything to be displayed.
PROMPT|NOPROMPT
modifies the effect of the INOBS=, OUTOBS=, and LOOPS= options. If you
specify the PROMPT option and reach the limit specified by INOBS=,
OUTOBS=, or LOOPS=, PROC SQL prompts you to stop or continue. The
prompting repeats if the same limit is reached again.
Default: NOPROMPT
SORTMSG|NOSORTMSG
Certain operations, such as ORDER BY, may sort tables internally using
PROC SORT. Specifying SORTMSG requests information from PROC SORT
about the sort and displays the information in the log.
Default: NOSORTMSG
SORTSEQ=sort-table
specifies the collating sequence to use when a query contains an ORDER BY
clause. Use this option only if you want a collating sequence other than your
system's or installation's default collating sequence.
See also: SORTSEQ= option in SAS Language Reference: Dictionary.
STIMER|NOSTIMER
specifies whether PROC SQL writes timing information to the SAS log for
each statement, rather than as a cumulative value for the entire procedure. For
this option to work, you must also specify the SAS system option STIMER.
Some operating environments require that you specify this system option
when you invoke SAS. If you use the system option alone, you receive timing
information for the entire SQL procedure, not on a statement-by-statement
basis.
Default: NOSTIMER
UNDO_POLICY=NONE|OPTIONAL|REQUIRED
specifies how PROC SQL handles updated data if errors occur while you are
updating data. You can use UNDO_POLICY= to control whether your
changes will be permanent:
NONE
keeps any updates or inserts.
OPTIONAL
reverses any updates or inserts that it can reverse reliably.
REQUIRED
undoes all inserts or updates that have been done to the point of the error. In
some cases, the UNDO operation cannot be done reliably. For example, when
a program uses a SAS/ACCESS view, it may not be able to reverse the effects
of the INSERT and UPDATE statements without reversing the effects of other
changes at the same time. In that case, PROC SQL issues an error message
and does not execute the statement. Also, when a SAS data set is accessed
through a SAS/SHARE server and is opened with the data set option
CNTLLEV=RECORD, you cannot reliably reverse your changes.

This option may enable other users to update newly inserted rows. If an error
occurs during the insert, PROC SQL can delete a record that another user
updated. In that case, the statement is not executed, and an error message is
issued.

Default: REQUIRED

ALTER TABLE Statement


Adds columns to, drops columns from, and changes column attributes in an
existing table. Adds, modifies, and drops integrity constraints from an existing
table.
Restriction: You cannot use any type of view in an ALTER TABLE statement.
Restriction: You cannot use ALTER TABLE on a table that is accessed via an
engine that does not support UPDATE processing.
Featured in: Updating Data in a PROC SQL Table
ALTER TABLE table-name
<constraint-clause> <, constraint-clause>...>;
<ADD column-definition <,column-definition>...>
<MODIFY column-definition

<,column-definition>...>
<DROP column <,column>...>;
where each constraint-clause is one of the following:
ADD <CONSTRAINT constraint-name> constraint
DROP CONSTRAINT constraint-name
DROP FOREIGN KEY constraint-name [Note: This is a DB2 extension.]
DROP PRIMARY KEY [Note: This is a DB2 extension.]

where constraint can be one of the following:

NOT NULL (column)


CHECK (WHERE-clause)
PRIMARY KEY (columns)
DISTINCT (columns)
UNIQUE (columns)
FOREIGN KEY (columns)
REFERENCES table-name
<ON DELETE referential-action > <ON UPDATE referential-action>

Arguments
column
names a column in table-name.
column-definition
See column-definition .
constraint-name
specifies the name for the constraint being specified.
referential-action
specifies the type of action to be performed on all matching foreign key
values.
RESTRICT
occurs only if there are matching foreign key values. This is the default
referential action.
SET NULL
sets all matching foreign key values to NULL.
table-name
refers to the name of table containing the primary key referenced by the
foreign key.
WHERE-clause
specifies a SAS WHERE-clause.
Specifying Initial Values of New Columns
When the ALTER TABLE statement adds a column to the table, it initializes the
column's values to missing in all rows of the table. Use the UPDATE statement to add
values to the new column(s).

Changing Column Attributes


If a column is already in the table, you can change the following column attributes
using the MODIFY clause: length, informat, format, and label. The values in a table
are either truncated or padded with blanks (if character data) as necessary to meet the
specified length attribute.

You cannot change a character column to numeric and vice versa. To change a
column's data type, drop the column and then add it (and its data) again, or use the
DATA step.

Note: You cannot change the length of a numeric column with the ALTER TABLE
statement. Use the DATA step instead.

Renaming Columns
To change a column's name, you must use the SAS data set option RENAME=. You
cannot change this attribute with the ALTER TABLE statement. RENAME= is
described in the section on SAS data set options in SAS Language Reference:
Dictionary.

Indexes on Altered Columns


When you alter the attributes of a column and an index has been defined for that
column, the values in the altered column continue to have the index defined for them.
If you drop a column with the ALTER TABLE statement, all the indexes (simple and
composite) in which the column participates are also dropped. See CREATE INDEX
Statement for more information on creating and using indexes.

Integrity Constraints
Use ALTER TABLE to modify integrity constraints for existing tables. Use the
CREATE TABLE statement to attach integrity constraints to new tables. For more
information on integrity constraints, see the section on SAS files in SAS Language
Reference: Concepts.

SQL with SAS


— filed under: Handout

Using PROC SQL, the SAS structured query language procedure to insert, delete,
modify and retrieve information from SAS data tables

SAS does not allow SQL statements to be used in the DATA step. However, SAS
provides PROC SQL which allows operations on SAS datasets with SQL. The SAS
terms dataset, observation, and variable respectively correspond to the SQL terms
table, row, and column.

The syntax of PROC SQL is:

PROC SQL < option(s)> ;


ALTER alter-statement;
CREATE create-statement;
DELETE delete-statement;
DESCRIBE describe-statement;
DROP drop-statement;
INSERT insert statement;
RESET < option < option &gt...> ;
SELECT select-statement;
UPDATE update-statement;
VALIDATE validate-statement;

ALTER changes the attibutes of columns or adds or drops columns


CREATE creates tables
DELETE removes rows from a table
DESCRIBE display a view definition
DROP deletes the table
INSERT inserts a new row into the table
RESET allows options to be changed during execution
SELECT retrives data and outputs results
UPDATE modifies columns in existing rows
VALIDATE checks a query-expression for syntactic accuracy.

Note that the DELETE statement has exactly the same function as the
DELETE statement used in the DATA step. However, DROP in PROC SQL
will delete the entire table(dataset) rather the columns(variables).

Some of options which may be useful: INOBS=n restricts the number of rows
processed from a source
NUMBER|NONUMBER includes a column with the row number
PRINT|NOPRINT turns printing for SELECT statements on or off

The Select Statement


Use the SELECT statement to select or create columns(variables) from
tables(datasets)
PROC SQL;
SELECT name, team, league, no_hits, no_bb, no_atbat
FROM sasuser.baseball;
League
at the Times
at
Team at the end of Hits in Walks in Bat
in
Player's Name end of 1986 1986 1986 1986
1986
---------------------------------------------------------------------
----
Aldrete, Mike SanFrancisco National 54 33
216
Allanson, Andy Cleveland American 66 14
293
Almon, Bill Pittsburgh National 43 30
196
Anderson, Dave LosAngeles National 53 22
216
Armas, Tony Boston American 112 24
425
Ashby, Alan Houston National 81 39
315
Backman, Wally NewYork National 124 36
387
Baines, Harold Chicago American 169 38
570
...
The equivalent base SAS statements to generate this output are: proc print label
data=sasuser.baseball noobs split='*'; var name team league no_hits
no_bb no_atbat; run;The SELECT statement in SQL functions somewhat like a
"SET" and
"KEEP" statement in base SAS. The SELECT in SQL should not be confused
with the SELECT code in base SAS. To select all players who play for
Cleveland:PROC SQL; SELECT name, team, league, no_hits, no_bb,
no_atbat FROM sasuser.baseball WHERE team eq 'Cleveland';
League
at the Times
at
Team at the end of Hits in Walks in Bat
in
Player's Name end of 1986 1986 1986 1986
1986
---------------------------------------------------------------------
---
Allanson, Andy Cleveland American 66 14
293
Bando, Chris Cleveland American 68 22
254
Bernazard, Tony Cleveland American 169 53
562
Butler, Brett Cleveland American 163 70
587
Carter, Joe Cleveland American 200 32
663
Castillo, Carmen Cleveland American 57 9
205
Franco, Julio Cleveland American 183 32
599
...

Note that the WHERE statement is a continuation of the SELECT.


There is only one semicolon at the end of the entire statement.

To select players who drew between 80 and 100 base on balls:


PROC SQL; SELECT name, team, league, no_hits, no_bb FROM
sasuser.baseball WHERE no_bb BETWEEN 80 AND 100;

League
at the
Team at the end of Hits in Walks in
Player's Name end of 1986 1986 1986 1986
--------------------------------------------------------------
Brett, George KansasCity American 128 80
Davis, Chili SanFrancisco National 146 84
Doran, Bill Houston National 152 81
Downing, Brian California American 137 90
Evans, Darrell Detroit American 122 91
Evans, Dwight Boston American 137 97
...

To select Pittsburgh players who drew between 80 and 100 base on balls:
PROC SQL; SELECT name, team, league, no_hits, no_bb FROM
sasuser.baseball WHERE (no_hits BETWEEN 80 AND 100) AND (team EQ
'Pittsburgh');

League
at the
Team at the end of Hits in Walks in
Player's Name end of 1986 1986 1986 1986
--------------------------------------------------------------
Bonds, Barry Pittsburgh National 92 65
Orsulak, Joe Pittsburgh National 100 28
Other relations used in the WHERE statement in both PROC SQL and base SAS
include: CONTAINS - will find all rows(observations) where the string is contained
in the variable. Example: WHERE name CONTAINS 'Mike' will find all
players with "Mike" in their name.

LIKE - a pattern matching function where one or more "wild cards"


can be substituted in the search.

Example: WHERE team LIKE 'P%' will find all rows where the
team name starts with the letter P.
WHERE name LIKE 'D_n' would find all rows where the
player name started with 'D', ended with 'n' and
had one letter in between ("Dan" and "Don" would
match, but "Dean" would not).
SQL allows basic functions to be performed on columns.
Find the total number of hits, the mean number of base on balls, the number of non-
missing at bats, and the number of non-missing salaries.

PROC SQL; SELECT SUM(no_hits) as tot_hits, MEAN(no_bb) as ave_bb,


COUNT(no_atbat) as nm_ab, COUNT(salary) as nm_pay FROM
sasuser.baseball;

TOT_HITS AVE_BB NM_AB NM_PAY -------------------------------------- 33294


39.85714 322 263

The base SAS code which produces the same results:

proc means data=sasuser.baseball sum mean n; var no_bb no_atbat salary no_hits;
output out=mtemp n=nm_bb nm_ab nm_pay nm_hits mean=ave_bb ave_ab ave_pay
ave_hits sum=tot_bb tot_ab tot_pay tot_hits; proc print data=mtemp; var tot_hits
ave_bb nm_ab nm_pay; run;

The following statistics can be performed by the SQL procedure:

AVE, MEAN means or average COUNT, FREQ, N number of nonmissing


values CSS corrected sum of squares CV coefficient of variation (percent)
MAX largest value MIN smallest value NMISS number of missing values
PRT probability of a greater absolute value of Student's tSTD standard
deviation STDERR standard error of the mean SUM sum of values

Note that the functions used in SQL do not work the same at the functions
in base SAS. In base SAS, the above functions operate on variable in one
observation, i.e. multiple columns in one row. The functions in SQL
work on one column across multiple rows. The functions in SQL perform
the same task as PROCs in base SAS.

PROC SQL allows use of formats and labels.


PROC SQL; SELECT SUM(no_hits) as tot_hits format=comma7.0
label='Total Hits', MEAN(no_bb) as ave_bb format=7.2 label='Mean Base
on Balls', COUNT(no_atbat) as nm_ab format=comma7.0 label='Frequency
At Bat', COUNT(salary) as nm_pay format=comma7.0 label='Non-missing
salary' FROM sasuser.baseball;
Mean
Total Base on Frequency Non-missing
Hits Balls At Bat salary
----------------------------------------
33,294 39.86 322 263
To sum by a classification variable, add the GROUP BY clause.PROC SQL;
SELECT team, league, SUM(no_hits) as tot_hits format=comma7.0
label='Total Hits', MEAN(no_bb) as ave_bb format=7.2 label='Mean Base
on Balls', COUNT(no_atbat) as nm_ab format=comma7.0 label='Frequency
At Bat', COUNT(salary) as nm_pay format=comma7.0 label='Non-missing
salary' FROM sasuser.baseball GROUP BY team, league;
League
at the Mean
Team at the end of Total Base on Frequency Non-missing
end of 1986 1986 Hits Balls At Bat salary
----------------------------------------------------------------
Atlanta National 1,055 40.36 11 8
Baltimore American 1,336 35.33 15 10
Boston American 1,378 53.40 10 8
California American 1,324 49.85 13 10
Chicago American 1,257 34.38 13 11
Chicago National 1,188 39.82 11 11
Cincinnati National 1,203 42.08 12 10
Cleveland American 1,564 35.92 12 11
...

To generate this with base SAS:


proc means data=sasuser.baseball nway noprint SUM MEAN N ;
class team league;
var no_hits no_bb no_atbat salary;
output out=mtemp SUM=tot_hits tot_bb tot_ab tot_pay
MEAN=ave_hits ave_bb ave_ab ave_pay

N=nm_hits nm_bb nm_ab nm_pay;


run;

data temp;
keep team league tot_hits ave_bb nm_ab nm_pay;
set mtemp;
run;

proc sort data=temp;


by team league;
run;

proc print;
/* format and label statements omitted */
run;

SQL allows the merging of the summary statistics back into each
row.
PROC SQL;

SELECT name, team, no_hits, no_bb,


SUM(no_hits) as tot_hits format=comma7.0 label='Total Hits',
MEAN(no_bb) as ave_bb format=7.2 label='Mean Base on Balls'
FROM sasuser.baseball
GROUP BY team, league;
Mea
n
Team at the Hits in Walks in Total Base
on
Player's Name end of 1986 1986 1986 Hits
Balls
---------------------------------------------------------------------
-
Horner, Bob Atlanta 141 52 1,055
40.36
Oberkfell, Ken Atlanta 136 83 1,055
40.36
Moreno, Omar Atlanta 84 21 1,055
40.36
Murphy, Dale Atlanta 163 75 1,055
40.36
Hubbard, Glenn Atlanta 94 66 1,055
40.36
Harper, Terry Atlanta 68 29 1,055
40.36
Virgil, Ozzie Atlanta 80 63 1,055
40.36
Simmons, Ted Atlanta 32 12 1,055
40.36

The corresponding base SAS code

proc means data=sasuser.baseball nway noprint SUM MEAN N ;


class team league;
var no_hits no_bb no_atbat salary;
output out=mtemp SUM=tot_hits tot_bb tot_ab tot_pay
MEAN=ave_hits ave_bb ave_ab ave_pay
N=nm_hits nm_bb nm_ab nm_pay;
run;

data temp;
keep team league tot_hits ave_bb;
set mtemp;

proc sort data=temp; by team league;

data individ;
keep name team league no_hits no_bb;
set sasuser.baseball;

proc sort data=individ; by team league;

data temp2;
merge individ temp; by team league;

proc print;
/* format and label statements omitted */
run;

Variables can be created in the SELECT statement without using the


summary functions.To calculate the "On Base Percentage" for an individual,
we need to add the base on balls to the number of hits, and divide this
total by the sum of the number of at bats plus the base on balls. To calculate
this number for the team, we need to use the SUM function in SQL to calculate
the numerator and denominator for each team. If the result of the calculation
was not stored in a variable with 'AS', the value would be calculated and
displayed, but with no column heading.
PROC SQL;

SELECT name, team, (no_hits+no_bb)/(no_atbat+no_bb) AS obp


format=4.3 label='Individual On Base Percentage',
SUM(no_hits+no_bb)/SUM(no_atbat+no_bb) AS team_obp
format=4.3 label='Team On Base Percentage'
FROM sasuser.baseball
GROUP BY team, league;
Individual Team On
Team at the On Base Base
Player's Name end of 1986 Percentage Percentage
--------------------------------------------------------
Moreno, Omar Atlanta .276 .325
Harper, Terry Atlanta .330 .325
Horner, Bob Atlanta .339 .325
Hubbard, Glenn Atlanta .338 .325
Thomas, Andres Atlanta .269 .325
...

The base SAS code:

data individ;
keep name team league no_hits no_bb no_atbat obp;
set sasuser.baseball;
obp=(no_hits+no_bb)/(no_atbat+no_bb);
proc summary data=sasuser.baseball nway;
class team league;
var no_hits no_bb no_atbat;
output out=team sum=no_hits no_bb no_atbat;
data teamobp;
keep team league team_obp;
set team;
team_obp = (no_hits+no_bb)/(no_atbat+no_bb);
proc sort data=individ; by team league;
proc sort data=teamobp; by team league;
data final;
keep name team league obp team_obp;
merge individ teamobp; by team league;
proc print;
Calculated variables can be used for selection.

To select players with an On Base Percentage greater than .380:

PROC SQL; SELECT name, team, (no_hits+no_bb)/(no_atbat+no_bb) AS obp


format=4.3 label='Individual On Base Percentage',
SUM(no_hits+no_bb)/SUM(no_atbat+no_bb) AS team_obp format=4.3
label='Team On Base Percentage' FROM sasuser.baseball WHERE
CALCULATED obp > .380 GROUP BY team, league;
Individual Team On
Team at the On Base Base
Player's Name end of 1986 Percentage Percentage
--------------------------------------------------------
Murray, Eddie Baltimore .400 .400
Boggs, Wade Boston .455 .421
Rice, Jim Boston .385 .421
Daniels, Kal Cincinnati .394 .394
Grubb, Johnny Detroit .412 .412
...
*** This table contains an error. The team On Base Percentage has now been
calculated using only the individuals with obp > .380 The WHERE
clause should be replaced by HAVING. Notice also that the keyword
"CALCULATED" needs to preceed any variable name that was not originally
in the table.
PROC SQL; SELECT name, team, (no_hits+no_bb)/(no_atbat+no_bb) AS obp
format=4.3 label='Individual On Base Percentage',
SUM(no_hits+no_bb)/SUM(no_atbat+no_bb) AS team_obp format=4.3
label='Team On Base Percentage' FROM sasuser.baseball GROUP BY team,
league HAVING CALCULATED obp > .380;

Individual Team On
Team at the On Base Base
Player's Name end of 1986 Percentage Percentage
--------------------------------------------------------
Murray, Eddie Baltimore .400 .333
Boggs, Wade Boston .455 .346
Rice, Jim Boston .385 .346
Daniels, Kal Cincinnati .394 .336
Grubb, Johnny Detroit .412 .340
Brett, George KansasCity .399 .313
...
The SELECT statement does not create tables(datasets). It only extracts
information from tables. To create a dataset from a PROC SQL query, use
the CREATE TABLE statement. The dataset can be a temporary dataset or
a permanent dataset. The name must adhere to the SAS naming conventions.

To create a temporary dataset "example" from the results of the previous


query:
PROC SQL; CREATE TABLE example AS SELECT name, team, (no_hits+no_bb)/
(no_atbat+no_bb) AS obp format=4.3 label='Individual On Base
Percentage', SUM(no_hits+no_bb)/SUM(no_atbat+no_bb) AS team_obp
format=4.3 label='Team On Base Percentage' FROM sasuser.baseball
GROUP BY team, league HAVING CALCULATED obp > .380; PROC PRINT
DATA=example (obs=5); run;
OBS NAME TEAM OBP TEAM_OBP

1 Murray, Eddie Baltimore .400 .333


2 Boggs, Wade Boston .455 .346
3 Rice, Jim Boston .385 .346
4 Daniels, Kal Cincinnati .394 .336
5 Grubb, Johnny Detroit .412 .340
Please note than when using the SELECT statement with no CREATE, the results of
the query appear in the output window (interactive SAS) or the .lst file (batch SAS). If
a dataset is output using the CREATE TABLE statement, the printing of the results is
supressed.

Another nice function of SQL is the CASE expression. It can be used


for classification of variables. If we wanted to classify the number
of hits into 4 groups, we could use the case statement. The ORDER BYstatement sorts
the results of the query.

PROC SQL; SELECT name, team, no_hits, CASE WHEN no_hits < 50 THEN
'poor' WHEN no_hits BETWEEN 51 AND 100 THEN 'mediocre' WHEN no_hits
BETWEEN 101 AND 150 THEN 'good' WHEN no_hits BETWEEN 151 and 200 THEN
'very good' ELSE 'excellent' END AS hit_qnty label 'Quality Hitter'
FROM sasuser.baseball ORDER BY team;
Team at the Hits in Quality
Player's Name end of 1986 1986 Hitter
-----------------------------------------------------
Hubbard, Glenn Atlanta 94 mediocre
Murphy, Dale Atlanta 163 very good
Sample, Billy Atlanta 57 mediocre
Moreno, Omar Atlanta 84 mediocre
Horner, Bob Atlanta 141 good
Ramirez, Rafael Atlanta 119 good
Oberkfell, Ken Atlanta 136 good
Harper, Terry Atlanta 68 mediocre
Simmons, Ted Atlanta 32 poor

Subqueries - A select may be used within another select statement to


extract information from a second table. For example, we will use
a subquery to find all values of v2 that are in the same row as id=2.
We will then select all rows in data1 that have var4 equal to these
values.

PROC SQL; SELECT key_val, var1, var2, var3, var4 FROM data1 WHERE
var4 IN (SELECT v2 FROM data2 WHERE id = 2);
KEY_VAL VAR1 VAR2 VAR3 VAR4
-----------------------------------------------
2 4 2 3 B
5 6 2 1 C
1 1 1 9 C

Tables can be merged or joined in several ways. The following code


creates every possible combination merging on a variable. The ORDER BY
statement sorts the output.
PROC SQL; SELECT id, v1, var3, var4 FROM data1 AS first, data2 AS
second WHERE first.key_val = second.id ORDER BY id, v1, var3;
ID V1 VAR3 VAR4
----------------------------------
1 2 4 A
1 2 5 E
1 2 9 C
1 3 4 A
1 3 5 E
1 3 9 C
1 5 4 A
1 5 5 E
1 5 9 C
2 1 3 B
2 1 3 B
2 1 9 A
2 1 9 A
2 2 3 B
2 2 9 A
3 1 8 E
3 2 8 E
3 3 8 E
4 1 5 A
4 2 5 A

IMPORTANT If SQL is being used to merge or join tables,


please be aware that if the tables are very large, SQL will not work,
or may be slower that the datastep. Also, only one table can be created
per SQL CREATE statement. Base SAS allows multiple
datasets to be created in on DATA step.
JOIN statements can be used to combine tables.

The following code produces output identical to the preceeding code:PROC SQL;
SELECT id, v1, var3, var4 FROM data1 as first INNER JOIN data2 as
second ON first.key_val = second.id ORDER BY id, v1, var3;
ID V1 VAR3 VAR4
----------------------------------
1 2 4 A
1 2 5 E
1 2 9 C
1 3 4 A
1 3 5 E
1 3 9 C
1 5 4 A
1 5 5 E
1 5 9 C
2 1 3 B
2 1 3 B
2 1 9 A
2 1 9 A
2 2 3 B
2 2 9 A
3 1 8 E
3 2 8 E
3 3 8 E
4 1 5 A
4 2 5 A

The LEFT JOIN combines two tables on key values that are equal, plus
any unmatched values from the first table.

PROC SQL; SELECT id, key_val, v1, var1, var2, var3, var4, v2 FROM
data1 as first LEFT JOIN data2 as second ON first.key_val = second.id
ORDER BY id, v1, var3;
ID KEY_VAL V1 VAR1 VAR2 VAR3 VAR4
V2
---------------------------------------------------------------------
-----
. 5 . 6 2 1 C
. 5 . 2 2 2 A
1 1 2 2 3 4 A
A
1 1 2 3 4 5 E
A
1 1 2 1 1 9 C
A
1 1 3 2 3 4 A
D
1 1 3 3 4 5 E
D
1 1 3 1 1 9 C
D
1 1 5 2 3 4 A
A
1 1 5 3 4 5 E
A
1 1 5 1 1 9 C
A
2 2 1 4 2 3 B
C
2 2 1 4 2 3 B
B
2 2 1 4 1 9 A
C
2 2 1 4 1 9 A
B
2 2 2 4 2 3 B
D
2 2 2 4 1 9 A
D
3 3 1 5 7 8 E
B
3 3 2 5 7 8 E
C
3 3 3 5 7 8 E
D
4 4 1 5 5 5 A
D
4 4 2 5 5 5 A
C

The first two rows with the missing ID correspond to key_val = 5 in the first table.

The RIGHT JOIN combines the tables by selecting the rows where the
key values match, plus any matched keys from the second table.

PROC SQL; SELECT id, key_val, v1, var1, var2, var3, var4, v2 FROM
data1 as first RIGHT JOIN data2 as second ON first.key_val =
second.id ORDER BY id, v1, var3;

ID KEY_VAL V1 VAR1 VAR2 VAR3 VAR4 V2


---------------------------------------------------------------------
---------
1 1 2 2 3 4 A
A
1 1 2 3 4 5 E
A
1 1 2 1 1 9 C
A
1 1 3 2 3 4 A
D
1 1 3 3 4 5 E
D
1 1 3 1 1 9 C
D
1 1 5 2 3 4 A
A
1 1 5 3 4 5 E
A
1 1 5 1 1 9 C
A
2 2 1 4 2 3 B
C
2 2 1 4 2 3 B
B
2 2 1 4 1 9 A
B
2 2 1 4 1 9 A
C
2 2 2 4 2 3 B
D
2 2 2 4 1 9 A
D
3 3 1 5 7 8 E
B
3 3 2 5 7 8 E
C
3 3 3 5 7 8 E
D
4 4 1 5 5 5 A
D
4 4 2 5 5 5 A
C
9 . 3 . . .
B
9 . 3 . . .
A

The FULL JOIN creates every possible combination from the tables:

PROC SQL; SELECT id, key_val, v1, var1, var2, var3, var4, v2 FROM
data1 as first FULL JOIN data2 as second ON first.key_val = second.id
ORDER BY id, v1, var3;
ID KEY_VAL V1 VAR1 VAR2 VAR3 VAR4
V2
---------------------------------------------------------------------
---------
. 5 . 6 2 1 C
. 5 . 2 2 2 A
1 1 2 2 3 4 A
A
1 1 2 3 4 5 E
A
1 1 2 1 1 9 C
A
1 1 3 2 3 4 A
D
1 1 3 3 4 5 E
D
1 1 3 1 1 9 C
D
1 1 5 2 3 4 A
A
1 1 5 3 4 5 E
A
1 1 5 1 1 9 C
A
2 2 1 4 2 3 B
C
2 2 1 4 2 3 B
B
2 2 1 4 1 9 A
B
2 2 1 4 1 9 A
C
2 2 2 4 2 3 B
D
2 2 2 4 1 9 A
D
3 3 1 5 7 8 E
B
3 3 2 5 7 8 E
C
3 3 3 5 7 8 E
D
4 4 1 5 5 5 A
D
4 4 2 5 5 5 A
C
9 . 3 . . .
B
9 . 3 . . .
A

The corresponding dataset merge statements corresponding to the various


joins are:

INNER JOINmerge data1 (in=a) data2 (in=b); by id; if a and b;LEFT JOINmerge
data1 (in=a) data2 (in=b); by id; if a;RIGHT JOINmerge data1 (in=a) data2 (in=b); by
id; if b;FULL JOINmerge data1 (in=a) data2 (in=b); by id;

Datasets used in examples

Selected variables from SASUSER.BASEBALL


N C
O N R C
L _ O _ R
S
E A _ N A _ C
A
N T A T H O T H R
L
A E G B I _ B I _
A
M A U A T B A T B
R
E M E T S B T S B
Y

Aldrete, Mike SanFrancisco National 216 54 33 216 54 33


75
Allanson, Andy Cleveland American 293 66 14 293 66 14
.
Almon, Bill Pittsburgh National 196 43 30 3231 825 238
240
Anderson, Dave LosAngeles National 216 53 22 926 210 114
225
Armas, Tony Boston American 425 112 24 4513 1134 230
.
Ashby, Alan Houston National 315 81 39 3449 835 375
475
Backman, Wally NewYork National 387 124 36 1775 506 194
550
Baines, Harold Chicago American 570 169 38 3754 1077 263
950
Baker, Dusty Oakland American 242 58 27 7117 1981 762
.
Balboni, Steve KansasCity American 512 117 43 1750 412 155
100

Sample dataset "data1"


KEY_VAL VAR1 VAR2 VAR3 VAR4

1 2 3 4 A
1 3 4 5 E
2 4 2 3 B
3 5 7 8 E
4 5 5 5 A
5 6 2 1 C
2 4 1 9 A
5 2 2 2 A
1 1 1 9 C

Sample dataset "data2"


ID V1 V2

1 2 A
1 3 D
1 5 A
2 1 B
2 1 C
2 2 D
3 1 B
3 2 C
3 3 D
4 1 D
4 2 C
9 3 B
9 3 A

Das könnte Ihnen auch gefallen