Beruflich Dokumente
Kultur Dokumente
Overview
The SQL procedure implements Structured Query Language (SQL) for the SAS
System. SQL is a standardized, widely used language that retrieves and updates data
in tables and views based on those tables.
PROC SQL Input and Output summarizes the variety of source material that you can
use with PROC SQL and what the procedure can produce.
You create PROC SQL tables from SAS data files, from SAS data views, or from
DBMS tables using PROC SQL's Pass-Through Facility. The Pass-Through Facility is
described in Connecting to a DBMS Using the SQL Procedure Pass-Through
Facility .
You can refer to views in queries as if they were tables. The view derives its data
from the tables or views that are listed in its FROM clause. The data accessed by a
view are a subset or superset of the data in its underlying table(s) or view(s).
A PROC SQL view is a SAS data set of type VIEW created by PROC SQL. A PROC
SQL view contains no data. It is a stored query expression that reads data values from
its underlying files, which can include SAS data files, SAS/ACCESS views, DATA
step views, other PROC SQL views, or DBMS data. When executed, a PROC SQL
view's output can be a subset or superset of one or more underlying files.
SAS/ACCESS views and DATA step views are similar to PROC SQL views in that
they are both stored programs of member type VIEW. SAS/ACCESS views describe
data in DBMS tables from other software vendors. DATA step views are stored
DATA step programs.
You can update data through a PROC SQL or SAS/ACCESS view with certain
restrictions. See Updating PROC SQL and SAS/ACCESS Views .
You can use all types of views as input to DATA steps and procedures.
Note: In this chapter, the term view collectively refers to PROC SQL views, DATA
step views, and SAS/ACCESS views, unless otherwise noted.
• You do not need to repeat the PROC SQL statement with each SQL statement.
You need only to repeat the PROC SQL statement if you execute a DATA
step or another SAS procedure between statements.
• SQL procedure statements are divided into clauses. For example, the most
basic SELECT statement contains the SELECT and FROM clauses. Items
within clauses are separated with commas in SQL, not with blanks as in the
SAS System. For example, if you list three columns in the SELECT clause,
the columns are separated with commas.
• The SELECT statement, which is used to retrieve data, also outputs the data
automatically unless you specify the NOPRINT option in the PROC SQL
statement. This means you can display your output or send it to a list file
without specifying the PRINT procedure.
• The ORDER BY clause sorts data by columns. In addition, tables do not need
to be presorted by a variable for use with PROC SQL. Therefore, you do not
need to use the SORT procedure with your PROC SQL programs.
• A PROC SQL statement runs when you submit it; you do not have to specify a
RUN statement. If you follow a PROC SQL statement with a RUN statement,
the SAS System ignores the RUN statement and submits the statements as
usual.
<,column-definition>...>
<DROP column <,column>...>;
CREATE <UNIQUE> INDEX index-name
ON table-name (column <,column>...);
CREATE TABLE table-name (column-definition <,column-definition>...);
(column-specification , ...<constraint-specification > ,...) ;
CREATE TABLE table-name LIKE table-name;
CREATE TABLE table-name AS query-expression
<ORDER BY order-by-item <,order-by-item>...>;
CREATE VIEW proc-sql-view AS query-expression
<ORDER BY order-by-item <,order-by-item>...>;
<USING libname-clause<, libname-clause>...>;
DELETE
FROM table-name|proc-sql-view |sas/access-view <AS alias>
<WHERE sql-expression>;
DESCRIBE TABLEtable-name<,table-name>... ;
DESCRIBE TABLE CONSTRAINTS table-name <, table-name>... ;
DESCRIBE VIEW proc-sql-view <,proc-sql-view>... ;
DROP INDEX index-name <,index-name>...
FROM table-name;
DROP TABLE table-name <,table-name>...;
DROP VIEW view-name <,view-name>...;
INSERT INTO table-name|sas/access-view|proc-sql-view
<(column<,column>...) >
SET column=sql-expression
<,column=sql-expression>...
<SET column=sql-expression
<,column=sql-expression>...>;
INSERT INTO table-name|sas/access-view|proc-sql-
view<(column<,column>...)>
VALUES (value<,value>...)
<VALUES (value <,value>...)>...;
INSERT INTO table-name|sas/access-view|proc-sql-view
<(column<,column>...)> query-expression;
RESET <option(s)>;
SELECT <DISTINCT> object-item <,object-item>...
<INTO :macro-variable-specification
<, :macro-variable-specification>...>
FROM from-list
<WHERE sql-expression>
<GROUP BY group-by-item
<,group-by-item>...>
<HAVING sql-expression>
<ORDER BY order-by-item
<,order-by-item>...>;
UPDATE table-name|sas/access-view|proc-sql-view <AS alias>
SET column=sql-expression
<,column=sql-expression>...
<SETcolumn=sql-expression
<,column=sql-expression>...>
<WHERE sql-expression>;
VALIDATEquery-expression;
PROC SQL;
<CONNECT TO dbms-name <AS alias><
<(connect-statement-argument-1=value
...<connect-statement-argument-n=value>)>>
<(dbms-argument-1=value
...<dbms-argument-n=value>)>>;
EXECUTE (dbms-SQL-statement)
BY dbms-name|alias;
<DISCONNECT FROM dbms-name|alias;>
<QUIT;>
To connect to a DBMS and query the DBMS data, use this form:
PROC SQL;
<CONNECT TO dbms-name <AS alias><
<(connect-statement-argument-1=value
...<connect-statement-argument-n=value>)>>
<(dbms-argument-1=value
...<dbms-argument-n=value>)>>;
SELECT column-list
FROM CONNECTION TO dbms-name|alias
(dbms-query)
optional PROC SQL clauses;
<DISCONNECT FROM dbms-name|alias;>
<QUIT;>
CONNECT
Control output
Control execution
Options
DOUBLE|NODOUBLE
double-spaces the report.
Default: NODOUBLE
Featured in: Combining Two Tables
DQUOTE=ANSI|SAS
specifies whether PROC SQL treats values within double-quotes as variables
or strings. With DQUOTE=ANSI, PROC SQL treats a quoted value as a
variable. This enables you to use the following as table names, column names,
or aliases:
Default: SAS
ERRORSTOP|NOERRORSTOP
specifies whether PROC SQL stops executing if it encounters an error. In a
batch or noninteractive session, ERRORSTOP instructs PROC SQL to stop
executing the statements but to continue checking the syntax after it has
encountered an error.
This option expands any use of an asterisk (for example, SELECT *) into the
list of qualified columns that it represents. Any PROC SQL view is expanded
into the underlying query, and parentheses are shown around all expressions to
further indicate their order of evaluation.
Default: NOFEEDBACK
FLOW<=n <m>>|NOFLOW
specifies that character columns longer than n are flowed to multiple lines.
PROC SQL sets the column width at n and specifies that character columns
longer than n are flowed to multiple lines. When you specify FLOW=n m,
PROC SQL floats the width of the columns between these limits to achieve a
balanced layout. FLOW is equivalent to FLOW=12 200.
Default: NOFLOW
INOBS=n
restricts the number of rows (observations) that PROC SQL retrieves from any
single source.
Tip: This option is useful for debugging queries on large tables.
LOOPS=n
restricts PROC SQL to n iterations through its inner loop. You use the number
of iterations reported in the SQLOOPS macro variable (after each SQL
statement is executed) to discover the number of loops. Set a limit to prevent
queries from consuming excessive computer resources. For example, joining
three large tables without meeting the join-matching conditions could create a
huge internal table that would be inefficient to execute.
See also: Using Macro Variables Set by PROC SQL
NODOUBLE
See DOUBLE|NODOUBLE .
NOERRORSTOP
See ERRORSTOP|NOERRORSTOP .
NOEXEC
See EXEC|NOEXEC .
NOFEEDBACK
See FEEDBACK|NOFEEDBACK .
NOFLOW
See FLOW|NOFLOW .
NONUMBER
See NUMBER|NONUMBER .
NOPRINT
See PRINT|NOPRINT .
NOPROMPT
See PROMPT|NOPROMPT .
NOSORTMSG
See SORTMSG|NOSORTMSG .
NOSTIMER
See STIMER|NOSTIMER .
NUMBER|NONUMBER
specifies whether the SELECT statement includes a column called ROW,
which is the row (or observation) number of the data as they are retrieved.
Default: NONUMBER
Featured in: Joining Two Tables
OUTOBS=n
restricts the number of rows (observations) in the output. For example, if you
specify OUTOBS=10 and insert values into a table using a query-expression,
the SQL procedure inserts a maximum of 10 rows. Likewise, OUTOBS=10
limits the output to 10 rows.
PRINT|NOPRINT
specifies whether the output from a SELECT statement is printed.
Default: PRINT
Tip: NOPRINT is useful when you are selecting values from a table
into macro variables and do not want anything to be displayed.
PROMPT|NOPROMPT
modifies the effect of the INOBS=, OUTOBS=, and LOOPS= options. If you
specify the PROMPT option and reach the limit specified by INOBS=,
OUTOBS=, or LOOPS=, PROC SQL prompts you to stop or continue. The
prompting repeats if the same limit is reached again.
Default: NOPROMPT
SORTMSG|NOSORTMSG
Certain operations, such as ORDER BY, may sort tables internally using
PROC SORT. Specifying SORTMSG requests information from PROC SORT
about the sort and displays the information in the log.
Default: NOSORTMSG
SORTSEQ=sort-table
specifies the collating sequence to use when a query contains an ORDER BY
clause. Use this option only if you want a collating sequence other than your
system's or installation's default collating sequence.
See also: SORTSEQ= option in SAS Language Reference: Dictionary.
STIMER|NOSTIMER
specifies whether PROC SQL writes timing information to the SAS log for
each statement, rather than as a cumulative value for the entire procedure. For
this option to work, you must also specify the SAS system option STIMER.
Some operating environments require that you specify this system option
when you invoke SAS. If you use the system option alone, you receive timing
information for the entire SQL procedure, not on a statement-by-statement
basis.
Default: NOSTIMER
UNDO_POLICY=NONE|OPTIONAL|REQUIRED
specifies how PROC SQL handles updated data if errors occur while you are
updating data. You can use UNDO_POLICY= to control whether your
changes will be permanent:
NONE
keeps any updates or inserts.
OPTIONAL
reverses any updates or inserts that it can reverse reliably.
REQUIRED
undoes all inserts or updates that have been done to the point of the error. In
some cases, the UNDO operation cannot be done reliably. For example, when
a program uses a SAS/ACCESS view, it may not be able to reverse the effects
of the INSERT and UPDATE statements without reversing the effects of other
changes at the same time. In that case, PROC SQL issues an error message
and does not execute the statement. Also, when a SAS data set is accessed
through a SAS/SHARE server and is opened with the data set option
CNTLLEV=RECORD, you cannot reliably reverse your changes.
This option may enable other users to update newly inserted rows. If an error
occurs during the insert, PROC SQL can delete a record that another user
updated. In that case, the statement is not executed, and an error message is
issued.
Default: REQUIRED
<,column-definition>...>
<DROP column <,column>...>;
where each constraint-clause is one of the following:
ADD <CONSTRAINT constraint-name> constraint
DROP CONSTRAINT constraint-name
DROP FOREIGN KEY constraint-name [Note: This is a DB2 extension.]
DROP PRIMARY KEY [Note: This is a DB2 extension.]
Arguments
column
names a column in table-name.
column-definition
See column-definition .
constraint-name
specifies the name for the constraint being specified.
referential-action
specifies the type of action to be performed on all matching foreign key
values.
RESTRICT
occurs only if there are matching foreign key values. This is the default
referential action.
SET NULL
sets all matching foreign key values to NULL.
table-name
refers to the name of table containing the primary key referenced by the
foreign key.
WHERE-clause
specifies a SAS WHERE-clause.
Specifying Initial Values of New Columns
When the ALTER TABLE statement adds a column to the table, it initializes the
column's values to missing in all rows of the table. Use the UPDATE statement to add
values to the new column(s).
You cannot change a character column to numeric and vice versa. To change a
column's data type, drop the column and then add it (and its data) again, or use the
DATA step.
Note: You cannot change the length of a numeric column with the ALTER TABLE
statement. Use the DATA step instead.
Renaming Columns
To change a column's name, you must use the SAS data set option RENAME=. You
cannot change this attribute with the ALTER TABLE statement. RENAME= is
described in the section on SAS data set options in SAS Language Reference:
Dictionary.
Integrity Constraints
Use ALTER TABLE to modify integrity constraints for existing tables. Use the
CREATE TABLE statement to attach integrity constraints to new tables. For more
information on integrity constraints, see the section on SAS files in SAS Language
Reference: Concepts.
Using PROC SQL, the SAS structured query language procedure to insert, delete,
modify and retrieve information from SAS data tables
SAS does not allow SQL statements to be used in the DATA step. However, SAS
provides PROC SQL which allows operations on SAS datasets with SQL. The SAS
terms dataset, observation, and variable respectively correspond to the SQL terms
table, row, and column.
Note that the DELETE statement has exactly the same function as the
DELETE statement used in the DATA step. However, DROP in PROC SQL
will delete the entire table(dataset) rather the columns(variables).
Some of options which may be useful: INOBS=n restricts the number of rows
processed from a source
NUMBER|NONUMBER includes a column with the row number
PRINT|NOPRINT turns printing for SELECT statements on or off
League
at the
Team at the end of Hits in Walks in
Player's Name end of 1986 1986 1986 1986
--------------------------------------------------------------
Brett, George KansasCity American 128 80
Davis, Chili SanFrancisco National 146 84
Doran, Bill Houston National 152 81
Downing, Brian California American 137 90
Evans, Darrell Detroit American 122 91
Evans, Dwight Boston American 137 97
...
To select Pittsburgh players who drew between 80 and 100 base on balls:
PROC SQL; SELECT name, team, league, no_hits, no_bb FROM
sasuser.baseball WHERE (no_hits BETWEEN 80 AND 100) AND (team EQ
'Pittsburgh');
League
at the
Team at the end of Hits in Walks in
Player's Name end of 1986 1986 1986 1986
--------------------------------------------------------------
Bonds, Barry Pittsburgh National 92 65
Orsulak, Joe Pittsburgh National 100 28
Other relations used in the WHERE statement in both PROC SQL and base SAS
include: CONTAINS - will find all rows(observations) where the string is contained
in the variable. Example: WHERE name CONTAINS 'Mike' will find all
players with "Mike" in their name.
Example: WHERE team LIKE 'P%' will find all rows where the
team name starts with the letter P.
WHERE name LIKE 'D_n' would find all rows where the
player name started with 'D', ended with 'n' and
had one letter in between ("Dan" and "Don" would
match, but "Dean" would not).
SQL allows basic functions to be performed on columns.
Find the total number of hits, the mean number of base on balls, the number of non-
missing at bats, and the number of non-missing salaries.
proc means data=sasuser.baseball sum mean n; var no_bb no_atbat salary no_hits;
output out=mtemp n=nm_bb nm_ab nm_pay nm_hits mean=ave_bb ave_ab ave_pay
ave_hits sum=tot_bb tot_ab tot_pay tot_hits; proc print data=mtemp; var tot_hits
ave_bb nm_ab nm_pay; run;
Note that the functions used in SQL do not work the same at the functions
in base SAS. In base SAS, the above functions operate on variable in one
observation, i.e. multiple columns in one row. The functions in SQL
work on one column across multiple rows. The functions in SQL perform
the same task as PROCs in base SAS.
data temp;
keep team league tot_hits ave_bb nm_ab nm_pay;
set mtemp;
run;
proc print;
/* format and label statements omitted */
run;
SQL allows the merging of the summary statistics back into each
row.
PROC SQL;
data temp;
keep team league tot_hits ave_bb;
set mtemp;
data individ;
keep name team league no_hits no_bb;
set sasuser.baseball;
data temp2;
merge individ temp; by team league;
proc print;
/* format and label statements omitted */
run;
data individ;
keep name team league no_hits no_bb no_atbat obp;
set sasuser.baseball;
obp=(no_hits+no_bb)/(no_atbat+no_bb);
proc summary data=sasuser.baseball nway;
class team league;
var no_hits no_bb no_atbat;
output out=team sum=no_hits no_bb no_atbat;
data teamobp;
keep team league team_obp;
set team;
team_obp = (no_hits+no_bb)/(no_atbat+no_bb);
proc sort data=individ; by team league;
proc sort data=teamobp; by team league;
data final;
keep name team league obp team_obp;
merge individ teamobp; by team league;
proc print;
Calculated variables can be used for selection.
Individual Team On
Team at the On Base Base
Player's Name end of 1986 Percentage Percentage
--------------------------------------------------------
Murray, Eddie Baltimore .400 .333
Boggs, Wade Boston .455 .346
Rice, Jim Boston .385 .346
Daniels, Kal Cincinnati .394 .336
Grubb, Johnny Detroit .412 .340
Brett, George KansasCity .399 .313
...
The SELECT statement does not create tables(datasets). It only extracts
information from tables. To create a dataset from a PROC SQL query, use
the CREATE TABLE statement. The dataset can be a temporary dataset or
a permanent dataset. The name must adhere to the SAS naming conventions.
PROC SQL; SELECT name, team, no_hits, CASE WHEN no_hits < 50 THEN
'poor' WHEN no_hits BETWEEN 51 AND 100 THEN 'mediocre' WHEN no_hits
BETWEEN 101 AND 150 THEN 'good' WHEN no_hits BETWEEN 151 and 200 THEN
'very good' ELSE 'excellent' END AS hit_qnty label 'Quality Hitter'
FROM sasuser.baseball ORDER BY team;
Team at the Hits in Quality
Player's Name end of 1986 1986 Hitter
-----------------------------------------------------
Hubbard, Glenn Atlanta 94 mediocre
Murphy, Dale Atlanta 163 very good
Sample, Billy Atlanta 57 mediocre
Moreno, Omar Atlanta 84 mediocre
Horner, Bob Atlanta 141 good
Ramirez, Rafael Atlanta 119 good
Oberkfell, Ken Atlanta 136 good
Harper, Terry Atlanta 68 mediocre
Simmons, Ted Atlanta 32 poor
PROC SQL; SELECT key_val, var1, var2, var3, var4 FROM data1 WHERE
var4 IN (SELECT v2 FROM data2 WHERE id = 2);
KEY_VAL VAR1 VAR2 VAR3 VAR4
-----------------------------------------------
2 4 2 3 B
5 6 2 1 C
1 1 1 9 C
The following code produces output identical to the preceeding code:PROC SQL;
SELECT id, v1, var3, var4 FROM data1 as first INNER JOIN data2 as
second ON first.key_val = second.id ORDER BY id, v1, var3;
ID V1 VAR3 VAR4
----------------------------------
1 2 4 A
1 2 5 E
1 2 9 C
1 3 4 A
1 3 5 E
1 3 9 C
1 5 4 A
1 5 5 E
1 5 9 C
2 1 3 B
2 1 3 B
2 1 9 A
2 1 9 A
2 2 3 B
2 2 9 A
3 1 8 E
3 2 8 E
3 3 8 E
4 1 5 A
4 2 5 A
The LEFT JOIN combines two tables on key values that are equal, plus
any unmatched values from the first table.
PROC SQL; SELECT id, key_val, v1, var1, var2, var3, var4, v2 FROM
data1 as first LEFT JOIN data2 as second ON first.key_val = second.id
ORDER BY id, v1, var3;
ID KEY_VAL V1 VAR1 VAR2 VAR3 VAR4
V2
---------------------------------------------------------------------
-----
. 5 . 6 2 1 C
. 5 . 2 2 2 A
1 1 2 2 3 4 A
A
1 1 2 3 4 5 E
A
1 1 2 1 1 9 C
A
1 1 3 2 3 4 A
D
1 1 3 3 4 5 E
D
1 1 3 1 1 9 C
D
1 1 5 2 3 4 A
A
1 1 5 3 4 5 E
A
1 1 5 1 1 9 C
A
2 2 1 4 2 3 B
C
2 2 1 4 2 3 B
B
2 2 1 4 1 9 A
C
2 2 1 4 1 9 A
B
2 2 2 4 2 3 B
D
2 2 2 4 1 9 A
D
3 3 1 5 7 8 E
B
3 3 2 5 7 8 E
C
3 3 3 5 7 8 E
D
4 4 1 5 5 5 A
D
4 4 2 5 5 5 A
C
The first two rows with the missing ID correspond to key_val = 5 in the first table.
The RIGHT JOIN combines the tables by selecting the rows where the
key values match, plus any matched keys from the second table.
PROC SQL; SELECT id, key_val, v1, var1, var2, var3, var4, v2 FROM
data1 as first RIGHT JOIN data2 as second ON first.key_val =
second.id ORDER BY id, v1, var3;
The FULL JOIN creates every possible combination from the tables:
PROC SQL; SELECT id, key_val, v1, var1, var2, var3, var4, v2 FROM
data1 as first FULL JOIN data2 as second ON first.key_val = second.id
ORDER BY id, v1, var3;
ID KEY_VAL V1 VAR1 VAR2 VAR3 VAR4
V2
---------------------------------------------------------------------
---------
. 5 . 6 2 1 C
. 5 . 2 2 2 A
1 1 2 2 3 4 A
A
1 1 2 3 4 5 E
A
1 1 2 1 1 9 C
A
1 1 3 2 3 4 A
D
1 1 3 3 4 5 E
D
1 1 3 1 1 9 C
D
1 1 5 2 3 4 A
A
1 1 5 3 4 5 E
A
1 1 5 1 1 9 C
A
2 2 1 4 2 3 B
C
2 2 1 4 2 3 B
B
2 2 1 4 1 9 A
B
2 2 1 4 1 9 A
C
2 2 2 4 2 3 B
D
2 2 2 4 1 9 A
D
3 3 1 5 7 8 E
B
3 3 2 5 7 8 E
C
3 3 3 5 7 8 E
D
4 4 1 5 5 5 A
D
4 4 2 5 5 5 A
C
9 . 3 . . .
B
9 . 3 . . .
A
INNER JOINmerge data1 (in=a) data2 (in=b); by id; if a and b;LEFT JOINmerge
data1 (in=a) data2 (in=b); by id; if a;RIGHT JOINmerge data1 (in=a) data2 (in=b); by
id; if b;FULL JOINmerge data1 (in=a) data2 (in=b); by id;
1 2 3 4 A
1 3 4 5 E
2 4 2 3 B
3 5 7 8 E
4 5 5 5 A
5 6 2 1 C
2 4 1 9 A
5 2 2 2 A
1 1 1 9 C
1 2 A
1 3 D
1 5 A
2 1 B
2 1 C
2 2 D
3 1 B
3 2 C
3 3 D
4 1 D
4 2 C
9 3 B
9 3 A