Beruflich Dokumente
Kultur Dokumente
What is SQL?
SQL is an ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems. SQL
statements are used to retrieve and update data in a database. SQL works with database programs like MS Access, DB2, Informix, MS SQL Server,
Oracle, Sybase, etc.
Unfortunately, there are many different versions of the SQL language, but to be in compliance with the ANSI standard, they must support the same
major keywords in a similar manner (such as SELECT, UPDATE, DELETE, INSERT, WHERE, and others).
Note: Most of the SQL database programs also have their own proprietary extensions in addition to the SQL standard!
A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows)
with data.
The table above contains three records (one for each person) and four columns (LastName, FirstName, Address, and City).
SQL Queries
With SQL, we can query a database and have a result set returned.
LastName
Hansen
Svendson
Pettersen
Note: Some database systems require a semicolon at the end of the SQL statement. We don't use the semicolon in our tutorials.
SQL (Structured Query Language) is a syntax for executing queries. But the SQL language also includes a syntax to update, insert, and delete
records.
These query and update commands together form the Data Manipulation Language (DML) part of SQL:
The Data Definition Language (DDL) part of SQL permits database tables to be created or deleted. We can also define indexes (keys), specify links
between tables, and impose constraints between database tables.
The SELECT statement is used to select data from a table. The tabular result is stored in a result table (called the result-set).
Syntax
SELECT column_name(s)
FROM table_name
Note: SQL statements are not case sensitive. SELECT is the same as select.
To select the content of columns named "LastName" and "FirstName", from the database table called "Persons", use a SELECT statement like this:
The result
LastName FirstName
Hansen Ola
Svendson Tove
Pettersen Kari
To select all columns from the "Persons" table, use a * symbol instead of column names, like this:
Result
The result from a SQL query is stored in a result-set. Most database software systems allow navigation of the result set with programming functions,
like: Move-To-First-Record, Get-Record-Content, Move-To-Next-Record, etc.
Programming functions like these are not a part of this tutorial. To learn about accessing data with function calls, please visit our ADO tutorial.
Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the
same call to the server.
Some SQL tutorials end each SQL statement with a semicolon. Is this necessary? We are using MS Access and SQL Server 2000 and we do not have
to put a semicolon after each SQL statement, but some database programs force you to use it.
The SELECT statement returns information from table columns. But what if we only want to select distinct elements?
With SQL, all we need to do is to add a DISTINCT keyword to the SELECT statement:
Syntax
To select ALL values from the column named "Company" we use a SELECT statement like this:
"Orders" table
Company OrderNumber
Sega 3412
W3Schools 2312
Trio 4678
W3Schools 6798
Result
Company
Sega
W3Schools
Trio
W3Schools
To select only DIFFERENT values from the column named "Company" we use a SELECT DISTINCT statement like this:
Result:
Company
Sega
W3Schools
Trio
To conditionally select data from a table, a WHERE clause can be added to the SELECT statement.
Syntax
= Equal
IN If you know the exact value you want to return for at least
one of the columns
To select only the persons living in the city "Sandnes", we add a WHERE clause to the SELECT statement:
"Persons" table
Result
Using Quotes
Note that we have used single quotes around the conditional values in the examples.
SQL uses single quotes around text values (most database systems will also accept double quotes). Numeric values should not be enclosed in
quotes.
This is correct:
SELECT * FROM Persons WHERE FirstName='Tove'
This is wrong:
SELECT * FROM Persons WHERE FirstName=Tove
This is correct:
SELECT * FROM Persons WHERE Year>1965
This is wrong:
SELECT * FROM Persons WHERE Year>'1965'
Syntax
A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the pattern.
Using LIKE
The following SQL statement will return persons with first names that start with an 'O':
The following SQL statement will return persons with first names that end with an 'a':
SELECT * FROM Persons
WHERE FirstName LIKE '%a'
The following SQL statement will return persons with first names that contain the pattern 'la':
The INSERT INTO statement is used to insert new rows into a table.
Syntax
You can also specify the columns for which you want to insert data:
Rasmussen Storgt 67
Syntax
UPDATE table_name
SET column_name = new_value
WHERE column_name = some_value
Person:
Rasmussen Storgt 67
Update one Column in a Row
We want to add a first name to the person with a last name of "Rasmussen":
Result:
We want to change the address and add the name of the city:
UPDATE Person
SET Address = 'Stien 12', City = 'Stavanger'
WHERE LastName = 'Rasmussen'
Result:
Syntax
Person:
Delete a Row
Result
It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and indexes will be intact:
ORDER BY
SELECT Company, OrderNumber FROM Orders
ORDER BY Company
AND & OR
IN
The BETWEEN ... AND operator selects a range of data between two values. These values can be numbers, text, or dates.
SELECT column_name FROM table_name
WHERE column_name
BETWEEN value1 AND value2
Employees:
Employee_ID Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari
Orders:
234 Printer 01
657 Table 03
865 Chair 03
We can select data from two tables by referring to two tables, like this:
Example
Result
Name Product
Example
SELECT Employees.Name
FROM Employees, Orders
WHERE Employees.Employee_ID=Orders.Employee_ID
AND Orders.Product='Printer'
Result
Name
Hansen, Ola
Using Joins
OR we can select data from two tables with the JOIN keyword, like this:
Syntax
The INNER JOIN returns all rows from both tables where there is a match. If there are rows in Employees that do not have matches in Orders, those
rows will not be listed.
Result
Name Product
Syntax
The LEFT JOIN returns all the rows from the first table (Employees), even if there are no matches in the second table (Orders). If there are rows in
Employees that do not have matches in Orders, those rows also will be listed.
Result
Name Product
Svendson, Tove
Pettersen, Kari
Example RIGHT JOIN
Syntax
The RIGHT JOIN returns all the rows from the second table (Orders), even if there are no matches in the first table (Employees). If there had been
any rows in Orders that did not have matches in Employees, those rows also would have been listed.
Result
Name Product
Example
SELECT Employees.Name
FROM Employees
INNER JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID
WHERE Orders.Product = 'Printer'
Result
Name
Hansen, Ola
SQL Statement 1
UNION ALL
SQL Statement 2
Example
Create a Database
To create a database:
Create a Table
Example
This example demonstrates how you can create a table named "Person", with four columns. The column names will be "LastName", "FirstName",
"Address", and "Age":
CREATE TABLE Person
(
LastName varchar,
FirstName varchar,
Address varchar,
Age int
)
This example demonstrates how you can specify a maximum length for some columns:
The data type specifies what type of data the column can hold. The table below contains the most common data types in SQL:
integer(size) Hold integers only. The maximum number of digits are specified in parenthesis.
int(size)
smallint(size)
tinyint(size)
decimal(size,d) Hold numbers with fractions. The maximum number of digits are specified in "size".
numeric(size,d) The maximum number of digits to the right of the decimal is specified in "d".
char(size) Holds a fixed length string (can contain letters, numbers, and special characters).
The fixed size is specified in parenthesis.
varchar(size) Holds a variable length string (can contain letters, numbers, and special characters).
The maximum size is specified in parenthesis.
Create Index
Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a
table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries.
Note: Updating a table containing indexes takes more time than updating a table without, this is because the indexes also need an update. So, it is
a good idea to create indexes only on columns that are often used for a search.
A Unique Index
Creates a unique index on a table. A unique index means that two rows cannot have the same index value.
Creates a simple index on a table. When the UNIQUE keyword is omitted, duplicate values are allowed.
Example
This example creates a simple index, named "PersonIndex", on the LastName field of the Person table:
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name:
If you want to index more than one column you can list the column names within the parentheses, separated by commas:
You can delete an existing index in a table with the DROP INDEX statement.
To delete a table (the table structure, attributes, and indexes will also be deleted):
To delete a database:
Truncate a Table
What if we only want to get rid of the data inside a table, and not the table itself? Use the TRUNCATE TABLE command (deletes only the data inside
the table):
ALTER TABLE
The ALTER TABLE statement is used to add or drop columns in an existing table.
Note: Some database systems don't allow the dropping of a column in a database table (DROP COLUMN column_name).
SQL Functions
Function Syntax
Types of Functions
There are several basic types and categories of functions in SQL. The basic types of functions are:
• Aggregate Functions
• Scalar functions
Aggregate functions
Aggregate functions operate against a collection of values, but return a single value.
Note: If used among many other expressions in the item list of a SELECT statement, the SELECT must have a GROUP BY clause!!
Name Age
Hansen, Ola 34
Svendson, Tove 45
Pettersen, Kari 19
Function Description
STDEV(column)
STDEVP(column)
VAR(column)
VARP(column)
Function Description
BINARY_CHECKSUM
CHECKSUM
CHECKSUM_AGG
FIRST(column) Returns the value of the first record in a specified field (not supported in
SQLServer2K)
LAST(column) Returns the value of the last record in a specified field (not supported in
SQLServer2K)
STDEV(column)
STDEVP(column)
VAR(column)
VARP(column)
Scalar functions
Scalar functions operate against a single value, and return a single value based on the input value.
Function Description
INSTR(c,char) Returns the numeric position of a named character within a text field
GROUP BY...
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate of all column values every time they are called, and
without the GROUP BY function it was impossible to find the sum for each individual group of column values.
GROUP BY Example
W3Schools 5500
IBM 4500
W3Schools 7100
Company SUM(Amount)
W3Schools 17100
IBM 17100
W3Schools 17100
The above code is invalid because the column returned is not part of an aggregate. A GROUP BY clause will solve this problem:
Company SUM(Amount)
W3Schools 12600
IBM 4500
HAVING...
HAVING... was added to SQL because the WHERE keyword could not be used against aggregate functions (like SUM), and without HAVING... it
would be impossible to test for result conditions.
W3Schools 5500
IBM 4500
W3Schools 7100
This SQL:
Company SUM(Amount)
W3Schools 12600
The SELECT INTO statement is most often used to create backup copies of tables or for archiving records.
Syntax
If you only want to copy a few fields, you can do so by listing them after the SELECT statement:
What is a View?
A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. You can add
SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from a single table.
Note: The database design and structure will NOT be affected by the functions, where, or join statements in a view.
Syntax
Note: The database does not store the view data! The database engine recreates the data, using the view's SELECT statement, every time a user
queries a view.
Using Views
A view could be used from inside a query, a stored procedure, or from inside another view. By adding functions, joins, etc., to a view, it allows you
to present exactly the data you want to the user.
The sample database Northwind has some views installed by default. The view "Current Product List" lists all active products (products that are not
discontinued) from the Products table. The view is created with the following SQL:
Another view from the Northwind sample database selects every product in the Products table that has a unit price that is higher than the average
unit price:
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName,UnitPrice
FROM Products
WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)
Advanced Concepts
Transact-SQL Variables
• @@CONNECTIONS - Returns the number of connections, or attempted connections, from the last
time the SQL server was started.
• @@CPU_BUSY - Returns the milliseconds that the CPU has benn working since the SQL Server was
last started.
• @@CURSOR_ROWS - Returns the number of rows from the last cursor opened. If the value is a
negative number, the cursor was opened asynchronously. If the number is -1 the last cursor was
opened dynamically. Because dynamic cursors can be changed at any time, it can never be determined
how many rows were returned.
• @@DATEFIRST - Gives you the number of days from the first day of the week. Monday = 1 and
Sunday = 7.
• @@DBTS - Gets the last used timestamp when a row with a timestamp value was updated or deleted.
• @@ERROR - Returns 0 if no error occured with the transaction. Otherwise it returns the error
number generated by the transaction.
• @@FETCH_STATUS - Gets the status of the last cursor FETCH. 0 = successful. -1 = failed or EOF.
-2 = missing row.
• @@IDENTITY - Returns the ID of the last inserted row. Since it doesn't work for UPDATE you
should query the inserted table like this:
In the above example the field myID is your table's primary key field.
• @@IDLE - Gets the milliseconds idle time since the SQL Server was last started.
• @@IO_BUSY - Gets the milliseconds the server has performed input and output function since last
started.
• @@LANGID - Gets the language ID. Look at the syslanguages table for a list of languages. Use SET
LANGUAGE to change the language ID.
SELECT strFirstName
FROM tblStewShack
WHERE strLastName = 'Stewart'
IF @@ROWCOUNT = 0
print 'No records in the database for that last name.'
Since @@ROWCOUNT is an integer datatype, if you are going to return more than 2 billion rows
you need to use:
• ROWCOUNT_BIG() This gets the number of rows affected and puts it in a bigint datatype.
• @@SERVERNAME - Gets the local server name.
• @@SERVICENAME - Gets the registry key for the server.
• @@SPID - Gets the server process ID.
• @@TEXTSIZE - Gets the TEXTSIZE or maximum byte length of the text or image datatype.
• @@TIMETICKS - Every computer has a differnt microsecond time tick rate. The typical operating
system has 31.25 milliseconds to a tick.
• @@TOTAL_ERRORS - Gets the number of disk read/write errors since the server was last started.
• @@TOTAL_READ - Gets the number of disk reads, but not cache reads since the server was last
started. It would be better to run the sp_monitor system stored procedure to get this information.
• @@TOTAL_WRITE - Gets the number of disk writes since the server was last started. It would be
better to run the sp_monitor system stored procedure to get this information.
• @@TRANCOUNT Gets the number of transactions in the current connection. When you run a stored
procedure with the BEGIN TRANSACTION statement, it adds 1 to @@TRANCOUNT. When you
ROLLBACK TRANSACTION it subtracts 1, except when you use ROLLBACK
TRANSACTION [savepoint_name],. COMMIT TRANSACTION or COMMIT WORK also
subtracts 1 from @@TRANCOUNT.
• @@VERSION - Gets the date, version, and processor for the server.
• DBCC SHRINKDATABASE (database name [ , target percent ] [ , { NOTRUNCATE |
TRUNCATEONLY } ]) - If you pass in the NOTRUNCATE parameter the released memory will go
to the database files. The parameter TRUNCATEONLY is the default. The released memory will go to
the operating system. The target percent is ignored when TRUNCATEONLY is used.
• DBCC CLEANTABLE ( { 'database_name' | database_id } , { 'table_name' | table_id | 'view_name' |
view_id } [ , batch_size ]) - This releases memory after you drop a text column in a table. This doesn't
work on temp or system tables.
Cursors
Cursor Locking
• Static cursors make a copy of the data in a table named tempdb. All the changes happen in this temp
table and does not modify the real data.
• OPTIMISTIC - no rows are locked. If a change occurs the last person making the change wins.
• Read Committed - SQL Server acquires a share lock while reading a row into a cursor but frees the
lock immediately after reading the row. Because shared lock requests are blocked by an exclusive
lock, a cursor is prevented from reading a row that another task has updated but not yet committed.
Read committed is the default isolation level setting for both SQL Server and ODBC.
• Read Uncommitted - SQL Server requests no locks while reading a row into a cursor and honors no
exclusive locks. Cursors can be populated with values that have already been updated but not yet
committed. The user is bypassing all of the locking transaction control mechanisms in SQL Server.
• Repeatable Read or Serializable - SQL Server requests a shared lock on each row as it is read into the
cursor as in READ COMMITTED, but if the cursor is opened within a transaction, the shared locks
are held until the end of the transaction instead of being freed after the row is read. This has the same
effect as specifying HOLDLOCK on a SELECT statement.
There are built in stored procedures that give information for cursors.
You can set cursor threshold with the sp_configure stored procedure. This specifies the number of rows in
the cursor keysets that are generated asynchronously. If you set it -1, all the keyset cursors are created
synchronously. This is good for small cursor record sets. If you set it to 0 all keyset cursors are generated
asynchronously.
SQL File System
RAID
One option is to put the database on a RAID. RAID stands for Redundant Array of Independent (or
Inexpensive) Disks. A RAID disk drive uses two or more hard drives in case one goes down or becomes
loaded with traffic.
RAID levels:
Level 0
No Fault Tolerancing; level 0 shows improved performance over the other levels, but if a drive fails
all data is lost.
Level 1
Mirrors and Duplexes; level 1 writes data to two duplicate disks at the same time. If one disk goes
down, data is drawn from the other disk.
Level 2
Error-Correcting Code; level 2 is not used very often. It requests data from the bit level instead of the
block level.
Level 3
Bit-Interleaved Parity; level 3 requests data from the byte-level. It cannot handle multiple requests at
the same time. That means it is hardly ever used.
Level 4
Dedicated Parity Drive; level 4 requests data at the block-level. If a drive fails a back-up disk is made
from the other disk. This is a big disadvantage because level 4 creates bottlenecks.
Level 5
Block Interleaved Distributed Parity; level 5 is byte level. Level 5 provides great performance and
fault tolerancing. It is the most used level of RAID. It spreads commands across many drives. These
commands can then be run independently and at the same time.
Level 6
Independent Data Disks with Double Parity; level 6 uses block-level.
Level 0+1
Mirror of Stripes; level 0+1 is an add-on RAID level. Two RAID 0 levels are added to a RAID 1
mirror.
Level 10
Stripe of Mirrors; level 10 is also an add-on RAID level. It is made up of many RAID 1 mirrors.
Level 7
Storage Computer Corporation owns the rights to this RAID level. It adds caching to Levels 3 or 4.
RAID S
EMC Corporation owns this RAID system. It is used in its Symmetrix storage systems.
Data Files
1. Primary - every database must have one and only one. Its file extension should be .mdf.
2. Secondary - it's not required to have one, but you can have more than one if you want to. Its extension
should be .ndf.
3. Logs - these files are used to recover the database after a disaster. The extension should be .ldf.
These files cannot be compressed. They grow automatically, but you can define how they will grow. If you
do not specifiy a maximum size, the file will continue to grow until it has filled up the drive.
Files can be grouped into filegroups. No file can be a member of two filegroups. There are two types of
filegroups, primary and user-defined.
USE master
GO
CREATE DATABASE stewshack
ON PRIMARY
( NAME='stewshack_primary',
FILENAME=
'c:\Program Files\Microsoft SQL Server\MSSQL\data\stewshack.mdf',
SIZE=4,
MAXSIZE=10,
FILEGROWTH=1),
FILEGROUP stewshack_filegroup
( NAME = 'stewshack_filegroup_data',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL\data\stewshack.ndf',
SIZE = 1MB,
MAXSIZE=10,
FILEGROWTH=1),
( NAME = 'stewshack_userinfo',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL\data\stewshack_userinfo.ndf',
SIZE = 1MB,
MAXSIZE=10,
FILEGROWTH=1)
LOG ON
( NAME='stewshack_log',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL\data\stewshack.ldf',
SIZE=1,
MAXSIZE=10,
FILEGROWTH=1)
GO
ALTER DATABASE stewshack
MODIFY FILEGROUP stewshack_filegroup DEFAULT
GO
If your database changes size as the day goes on. You might want to turn autoshrink to true. When you make
a back-up of the transaction log, it gets automatically shrunk. If you do not specify a database size it is
defaulted to 1 MB. If you do not specify a growth rate, it defaults to 10%.
Normalization
1. Eliminate Repeating Groups
Every field in the table must contain different information.
We've all seen the Address 1 and Address 2 fields on web sites. That does not mean there needs to be an
address 1 and address 2 field in the database. Having them would break the first rul of normalization.
Bad Example:
In the following example, the name and date of the last course the user has taken is mentioned in the
employee table. This course name does not correspond to the employee id. The course date does not describe
the employee. They should both have their own table with a unique ID. This would tell us that a junction
table should exist between employees and courses.
Bad Example:
In the following example I have two tables, tblEmployee and tblPaycheck. Now for some reason pay checks
can sometimes be sent to a different address. So there exists a field in tblPayCheck named address. This field
is often the same as the employee's address. So a duplication of data is occuring. This can cause confusion if
the employee's address is only updated in tblEmployee. They won't get their pay check! A separate table
named tblPayCheckAddress should exist with a foreign key going to tblEmployee. If a record exists, than a
different address needs to be used to send the pay check.
tblEmployee tblPaycheck
emp_id Name address paycheck_id emp_id address
1 Dan Stewart 1490 W 121 Suite 201 1 1 PO BOX 12713
2 Abhay Mehta 1490 W 121 Suite 201 2 2 1490 W 121 Suite 201
An example of a many-to-many relationship is one unit can contain many documents and one document can
be in many units.
A junction table needs to exist between units and documents.
Stored Procedures
SQL Server allows you to create stored procedures. At first I didn't like them. Then I realized that they make
your web page much more object oriented if you use them. You can pass in variables by using this SQL
statement.
EXECUTE getName
@lname = 'Stewart'
@fname = 'Dan'
If you run EXECUTE getName WITH RECOMPILE the stored procedure will not use a cached copy. This is
good if the data changes often and running a cached copy will return old results.
Triggers
CREATE TRIGGER tblUser_onInsertUpdateDelete
ON tblUser
WITH ENCRYPTION
FOR INSERT,UPDATE, DELETE
AS
INSERT INTO tblLog (numOfChanges) VALUES (@@ROWCOUNT)
GO
CREATE TRIGGER tblUser_AfterInsertUpdateDelete
ON tblUser
WITH ENCRYPTION
AFTER INSERT,UPDATE,DELETE
AS
INSERT INTO tblLog (numOfChanges) VALUES (@@ROWCOUNT)
GO
The WITH ENCRYPTION is not neccessary, but it does protect your code from replication. The AFTER
trigger cannot be placed on views. The INSTEAD OF trigger can be placed on updateable views if the WITH
CHECK OPTION is not on the view.