Sie sind auf Seite 1von 86

CHAPTER 42

Whats New for


Transact-SQL in SQL
Server 2008
IN THIS CHAPTER
. MERGE Statement
. Insert over DML
. GROUP BY Clause
Enhancements
. Variable Assignment in
DECLARE Statement
. Compound Assignment
Operators
. Row Constructors
. New date and time Data
Types and Functions
. Table-Valued Parameters
. Hierarchyid Data Type
. Using FILESTREAM Storage
. Sparse Columns
. Spatial Data Types
. Change Data Capture
. Change Tracking
Although SQL Server 2008 introduces some new features
and changes to the Transact-SQL (T-SQL) language that
provide additional capabilities, there is not a significant
number of new features over what was available in 2005.
T-SQL does offer the following new features:
. MERGE statement
. Insert over DML
. GROUP BY clause enhancements
. Variable assignment in DECLARE statement
. Compound assignment operators
. Row Constructors
. date and time data types
. Table-valued parameters
. Hierarchyid data type
. FILESTREAM Storage
. Sparse Columns
. Spatial Data Types
. Change Data Capture
. Change Tracking
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1551
1552 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
NOTE
If you are making the leap from SQL Server 2000 (or earlier) to SQL Server 2008 or
SQL Server 2008 R2, you may not be familiar with a number of T-SQL enhancements
introduced in SQL Server 2005. Some of these enhancements are used in the exam-
ples in this chapter. If you are looking for an introduction to the new T-SQL features
introduced in SQL Server 2005, check out the In Case you Missed It... section in
Chapter 43, Transact-SQL Programming Guidelines, Tips, and Tricks, which is provided
on the CD included with this book.
NOTE
Unless stated otherwise, all examples in this chapter use tables in the bigpubs2008
database.
MERGE Statement
In versions of SQL Server prior to SQL Server 2008, if you had a set of data rows in a
source table that you wanted to synchronize with a target table, you had to perform at
least three operations: one scan of the source table to find matching rows to update in the
target table, another scan of the source table to find nonmatching rows to insert into the
target table, and a third scan to find rows in the target table not contained in the source
table that needed to be deleted. SQL Server 2008, however, introduces the MERGE
statement. With the MERGE statement, you can synchronize two tables by inserting, updat-
ing, or deleting rows in one table based on differences found in the other table, all in just
a single statement, minimizing the number of times that rows in the source and target
tables need to be processed. The MERGE statement can also be used for performing condi-
tional inserts or updates of rows in a target table from a source table.
The MERGE syntax consists of the following primary clauses:
. The MERGE clause specifies the table or view that is the target of the insert, update, or
delete operations.
. The USING clause specifies the data source being joined with the target.
. The ON clause specifies the join conditions that determine how the target and
source match.
. The WHEN MATCHED clause specifies either the update or delete operation to perform
when rows of target table match rows in the source table and any additional search
conditions.
. WHEN NOT MATCHED BY TARGET specifies the insert operation when a row in the
source table does not have a match in the target table.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1552
1553 MERGE Statement
4
2
. WHEN NOT MATCHED BY SOURCE specifies the update or delete operation to perform
when rows of the target table do not have matches in the source table.
. The OUTPUT clause returns a row for each row in the target that is inserted, updated,
or deleted.
The basic syntax of the MERGE statement is as follows:
[ WITH common_table_expression [,...n] ]
MERGE
[ TOP ( N ) [ PERCENT ] ]
[ INTO ] target_table [ [ AS ] table_alias ]
USING table_or_view_name [ [ AS ] table_alias ]
ON merge_search_condition
[ WHEN MATCHED [ AND search_condition ]
THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND search_condition ]
THEN { INSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES
}} ]
[ WHEN NOT MATCHED BY SOURCE [ AND search_condition ]
THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
[ OUTPUT column_name | scalar_expression
INTO { @table_variable | output_table } [ (column_list) ] ]
[ OUTPUT column_name | scalar_expression [ [AS] column_alias_identifier ] [
,...n ] ] ;
The WHEN clauses specify the actions to take on the rows identified by the conditions speci-
fied in the ON clause. The conditions specified in the ON clause determine the full result set
that will be operated on. Additional filtering to restrict the affected rows can be specified
in the WHEN clauses. Multiple WHEN clauses with different search conditions can be speci-
fied. However, if there is a MATCH clause that includes a search condition, it must be speci-
fied before all other WHEN MATCH clauses.
Note that the MERGE command must be terminated with a semicolon (;). Otherwise, you
receive a syntax error.
When you run a MERGE statement, rows in the source are matched with rows in the target
based on the join predicate that you specify in the ON clause. The rows are processed in a
single pass, and one insert, update, or delete operation is performed per input row
depending on the WHEN clauses specified. The WHEN clauses determine which of the follow-
ing matches exist in the result set:
. A matched pair consisting of one row from the target and one from the source as a
result of matching condition in the WHEN MATCHED clause
. A row from the source that has no matching row in the target as a result of the
condition specified the WHEN NOT MATCHED BY TARGET clause
. A row from the target that has no corresponding row in the source as a result of the
condition specified in the WHEN NOT MATCHED BY SOURCE clause
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1553
1554 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
TABLE 42.1 Join Methods Used for WHEN Clauses
Specified WHEN Clauses Join Method
WHEN MATCHED clause only INNER JOIN
WHEN NOT MATCHED BY TARGET clause, but not the WHEN NOT
MATCHED BY SOURCE clause
LEFT OUTER JOIN from
source to target
WHEN MATCHED clause and the WHEN NOT MATCHED BY SOURCE
clause, but not the WHEN NOT MATCHED BY TARGET clause
RIGHT OUTER JOIN from
source to target
WHEN NOT MATCHED BY TARGET clause and the WHEN NOT
MATCHED BY SOURCE clause
FULL OUTER JOIN
WHEN NOT MATCHED BY SOURCE clause only ANTI SEMI JOIN
The combination of WHEN clauses specified in the MERGE statement determines the join
method that SQL Server will use to process the query (see Table 42.1).
To improve the performance of the MERGE statement, you should make sure you have
appropriate indexes to support the join columns between the source table and target table.
Any additional columns in the source table index that will help to cover the query may
help improve performance even more (for information on index covering, see Chapter 34,
Data Structures, Indexes, and Performance). The indexes should ensure that the join
keys are unique and, if possible, sort the data in the tables in the order it will be processed
so additional sort operations are not necessary. Unique indexes supporting the join condi-
tions for the MERGE statement will improve query performance because the query optimizer
does not need to perform extra validation processing to locate and update duplicate rows.
To better understand how the MERGE statement works, lets look at an example. First, you
need to set up some data in a source table. In the bigpubs2008 database, there is a table
called stores. For this example, lets assume you want to set up a new table that keeps
track of each stores inventory to support an application that can monitor each stores
inventory and send notifications when certain items run low, as well as to support the
ability of each store to search other store inventories to locate rare and out-of-print books
that other stores may have available. On a daily basis, each store uploads a full refresh of
its current inventory to a staging table (inventory_load), which is the source table for the
MERGE. You then use the inventory_load table to modify the stores inventory in the
store_inventory table (which is the target table for the MERGE operation).
First, lets create the new store_inventory table (see Listing 42.1). Just for sake of the
example, you can create and populate it with the existing data from the sales table for
stor_id A011 and create a primary key constraint on the stor_id and title_id
columns. The next step is to load the inventory_load table. Normally, in a real-world
scenario, this table would likely be populated via a BULK INSERT statement or SQL Server
Integration Services. However, for the sake of this example, you simply are going to create
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1554
1555 MERGE Statement
4
2
some test data by creating and populating the inventory_load table using SELECT INTO
with data merged from the sales data for both stor_id A011 and A017.
When the inventory_load table is created and populated, you can create a primary key
on the stor_id and title_id columns as well to support the join with the
store_inventory table.
The next step is to build out the MERGE statement. Following are the rules to be applied:
. If there is a matching row between the source and target tables and the qty value is
different, update the qty value in the target table to the value in the source table.
. If a row in the source table doesnt have a match in the target table, this is a new
inventory item, so insert the new row to the target table.
. If a row in the target table doesnt have a matching row in the source table, that
inventory item no longer exists, so delete it from the target table.
Also for the sake of the example so that you can see just what the MERGE statement ends
up doing, the OUTPUT clause has been added with the $action column included. The
$action column displays what operation (INSERT, UPDATE, DELETE) was performed on each
row, and displays the title_id and qty values for both the source and target tables for
each row processed (note that if the title_id and qty columns are NULL, that was a
nonmatching row).
LISTING 42.1 A MERGE Example
use bigpubs2008
go
if OBJECT_ID(store_inventory) is not null
drop table store_inventory
go
-- Create and populate the store_inventory table
select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE()
into store_inventory
from sales s
where stor_id = A011
group by stor_id, title_id
go
-- add primary key on store_inventory to support the join to source table
alter table store_inventory add constraint PK_store_inventory primary key
(stor_id, title_id)
Go
if OBJECT_ID(inventory_load) is not null
drop table inventory_load
go
-- Now, create and populate the inventory_load table
select stor_id = A011,
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1555
1556 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
title_id,
qty = SUM(qty)
into inventory_load
from sales s
where stor_id like A01[17]
and title_id not like %8
group by title_id
go
add primary key on store_inventory to support the join to target table
alter table inventory_load add constraint PK_inventory_load primary key
(stor_id, title_id)
go
select * from store_inventory
go
-- perform the marge, updating any matching rows with different quantities
-- adding any rows in source not in the target, and deleting any rows from the
-- target that are not in the source.
-- Output clause is specified to display the results of the MERGE
MERGE
INTO store_inventory as s
USING inventory_load as i
ON s.stor_id = i.stor_id
and s.title_id = i.title_id
WHEN MATCHED and s.qty <> i.qty
THEN UPDATE
SET s.qty = i.qty,
update_dt = getdate()
WHEN NOT MATCHED
THEN INSERT (stor_id, title_id, qty, update_dt)
VALUES (i.stor_id, i.title_id, i.qty, getdate())
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT $action,
isnull(inserted.title_id, ) as src_titleid, isnull(str(inserted.qty, 5),
) as src_qty,
isnull(deleted.title_id, ) as tgt_titleid, isnull(str(deleted.qty, 5),
) as tgt_qty
;
go
select * from store_inventory
go
If you run the script in Listing 42.1, you should see output like the following.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1556
1557 MERGE Statement
4
2
stor_id title_id qty update_dt
------- -------- ----------- -----------------------
A011 CH0741 1452 2010-03-25 00:34:25.597
A011 CH3348 24 2010-03-25 00:34:25.597
A011 FI0324 1392 2010-03-25 00:34:25.597
A011 FI0392 1176 2010-03-25 00:34:25.597
A011 FI1552 1476 2010-03-25 00:34:25.597
A011 FI1872 540 2010-03-25 00:34:25.597
A011 FI3484 1428 2010-03-25 00:34:25.597
A011 FI3660 984 2010-03-25 00:34:25.597
A011 FI4020 1704 2010-03-25 00:34:25.597
A011 FI4970 1140 2010-03-25 00:34:25.597
A011 FI4992 180 2010-03-25 00:34:25.597
A011 FI5832 1632 2010-03-25 00:34:25.597
A011 NF8918 1140 2010-03-25 00:34:25.597
A011 PC9999 1272 2010-03-25 00:34:25.597
A011 TC7777 1692 2010-03-25 00:34:25.597
(15 row(s) affected)
$action
---------- ------ ----- ------ -----
INSERT BU2075 1536
DELETE CH3348 24
INSERT CH5390 888
INSERT CH7553 540
INSERT FI1950 1308
INSERT FI2100 1104
INSERT FI3822 996
UPDATE FI4970 1632 FI4970 1140
INSERT FI7040 1596
INSERT LC8400 732
DELETE NF8918 1140
(11 row(s) affected)
stor_id title_id qty update_dt
------- -------- ----------- -----------------------
A011 BU2075 1536 2010-03-25 00:54:54.547
A011 CH0741 1452 2010-03-25 00:34:25.597
A011 CH5390 888 2010-03-25 00:54:54.547
A011 CH7553 540 2010-03-25 00:54:54.547
A011 FI0324 1392 2010-03-25 00:34:25.597
A011 FI0392 1176 2010-03-25 00:34:25.597
A011 FI1552 1476 2010-03-25 00:34:25.597
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1557
1558 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
A011 FI1872 540 2010-03-25 00:34:25.597
A011 FI1950 1308 2010-03-25 00:54:54.547
A011 FI2100 1104 2010-03-25 00:54:54.547
A011 FI3484 1428 2010-03-25 00:34:25.597
A011 FI3660 984 2010-03-25 00:34:25.597
A011 FI3822 996 2010-03-25 00:54:54.547
A011 FI4020 1704 2010-03-25 00:34:25.597
A011 FI4970 1632 2010-03-25 00:54:54.547
A011 FI4992 180 2010-03-25 00:34:25.597
A011 FI5832 1632 2010-03-25 00:34:25.597
A011 FI7040 1596 2010-03-25 00:54:54.547
A011 LC8400 732 2010-03-25 00:54:54.547
A011 PC9999 1272 2010-03-25 00:34:25.597
A011 TC7777 1692 2010-03-25 00:34:25.597
(21 row(s) affected)
If you examine the results and compare the before and after contents of the
store_inventory, you see that eight new rows were inserted to store_inventory, two
rows were deleted, and one row was updated.
MERGE Statement Best Practices and Guidelines
The MERGE statement is a great addition to the T-SQL language. It provides a concise and effi-
cient mechanism to perform multiple operations on a table based on contents in a source
table without having to resort to using a cursor or running multiple set-oriented operations
against the table. However, there are some guidelines and best practices you should keep in
mind to help ensure you get the best performance from your MERGE statements.
First, you should try to reduce the number of rows accessed by the MERGE statement early
in the process by specifying any additional search condition to the ON clause that filters
out rows that do not need to be processed. You should avoid using the conditions in the
WHEN clauses as row filters. However, you need to be careful if you are using any of the
WHEN NOT MATCHED clauses because the elimination of rows via the ON clause may cause
unexpected and incorrect results. Because the additional search conditions specified in the
ON clause are not used for matching the source and target data, they can be misapplied.
To ensure correct results are obtained, you should specify only search conditions in the ON
clause that determine the criteria for matching data in the source and target tables. That is,
specify only columns from the target table that are compared to the corresponding columns
of the source table. Do not include comparisons to other values such as a constant.
To filter out rows from the source or target tables, you should consider using one of the
following methods.
. Specify the search condition for row filtering in the appropriate WHEN clause. For
example, WHEN NOT MATCHED AND qty > 0 THEN INSERT....
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1558
1559 Insert over DML
4
2
. Define a view on the source or target that returns the filtered rows and reference the
view as the source or target table. If the view is used as the target, make sure the
view is updateable (for more information about updating data by using a view, see
Chapter 27, Creating and Managing Views).
. Use the WITH <common table expression> clause to filter out rows from the source
or target tables. However, if you are not careful, this method is similar to specifying
additional search criteria in the ON clause and may produce incorrect results. You
should test this approach thoroughly before implementing it (for information on
using common table expressions, see Chapter 43, Transact-SQL Programming
Guidelines, Tips, and Tricks).
Insert over DML
Another T-SQL enhancement in SQL Server 2008 applies to the use of the OUTPUT clause.
The OUTPUT clause allows you to return data from a modification statement (INSERT,
UPDATE, MERGE, or DELETE) as a result set or into a table variable or an output table. In SQL
Server 2008, you can include one of these Data Manipulation Language (DML) statements
with an OUTPUT clause within the context of an INSERT...SELECT statement.
In the MERGE statement in Listing 42.1, the OUTPUT clause was used to display the rows
affected by the statement. Suppose that you want the output of this to be put into a sepa-
rate audit or processing table. In SQL Server 2008, you can do so by allowing the MERGE
statement with the OUTPUT clause to be incorporated as a derived table in the SELECT
clause of an INSERT statement.
To demonstrate this approach, you first need to create a table for storing that data:
if OBJECT_ID(inventory_audit) is not null
drop table inventory_audit
go
CREATE TABLE inventory_audit
(
Action varchar(10) not null,
Src_title_id varchar(6) null,
Src_qty int null,
Tgt_title_id varchar(6) null,
Tgt_qty int null,
Loginname varchar(30) null default suser_name(),
Action_DT datetime2 null default sysdatetime()
)
Now it is possible to be put a SELECT statement atop the MERGE command as the values
clause for an INSERT into the inventory_audit table (see Listing 42.2).
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1559
1560 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
LISTING 42.2 Insert over DML Example
-- NOTE: to see the results for this example
-- you first need to clear out and repopulate
-- the store_inventory table
Truncate table store_inventory
Insert store_inventory (stor_id, title_id, qty, update_dt)
select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE()
from sales s
where stor_id = A011
group by stor_id, title_id
go
insert inventory_audit
(action,
Src_title_id,
Src_qty ,
Tgt_title_id,
Tgt_qty ,
Loginname,
Action_DT
)
select *, SUSER_NAME(), SYSDATETIME()
from (
MERGE
INTO store_inventory as s
USING inventory_load as i
ON s.stor_id = i.stor_id
and s.title_id = i.title_id
WHEN MATCHED and s.qty <> i.qty
THEN UPDATE
SET s.qty = i.qty,
update_dt = getdate()
WHEN NOT MATCHED
THEN INSERT (stor_id, title_id, qty, update_dt)
VALUES (i.stor_id, i.title_id, i.qty, getdate())
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT $action,
isnull(inserted.title_id, ) as src_titleid,
isnull(str(inserted.qty, 5), ) as src_qty,
isnull(deleted.title_id, ) as tgt_titleid,
isnull(str(deleted.qty, 5), ) as tgt_qty
) changes ( action,
Src_title_id,
Src_qty ,
Tgt_title_id,
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1560
1561 GROUP BY Clause Enhancements
4
2
Tgt_qty );
go
select * from inventory_audit
go
Action Src_title_id Src_qty Tgt_title_id Tgt_qty Loginname Action_DT
------ ------------ ------- ------------ ------- --------- ----------------------
INSERT BU2075 1536 0 rrankins 2010-04-02 22:20:59.48
DELETE 0 CH3348 24 rrankins 2010-04-02 22:20:59.48
INSERT CH5390 888 0 rrankins 2010-04-02 22:20:59.48
INSERT CH7553 540 0 rrankins 2010-04-02 22:20:59.48
INSERT FI1950 1308 0 rrankins 2010-04-02 22:20:59.48
INSERT FI2100 1104 0 rrankins 2010-04-02 22:20:59.48
INSERT FI3822 996 0 rrankins 2010-04-02 22:20:59.48
UPDATE FI4970 1632 FI4970 1140 rrankins 2010-04-02 22:20:59.48
INSERT FI7040 1596 0 rrankins 2010-04-02 22:20:59.48
INSERT LC8400 732 0 rrankins 2010-04-02 22:20:59.48
DELETE 0 NF8918 1140 rrankins 2010-04-02 22:20:59.48
GROUP BY Clause Enhancements
SQL Server 2008 introduces a number of enhancements and changes to the grouping
aggregate relational result set. These changes include the following:
. ROLLUP and CUBE operator syntax changes
. New GROUPING SETS operator
. New GROUPING_ID() function
ROLLUP and CUBE Operator Syntax Changes
The ROLLUP and CUBE operators produce additional aggregate groupings and are appended
to the GROUP BY clause. Prior to SQL Server 2008, to include ROLLUP or CUBE groupings,
you had to specify the WITH ROLLUP or WITH CUBE options in the GROUP BY clause after the
list of grouping columns. In SQL Server 2008, the syntax now follows the ANSI standard
for ROLLUP and CUBE; you first designate the ROLLUP or CUBE option and then provide the
grouping columns to these operators as a comma-separated list enclosed in parentheses.
The new syntax is
GROUP BY [ROLLUP | CUBE ( non-aggregate_column_list ) ]
Following are examples using the pre-2008 syntax:
SELECT type, pub_id, AVG(price) AS average
FROM titles
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1561
1562 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
GROUP BY type, pub_id
WITH CUBE
SELECT pub_id, type, SUM(ytd_sales) as ytd_sales
FROM dbo.titles
where type like %cook% or type = business
GROUP BY type, pub_id
WITH ROLLUP
An example of the new ANSI standard syntax supported in SQL Server 2008 is as follows:
SELECT type, pub_id, AVG(price) AS average
FROM titles
GROUP BY CUBE ( type, pub_id)
SELECT pub_id, type, SUM(ytd_sales) as ytd_sales
FROM dbo.titles
where type like %cook% or type = business
GROUP BY ROLLUP (type, pub_id)
NOTE
The old-style CUBE and ROLLUP syntax is still supported for backward-compatibility pur-
poses but is being deprecated. You should convert any existing queries using the pre-
2008 WITH CUBE or WITH ROLLUP syntax to the new syntax to ensure future
compatibility.
GROUPING SETS
The CUBE and ROLLUP operators allow you to run a single query and generate multiple sets
of groupings. However, the sets of groupings are fixed. For example, if you use GROUP BY
ROLLUP (A, B, C), you get aggregates generated for the following groupings of nonaggre-
gate columns:
. GROUP BY A, B, C
. GROUP BY A, B
. GROUP BY A
. A super-aggregate for all rows
If you use GROUP BY CUBE (A, B, C), you get aggregates generated for the following
groupings of nonaggregate columns:
. GROUP BY A, B, C
. GROUP BY A, B
. GROUP BY A, C
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1562
1563 GROUP BY Clause Enhancements
4
2
. GROUP BY B, C
. GROUP BY A
. GROUP BY B
. GROUP BY C
. A super-aggregate for all rows
SQL Server 2008 introduces the GROUPING SETS operator in addition to the CUBE and
ROLLUP operators for performing several groupings in a single query. With GROUPING SETS,
only the specified groups are aggregated instead of the full set of aggregations generated
by CUBE or ROLLUP. GROUPING SETS enables you to generate results with multiple groupings
in a single query, without having to resort to writing multiple GROUP BY queries and
combining the results using a UNION ALL statement.
The GROUPING SETS operator supports concatenating column groupings and an optional
super aggregate row. The syntax for defining grouping sets is as follows:
GROUP BY [ GROUPING SETS ( ( ) | grouping_set_item | grouping_set_item_list
[, ...n ] ) ]
The GROUPING SETS items can be single columns or a list of columns. The null field list (
) can also be used to generate a super-aggregate (that is, a grand total for the entire result
set). A non-nested list of columns works as separate simple GROUP BY statements, which
are then combined in an implied UNION ALL. A nested list of columns in parentheses
within the GROUPING SETS item list works as a GROUP BY on that set of columns. Table 42.2
demonstrates examples of GROUPING SETS clauses and the corresponding groupings that
the query generates.
TABLE 42.2 Grouping Sets Examples
GROUPING SETS Clause Equivalent Statement
GROUP BY GROUPING SETS (A,B,C) GROUP BY A
UNION ALL
GROUP BY B
UNION ALL
GROUP BY C
GROUP BY GROUPING SETS ((A,B,C)) GROUP BY A,B,C
GROUP BY GROUPING SETS (A,(B,C)) GROUP BY A
UNION ALL
GROUP BY B,C
GROUP BY GROUPING SETS ((A,C),(B,C)) GROUP BY A,C
UNION ALL
GROUP BY B,C
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1563
1564 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
Listing 42.3 demonstrates how to use the GROUPING SETS operator to perform three group-
ings on three individual columns in a single query.
LISTING 42.3 GROUPING SETS Example
/***
** Perform a grouping by type, grouping by pub_id, and grouping by price
***/
SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales
FROM titles
where pub_id < 9
GROUP BY GROUPING SETS ( type, pub_id, price)
go
type pub_id price ytd_sales
------------ ------ --------------------- -----------
NULL NULL NULL 0
NULL NULL 0.0006 111
NULL NULL 0.0017 750
NULL NULL 14.3279 4095
NULL NULL 14.595 18972
NULL NULL 14.9532 14294
NULL NULL 14.9611 4095
NULL NULL 15.894 40968
NULL NULL 15.9329 3336
NULL NULL 17.0884 2045
NULL NULL 17.1675 8780
NULL 0736 NULL 28286
NULL 0877 NULL 44219
NULL 1389 NULL 24941
business NULL NULL 30788
mod_cook NULL NULL 24278
popular_comp NULL NULL 12875
psychology NULL NULL 9939
trad_cook NULL NULL 19566
In the output in Listing 42.3, the first 11 rows are the results grouped by price, the next 3
rows are grouped by pub_id, and the bottom 5 rows are grouped by type. Now, you can
modify this query to include a super-aggregate for all rows by adding a null field list, as
shown in Listing 42.4.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1564
1565 GROUP BY Clause Enhancements
4
2
LISTING 42.4 GROUPING SETS Example with Null Field List to Generate Super-Aggregate
SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales
FROM titles
where pub_id < 9
GROUP BY GROUPING SETS ( type, pub_id, price, () )
go
type pub_id price ytd_sales
------------ ------ --------------------- -----------
NULL NULL NULL 0
NULL NULL 0.0006 111
NULL NULL 0.0017 750
NULL NULL 14.3279 4095
NULL NULL 14.595 18972
NULL NULL 14.9532 14294
NULL NULL 14.9611 4095
NULL NULL 15.894 40968
NULL NULL 15.9329 3336
NULL NULL 17.0884 2045
NULL NULL 17.1675 8780
NULL NULL NULL 97446
NULL 0736 NULL 28286
NULL 0877 NULL 44219
NULL 1389 NULL 24941
business NULL NULL 30788
mod_cook NULL NULL 24278
popular_comp NULL NULL 12875
psychology NULL NULL 9939
trad_cook NULL NULL 19566
If you look closely at the results in Listing 42.4, you see there are two rows with NULL values
for all three columns for type, pub_id, and price. How can you determine definitively
which row is the super-aggregate of all three rows, and which is a row grouped by price
where the value of price is NULL? This is where the new grouping_id() function comes in.
The grouping_id() Function
The grouping_id() function, new in SQL Server 2008, can be used to determine the level
of grouping in a query using GROUPING SETS or the CUBE and ROLLUP operators. Unlike the
GROUPING() function, which takes only a single column expression as an argument and
returns a 1 or 0 to indicate whether that individual column is being aggregated, the
grouping_id() function accepts multiple column expressions and returns a bitmap to
indicate which columns are being aggregated for that row.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1565
1566 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
For example, you can add the grouping_id() and grouping() functions to the query in
Listing 42.4 and examine the results (see Listing 42.5).
LISTING 42.5 Using the grouping_id() Function
SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales,
grouping_id(type, pub_id, price) as grping_id,
grouping(type) type_rlp,
grouping(pub_id) pub_id_rlp,
grouping(price) price_rlp
FROM titles
where pub_id < 9
GROUP BY GROUPING SETS ( type, pub_id, price, () )
go
type pub_id price ytd_sales grping_id type_rlp pub_id_rlp price_rlp
------------ ------ ------- --------- --------- -------- ---------- ---------
NULL NULL NULL 0 6 1 1 0
NULL NULL 0.0006 111 6 1 1 0
NULL NULL 0.0017 750 6 1 1 0
NULL NULL 14.3279 4095 6 1 1 0
NULL NULL 14.595 18972 6 1 1 0
NULL NULL 14.9532 14294 6 1 1 0
NULL NULL 14.9611 4095 6 1 1 0
NULL NULL 15.894 40968 6 1 1 0
NULL NULL 15.9329 3336 6 1 1 0
NULL NULL 17.0884 2045 6 1 1 0
NULL NULL 17.1675 8780 6 1 1 0
NULL NULL NULL 97446 7 1 1 1
NULL 0736 NULL 28286 5 1 0 1
NULL 0877 NULL 44219 5 1 0 1
NULL 1389 NULL 24941 5 1 0 1
business NULL NULL 30788 3 0 1 1
mod_cook NULL NULL 24278 3 0 1 1
popular_comp NULL NULL 12875 3 0 1 1
psychology NULL NULL 9939 3 0 1 1
trad_cook NULL NULL 19566 3 0 1 1
Unlike the grouping() function, which takes only a single column name as an argument,
the grouping_id() function accepts all columns that participate in any grouping set. The
grouping_id() function produces an integer result that is a bitmap, where each bit repre-
sents a different column, producing a unique integer for each grouping set. The bits in the
bitmap indicate whether the columns are being aggregated in the grouping set (bit value is
1) or if the column is used to determine the grouping set (bit value is 0) used to calculate
the aggregate value.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1566
1567 GROUP BY Clause Enhancements
4
2
The bit values are assigned to columns from right to left in the order the columns are
listed in the grouping_id() function. For example, in the query in Listing 42.5, price is
the rightmost bit value, bit 1; pub_id is assigned the next bit value, bit 2, and type is
assigned the leftmost bit value, bit 3. When the grouping_id() value equals 6, that means
the bits 2 and 3 are turned on (4 + 2 + 0 = 6). This indicates that the type and pub_id
columns are being aggregated in the grouping set, and the price column defines the
grouping set.
The grouping_id() column can thus be used to determine which of the two rows where
type, pub_id, and price are all NULL is the row with the super-aggregate of all three
columns (grouping_id = 7), and which row is an aggregate rolled up where the value of
price is NULL (grouping_id = 6).
The values returned by the grouping_id() function can also be used for further filtering
your grouping set results or for sorting your grouping set results, as shown in Listing 42.6.
LISTING 42.6 Using the grouping_id() Function to Sort Results
SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales,
grouping_id(type, pub_id, price) as grping_id
FROM titles
where pub_id < 9
GROUP BY GROUPING SETS ( type, pub_id, price, () )
order by grping_id
go
type pub_id price ytd_sales grping_id
------------ ------ -------- ----------- -----------
business NULL NULL 30788 3
mod_cook NULL NULL 24278 3
popular_comp NULL NULL 12875 3
psychology NULL NULL 9939 3
trad_cook NULL NULL 19566 3
NULL 0736 NULL 28286 5
NULL 0877 NULL 44219 5
NULL 1389 NULL 24941 5
NULL NULL NULL 0 6
NULL NULL 0.0006 111 6
NULL NULL 0.0017 750 6
NULL NULL 14.3279 4095 6
NULL NULL 14.595 18972 6
NULL NULL 14.9532 14294 6
NULL NULL 14.9611 4095 6
NULL NULL 15.894 40968 6
NULL NULL 15.9329 3336 6
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1567
1568 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
NULL NULL 17.0884 2045 6
NULL NULL 17.1675 8780 6
NULL NULL NULL 97446 7
Variable Assignment in DECLARE Statement
In SQL Server 2008, you can now set a variables initial value at the same time you declare
it. For example, the following line of code declares a variable named @ctr of type int and
set its value to 100:
DECLARE @ctr int = 100
Previously, this functionality was only possible with stored procedure parameters.
Assigning an initial value to a variable required a separate SET or SELECT statement. This
new syntax simply streamlines the process of assigning an initial value to a variable. The
value specified can be a constant or a constant expression, as in the following:
DECLARE @start_time datetime = getdate()
You can even assign the initial value via a subquery, as long as the subquery returns only a
single value, as in the following example:
declare @max_price money = (select MAX(price) from titles)
The value being assigned to the variable must be of the same type as the variable or be
implicitly convertible to that type.
Compound Assignment Operators
Another new feature that streamlines and improves the efficiency of your T-SQL code is
compound operators. This is a concept that has been around in many other programming
languages for a long time, but has now finally found its way into T-SQL. Compound oper-
ators are used when you want to apply an arithmetic operation on a variable and assign
the value back into the variable.
For example, the += operator adds the specified value to the variable and then assigns the
new value back into the variable. For example,
SET @ctr += 1
is functionally the same as
SET @ctr = @ctr + 1
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1568
1569 Row Constructors
4
2
The compound operators are a quicker to type, and they offer a cleaner piece of finished
code. Following is the complete list of compound operators provided in SQL Server 2008:
+= Add and assign
-= Subtract and assign
*= Multiply and assign
/= Divide and assign
%= Modulo and assign
&= Bitwise AND and assign
^= Bitwise XOR and assign
|= Bitwise OR and assign
Row Constructors
SQL Server 2008 provides a new method to insert data to SQL Server tables, referred to as
row constructors. Row constructors are a feature that can be used to simplify data insertion,
allowing multiple rows of data to be specified in a single DML statement. Row construc-
tors are used to specify a set of row value expressions to be constructed into a data row.
Row constructors can be specified in the VALUES clause of the INSERT statement, in the
USING clause of the MERGE statement, and in the definition of a derived table in the FROM
clause. The general syntax of the row constructor is as follows:
VALUES ( { expression | DEFAULT | NULL |} [ ,...n ] ) [ ,...n ]
Each column of data defined in the VALUES clause is separated from the next using a
comma. Multiple rows (which may also contain multiple columns) are separated from
each other using parentheses and a comma. When multiple rows are specified, the corre-
sponding column values must be of the same data type or implicitly convertible data
type. The following example shows the row constructor VALUES clause being used within a
SELECT statement to define a set of rows and columns with explicit values:
SELECT a, b FROM (VALUES (1, 2), (3, 4), (5, 6), (7, 8), (9, 10) )
AS MyTable(a, b);
GO
a b
----------- -----------
1 2
3 4
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1569
1570 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
5 6
7 8
9 10
The VALUES clause is commonly used in this manner to populate temporary tables but can
also be used in a view, as shown in Listing 42.7.
LISTING 42.7 Using the VALUES Clause in a View
create view book_types
as
SELECT type, description
FROM (VALUES (mod_cook, Modern Cooking),
(trad_cook, Traditional Cooking),
(popular_comp, Popular Computing),
(biography, Biography),
(business, Business Development),
(children, Childrens Literature),
(fiction, Fiction),
(nonfiction, NonFiction),
(psychology, Psychology and Self Help),
(drama, Drama and Theater),
(lit crit, Literay Criticism)
) AS type_lookup(type, description)
go
Defining a view in this manner can be useful as a code lookup table:
select top 10
convert(varchar(50), title) as title, description
from titles t
inner join
book_types bt
on t.type = bt.type
order by title_id desc
go
title description

Sushi, Anyone? Traditional Cooking
Fifty Years in Buckingham Palace Kitchens Traditional Cooking
Onions, Leeks, and Garlic: Cooking Secrets of the Traditional Cooking
Emotional Security: A New Algorithm Psychology and Self Help
Prolonged Data Deprivation: Four Case Studies Psychology and Self Help
Life Without Fear Psychology and Self Help
Is Anger the Enemy? Psychology and Self Help
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1570
1571 Row Constructors
4
2
Computer Phobic AND Non-Phobic Individuals: Behavi Psychology and Self Help
Net Etiquette Popular Computing
Secrets of Silicon Valley Popular Computing
The advantage of this approach is that unlike a permanent code table, the view with the
VALUES clause doesnt really take up any space; its materialized only when its referenced.
Maintaining it involves simply dropping and re-creating the view rather than having to
perform inserts, updates, and deletes as you would for a permanent table.
The primary use of row constructors is to insert multiple rows of data in a single INSERT
statement. Essentially, if you have multiple rows to insert, you can specify multiple rows
in the VALUES clause. The maximum number of rows that can be specified in the VALUES
clause is 1000. The following example shows how to use the row constructor VALUES
clause in a single INSERT statement to insert five rows:
insert sales (stor_id, ord_num, ord_date, qty, payterms, title_id)
VALUES (6380, 1234, 3/26/2010, 50, Net 30, BU1032),
(6380, 1234, 3/26/2010, 150, Net 30, PS2091),
(6380, 1234, 3/26/2010, 25, Net 30, CH2480),
(6380, 1234, 3/26/2010, 30, Net 30, FI2046),
(6380, 1234, 3/26/2010, 10, Net 30, FI6318)
As you can see, this new syntax is much more concise and simple than having to issue
five individual INSERT statements as you would have had to do in versions of SQL Server
prior to SQL Server 2008.
The VALUES clause can also be used in the MERGE statement as the source table. Listing 42.8
uses the VALUES clause to define five rows as the source data to perform INSERT/UPDATE
operations on the store_inventory table defined in Listing 42.1.
LISTING 42.8 Using the VALUES Clause in a MERGE Statement
MERGE
INTO store_inventory as s
USING
(VALUES
(A011, CH3348, 41 , getdate()),
(A011, CH2480, 125 , getdate()),
(A011, FI0392, 1100 , getdate()),
(A011, FI2046, 1476 , getdate()),
(A011, FI1872, 520 , getdate())
) as i (stor_id, title_id, qty, update_dt)
ON s.stor_id = i.stor_id
and s.title_id = i.title_id
WHEN MATCHED and s.qty <> i.qty
THEN UPDATE
SET s.qty = i.qty,
update_dt = getdate()
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1571
1572 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
WHEN NOT MATCHED
THEN INSERT (stor_id, title_id, qty, update_dt)
VALUES (i.stor_id, i.title_id, i.qty, getdate())
OUTPUT $action,
isnull(inserted.title_id, ) as src_titleid,
isnull(str(inserted.qty, 5), ) as src_qty,
isnull(deleted.title_id, ) as tgt_titleid,
isnull(str(deleted.qty, 5), ) as tgt_qty
;
go
$action src_titleid src_qty tgt_titleid tgt_qty
---------- ----------- ------- ----------- -------
INSERT CH2480 125
UPDATE CH3348 41 CH3348 24
UPDATE FI0392 1100 FI0392 1176
UPDATE FI1872 520 FI1872 540
INSERT FI2046 1476
New date and time Data Types and Functions
SQL Server 2008 introduces four new date and time data types:
. date
. time (precision)
. datetime2 (precision)
. datetimeoffset (precision)
Two of the most welcome of these new types are the new date and time data types. These
new data types allow you to store date-only and time-only values. In previous versions of
SQL Server, the datetime and smalldatetime data types were the only available types for
storing date or time values, and they always store both the date and time. This made date-
only or time-only comparisons tricky at times because you always had to account for the
other component (for more detailed examples on working with datetime values in SQL
Server, see Chapter 43). In addition, the datetime stored date values range only from
1/1/1753 to 12/31/9999, with accuracy only to 3.33 milliseconds. The smalldatetime
stored date values range only from 1/1/1900 to 6/6/2079, with accuracy of only 1 minute.
The new date data type stores only the date component without the time component,
and stores date values ranging from 1/1/0001 to 12/31/9999.
The new time data type stores only the time component with accuracy that can be specified
down to seven decimal places (100 nanoseconds). The default is seven decimal places.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1572
1573 New date and time Data Types and Functions
4
2
The datetime2 data type stores both date and time components, similar to datetime,
increases the range of allowed values to 1/1/0001 to 12/31/9999, also with accuracy down
to seven decimal places (100 ns). The default precision is seven decimal places.
The datetimeoffset data type also stores both date and time components just like
datetime2, but includes the time zone offset from Universal Time Coordinates (UTC). The
time zone offset ranges from -14:00 to +14:00.
Along with the new date and time data types, SQL Server 2008 also introduces some new
date and time functions for returning the current system date and time in different formats:
. SYSDATETIME()Returns the current system datetime as a DATETIME2(7) value
. SYSDATETIMEOFFSET()Returns the current system datetime as a DATETIMEOFFSET(7)
value
. SYSUTCDATETIMEReturns the current system datetime as a DATETIME2(7) value
representing the current UTC time
. SWITCHOFFSET (DATETIMEOFFSET,time_zone)Changes the DATETIMEOFFSET value
from the stored time zone offset to the specified time zone
. TODATETIMEOFFSET (datetime, time_zone)Applies the specified time zone to the
datetime value that does not reflect time zone difference from UTC
Listing 42.9 demonstrates the use of some of the new data types and functions. Notice the
difference in the specified decimal precision returned for the time values.
LISTING 42.9 Using the new date and time Data Types and Functions
declare @date date,
@time time,
@time3 time(3),
@datetime2 datetime2(7),
@datetimeoffset datetimeoffset,
@datetime datetime,
@utcdatetime datetime2(7)
select @datetime = getdate(),
@date = getdate(),
@time = sysdatetime(),
@time3 = sysdatetime(),
@datetime2 = SYSDATETIME(),
@datetimeoffset = SYSDATETIMEOFFSET(),
@utcdatetime = SYSUTCDATETIME()
select @datetime as datetime,
@date as date,
@time as time,
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1573
1574 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
@time3 as time3
select
@datetime2 as datetime2,
@datetimeoffset as datetimeoffset,
@utcdatetime as utcdatetime
select SYSDATETIMEOFFSET() as sysdatetimeoffset,
SYSDATETIME() as sysdatetime
go
datetime date time time3
----------------------- ---------- ---------------- -----------------
2010-03-28 23:18:30.490 2010-03-28 23:18:30.4904294 23:18:30.492
datetime2 datetimeoffset utcdatetime
---------------------- ---------------------------------- ----------------------
2010-03-28 23:18:30.49 2010-03-28 23:18:30.4924295 -04:00 2010-03-29 03:18:30.49
sysdatetimeoffset sysdatetime
---------------------------------- ----------------------
2010-03-28 23:24:10.7485902 -04:00 2010-03-28 23:24:10.74
Be aware that retrieving the value from getdate() or sysdatetime() into a
datetimeoffset variable or column does not capture the offset from UTC, even if you
store the returned value in a column or variable defined with the datetimeoffset data
type. To do so, you need to use the SYSDATETIMEOFFSET() function:
declare @datetimeoffset1 datetimeoffset,
@datetimeoffset2 datetimeoffset
select
@datetimeoffset1 = SYSDATETIME(),
@datetimeoffset2 = SYSDATETIMEOFFSET()
select @datetimeoffset1, @datetimeoffset2
go
---------------------------------- ----------------------------------
2010-03-28 23:36:39.7271831 +00:00 2010-03-28 23:36:39.7271831 -04:00
Note that in the output, SQL Server Management Studio (SSMS) trims the time values
down to two decimal places when it displays the results in the Text Results tab. However,
this is just for display purposes (and applies only with text results; grid results display the
full decimal precision). The actual value does store the precision down to the specified
number of decimal places, which can be seen if you convert the datetime2 value to a
string format that displays all the decimal places:
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1574
1575 New date and time Data Types and Functions
4
2
select SYSDATETIME() as datetime2_trim,
convert(varchar(30), SYSDATETIME(), 121) as datetime2_full
go
datetime2_trim datetime2_full
---------------------- ------------------------------
2010-03-30 23:52:30.68 2010-03-30 23:52:30.6851262
The SWITCHOFFSET() function can be used to convert a datetimeoffset value into a differ-
ent time zone offset value:
select SYSDATETIMEOFFSET(), SWITCHOFFSET ( SYSDATETIMEOFFSET(), -07:00 )
go
---------------------------------- ----------------------------------
2010-03-29 00:07:21.1335738 -04:00 2010-03-28 21:07:21.1335738 -07:00
When you are specifying a time zone value for the SWITCHOFFSET or TODATETIMEOFFSET
offset functions, the value can be specified as an integer value representing the number of
minutes of offset or as a time value in hh:mm format. The range of allowed values is +14
hours to -13 hours.
select TODATETIMEOFFSET ( SYSDATETIME(), -300 )
select TODATETIMEOFFSET ( SYSDATETIME(), -05:00 )
go
----------------------------------
2010-03-29 00:23:05.5773288 -05:00
----------------------------------
2010-03-29 00:23:05.5773288 -05:00
Date and Time Conversions
If an existing CONVERT style includes the time part, and the conversion is from
datetimeoffset to a string, the time zone offset (except for style 127) is included. If you
do not want the time zone offset, you need to use cast or convert the datetimeoffset
value to datetime2 first and then to a string:
select convert(varchar(35), SYSDATETIMEOFFSET(), 121) as datetime_offset,
CONVERT(varchar(30), cast(SYSDATETIMEOFFSET() as datetime2),121) as datetime2
go
datetime_offset datetime2
----------------------------------- ------------------------------
2010-03-30 23:57:36.1015950 -04:00 2010-03-30 23:57:36.1015950
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1575
1576 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
When you convert from datetime2 or datetimeoffset to date, there is no rounding and
the date part is extracted explicitly. For any implicit conversion from datetimeoffset to
date, time, datetime2, datetime, or smalldatetime, conversion is based on the local date
and time value (to the persistent time zone offset). For example, when the
datetimeoffset(3) value, 2006-10-21 12:20:20.999 -8:00, is converted to time(3), the
result is 12:20:20.999, not 20:20:20.999(UTC).
If you convert from a higher-precision time value to a lower-precision value, the conversion
is permitted, and the higher-precision values are truncated to fit the lower precision type.
If you are converting a time(n), datetime2(n), or datetimeoffset(n) value to a string,
the number of digits depends on the type specification. If you want a specific precision in
the resulting string, convert to a data type with the appropriate precision first and then to
a string, as follows:
select
convert(varchar(35), sysdatetime(), 121) as datetime_offset,
CONVERT(varchar(30), cast(sysdatetime() as datetime2(3)), 121) as datetime2
go
datetime_offset datetime2
----------------------------------- ------------------------------
2010-03-31 00:04:37.3306880 2010-03-31 00:04:37.331
If you attempt to cast a string literal with a fractional seconds precision that is more than
that allowed for smalldatetime or datetime, Error 241 is raised:
declare @datetime datetime
select @datetime = 2010-03-31 00:04:37.3306880
go
Msg 241, Level 16, State 1, Line 2
Conversion failed when converting date and/or time from character string.
Table-Valued Parameters
In previous versions of SQL Server, it was not possible to share the contents of table vari-
ables between stored procedures. SQL Server 2008 changes that with the introduction of
table-valued parameters, which allow you to pass table variables to stored procedures as
input parameters. Table-valued parameters provide more flexibility and, in many cases,
better performance than temporary tables as a means to pass result sets between stored
procedures.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1576
1577 Table-Valued Parameters
4
2
To create and use table-valued parameters, you must first create a user-defined table type
as a TABLE data type and define the table structure. This is done using the CREATE TYPE
command, as shown in Listing 42.10.
LISTING 42.10 Defining a User-Defined Table Type
if exists (select * from sys.systypes t where t.name = ytdsales_tabletype
and t.uid = USER_ID(dbo))
drop type ytdsales_tabletype
go
CREATE TYPE ytdsales_tabletype AS TABLE
(title_id char(6),
title varchar(50),
pubdate date,
ytd_sales int)
go
After creating the user-defined table data type, you can use it for declaring local table vari-
ables and for stored procedure parameters. To use the table-valued parameter in a proce-
dure, you create a procedure to receive and access data through a table-valued parameter,
as shown in Listing 42.11.
LISTING 42.11 Defining a Stored Procedure with a Table-Valued Parameter
/* Create a procedure to receive data for the table-valued parameter. */
if OBJECT_ID(tab_parm_test) is not null
drop proc tab_parm_test
go
create proc tab_parm_test
@pubdate datetime = null,
@sales_minimum int = 0,
@ytd_sales_tab ytdsales_tabletype READONLY
as
set nocount on
if @pubdate is null
-- if no date is specified, set date to last year
set @pubdate = dateadd(month, -12, getdate())
select * from @ytd_sales_tab
where pubdate > @pubdate
and ytd_sales >= @sales_minimum
return
go
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1577
1578 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
Then, when calling that stored procedure, you declare a local table variable using the table
data type defined previously, populate the table variable with data, and then pass the table
variable to the stored procedure (see Listing 42.12).
LISTING 42.12 Executing a Stored Procedure with a Table-Valued Parameter
/* Declare a variable that references the table type. */
declare @ytd_sales_tab ytdsales_tabletype
/* Add data to the table variable. */
insert @ytd_sales_tab
select title_id, convert(varchar(50), title), pubdate, ytd_sales
from titles
/* Pass the table variable populated with data to a stored procedure. */
exec tab_parm_test 6/1/2001, 10000, @ytd_sales_tab
go
title_id title ytd_sales
-------- -------------------------------------------------- -----------
BU2075 You Can Combat Computer Stress! 18722
MC3021 The Gourmet Microwave 22246
TC4203 Fifty Years in Buckingham Palace Kitchens 15096
The scope of a table-valued parameter is limited to only the stored procedure to which it is
passed. To access the contents of a table-valued parameter in a procedure called by
another procedure that contains a table-valued parameter, you need to pass the table-
valued parameter to the subprocedure. Listing 42.13 provides an example of a subproce-
dure and alters the procedure created in Listing 42.6 to call the subprocedure.
LISTING 42.13 Passing a Table-Valued Parameter to a Subprocedure
/* Create the sub-procedure */
create proc tab_parm_subproc
@pubdate datetime = null,
@sales_minimum int = 0,
@ytd_sales_tab ytdsales_tabletype READONLY
as
select * from @ytd_sales_tab
where ytd_sales <= @sales_minimum
and ytd_sales <> 0
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1578
1579 Table-Valued Parameters
4
2
go
/* modify the tab_part_test proc to call the sub-procedure */
alter proc tab_parm_test
@pubdate datetime = null,
@sales_minimum int = 0,
@ytd_sales_tab ytdsales_tabletype READONLY
as
set nocount on
if @pubdate is null
-- if no date is specified, set date to last year
set @pubdate = dateadd(month, -12, getdate())
select * from @ytd_sales_tab
where pubdate > @pubdate
and ytd_sales >= @sales_minimum
exec tab_parm_subproc @pubdate,
@sales_minimum,
@ytd_sales_tab
return
go
/* Declare a variable that references the type. */
declare @ytd_sales_tab ytdsales_tabletype
/* Add data to the table variable. */
insert @ytd_sales_tab
select title_id, convert(varchar(50), title), pubdate, ytd_sales
from titles
where type = business
/* Pass the table variable populated with data to a stored procedure. */
exec tab_parm_test 6/1/2001, 10000, @ytd_sales_tab
go
title_id title pubdate ytd_sales
-------- -------------------------------------------------- ---------- -----------
BU2075 You Can Combat Computer Stress! 2004-06-30 18722
title_id title pubdate ytd_sales
-------- -------------------------------------------------- ---------- -----------
BU1032 The Busy Executives Database Guide 2004-06-12 4095
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1579
1580 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
BU1111 Cooking with Computers: Surreptitious Balance Shee 2004-06-09 3876
BU7832 Straight Talk About Computers 2004-06-22 4095
Table-Valued Parameters Versus Temporary Tables
Table-valued parameters offer more flexibility and in some cases better performance than
temporary tables or other ways to pass a list of values to a stored procedure. One benefit is
table-valued parameters do not acquire locks for the initial population of data from a
client. Also, table-valued parameters are memory resident and do not incur physical I/O
unless they grow too large to remain in cache memory.
However, table-valued parameters do have some restrictions:
. SQL Server does not create or maintain statistics on columns of table-valued parame-
ters.
. Table-valued parameters can be passed only as READONLY input parameters to T-SQL
routines. You cannot perform UPDATE, DELETE, or INSERT operations on a table-valued
parameter within the body of the stored procedure to which it is passed.
. Like table variables, a table-valued parameter cannot be specified as the target of a
SELECT INTO or INSERT EXEC statement. They can only be populated using an
INSERT statement.
Hierarchyid Data Type
The Hierarchyid data type introduced in SQL Server 2008 is actually a system-supplied
common language runtime (CLR) user-defined type (UDT) that can be used for storing and
manipulating hierarchical structures (for example, parent-child relationships) in a rela-
tional database. The Hierarchyid type is stored as a varbinary value that represents the
position of the current node in the hierarchy (both in terms of parent-child position and
position among siblings). You can perform manipulations on the type in Transact-SQL by
invoking methods exposed by the type.
Creating a Hierarchy
First, lets define a hierarchy in a table using the Hierarchyid data type. For example, this
section uses the Parts table example used in Chapter 28, Creating and Managing Stored
Procedures, to demonstrate how a stored procedure could be used to traverse a hierarchy
stored in a table. There is also an example in Chapter 52 using a recursive common table
expression (CTE) to perform a similar action. Lets see how to implement an alternative
solution by adding a Hierarchyid column to the Parts table. First, you create a version of
the Parts table using the Hierarchyid data type (see Listing 42.14).
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1580
1581 Hierarchyid Data Type
4
2
LISTING 42.14 Creating the Parts Table with a Hierarchyid Data Type
Use bigpubs2008
Go
CREATE TABLE PARTS_hierarchy(
partid int NOT NULL,
hid hierarchyid not null,
lvl as hid.GetLevel() persisted,
partname varchar(30) NOT NULL,
PRIMARY KEY NONCLUSTERED (partid),
UNIQUE NONCLUSTERED (partname)
)
Note the hid column defined with the Hierarchyid data type. Notice also how the lvl
column is defined as a compute column using the GetLevel method of the hid column to
define the persisted computed column level. The GetLevel method returns the level of the
current node in the hierarchy.
The Hierarchyid data type provides topological sorting, meaning that a childs sort value is
guaranteed to be greater than the parents sort value. This guarantees that a nodes sort value
will be higher than all its ancestors. You can take advantage of this feature by creating an index
on the Hierarchyid column because the index will sort the data in a depth-first manner. This
ensures that all members of the same subtree are close to each other in the leaf level of the
index, which makes the index useful as an efficient mechanism for returning all descendents of
a node. To take advantage of this, you can create a clustered index on the hid column:
CREATE UNIQUE CLUSTERED INDEX idx_hid_first ON Parts_hierarchy (hid);
You can also use another indexing strategy called breadth-first, in which you organize all
nodes from the same level close to each other in the leaf level of the index. This is done by
building the index such that the leading column is level in the hierarchy. Queries that need
to get all nodes from the same level in the hierarchy can benefit from this type of index:
CREATE UNIQUE INDEX idx_lvl_first ON Parts_hierarchy(lvl, hid);
Populating the Hierarchy
Now that youve created the hierarchy table, the next step is to populate it. To insert a
new node into the hierarchy, you must first produce a new Hierarchyid value that repre-
sents the correct position in the hierarchy. There are two methods available with the
Hierarchyid data type to do this: the HIERARCHYID::GetRoot() method and
GetDescendant method. You use the HIERARCHYID::GetRoot() method to produce the
value for the root node of the hierarchy. This method simply produces a Hierarchyid
value that is internally an empty binary string representing the root of the tree.
You can use the GetDescendant method to produce a value below a given parent. The
GetDescendant method accepts two optional Hierarchyid input values that represent the
two nodes between which you want to position the new node. If both values are not NULL,
the method produces a new value positioned between the two nodes. If the first parameter
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1581
1582 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
is not NULL and the second parameter is NULL, the method produces a value greater than
the first parameter. Finally, if the first parameter is NULL and the second parameter is not
NULL, the method produces a value smaller than the second parameter. If both parameters
are NULL, the method produces a value simply below the given parent.
NOTE
The GetDescendant method does not guarantee that Hierarchyid values are unique.
To enforce uniqueness, you must define either a primary key, unique constraint, or
unique index on the Hierarchyid column.
The code in Listing 42.15 uses a cursor to loop through the rows currently in the Parts
table and populates the Parts_hierarchy table. If the part is the first node in the hierar-
chy, the procedure uses the HIERARCHYID::GetRoot() method to assign the hid value for
the root node of the hierarchy. Otherwise, the code in the cursor looks for the last child
hid value of the new parts parent part and uses the GetDescendant method to produce a
value that positions the new node after the last child of that parent part.
NOTE
Listing 42.15 also makes use of a recursive common table expression to traverse the
existing Parts table in hierarchical order to add in the rows at the proper level, starting
with the top-most parent part. If you are unfamiliar with CTEs (which were introduced in
SQL Server 2005), you may want to review the In Case you Missed it section in
Chapter 43.
LISTING 42.15 Populating the Parts_hierarchy Table
DECLARE
@hid AS HIERARCHYID,
@parent_hid AS HIERARCHYID,
@last_child_hid AS HIERARCHYID,
@partid int,
@partname varchar(30),
@parentpartid int
declare parts_cur cursor for
WITH PartsCTE(partid, partname, parentpartid, lvl)
AS
(
SELECT partid, partname, parentpartid, 0
FROM PARTS
WHERE parentpartid is null
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1582
1583 Hierarchyid Data Type
4
2
UNION ALL
SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1
FROM Parts as P
JOIN PartsCTE as PP
ON P.parentpartid = PP.Partid
)
SELECT PartID, Partname, ParentPartid
FROM PartsCTE
order by lvl
open parts_cur
fetch parts_cur into @partid, @partname, @parentpartid
while @@FETCH_STATUS = 0
begin
if @parentpartid is null
set @hid = HIERARCHYID::GetRoot()
else
begin
select @parent_hid = hid from PARTS_hierarchy
where partid = @parentpartid
select @last_child_hid = MAX(hid) from PARTS_hierarchy
where hid.GetAncestor(1) = @parent_hid
select @hid = @parent_hid.GetDescendant(@last_child_hid, NULL)
end
insert PARTS_hierarchy (partid, hid, partname)
values (@partid, @hid, @partname)
fetch parts_cur into @partid, @partname, @parentpartid
end
close parts_cur
deallocate parts_cur
go
Querying the Hierarchy
Now that youve populated the hierarchy, you should query it to view the data and verify
the hierarchy was populated correctly. However, If you query the hid value directly, you
see only its binary representation, which is not very meaningful. To view the Hierarchyid
value in a more useful manner, you can use the ToString method, which returns a logical
string representation of the Hierarchyid. This string representation is shown as a path
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1583
1584 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
with a slash sign used as a separator between the levels. For example, you can run the
following query to get both the binary and logical representations of the hid value:
select cast(hid as varbinary(6)) as hid,
substring(hid.ToString(), 1, 12) as path,
lvl,
partid,
partname
From parts_hierarchy
go
hid path lvl partid partname
-------------- ------------ ------ ----------- ------------------
0x / 0 22 Car
0x58 /1/ 1 1 DriveTrain
0x68 /2/ 1 23 Body
0x78 /3/ 1 24 Frame
0x5AC0 /1/1/ 2 2 Engine
0x5B40 /1/2/ 2 3 Transmission
0x5BC0 /1/3/ 2 4 Axle
0x5C20 /1/4/ 2 12 Drive Shaft
0x5B56 /1/2/1/ 3 9 Flywheel
0x5B5A /1/2/2/ 3 10 Clutch
0x5B5E /1/2/3/ 3 16 Gear Box
0x5AD6 /1/1/1/ 3 5 Radiator
0x5ADA /1/1/2/ 3 6 Intake Manifold
0x5ADE /1/1/3/ 3 7 Exhaust Manifold
0x5AE1 /1/1/4/ 3 8 Carburetor
0x5AE3 /1/1/5/ 3 13 Piston
0x5AE5 /1/1/6/ 3 14 Crankshaft
0x5AE358 /1/1/5/1/ 4 21 Piston Rings
0x5AE158 /1/1/4/1/ 4 11 Float Valve
0x5B5EB0 /1/2/3/1/ 4 15 Reverse Gear
0x5B5ED0 /1/2/3/2/ 4 17 First Gear
0x5B5EF0 /1/2/3/3/ 4 18 Second Gear
0x5B5F08 /1/2/3/4/ 4 19 Third Gear
0x5B5F18 /1/2/3/5/ 4 20 Fourth Gear
As stated previously, the values stored in a Hierarchyid column provide topological sorting
of the nodes in the hierarchy. The GetLevel method can be used to produce the level in the
hierarchy (as it was to store the level in the computed lvl column in the Parts_hierarchy
table). Using the lvl column or the GetLevel method, you can easily produce a graphical
depiction of the hierarchy by simply sorting the rows by hid and generating indentation
for each row based on the lvl column, as shown in the following example:
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1584
1585 Hierarchyid Data Type
4
2
SELECT
REPLICATE(--, lvl)
+ right(>,lvl)
+ partname AS partname
FROM Parts_hierarchy
order by hid
go
partname
-------------------------
Car
-->DriveTrain
---->Engine
------>Radiator
------>Intake Manifold
------>Exhaust Manifold
------>Carburetor
-------->Float Valve
------>Piston
-------->Piston Rings
------>Crankshaft
---->Transmission
------>Flywheel
------>Clutch
------>Gear Box
-------->Reverse Gear
-------->First Gear
-------->Second Gear
-------->Third Gear
-------->Fourth Gear
---->Axle
---->Drive Shaft
-->Body
-->Frame
To return only the subparts of a specific part, you can use the IsDescendantOf method.
The parameter passed to this method is a nodes Hierarchyid value. The method returns 1
if the queried node is a descendant of the input node. For example, the following query
returns all subparts of the engine:
select child.partid, child.partname, child.lvl
from
parts_hierarchy as parent
inner join
parts_hierarchy as child
on parent.partname = Engine
and child.hid.IsDescendantOf(parent.hid) = 1
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1585
1586 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
go
partid partname lvl
----------- ------------------------------ ------
2 Engine 2
5 Radiator 3
6 Intake Manifold 3
7 Exhaust Manifold 3
8 Carburetor 3
13 Piston 3
14 Crankshaft 3
21 Piston Rings 4
11 Float Valve 4
You can also use the IsDescendantOf method to return all parent parts of a given part:
select parent.partid, parent.partname, parent.lvl
from
parts_hierarchy as parent
inner join
parts_hierarchy as child
on child.partname = Piston
and child.hid.IsDescendantOf(parent.hid) = 1
go
partid partname lvl
----------- ------------------------------ ------
22 Car 0
1 DriveTrain 1
2 Engine 2
13 Piston 3
To return a specific level of subparts for a given part, you can use the GetAncestor
method. You pass this method an integer value indicating the level below the parent you
want to display. The function returns the Hierarchyid value of the ancestor n levels above
the queried node. For example, the following query returns all the subparts two levels
down from the drivetrain:
select child.partid, child.partname
from
parts_hierarchy as parent
inner join
parts_hierarchy as child
on parent.partname = Drivetrain
and child.hid.GetAncestor(2) = parent.hid
go
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1586
1587 Hierarchyid Data Type
4
2
partid partname lvl
----------- ------------------------------ ------
9 Flywheel 3
10 Clutch 3
16 Gear Box 3
5 Radiator 3
6 Intake Manifold 3
7 Exhaust Manifold 3
8 Carburetor 3
13 Piston 3
14 Crankshaft 3
Modifying the Hierarchy
The script in Listing 42.15 performs the initial population of the Parts_hierarchy table.
What if you need to add additional records into the table? Lets look at how to use the
GetDescendant method to add new records at different levels of the hierarchy.
For example, to add a child part to the Body node (node /2/), you can use the
GetDescendant method without any arguments to add the new row below Body node at
node /2/1/:
INSERT Parts_hierarchy (hid, partid, partname)
select hid.GetDescendant(null, null), 25, left front fender
from Parts_hierarchy
where partname = Body
To add a new row as a higher descendant node at the same level as the left front fender
inserted in the previous example, you use the GetDescendant method again, but this time
passing the Hierarchyid of the existing child node as the first parameter. This specifies
that the new node will follow the existing node, becoming /2/2/. There are a couple of
ways to specify the Hierarchyid of the existing child node. You can retrieve it from the
table as a Hierarchyid data type, or if you know the string representation of the node,
you can use the Parse method. The Parse method converts a canonical string representa-
tion of a hierarchical value to Hierarchyid. Parse is also called implicitly when a conver-
sion from a string type to Hierarchyid occurs, as in CAST (input AS hierarchyid). Parse is
essentially the opposite of the ToString method.
INSERT Parts_hierarchy (hid, partid, partname)
select hid.GetDescendant(hierarchyid::Parse(/2/1/), null), 26, right
front fender
from Parts_hierarchy
where partname = Body
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1587
1588 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
Now, what if you need to add a new node between the two existing nodes you just added?
Again, you use the GetDescendant methods, but this time, you pass it the hierarchy IDs of
both existing nodes between which you want to insert the new node:
declare @child1 hierarchyid,
@child2 hierarchyid
select @child1 = hid from Parts_hierarchy where partname = left front fender
select @child2 = hid from Parts_hierarchy where partname = right front fender
INSERT Parts_hierarchy (hid, partid, partname)
select hid.GetDescendant(@child1, @child2), 27, front bumper
from Parts_hierarchy
where partname = Body
Now, lets run a query of the Body subtree to examine the newly inserted child nodes:
select child.partid, child.partname, child.lvl,
substring(child.hid.ToString(), 1, 12) as path
from
parts_hierarchy as parent
inner join
parts_hierarchy as child
on parent.partname = Body
and child.hid.IsDescendantOf(parent.hid) = 1
order by child.hid
go
partid partname lvl path
----------- ------------------------------ ------ ------------
23 Body 1 /2/
25 left front fender 2 /2/1/
27 front bumper 2 /2/1.1/
26 right front fender 2 /2/2/
Notice that the first child added (left front fender) has a node path of /2/1/, and the
second row added (right front fender) has a node path of /2/2/. The new child node
inserted between these two nodes (front bumper) was given a node path of /2/1.1/ so
that it maintains the designated topological ordering of the nodes.
What if you need to make other types of changes within hierarchies? For example, you
might need to move a whole subtree of parts from one part to another (that is, move a
part and all its subordinates). To move nodes or subtrees in a hierarchy, you can use the
GetReparentedValue method of the Hierarchyid data type. You invoke this method on
the Hierarchyid value of the node you want to reparent and provide as inputs the value
of the old parent and the value of the new parent.
Note that this method doesnt change the Hierarchyid value for the existing node that
you want to move. Instead, it returns a new Hierarchyid value that you can use to update
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1588
1589 Hierarchyid Data Type
4
2
the target nodes Hierarchyid value. Logically, the GetReparentedValue method simply
substitutes the part of the existing nodes path that represents the old parents path with
the new parents path. For example, if the path of the existing node is /1/2/1/, the path
of the old parent is /1/2/, and the path of the new parent is /2/1/3/, the
GetReparentedValue method would return /2/1/3/1/.
You have to be careful, though. If the target parent node already has child nodes, the
GetReparentedValue method may not produce a unique hierarchy path. If you reparent
node /1/2/1/ from old parent /1/2/ to new parent /2/1/3/, and /2/1/3/ already has a
child /2/1/3/1/, you generate a duplicate value. To avoid this situation when moving a
single node from one parent to another, you should not use the GetReparentedValue
method but instead use the GetDescendant method to produce a completely new value for
the single node. For example, lets assume you want to move the Flywheel part from the
Transmission node to the Engine node. A sample approach is shown in Listing 42.16. This
example uses the GetDescendant method to generate a new Hierarchyid under the Engine
node following the last child node and updates the hid column for the Flywheel record to
the new Hierarchyid generated.
LISTING 42.16 Moving a Single Node in a Hierarchy
declare @newhid hierarchyid,
@maxchild hierarchyid
-- first, find the max child node under the Engine node
-- this is the node we will move the Flywheel node after
select @maxchild = max(child.hid)
from
parts_hierarchy as parent
inner join
parts_hierarchy as child
on parent.partname = Engine
and child.hid.GetAncestor(1) = parent.hid
select Child to insert after = @maxchild.ToString()
-- Now, generate a new descendant hid for the Engine node
-- after the max child node
select @newhid = hid.GetDescendant(@maxchild, null)
from Parts_hierarchy
where partname = Engine
-- Update the hid for the Flywheel node to the new hid
update Parts_hierarchy
set hid = @newhid
where partname = Flywheel
go
Child to insert after
----------------------
/1/1/6/
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1589
1590 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
If you need to move an entire subtree within a hierarchy, you can use the
GetReparentedValue method in conjunction with the GetDescendant method. For
example, suppose you want to move the whole Engine subtree from its current parent node
of Drivetrain to the new parent node of Car. The Car node obviously already has children.
If you want to avoid conflicts, the best approach is to generate a new Hierarchyid value for
the root node of the subtree. You can achieve this with the following steps:
1. Use the GetDescendant method to produce a completely new Hierarchyid value for
the root node of the subtree.
2. Update the Hierarchyid value of all nodes in the subtree to the value returned by
the GetReparentedValue method.
Because you are generating a completely new Hierarchyid value under the target parent,
this new child node has no existing children, which avoids any duplicate Hierarchyid
values. Listing 42.17 provides an example for changing the parent node of the Engine
subtree from Drivetrain to Car.
LISTING 42.17 Reparenting a Subtree in a Hierarchy
DECLARE
@old_root AS HIERARCHYID,
@new_root AS HIERARCHYID,
@new_parent_hid AS HIERARCHYID,
@max_child as hierarchyid
-- Get the hid of the new parent
select @new_parent_hid = hid
FROM dbo.parts_hierarchy
WHERE partname = Car
-- Get the hid of the current root of the subnode
Select @old_root = hid
FROM dbo.parts_hierarchy
WHERE partname = Engine
-- Get the max hid of child nodes of the new parent
select @max_child = MAX(hid)
FROM parts_hierarchy
WHERE hid.GetAncestor(1) = @new_parent_hid
-- get a new hid for the moving child node
-- that is after the current max child node of the new parent
SET @new_root = @new_parent_hid.GetDescendant (@max_child, null)
-- Next, reparent the moving child node and all descendants
UPDATE dbo.parts_hierarchy
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1590
1591 Hierarchyid Data Type
4
2
SET hid = hid.GetReparentedValue(@old_root, @new_root)
WHERE hid.IsDescendantOf(@old_root) = 1
Now, lets reexamine the hierarchy after the updates made in Listings 42.16. and 42.17:
SELECT
left(REPLICATE(--, lvl)
+ right(>,lvl)
+ partname, 30) AS partname,
hid.ToString() AS path
FROM Parts_hierarchy
order by hid
go
partname path
------------------------------ ------------
Car /
-->DriveTrain /1/
---->Transmission /1/2/
------>Clutch /1/2/2/
------>Gear Box /1/2/3/
-------->Reverse Gear /1/2/3/1/
-------->First Gear /1/2/3/2/
-------->Second Gear /1/2/3/3/
-------->Third Gear /1/2/3/4/
-------->Fourth Gear /1/2/3/5/
---->Axle /1/3/
---->Drive Shaft /1/4/
-->Body /2/
---->left front fender /2/1/
---->front bumper /2/1.1/
---->right front fender /2/2/
-->Frame /3/
-->Engine /4/
---->Radiator /4/1/
---->Intake Manifold /4/2/
---->Exhaust Manifold /4/3/
---->Carburetor /4/4/
------>Float Valve /4/4/1/
---->Piston /4/5/
------>Piston Rings /4/5/1/
---->Crankshaft /4/6/
---->Flywheel /4/7/
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1591
1592 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
As you can see from the results, the Flywheel node is now under the Engine node, and the
entire Engine subtree is now under the Car node.
Using FILESTREAM Storage
In versions of SQL Server prior to SQL Server 2008, there were two ways of storing
unstructured data: as a binary large object (BLOB) in an image or varbinary(max) column,
or in files outside the database, separate from the structured relational data, storing a refer-
ence or pathname to the file in a varchar column. Neither of these methods is ideal for
handling unstructured data. Storing the data outside the database makes managing the
unstructured data and keeping it associated with structured data more complex. This
approach lacks transactional consistency, coordinating backups and restores with the
structured data in the database is difficult, and implementing proper data security can be
quite cumbersome.
Storing the unstructured data in the database solves the transactional consistency,
backup/restore, and security issues, but BLOBs have different usage patterns than rela-
tional data. SQL Servers storage engine is primarily concerned with doing I/O on rela-
tional data stored in pages and extents, not streaming large BLOBs. I/O performance
typically degrades dramatically if the size of the BLOB data increases beyond 1MB.
Accessing BLOB data stored inside a SQL Server database is generally slower than storing it
externally in a location such as the NTFS file system. In addition, BLOB storage is not as
efficient as the file system for storing large data values, so more storage space is required.
FILESTREAM storage, introduced in SQL Server 2008, helps to solve the issues with using
unstructured data by integrating the SQL Server Database Engine with the NTFS file
system for storing unstructured data such as documents and images on the file system
with a pointer to the data in the database. The file pointer is implemented in SQL Server
as a varbinary(max) column, and the actual data is stored in files in the file system.
In addition to enabling client applications to leverage the rich NTFS streaming APIs and
the performance of the file system for storing and retrieving unstructured data, other
advantages of FILESTREAM storage include the following:
. You are able to use T-SQL statements to insert, update, query, and back up
FILESTREAM data even though the actual data resides outside the database in the
NTFS file system.
. You are able to maintain transactional consistency between the unstructured data
and corresponding structured data.
. You are able to enforce the same level of security on the unstructured data as with
your relational data using built-in SQL Server security mechanisms.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1592
1593 Using FILESTREAM Storage
4
2
. FILESTREAM uses the NT system cache for caching file data rather than caching the
data in the SQL Server buffer pool, leaving more memory available for query
processing.
. FILESTREAM storage also eliminates the size limitation of BLOBS stored in the data-
base. Whereas standard image and varbinary(max) columns have a size limitation of
2GB, the sizes of the FILESTREAM BLOBs are limited only by the available space of
the file system.
Columns with the FILESTREAM attribute set can be managed just like any other BLOB
column in SQL Server. Administrators can use the manageability and security capabilities
of SQL Server to integrate FILESTREAM data management with the rest of the data in the
relational databasewithout needing to manage the file system data separately. This
includes maintenance operations such as backup and restore, complete integration with
the SQL Server security model, and full-transaction support to ensure data-level consis-
tency between the relational data in the database and the unstructured data physically
stored on the file system. The database administrator does not need to manage the file
system data separately
Whether you should use database storage or file system storage for your BLOB data is
determined by the size and use of the unstructured data. If the following conditions are
true, you should consider using FILESTREAM:
. The objects being stored as BLOBS are, on average, larger than 1MB.
. Fast read access is important.
. You are developing applications that use a middle tier for application logic.
Enabling FILESTREAM Storage
If you decide to use FILESTREAM storage, it first needs to be enabled at both the Windows
level as well as at the SQL Server Instance level. FILESTREAM storage can be enabled auto-
matically during SQL Server installation or manually after installation.
If you are enabling FILESTREAM during SQL Server installation, you need to provide the
Windows share location where the FILESTREAM data will be stored. You can also choose
whether to allow remote clients to access the FILESTREAM data. For more information on
how to enable FILESTREAM storage during installation, see Chapter 8, Installing SQL
Server 2008.
If you did not enable the FILESTREAM option during installation, you can enable it for a
running instance of SQL Server 2008 at any time using SQL Server Configuration Manager
(SSCM). In SSCM, right-click on the SQL Server Service and select Properties. Then select
the FILESTREAM tab, which provides similar options as those displayed during SQL Server
installation (see Figure 42.1). This enables SQL Server to work directly with the Windows
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1593
1594 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
FIGURE 42.1 Setting FILESTREAM options in SQL Server Configuration Manager.
file system for storing FILESTREAM data. You have three options for how FILESTREAM
functionality will be enabled:
. Allowing only T-SQL access (by checking only the Enable FILESTREAM for Transact-
SQL Access option).
. Allowing both T-SQL and Win32 access to FILESTREAM data (by checking the Enable
FILESTREAM for File I/O Streaming Access option and providing a Windows share
name to be used to access the FILESTREAM data). This allows Win32 file system
interfaces to provide streaming access to the data.
. Allowing remote clients to have access to the FILESTREAM data that is stored on this
share (by selecting the Allow Remote Clients to Have Streaming Access to
FILESTREAM Data option).
NOTE
You need to be Windows Administrator on a local system and have sysadmin rights to
enable FILESTREAM for SQL Server.
After you enable FILESTREAM in SQL Server Configuration Manager, a new share is
created on the host system with the name specified. This share is intended only to allow
very low-level streaming interaction between SQL Server and authorized clients. It is
recommended that only the service account used by the SQL Server instance should have
access to this share. Also, because this change takes place at the OS level and not from
within SQL Server, you need to stop and restart the SQL Server instance for the change to
take effect.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1594
1595 Using FILESTREAM Storage
4
2
After restarting the SQL Server instance to enable FILESTREAM at the Windows OS level,
you next need to enable FILESTREAM for the SQL Server Instance. You can do this either
through SQL Server Management Studio or via T-SQL. To enable FILESTREAM for the SQL
Server instance using SQL Server Management Studio, right-click on the SQL Server
instance in the Object Explorer, select Properties, select the Advanced page, and set the
Filestream Access Level property as shown in Figure 42.2. The available options are
. Disabled (0)FILESTREAM access is not permitted.
. Transact SQL Access Enabled (1)FILESTREAM data can be accessed only by T-
SQL commands.
. Full Access Enabled (2)Both T-SQL and Win32 access to FILESTREAM data are
permitted.
You can also optionally enable FILESTREAM for the SQL Server instance using the
sp_Configure system procedure, specifying the filestream access level as the setting
and passing the option of 0 (disabled), 1 (T-SQL access), or 2 (Full access). The following
example shows full access being enabled for the current SQL Server instance:
EXEC sp_configure filestream access level, 2
GO
RECONFIGURE
GO
FIGURE 42.2 Enabling FILESTREAM for a SQL Server Instance in SSMS.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1595
1596 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
After you configure the SQL Server instance for FILESTREAM access, the next step is to set
up a database to store FILESTREAM data.
Setting Up a Database for FILESTREAM Storage
After you enable FILESTREAM for the SQL Server instance, you can store FILESTREAM data
in a database by creating a FILESTREAM filegroup. You can do this when creating the data-
base or by adding a new filegroup to an existing database. The filegroup designated for
FILESTREAM storage must include the CONTAINS FILESTREAM clause and be defined. The
code in Listing 42.18 creates the Customer database and then adds a FILESTREAM filegroup.
LISTING 42.18 Setting Up a Database for FILESTREAM Storage
CREATE DATABASE Customer
ON ( NAME=Customer_Data,
FILENAME=C:\SQLData\Customer_Data1.mdf,
SIZE=50,
MAXSIZE=100,
FILEGROWTH=10)
LOG ON ( NAME=Customer_Log,
FILENAME=C:\SQLData\Customer_Log.ldf,
SIZE=50,
FILEGROWTH=20%)
GO
ALTER DATABASE Customer
ADD FILEGROUP Cust_FSGroup CONTAINS FILESTREAM
GO
ALTER DATABASE Customer
ADD FILE ( NAME=custinfo_FS,
FILENAME = G:\SQLData\custinfo_FS)
TO FILEGROUP Cust_FSGroup
GO
Notice in Listing 42.18 the FILESTREAM filegroup points to a file system folder rather than
an actual file. This folder must not exist already (although the path up to the folder must
exist); SQL Server creates the FILESTREAM folder (for example, in Listing 42.18, the
custinfo_FS folder is created automatically by SQL Server in the G:\SQLData folder). The
FILESTREAM files and file data actually end up being stored in the created folder. A
FILESTREAM filegroup is restricted to referencing only a single file folder.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1596
1597 Using FILESTREAM Storage
4
2
Using FILESTREAM Storage for Data Columns
Once FILESTREAM storage is enabled for a database, you can specify the FILESTREAM
attribute on a varbinary(max) column to indicate that a column should store data in the
FILESTREAM filegroup on the file system. When columns are defined with the FILESTREAM
attribute, the Database Engine stores all data for that column on the file system instead of
in the database file. In addition to a varbinary(max) column with the FILESTREAM
attribute, tables used to store FILESTREAM data also require the existence of a UNIQUE
ROWGUIDCOL, as shown in Listing 42.19, which creates a custinfo table on the FILESTREAM
filegroup. CUSTDATA is defined as the FILESTREAM column, and ID is defined as the unique
ROWGUID column.
LISTING 42.19 Creating a FILESTREAM-Enabled Table
CREATE TABLE CUSTINFO
(ID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
CUSTDATA VARBINARY (MAX) FILESTREAM NULL )
FILESTREAM_ON Cust_FSGroup
Each table created with a FILESTREAM column(s) creates a new subfolder in the FILESTREAM
filegroup folder, and each FILESTREAM column in the table creates a separate subfolder
under the table folder. These column folders are where the actual FILESTREAM files are
stored. Initially, these folders are empty until you start adding rows into the table. A file is
created in the column subfolder for each row inserted into the table with a non-NULL value
for the FILESTREAM column.
NOTE
For more detailed information on how FILESTREAM data is stored and managed, see
Chapter 34.
To ensure that SQL Server creates a new, blank file within the FILESTREAM storage folder
for each row inserted in the table, you can specify a default value of 0x for the
FILESTREAM column:
alter table CUSTINFO add constraint custdata_def default 0x for CUSTDATA
Creating a default is not required if all access to the FILESTREAM data is going to be done
through T-SQL. However, if you will be using Win32 streaming clients to upload file
contents into the FILESTREAM column, the file needs to exist already. Without the default
to ensure creation of a blank file for each row, new files would have to be created first
by inserting contents directly through T-SQL before they could be accessed via Win32
client streaming applications.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1597
1598 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
To insert data into a FILESTREAM column, you use a normal INSERT statement and provide
a varbinary(max) value to store into the FILESTREAM column:
INSERT CUSTINFO (ID, CUSTDATA)
VALUES (NEWID(), CONVERT(VARBINARY(MAX), REPLICATE (CUST DATA, 100000)))
To retrieve FILESTREAM data, you can use a simple T-SQL SELECT statement, although you
may need to convert the varbinary(max) to varchar to be able to display text data:
select ID, CONVERT(varchar(40), CUSTDATA) as CUSTDATA
from CUSTINFO
go
ID CUSTDATA
------------------------------------ ----------------------------------------------
FA67BF05-51B5-4BA7-A383-7F88DAAE9C49 CUST DATACUST DATACUST DATACUST DATACUST
The preceding examples work fine if the FILESTREAM data is essentially text data;
however, neither SQL Server Management Studio nor SQL Server itself really has any user
interface, or native way, to let you stream the contents of an actual file into a table thats
been marked with the FILESTREAM attribute on one of your varbinary(max) columns. In
other words, if you have a .jpg or .mp3 file that you want to store within SQL Server,
theres no native functionality to convert that images byte stream into something that
you could put, for example, into a simple INSERT statement. To read or store this type of
data, you need to use Win32 to read and write data to a FILESTREAM BLOB. Following are
the steps you need to perform in your client applications:
1. Read the FILESTREAM file path.
2. Read the current transaction context.
3. Obtain a Win32 handle and use the handle to read and write data to the
FILESTREAM BLOB.
Each cell in a FILESTREAM table has a file path associated with it. You can use the PATHNAME
property to retrieve the file path of a varbinary(max) column in a T-SQL statement:
DECLARE @filePath varchar(max)
SELECT @filePath = CUSTDATA.PathName()
FROM CUSTINFO
WHERE ID = FA67BF05-51B5-4BA7-A383-7F88DAAE9C49
PRINT @filepath
go
\\LATITUDED830-W7\FILESTREAM\v1\Customer\dbo\CUSTINFO\CUSTDATA
\FA67BF05-51B5-4BA7-A383-7F88DAAE9C49
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1598
1599 Using FILESTREAM Storage
4
2
Next, to obtain the current transaction context and return it to the client application, use
the GET_FILESTREAM_TRANSACTION_CONTEXT() T-SQL function:
BEGIN TRAN
SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()
After you obtain the transaction context, the next step in your application code is to
obtain a Win32 file handle to read or write the data to the FILESTREAM column. To obtain
a Win32 file handle, you call the OpenSqlFilestream API. The returned handle can then be
passed to any of the following Win32 APIs to read and write data to a FILESTREAM BLOB:
. ReadFile
. WriteFile
. TransmitFile
. SetFilePointer
. SetEndOfFile
. FlushFileBuffers
To summarize, the steps you perform to upload a file to a FILESTREAM column are as follows:
1. Start a new transaction and obtain the transaction context ID that can be used to
initiate the Win32 file-streaming process.
2. Execute a SqlDataReader connection to pull back the full path (in SQL Server) of the
FILESTREAM file to which you will be uploading data.
3. Initiate a straight file-streaming operation using the
System.Data.SqlTypes.SqlFileStream class.
4. Create a new System.IO.FileStream object to read the file locally and buffer bytes
along to the SqlFileStream object until there are no more bytes to transfer.
5. Close the transaction.
NOTE
Because youre streaming file contents via a Win32 process, you need to use integrat-
ed security to connect to SQL Server because native SQL logins cant generate the
needed security tokens to access the underlying file system where the FILESTREAM
data is stored.
To retrieve data from a FILESTREAM column to a file on the client, you primarily follow the
same steps as you do for inserting data; however, instead you pull data from a
SqlFileStream object into a buffer and push it into a local FILESTREAM object until there
are no more bytes left to retrieve.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1599
1600 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
TIP
Refer to the Managing FILESTREAM Data by Using Win32 topic in SQL Server 2008
R2 Books Online for specific C#, Visual Basic, and Visual C++ application code exam-
ples showing how to obtain a Win32 file handle and use it to read and write data to a
FILESTREAM column.
Sparse Columns
SQL Server 2008 provides a new space-saving storage option referred to as sparse columns.
Sparse columns can provide optimized and efficient storage for columns that contain
predominately NULL values. The NULL values require no storage space, but these space
savings come at a cost of increased space for storing non-NULL values (an additional 24
bytes of space is needed for non-NULL values). For this reason, Microsoft recommends
using sparse columns only when the space saved is at least 20% to 40%. However, the
consensus rule of thumb that is emerging from experience with sparse columns is that it is
best to use them only when more than 90% of the values are NULL.
There are a number of restrictions and limitations regarding the use of sparse columns,
including the following:
. Sparse columns cannot be defined with the ROWGUIDCOL or IDENTITY properties.
. Sparse columns cannot be defined with a default value.
. Sparse columns cannot be used in a user-defined table type.
. Although sparse columns allow up to 30,000 columns per table, the total row size is
reduced to 8,018 bytes due to the additional overhead for sparse columns.
. If a table has sparse columns, you cant compress it at either the row or page level.
. Columns defined with the geography, geometry, text, ntext, timestamp, image, or
user-defined data types cannot be defined as sparse columns.
. You cant define varbinary(max) fields that use FILESTREAM storage as sparse
columns.
. You cant define a computed column as sparse, but you can use a sparse column in
the calculation of a computed column.
. A table cannot have more than 1,024 non-sparse columns.
Column Sets
Column sets provide an alternative way to view and work with all the sparse columns in a
table. The sparse columns are aggregated into a single untyped XML column, which
simplifies working with many sparse columns in a table. The XML column used for a
column set is similar to a calculated column in that it is not physically stored, but unlike
calculated columns, it is updateable.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1600
1601 Sparse Columns
4
2
There are some restrictions on column sets:
. You cannot add a column set to a table that already has sparse columns.
. You can define only one column set per table.
. Constraints or default values cannot be defined on a column set.
. Computed columns cannot contain column set columns.
. A column set cannot be changed; you must delete and re-create the column set.
However, sparse columns can be added to the table after a column set has been
defined and is automatically included in the column set.
. Distributed queries, replication, and Change Data Capture do not support column sets.
. A column set cannot be part of any kind of index, including XML indexes, full-text
indexes, and indexed views.
NOTE
Sparse columns and column sets are defined by using the CREATE TABLE or ALTER
TABLE statements. This chapter focuses on using and working with sparse columns.
For more information on defining sparse columns and column sets, see Chapter 24,
Creating and Managing Tables.
Working with Sparse Columns
Querying and manipulation of sparse columns is the same as for regular columns, with
one exception described later in this chapter. Theres nothing functionally different about
a table that includes sparse columns, except the way the sparse columns are stored. You
can still use all the standard INSERT, UPDATE, and DELETE statements on tables with sparse
columns just like a table that doesnt have sparse columns. You can also wrap operations
on a table with sparse columns in transactions as usual.
To work with sparse columns, lets first create a table with sparse columns. Listing 42.20
creates a version of the Product table in the AdventureWorks2008R2 database and then
populates the table with data from the Production.Product table. The Color, Weight, and
SellEndDate columns are defined as sparse columns (the source data contains a significant
number of NULL values for these columns). These columns are also defined as part of the
column set, ProductInfo.
LISTING 42.20 Creating a Table with Sparse Columns
USE AdventureWorks2008R2
GO
CREATE TABLE Product_sparse
(
ProductID INT NOT NULL PRIMARY KEY,
ProductName NVARCHAR(50) NOT NULL,
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1601
1602 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
Color NVARCHAR(15) SPARSE NULL,
Weight DECIMAL(8,2) SPARSE NULL,
SellEndDate DATETIME SPARSE NULL,
ProductInfo XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)
GO
INSERT INTO Product_sparse
(ProductID, ProductName, Color, Weight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO
You can reference the sparse columns in your queries just as you would any type of column:
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where ProductID < 320
go
productID productName Color Weight SEllEndDate
--------- --------------------- ------------ ------------- -----------
1 Adjustable Race NULL NULL NULL
2 Bearing Ball NULL NULL NULL
3 BB Ball Bearing NULL NULL NULL
4 Headset Ball Bearings NULL NULL NULL
316 Blade NULL NULL NULL
317 LL Crankarm Black NULL NULL
318 ML Crankarm Black NULL NULL
319 HL Crankarm Black NULL NULL
Note, however, that if you use SELECT * in a query and the table has a column set defined
for the sparse columns, the column set is returned as a single XML column instead of the
individual columns:
SELECT *
FROM Product_sparse
where ProductID < 320
go
ProductID ProductName ProductInfo
----------- ---------------------- ----------------------------------
1 Adjustable Race NULL
2 Bearing Ball NULL
3 BB Ball Bearing NULL
4 Headset Ball Bearings NULL
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1602
1603 Sparse Columns
4
2
316 Blade NULL
317 LL Crankarm <Color>Black</Color>
318 ML Crankarm <Color>Black</Color>
319 HL Crankarm <Color>Black</Color>
You need to explicitly list the columns in the SELECT clause to have the result columns
returned as relational columns.
When the column set is defined, you can also operate on the column set by using XML
operations instead of relational operations. For example, the following code inserts a row
into the table by using the column set and specifying a value for Weight as XML:
INSERT Product_sparse(ProductID, ProductName, ProductInfo)
VALUES(5, ValveStem, <Weight>.12</Weight>)
go
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5 ValveStem NULL 0.12 NULL
Notice that NULL is assumed for any column omitted from the XML value, such as Color
and SellEndDate in this example.
When updating a column set using an XML value, you must include values for all the
columns in the column set you want to set, including any existing values. Any values not
specified in the XML string are set to NULL. For example, the following query sets both
Color and Weight where ProductID = 5:
Update Product_sparse
set ProductInfo = <Color>black</Color><Weight>.20</Weight>
where productID = 5
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5 ValveStem black 0.20 NULL
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1603
1604 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
Now, if you run another update but only specify a value for Weight in the XML string, the
Color column is set to NULL:
Update Product_sparse
set ProductInfo = <Weight>.10</Weight>
where productID = 5
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5 ValveStem NULL 0.10 NULL
However, if you reference the sparse columns explicitly in an UPDATE statement, the other
values remain unchanged:
Update Product_sparse
set Color = silver
where ProductID = 5
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ------ ------ -----------
5 ValveStem silver 0.10 NULL
Column sets are most useful when you have many sparse columns in a table (for example,
hundreds) and operating on them individually is cumbersome. Your client applications
may more easily and efficiently generate the appropriate XML string to populate the
column set rather than your having to build an UPDATE statement dynamically to deter-
mine which of the sparse columns need to be included in the SET clause. Applications
might actually see some performance improvement when they select, insert, or update
data by using column sets on tables that have lots of columns.
Sparse Columns: Good or Bad?
There is some disagreement in the SQL Server community whether or not sparse columns
are appropriate. A number of professionals are of the opinion that any table design that
requires sparse columns is a bad design that does not follow good relational design guide-
lines. Sparse columns, by their nature, are heavily denormalized. On the other hand,
many times you have to live in the real world and make the best of a bad database design
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1604
1605 Spatial Data Types
4
2
that youve inherited. Sparse columns can help solve performance and storage issues in
databases that may have been poorly designed.
Although sparse columns can solve certain kinds of problems with database design, you
should never use them as an alternative to proper database and table design. As cool as
sparse columns are, they arent appropriate for every scenario, particularly when youre
tempted to violate normalization rules to be able to cram more fields into a table.
Spatial Data Types
SQL Servers support of SQLCLR allows for very rich user-defined types to be utilized. For
example, a developer could create a single object that contains multiple properties and
can also perform calculations internally (methods), yet still store it in a single column in a
single row in a database table. This allows multiple complex types of data to be stored and
queried in the database, instead of just strings and numbers.
SQL Server 2008 makes use of SQLCLR to support two new .NET CLR data types for
storing spatial data: GEOMETRY and GEOGRAPHY. These types support methods and properties
that allow for the creation, comparison, analysis, and retrieval of spatial data. Spatial data
types provide a comprehensive, high-performance, and extensible data storage solution for
spatial data, enabling organizations of any scale to integrate geospatial features into their
applications and services.
The GEOMETRY data type is a .NET CLR data type that supports the planar model/data,
which assumes a flat projection and is therefore sometimes called flat earth. Geometry
data represents information in a uniform two-dimensional plane as points, lines, and
polygons on a flat surface, such as maps and interior floor plans where the curvature of
the earth does not need to be taken into account. For example, perhaps your user-defined
coordinate space is being used to represent a warehouse facility. Within that coordinate
space, you can use the GEOMETRY data type to define areas that represent storage bays
within the warehouse. You can then store data in your database that tracks which inven-
tory is located in which area. You could then query the data to determine which forklift
driver is closest to a certain type of item, for example.
The GEOGRAPHY data type provides a storage structure for geodetic data, sometimes referred
to as round-earth data because it assumes a roughly spherical model of the world. It
provides a storage structure for spatial data that is defined by latitude and longitude coor-
dinates using an industry standard ellipsoid such as WGS84, the projection method used
by Global Positioning System (GPS) applications. The SQL Server GEOGRAPHY data type uses
latitude and longitude angles to identify points on the earth. Latitude measures how far
north (or south) of the equator a point is, while longitude measures how far east (or west)
of a prime meridian a point is. Note that this coordinate system can be used to identify
points on any spherical object, be it a baseball, the earth, or even the moon.
The GEOMETRY and GEOGRAPHY data types support seven instance types that you can create
and work with in a database:
. POINTA POINT is an exact location and is defined in terms of an X and Y pair of
coordinates, as well as optionally by Z (elevation) and M (measure) coordinates. It
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1605
1606 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
does not have a length or any area associated with it. These instance types are used
as the fundamental building blocks of more complex spatial types.
. MULTIPOINTA MULTIPOINT is a collection of zero or more points.
. LINESTRINGA LINESTRING is the path between a sequence of points (that is, a series
of connected line segments). It is considered simple if it does not cross over itself and
is considered a ring if the starting point is the same as the ending point. A
LINESTRING is always considered to be a one-dimensional object; it has length but
does not have area (even if it is a ring).
. MULTILINESTRINGA MULTILINESTRING is a collection of zero or more GEOMETRY or
GEOGRAPHY LINESTRING instances.
. POLYGONA POLYGON is a closed two-dimensional shape defined by a ring. It has both
length and area and has at least three distinct points. A POLYGON may also have holes
in its interior (a hole is defined by another POLYGON). Area within a hole is consid-
ered to be exterior to the POLYGON itself.
. MULTIPOLYGONA MULTIPOLYGON instance is a collection of zero or more POLYGON
instances.
. GEOMETRYCOLLECTIONA GEOMETRYCOLLECTION is a collection of zero or more
GEOMETRY or GEOGRAPHY instances. A GEOMETRYCOLLECTION can be empty. This is simi-
lar to a list or an array in most programming languages. The most generic type of
collection is the GEOMCOLLECTION, whose members can be of any type.
Representing Spatial Data
The Open Geospatial Consortium, Inc. (OGC) is a nonprofit, international, voluntary
consensus standards organization that is leading the development of standards for
geospatial and location-based services. The OGC defines different ways to represent
geospatial information as bytes of data that can then be interpreted by the GEOMETRY or
GEOGRAPHY types as being POINTS, LINESTRINGS, and so on. SQL Server 2008 supports three
such formats:
. Well-Known Text (WKT)
. Well-Known Binary (WKB)
. Geography Markup Language (GML)
For the purposes of this chapter, we stick to WKT examples because they are both concise
and somewhat readable. The syntax of WKT is not too difficult to understand, so lets look
at some examples:
. POINT(10 100)Here, 10 and 100 represent X and Y values of the point.
. POINT(10 100 10 1)This example shows Z and M values in addition to X and Y.
. LINESTRING(0 0, 10 100)The first two values represent the starting point, and
the last two values represent the end point of the line.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1606
1607 Spatial Data Types
4
2
. POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))Each pair of numbers represents a
point on the edge of the polygon. Note that the end point is the same as the starting
point.
Working with Geometry Data
As mentioned previously, the geometry data type is implemented as a common language
runtime (CLR) data type in SQL Server and is used to represent data in a Euclidean (flat)
coordinate system. The GEOMETRY type is predefined and available in each database. Any
variable, parameter, or table column can be declared with the GEOMETRY data type, and you
can operate on geometry data in the same manner as you would use other CLR types
using the built-in methods to create, validate, and query geometry data.
NOTE
SQL Server provides a number of methods for the GEOMETRY and GEOGRAPHY data types.
Covering all the available methods is beyond the scope of this chapter. The examples
provided here touch on some of the more common methods. For more information on
other GEOMETRY and GEOGRAPHY methods, refer to SQL Server 2008 Books Online.
To assign a value to a column or variable of type GEOMETRY, you must use one of the
static methods to parse the representation of the data into the spatial data type. For
example, to parse geometry data provided in a valid WKT syntax, you can use the
STGeomFromText method:
Declare @geom GEOMETRY
Declare @geom2 GEOMETRY
SET @geom = geometry::STGeomFromText(LINESTRING (100 100, 20 180, 180 180), 0)
SET @geom2 = geometry::STGeomFromText
(POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0)), 0)
NOTE
The last parameter passed to the method is the spatial reference ID (SRID) parameter.
The SRID is required. SQL Server 2008 does not perform calculations on pieces of
spatial information that belong to separate spatial reference systems (for example, if
one system uses centimeters and another uses miles, SQL Server simply does not
have the means to automatically convert units). For the GEOMETRY type, the default
SRID value is 0. The default SRID for GEOGRAPHY is 4326, which maps to the WGS 84
spatial reference system.
If you are declaring a LINESTRING specifically, you can use the STLineFromText static
method that accepts only valid LINESTRINGs as input:
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1607
1608 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
Declare @geom GEOMETRY
SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0)
The GEOMETRY type, like other SQLCLR UDTs, supports implicit conversion to and from a
string. The string format supported by the GEOMETRY type for implicit conversion is WKT.
Due to this feature, all the following SET statements are functionally equivalent (the last
two SET statements use an implicit SRID of 0):
DECLARE @geom GEOMETRY
SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0)
set @geom = Geometry::Parse(LINESTRING (100 100, 20 180, 180 180))
set @geom = LINESTRING (100 100, 20 180, 180 180)
After defining a GEOMETRY instance, you can use the CLR UDT dot notation to access other
properties and methods of the GEOGRAPHY instance. For example, the following code uses
the STLength() method to return the length of the LINESTRING:
DECLARE @geom GEOMETRY
SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0)
select @geom.STLength() as Length
go
Length
----------------------
273.137084989848
The following example uses the STIntersection() method to return the points where two
GEOMETRY instances intersect:
DECLARE @geom1 GEOMETRY;
DECLARE @geom2 GEOMETRY;
DECLARE @result GEOMETRY;
SET @geom1 = geometry::STGeomFromText(LINESTRING (100 100, 20 180, 180 180), 0)
SET @geom2 = geometry::STGeomFromText(POLYGON ((0 0, 150 0, 150 150, 0 150, 0
0)), 0)
SELECT @result = @geom1.STIntersection(@geom2);
SELECT @result.STAsText();
go
----------------------------
LINESTRING (50 150, 100 100)
All the preceding examples use local variables in a batch. You also can declare columns in
a table with the GEOMETRY type, and you can use the instance properties and methods
against the columns as well:
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1608
1609 Spatial Data Types
4
2
CREATE TABLE #geom_demo
(
GeomID INT IDENTITY NOT NULL,
GeomCol GEOMETRY
)
INSERT INTO #geom_demo (GeomCol)
VALUES (LINESTRING (100 100, 20 180, 180 180)),
(POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0))),
(POINT(10 10))
SELECT
GeomID,
GeomCol.ToString() AS WKT,
GeomCol.STLength() AS LENGTH,
GeomCol.STArea() as Area
FROM #geom_demo
drop table #geom_demo
go
GeomID WKT LENGTH Area
----------- -------------------------------------------- ----------------- ------
1 LINESTRING (100 100, 20 180, 180 180) 273.137084989848 0
2 POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0)) 600 22500
3 POINT (10 10) 0 0
Working with Geography Data
The GEOGRAPHY data type is also implemented as a .NET common language runtime data
type in SQL Server. Unlike the GEOMETRY data type in which locations are defined in terms
of X and Y coordinates that can conceivably extend to infinity, the GEOGRAPHY type repre-
sents data in a round-earth coordinate system. Whereas flat models do not wrap around,
the round-earth coordinate system does wrap around such that if you start at a point on
the globe and continue in one direction, you eventually return to the starting point.
Because defining points on a ball using X and Y is not very practical, the GEOGRAPHY data
type instead defines points using angles. The SQL Server GEOGRAPHY data type stores ellip-
soidal (round-earth) data as GPS latitude and longitude coordinates. Longitude represents
the horizontal angle and ranges from -180 degrees to 180 degrees, and latitude represents
the vertical angle and ranges from -90 degrees to 90 degrees.
The GEOGRAPHY data type provides similar built-in methods as the GEOMETRY data type that
you can use to create, validate, and query geography instances.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1609
1610 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
To assign a value to a geography column or variable, you can use the STGeogFromText
methods to parse the parse geometry data provided in a valid WKT syntax into a valid
geography value:
Declare @geog GEOGRAPHY
Declare @geog2 GEOGRAPHY
SET @geog =
geography::STGeomFromText(LINESTRING(-122.360 47.656,
-122.343 47.656), 4326)
SET @geog2 =
geography::STGeomFromText(POLYGON((-122.358 47.653,
-122.348 47.649,
-122.348 47.658,
-122.358 47.658,
-122.358 47.653)), 4326)
As with the GEOMETRY data type, you can also use the STLineFromText static method that
accepts only valid LINESTRINGS as input, or you can take advantage of the support for
implicit conversion of WKT strings:
DECLARE @geog GEOGRAPHY
SET @geog = Geography::STLineFromText(LINESTRING (-122.360 47.656,
-122.343 47.656), 4326)
set @geog = Geography::Parse(LINESTRING (-122.360 47.656,
-122.343 47.656))
set @geog = LINESTRING (-122.360 47.656, -122.343 47.656)
The following code uses the STLength() and STArea() methods to return the length of the
LINESTRING:
DECLARE @geom GEOMETRY
SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0)
select @geom.STLength() as Length
go
Length
----------------------
273.137084989848
The preceding examples use local variables in a batch. You also can declare columns in a
table using the geography data type, and you can use the instance properties and methods
against the columns as well:
CREATE TABLE #geog
( id int IDENTITY (1,1),
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1610
1611 Spatial Data Types
4
2
GeogCol1 GEOGRAPHY,
GeogCol2 AS GeogCol1.STAsText() );
GO
INSERT INTO #geog (GeogCol1)
VALUES (geography::STGeomFromText
(LINESTRING(-122.360 47.656, -122.343 47.656), 4326));
INSERT INTO #geog (GeogCol1)
VALUES (geography::STGeomFromText
(POLYGON((-122.358 47.653,
-122.348 47.649,
-122.348 47.658,
-122.358 47.658,
-122.358 47.653)), 4326));
GO
DECLARE @geog1 GEOGRAPHY;
DECLARE @geog2 GEOGRAPHY;
DECLARE @result GEOGRAPHY;
SELECT @geog1 = GeogCol1 FROM #geog WHERE id = 1;
SELECT @geog2 = GeogCol1 FROM #geog WHERE id = 2;
SELECT @result = @geog1.STIntersection(@geog2);
SELECT Intersection = @result.STAsText();
go
Intersection
-------------------------------------------------
-----------------------------------------
LINESTRING (-122.3479999999668 47.656000260658459
, -122.35799999998773 47.656000130309728)
Spatial Data Support in SSMS
When querying spatial data in SSMS, youll find that SSMS has a built-in capability to plot
and display some basic maps of your spatial data.
To demonstrate this, you can run the following query in the AdventureWorks2008R2 or
AdventureWorks2008 database in SSMS:
select SpatialLocation
from person.Address a
inner join
person.StateProvince sp
on a.StateProvinceID = sp.StateProvinceID
and sp.CountryRegionCode = US
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1611
1612 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
After the query runs, you should see a Spatial Results tab next to the Results tab (see
Figure 42.3). Click on this tab, and the location points are plotted on a map. Select the
Bonne Projection. If you look closely, you can see that the geographical points plotted
roughly provide an outline of the United States. If you mouse over one of the points,
SSMS displays the associated address information displayed in the Person.Address table.
In addition to displaying maps of geography data values, SSMS can also display geometry
data, showing lines, points, and polygons in an X-Y grid. For example, if you run the
following query and click on the Spatial Results tab, it should display a box like the one
shown in Figure 42.4:
declare @smallBox GEOMETRY = polygon((0 0, 0 2, 2 2, 2 0, 0 0));
select @smallbox
If you want to display multiple polygons, points, or lines together at the same time, they
have to be returned as multiple rows in a single table. If you return them as multiple
columns, SSMS displays only one column at a time in the Spatial Results tab. For example,
if you run the following query, SSMS displays two boxes, the polygon defined by the inter-
section of the two boxes, as well as the overlapping line defined by the LineString, as
shown in Figure 42.5:
FIGURE 42.3 Displaying a map of Person.Address records in SSMS.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1612
1613 Spatial Data Types
4
2
FIGURE 42.4 Displaying a polygon in SSMS.
FIGURE 42.5 Displaying intersecting polygons and an overlapping Line in SSMS.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1613
1614 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
declare @smallBox GEOMETRY = polygon((0 0, 0 2, 2 2, 2 0, 0 0));
declare @largeBox GEOMETRY = polygon((1 1, 1 4, 4 4, 4 1, 1 1));
declare @line GEOMETRY = linestring(0 2, 4 4);
select @smallBox
union all
select @largeBox
union all
select @smallBox.STIntersection(@largeBox)
union all
select @line
Spatial Data Types: Where to Go from Here?
The preceding sections provide only a brief introduction to spatial data types and how to
work with geometry and geography data. For more information on working with spatial
data, in addition to Books Online, you might want to visit the Microsoft SQL Server 2008
Spatial Data page at http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx.
This page provides links to whitepapers and other technical documents related to working
with spatial data in SQL Server 2008.
In addition, all examples here deal with spatial data only as data values and coordinates.
Spatial data is often most useful when it can be displayed visually, such as on a map. SQL
Server 2008 R2 Reporting Services provides new map controls and a map wizard for
creating map reports based on spatial data. For more information, see Chapter 53, SQL
Server 2008 Reporting Services.
Change Data Capture
In SQL Server 2008, Microsoft introduced a new feature called Change Data Capture
(CDC), which is designed to make it much easier and less resource intensive to identify
and retrieve changed data from tables in an online transaction processing (OLTP) data-
base. In a nutshell, CDC captures and records INSERT, UPDATE, and DELETE activity in an
OLTP database and stores it in a form that is easily consumed by an application, such as a
SQL Server Integration Services (SSIS) package.
In the past, capturing data changes for your tables for auditing or extract, transform, and
load (ETL) purposes required using replication, time stamp columns, triggers, complex
queries, or expensive third-party tools. None of these other methods are easy to imple-
ment, and many of them use a lot of server resources, negatively affecting the perfor-
mance of the OLTP server.
Change Data Capture provides for a more efficient mechanism for capturing the data
changes in a table.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1614
1615 Change Data Capture
4
2
NOTE
Change Data Capture is available only in the SQL Server 2008 Developer, Enterprise,
and Datacenter Editions.
The source of change data for Change Data Capture is the SQL Server transaction log. As
inserts, updates, and deletes are applied to tables, entries that describe those changes are
added to the transaction log. When Change Data Capture is enabled for a database, a SQL
Server Agent capture job is created to invoke the sp_replcmds system procedure. This
procedure is an internal server function and is the same mechanism used by transactional
replication to harvest changes from the transaction log.
NOTE
If replication is already enabled for the database, the transactional log reader used for
replication is also used for CDC. This strategy significantly reduces log contention when
both replication and Change Data Capture are enabled for the same database.
The principal task of the Change Data Capture process is to scan the log and identify
changes to data rows in any tables configured for Change Data Capture. As these changes
are identified, the process writes column data and transaction-related information to the
Change Data Capture tables. The changes can then be read from these change tables to be
applied as needed.
The Change Data Capture Tables
When CDC is enabled for a database and one or more tables, an associated Change Data
Capture table is created for each table being monitored. The Change Data Capture tables
are used to store the changes made to the data in corresponding source tables, along with
some metadata used to track the changes. By default, the name of the CDC change table is
schemaname_tablename_CT and is based on the name of the source table.
The first five columns of a Change Data Capture change table are metadata columns and
contain additional information relevant to the recorded change:
. __$start_lsnIdentifies the commit log sequence number (LSN) assigned to the
change. This value can be used to determine the order of the transactions.
. __$end_lsnIs currently not used and in SQL Server 2008 is always NULL.
. __$seqvalCan be used to order changes that occur within the same transaction.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1615
1616 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
. __$operationRecords the operation associated with the change: 1 = delete, 2 =
insert, 3 = update before image(delete), and 4 = update after image(insert)
. __$update_maskIs a variable bit mask with one defined bit for each captured col-
umn to identify what columns were changed. For insert and delete entries, the
update mask always has all bits set. Update rows have the bits set only for the
columns that were modified.
The remaining columns in the Change Data Capture change table are identical to the
columns from the source table in name and type and are used to store the column data
gathered from the source table when an insert, update, or delete operation is performed
on the table.
For every row inserted into the source table, a single row a single row is inserted into the
change table, and this row contains the column values inserted into the source table.
Every row deleted from the source table is also inserted as a single row into the change
table but contains the column values in the row before the delete operation. An update
operation is captured as a delete followed by an insert, so two rows are captured for each
update: one row entry to capture the column values before the update, and a second row
entry to capture the column values after the update.
In addition to the Change Data Capture tables, the following Change Data Capture meta-
data tables are also created:
. cdc.change_tablesContains one row for each change table in the created when
Change Data Capture is enabled on a source table.
. cdc.index_columnsContains one row for each index column used by Change Data
Capture to uniquely identify rows in the source table. By default, this is the column
of the primary key of the source table, but a different unique index on the source
table can be specified when Change Data Capture is enabled on the source table. A
primary key or unique index is required on the source table only if Net Change
Tracking is enabled.
. cdc.captured_columnsContains one row for each column tracked in each source
table. By default, all columns of the source table are captured, but you can include or
exclude columns when enabling Change Data Capture for a table by specifying a
column list.
. cdc.ddl_historyContains a row for each Data Definition Language (DDL) change
made to any table enabled for Change Data Capture. You can use this table to deter-
mine when a DDL change occurred on a source table and what the change was.
. cdc.lsn_time_mappingContains a row for each transaction stored in a change
table and is used to map between log sequence number (LSN) commit values and the
actual time the transaction was committed.
Although you can query the Change Data Capture tables directly, it is not recommended.
Instead, you should use the Change Data Capture functions, which are discussed later.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1616
1617 Change Data Capture
4
2
All these objects associated with a CDC instance are created in the special schema called
cdc when Change Data Capture is enabled for a database.
Enabling CDC for a Database
Before you can begin capturing data changes for a table, you must first enable the data-
base for Change Data Capture. You do this by running the stored procedure
sys.sp_cdc_enable_db within the desired database context. When a database is enabled
for Change Data Capture, the cdc schema, cdc user, metadata tables, as well as the system
functions, are used to query for change data.
NOTE
To determine whether a database is already enabled for CDC, you can check the value
in the is_cdc_enabled column in the sys.databases catalog view. A value of 1 indi-
cates that CDC is enabled for the specified database.
The following SQL code enables CDC for the AdventureWorks2008R2 database and then
checks that CDC is enabled by querying the sys.databases catalog view:
use AdventureWorks2008R2
go
exec sys.sp_cdc_enable_db
go
select is_cdc_enabled
from sys.databases
where name = AdventureWorks2008R2
go
is_cdc_enabled
--------------
1
NOTE
Although the examples presented here are run against the AdventureWorks2008R2 data-
base, they can also be run against the AdventureWorks2008 database. However, you
should be aware that some of the column values displayed may not be exactly the same.
Enabling CDC for a Table
When the database is enabled for Change Data Capture, you can use the
sys.sp_cdc_enable_table stored procedure to enable a Change Data Capture instance for
any tables in that database. The sp_cdc_enable_Table stored procedure supports the
following parameters:
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1617
1618 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
. @source_schemaSpecifies the name of the schema in which the source table
resides.
. @source_nameSpecifies the name of the source table.
. @role_nameIndicates the name of the database role used to control access to
Change Data Capture tables. If this parameter is set to NULL, no role is used to limit
access to the change data. If the specified role does not exist, SQL Server creates a
database role with the specified name.
. @capture_instanceSpecifies the name of the capture instance used to name the
instance-specific Change Data Capture objects. By default, this is the source schema
name plus the source table name in the format schemaname_sourcename. A source
table can have a maximum of two capture instances.
. @supports_net_changesIs set to 1 or 0 to indicate whether support for querying
for net changes is to be enabled for this capture instance. If this parameter is set to 1,
the source table must have a defined primary key, or an alternate unique index must
be specified for the @index_name parameter.
. @index_nameSpecifies the name of a unique index to use to uniquely identify rows
in the source table.
. @captured_column_listSpecifies the source table columns to be included in the
change table. By default, all columns are included in the change table.
. @filegroup_nameSpecifies the filegroup to be used for the change table created for
the capture instance. If this parameter is NULL or not specified, the default filegroup
is used. If possible, it is recommended you create a separate filegroup from your
source tables for the Change Data Capture change tables.
. @allow_partition_switchIndicates whether the SWITCH PARTITION command of
ALTER TABLE can be executed against a table that is enabled for Change Data
Capture. The default is 1 (enabled). If any partition switches occur, Change Data
Capture does not track the changes resulting from the switch. This causes data
inconsistencies when the change data is consumed.
The @source_schema, @source_name, and @role_name parameters are the only required
parameters. All the others are optional and apply default values if not specified.
To implement basic change data tracking for a table, lets first create a copy of the
Customer table to play around with:
select * into MyCustomer from Sales.Customer
alter table MyCustomer add Primary key (CUstomerID)
Now, to enable CDC on the MyCustomer table, you can execute the following:
EXEC sys.sp_cdc_enable_table
@source_schema = Ndbo,
@source_name = NMyCustomer,
@role_name = NULL
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1618
1619 Change Data Capture
4
2
NOTE
If this is the first time you are enabling CDC for a table in the database, you may see
the following messages, which indicate that SQL Server is enabling the SQL Agent jobs
to begin capturing the data changes in the database:
Job cdc.AdventureWorks2008R2_capture started successfully.
Job cdc.AdventureWorks2008R2_cleanup started successfully.
The Capture job that is created generally runs continuously and is used to move
changed data to the CDC tables from the transaction log. The Cleanup job runs on a
scheduled basis to remove older data from the CDC tables so that they dont grow too
large. By default, it automatically removes data that is more than three days old. The
properties of these jobs can be viewed and modified using the sys.sp_cdc_help_jobs
and sys.sp_cdc_change_job procedures, respectively.
To determine whether or not a source table has been enabled for Change Data Capture, you
can query the is_tracked_by_cdc column in the sys.tables catalog view for that table:
select is_tracked_by_cdc
from sys.tables
where name = MyCustomer
go
is_tracked_by_cdc
-----------------
1
TIP
To get information on which tables are configured for CDC and what the settings for
each are, you can execute the sys.sp_cdc_help_change_data_capture stored proce-
dure. It reports the name and ID of the source and Change Tracking tables, the CDC
table properties, the columns included in the capture, and the date the CDC was
enabled/created for the source table.
Querying the CDC Tables
After you enable change data tracking for a table, SQL Server begins capturing any data
changes for the table in the Change Data Capture tables. To identify the data changes, you
need to query the Change Data Capture tables. Although you can query the Change Data
Capture tables directly, it is recommended that you use the CDC functions instead. The
main CDC table-valued functions (TVFs) are
. cdc.fn_cdc_get_all_changes_capture_instance
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1619
1620 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
. cdc.fn_cdc_get_net_changes_capture_instance
NOTE
The Change Data Capture change table and associated CDC table-valued functions
created along with it constitute what is referred to as a capture instance. A capture
instance is created for every source table that is enabled for CDC.
Each capture instance is given a unique name based on the schema and table names.
For example, if the table named sales.products is CDC enabled, the capture instance
created is named sales_products. The name of the CDC change table within the cap-
ture instance is sales_products_CT, and the names of the two associated CDC query
functions are cdc.fn_cdc_get_all_changes_sales_products and
cdc.fn_cdc_get_net_changes_sales_products.
Both of the CDC table-valued functions require two parameters to define the range of log
sequence numbers to use as the upper and lower bounds to determine which records are
to be included in the returned result set. A third required parameter, the
row_filter_option, specifies the content of the metadata columns as well as the rows to
be returned in the result set. Two values can be specified for the row_filter for the
cdc.fn_cdc_get_all_changes_capture_instance function: all and all update old.
If all is specified, the function returns all changes within the specified log sequence
number (LSN) range. For changes due to an update operation, only the row containing the
new values after the update are returned. If all update old is specified, the function
returns all changes within the specified LSN range. For changes due to an update opera-
tion, this option returns both the before and after update copies of the row.
For the cdc.fn_cdc_get_net_changes_capture_instance function, three values can be
specified for the row_filter parameter: all, all with mask, and all with merge.
If all is specified, the function returns the LSN of the final change to the row, and the
operation needed to apply the change to the row is returned in the __$start_lsn and
__$operation metadata columns. The __$update_mask column is always NULL. If all
with mask is specified, the function returns the LSN of the final change to the row and
the operation needed to apply the change to the row. Plus, if the __$operation equals 4
(that is, it contains the after update row values), the columns actually modified in the
update are identified by the bit mask returned in the __$update_mask column.
If the all with merge option is passed, the function returns the LSN of the final change
to the row and the operation needed to apply the change to the row. The __$operation
column will have one of two values: 1 for delete and 5 to indicate that the operation
needed to apply the change is either an insert or update. The column __$update_mask is
always NULL.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1620
1621 Change Data Capture
4
2
So how do you determine what LSNs to specify to return the rows you need? Fortunately,
SQL Server provides several functions to help determine the appropriate LSN values for use
in querying the TVFs:
. sys.fn_cdc_get_min_lsnReturns the smallest LSN associated with a capture
instance validity interval. The validity interval is the time interval for which change
data is currently available for its capture instances.
. sys.fn_cdc_get_max_lsnReturns the largest LSN in the validity interval.
. sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_timeAre used to corre-
late LSN values with a standard time value.
. sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsnCan be used to make
an incremental adjustment to an LSN value. This adjustment is sometimes necessary
to ensure that changes are not duplicated in consecutive query windows.
So, before you can start querying the CDC tables, you need to generate some records in
them by running some data modifications against the source tables. First, you need to run
the statements in Listing 42.21 against the MyCustomer table to generate some records in
the dbo_MyCustomer_CT Change Data Capture change table.
LISTING 42.21 Some Data Modifications to Populate the MyCustomer CDC Capture Table
delete MyCustomer where CustomerID = 22
Insert MyCustomer (PersonID, StoreID, TerritoryID,
AccountNumber, rowguid, ModifiedDate)
Values (20778, null, 9,
AW + RIGHT(00000000
+ convert(varchar(8), IDENT_Current(MyCustomer)), 8),
NEWID(),
GETDATE())
declare @ident int
select @ident = SCOPE_IDENTITY()
update MyCustomer
set TerritoryID = 3,
ModifiedDate = GETDATE()
where CustomerID = @ident
Now that you have some rows in the CDC capture table, you can start retrieving them.
First, you need to identify the min and max LSN values to pass to the
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1621
1622 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
cdc.fn_cdc_get_all_changes_dbo_MyCustomer function. This can be done using the
sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn functions. Listing 42.22 puts all
these pieces together to return the records stored in the CDC capture table.
LISTING 42.22 Querying the MyCustomer CDC Capture Table
USE AdventureWorks2008R2
GO
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
-- get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn(dbo_MyCustomer)
-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
-- get all changes in the range using all update old parameter
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, all update old);
GO
__$start_lsn __$seqval __$operation
__$update_mask CustomerID PersonID StoreID TerritoryID
AccountNumber rowguid
ModifiedDate
---------------------- ---------------------- ------------
-------------- ----------- ----------- ----------- -----------
------------- ------------------------------------
-----------------------
0x00000039000014400004 0x00000039000014400002 1
0x7F 22 NULL 494 3
AW00000022 9774AED6-D673-412D-B481-2573E470B478
2008-10-13 11:15:07.263
0x00000039000014410004 0x00000039000014410003 2
0x7F 30119 20778 NULL 9
AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
0x000000390000144C0004 0x000000390000144C0002 3
0x48 30119 20778 NULL 9
AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
ccc0x000000390000144C0004 0x000000390000144C0002 4
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1622
1623 Change Data Capture
4
2
ccc0x48 30119 20778 NULL 3
cccAW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
ccc2010-04-27 22:38:48.263
Because the option all update old is specified in Listing 42.22, all the rows in the
dbo_MyCustomer_CT capture table are returned, including the deleted row, inserted row,
and both the before and after copies of the row updated.
If you want to return only the final version of each row within the LSN range (and the
@supports_net_changes was set to 1 when CDC was enabled for the table), you can use
the cdc.fn_cdc_get_net_changes_capture_instance function, as shown in Listing 42.23.
LISTING 42.23 Querying the MyCustomer CDC Capture Table for Net Changes
USE AdventureWorks2008R2
GO
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
-- get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn(dbo_MyCustomer)
-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
-- get all changes in the range using all with_merge parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, all with merge);
GO
__$start_lsn __$operation __$update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- -------------
------------------------------------ -----------------------
0x00000039000014400004 1 NULL 22
NULL 494 3 AW00000022
9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263
ccc0x000000390000144C0004 5 NULL 30119
ccc20778 NULL 3 AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1623
1624 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
For typical ETL-type applications, querying for change data is an ongoing process, making
periodic requests for all the changes that occurred since the last request which need to be
applied to the target. For these types of queries, you can use the
sys.fn_cdc_increment_lsn function to determine the next lowest LSN boundary that is
greater than the max LSN boundary of the previous query. To demonstrate this, lets first
execute some additional data modifications against the MyCustomer table:
Insert MyCustomer (PersonID, StoreID, TerritoryID,
AccountNumber, rowguid, ModifiedDate)
Values (20779, null, 12,
AW + RIGHT(00000000
+ convert(varchar(8), IDENT_Current(MyCustomer)), 8),
NEWID(),
GETDATE())
delete MyCustomer where CustomerID = 30119
The max LSN from the previous examples is 0x000000390000144C0004. We want to incre-
ment from this LSN to find the next set of changes. In Listing 42.24, you pass this value
to the sys.fn_cdc_increment_lsn to set the min LSN value youll use with the
cdc.fn_cdc_get_net_changes_dbo_MyCustomer function as the lower bound.
LISTING 42.24 Using sys.fn_cdc_increment_lsn to Return the Net Changes to the
MyCustomer CDC Capture Table Since the Last Retrieval
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
-- get the Next lowest LSN after the previous Max LSN
SELECT @from_lsn = sys.fn_cdc_increment_lsn(0x000000390000144C0004)
-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
-- get all changes in the range using all with_merge parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, all with merge);
GO
__$start_lsn __$operation __$update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
---------------------- ------------ -------------- -----------
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1624
1625 Change Data Capture
4
2
----------- ----------- ----------- ------------- ---------------------------------
--- -----------------------
0x00000039000017D30004 5 NULL 30120
20779 NULL 12 AW00030120
CE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477
ccc0x00000039000017E50004 1 NULL 30119
ccc20778 NULL 3 AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
If you want to retrieve the changes captured during a specific time period, you can use the
sys.fn_cdc_map_time_to_lsn function, as shown in Listing 42.25.
LISTING 42.25 Retrieving all Changes to MyCustomer During a Specific Time Period
DECLARE @begin_time datetime,
@end_time datetime,
@begin_lsn binary(10),
@end_lsn binary(10);
SET @begin_time = 2010-04-27 22:38:48.250
SET @end_time = 2010-04-27 23:52:36.500
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn
(smallest greater than, @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn
(largest less than or equal, @end_time);
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@begin_lsn, @end_lsn, all);
Go
__$start_lsn __$operation __$update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- -------------
------------------------------------ -----------------------
0x000000390000144C0004 4 NULL 30119
20778 NULL 3 AW00030119
2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
ccc0x00000039000017D30004 2 NULL 30120
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1625
1626 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
ccc20779 NULL 12 AW00030120
cccCE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477
CDC and DDL Changes to Source Tables
One of the common challenges when capturing data changes from your source tables is
how to handle DDL changes to the source tables. This can be an issue if the downstream
consumer of the changes has not reflected the same DDL changes for its destination tables.
Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent
DDL changes from occurring. However, Change Data Capture does help to mitigate the
effect on the downstream consumers by allowing the delivered result sets that are returned
from the CDC capture tables to remain unchanged even as the column structure of the
underlying source table changes. Essentially, the capture process responsible for populat-
ing the change table ignores any new columns not present when the source table was
enabled for Change Data Capture. If a tracked column is dropped, NULL values are supplied
for the column in the subsequent change entries.
However, if the data type of a tracked column is modified, the data type change is also
propagated to the change table to ensure that the capture mechanism does not introduce
data loss in tracked columns as a result of mismatched data types. When a column is
modified, the capture process posts any detected changes to the cdc.ddl_history table.
Downstream consumers of the change data from the source tables that may need to be
alerted of the column changes (and make similar adjustments to the destination tables)
can use the stored procedure sys.sp_cdc_get_ddl_history to identify any modifications
to the source table columns.
So how do you modify the capture instance to recognize any added or dropped columns
in the source table? Unfortunately, the only way to do this is to disable CDC on the table
and re-enable it. However, in an active source environment where its not possible to
suspend processing while CDC is being disabled and re-enabled, there is the possibility of
data loss between when CDC is disabled and re-enabled.
Fortunately, CDC allows two capture instances to be associated with a single source table.
This makes it possible to create a second capture instance for the table that reflects the
new column structure. The capture process then captures changes to the same source table
into two distinct change tables having two different column structures. While the original
change table continues to feed current operational programs, the new change table feeds
environments that have been modified to incorporate the new column data. Allowing the
capture mechanism to populate both change tables in tandem provides a mechanism for
smoothly transitioning from one table structure to the other without any loss of change
data. When the transition to the new table structure has been fully effected, the obsolete
capture instance can be removed.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1626
1627 Change Tracking
4
2
Change Tracking
In addition to Change Data Capture, SQL Server 2008 also introduces Change Tracking.
Change Tracking is a lightweight solution that provides an efficient change tracking
mechanism for applications. Although they are similar in name, the purposes of Change
Tracking and Change Data Capture are different.
Change Data Capture is an asynchronous mechanism that uses the transaction log to
record all the changes to a data row and store them in change tables. All intermediate
versions of a row are available in the change tables. The information captured is stored in
a relational format that can be queried by client applications such as ETL processes.
Change Tracking, in contrast, is a synchronous mechanism that tracks modifications to a
table but stores only the fact that a row has been modified and when. It does not keep
track of how many times the row has changed or the values of any of the intermediate
changes. However, having a mechanism that records that a row has changed, you can
check to see whether data has changed and obtain the latest version of the row directly
from the table itself rather than querying a change capture table.
NOTE
Unlike Change Data Capture, which is available only in the Enterprise, Datacenter, and
Developer Editions of SQL Server, Change Tracking is available in all editions.
Change Tracking operates by using tracking tables that store a primary key and version
number for each row in a table that has been enabled for Change Tracking. Applications
can then check to see whether a row has changed by looking up the row in the tracking
table by its primary key and see if the version number is different from when the row was
first retrieved.
One of the common uses of Change Tracking is for applications that have to synchronize
data with SQL Server. Change Tracking can be used as a foundation for both one-way and
two-way synchronization applications.
One-way synchronization applications, such as a client or mid-tier caching application,
can be built to use Change Tracking. The caching application, which requires data from a
SQL Server database to be cached in other data stores, can use Change Tracking to deter-
mine when changes have been made to the database tables and refresh the cache store by
retrieving data from the modified rows only to keep the cache up-to-date.
Two-way synchronization applications can also be built to use Change Tracking. A typical
example of a two-way synchronization application is the occasionally connected applica-
tionfor example, a sales application that runs on a laptop and is disconnected from the
central SQL Server database while the salesperson is out in the field. Initially, the client
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1627
1628 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
application queries and updates its local data store from the SQL Server database. When it
reconnects with the database later, the application synchronizes with the database, and
data changes will flow from the laptop to the database and from the database to the
laptop. Because data changes happen in both locations while the client application is
disconnected, the two-way synchronization application must be able to detect conflicts. A
conflict occurs if the same data is changed in both data stores in the time between
synchronizations. The client application can use Change Tracking to detect conflicts by
identifying rows whose version number has changed since the last synchronization. The
application can implement a mechanism to resolve the conflicts so that the data changes
are not lost.
Implementing Change Tracking
To use Change Tracking, you must first enable it for the database and then enable it at the
table level for any tables for which you want to track changes. Change Tracking can be
enabled via T-SQL statements or through SQL Server Management Studio.
To enable Change Tracking for a database in SSMS, right-click on the database in Object
Explorer to bring up the Properties dialog and select the Change Tracking page. To enable
Change Tracking, set the Change Tracking option to True (see Figure 42.6). Also on this
page, you can configure the retention period for how long SQL Server retains the Change
Tracking information for each data row and whether to automatically clean up the
Change Tracking information when the retention period has been exceeded.
FIGURE 42.6 Enabling Change Tracking for a database.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1628
1629 Change Tracking
4
2
Change Tracking can also be enabled with the ALTER DATABASE command:
ALTER DATABASE AdventureWorks2008R2
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
After enabling Change Tracking at the database level, you can then enable Change
Tracking for the tables for which you want to track changes. To enable Change Tracking
for a table in SSMS, right-click on the table in Object Explorer to bring up the Properties
dialog and select the Change Tracking page. Set the Change Tracking option to True to
enable Change Tracking (see Figure 42.7). The TRACK_COLUMNS_UPDATED option specifies
whether SQL Server should store in the internal Change Tracking table any extra informa-
tion about which specific columns were updated. Column tracking allows an application
to synchronize only when specific columns are updated. This capability can improve the
efficiency and performance of the synchronization process, but at the cost of additional
storage overhead. This option is set to OFF by default.
Change Tracking can also be enabled via T-SQL with the ALTER TABLE command:
FIGURE 42.7 Enabling Change Tracking for a table.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1629
1630 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
USE [AdventureWorks2008R2]
GO
ALTER TABLE [dbo].[MyCustomer]
ENABLE CHANGE_TRACKING WITH(TRACK_COLUMNS_UPDATED = ON)
TIP
To determine which tables and databases have Change Tracking enabled, you can use the
sys.change_tracking_databases and sys.change_tracking_tables catalog views.
Identifying Tracked Changes
After Change Tracking is enabled for a table, any data modification statements that affect
rows in the table cause Change Tracking information for each modified row to be
recorded. To query for the rows that have changed and to obtain information about the
changes, you can use the built-in Change Tracking functions.
Unless you enabled the TRACK_COLUMNS_UPDATED option, only the values of the primary key
column are recorded with the change information to allow you to identify the rows that
have been changed. To identify the changed rows, use the CHANGETABLE (CHANGES ...)
Change Tracking function. The CHANGETABLE (CHANGES ...) function takes two parame-
ters: the first is the table name, and the second is the last synchronization version number.
If you pass 0 for the last synchronization version parameter, you get a list of all the rows
that have been modified since version 0, which means all the changes to the table since
first enabling Change Tracking. Typically, however, you do not want all the rows that have
changed from the beginning of Change Tracking, but only those rows that have changed
since the last time you retrieved the changed rows.
Rather than having to keep track of the version numbers, you can use the
CHANGE_TRACKING_CURRENT_VERSION() function to obtain the current version that will be
used the next time you query for changes. The version returned represents the version of
the last committed transaction.
Before an application can obtain changes for the first time, the application must first
execute a query to obtain the initial data from the table and a query to retrieve the initial
synchronization version using CHANGE_TRACKING_CURRENT_VERSION() function. The version
number that is retrieved is passed to the CHANGETABLE(CHANGES ...) function the next
time it is invoked.
The following example illustrates how to obtain the initial synchronization version and
initial data set:
USE AdventureWorks2008R2
Go
declare @synchronization_version bigint
Select change_tracking_version = CHANGE_TRACKING_CURRENT_VERSION();
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1630
1631 Change Tracking
4
2
-- Obtain initial data set.
select CustomerID, TerritoryID, @synchronization_version as version
from MyCustomer
where CustomerID <= 5
go
change_tracking_version
-----------------------
0
CustomerID TerritoryID
----------- -----------
1 1
2 1
3 4
4 4
5 4
As you can see, because no updates have been performed since Change Tracking was
enabled, the initial version is 0.
Now lets perform some updates on these rows to effect some changes:
update MyCustomer
set TerritoryID = 5
where CustomerID = 4
update MyCustomer
set TerritoryID = 4
where CustomerID = 5
Now you can use the CHANGETABLE(CHANGES ...) function to find the rows that have
changed since the last version (0):
declare @last_synchronization_version bigint
set @last_synchronization_version = 0
SELECT
CT.CustomerID as CustID, CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM
CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
Go
CustID SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT
------ -------------------- ------------------ ------------------
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1631
1632 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
4 U 0x0000000004000000 NULL
5 U 0x0000000004000000 NULL
You can see in these results that this query returns the CustomerIDs of the two rows that
were changed. However, most applications also want the data from these rows as well. To
return the data, you can join the results from CHANGETABLE(CHANGES ...) with the data in
the user table. For example, the following query joins with the MyCustomer table to obtain
the values for the PersonID, StoredID, and TerritoryID columns. Note that the query
uses an OUTER JOIN to make sure that the change information is returned for any rows
that may have been deleted from the user table. Also, at the same time you are retrieving
the data rows, you also want to retrieve the current version as well to use the next time
the application comes back to retrieve the latest changes:
declare @last_synchronization_version bigint
set @last_synchronization_version = 0
select current_version = CHANGE_TRACKING_CURRENT_VERSION()
SELECT
CT.CustomerID as CustID,
C.PersonID,
C.StoreID,
C.TerritoryID,
CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM
MyCustomer C
RIGHT OUTER JOIN
CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
on C.CustomerID = CT.CustomerID
go
current_version
--------------------
2
CustID PersonID StoreID TerritoryID
SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT
----------- ----------- ----------- -----------
-------------------- ------------------ -------------------
4 NULL 932 5
U 0x0000000004000000 NULL
5 NULL 1026 4
U 0x0000000004000000 NULL
You can see in the output from this query that the current version is now 2. The next time
the application issues a query to identify the rows that have been changed since this
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1632
1633 Change Tracking
4
2
query, it will pass the value of 2 as the @last_synchronization_version to the
CHANGETABLE(CHANGES ...) function.
CAUTION
The version number is NOT specific to a table or user session. The Change Tracking
version number is maintained across the entire database for all users and change
tracked tables. Whenever a data modification is performed by any user on any table that
has Change Tracking enabled, the version number is incremented.
For example, immediately after running an update on change tracked table A in the cur-
rent application and incrementing the version to 3, another application could run an
update on change tracked table B and increment the version to 4, and so on. This is
why you should always capture the current version number whenever you are retrieving
the latest set of changes from the change tracked tables.
If an application has not synchronized with the database in a while, the stored version
number could no longer be valid if the Change Tracking retention period has expired for
any row modifications that have occurred since that version. To validate the version
number, you can use the CHANGE_TRACKING_MIN_VALID_VERSION() function. This function
returns the minimum valid version that a client can have and still obtain valid results
from CHANGETABLE(). Your client applications should check the last synchronization
version obtained against the value returned by this function and if the last synchroniza-
tion version is less than the version returned by this function, that version is invalid. The
client application has to reinitialize all the data rows from the table. The following T-SQL
code snippet can be used to validate the last_synchronization_version:
-- Check individual table.
IF (@last_synchronization_version <
CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID(MyCustomer)))
BEGIN
-- Handle invalid version and do not enumerate changes.
-- Client must be reinitialized.
END
Identifying Changed Columns
In addition to information about which rows were changed and the operation that caused
the change (insert, update, or deletereported as I, U, or D in the SYS_CHANGE_OPERATION),
the CHANGETABLE(CHANGES ...) function also provides information on which columns
were modified if you enabled the TRACK_COLUMNS_UPDATED option. You can use this infor-
mation to determine whether any action is needed in your client application based on
which columns changed.
To identify whether a specific column has changed, you can use the
CHANGE_TRACKING_IS_COLUMN_IN_MASK (column_id, change_columns) function. This func-
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1633
1634 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008
tion interprets the SYS_CHANGE_COLUMNS bitmap value returned by the CHANGETABLE(CHANGES
...) function and returns a 1 if the column was modified or 0 if it was not:
declare @last_synchronization_version bigint
set @last_synchronization_version = 0
SELECT
CT.CustomerID as CustID,
TerritoryChanged = CHANGE_TRACKING_IS_COLUMN_IN_MASK
(COLUMNPROPERTY(OBJECT_ID(MyCustomer),
TerritoryID, ColumnId),
CT.SYS_CHANGE_COLUMNS),
CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS
FROM
CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
go
CustID TerritoryChanged SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS
----------- ---------------- -------------------- ------------------
4 1 U 0x0000000004000000
5 1 U 0x0000000004000000
In the query results, you can see that both update operations (SYS_CHANGE_OPERATION =
U) modified the TerritoryID column (TerritoryChanged = 1).
Change Tracking Overhead
Although Change Tracking has been optimized to minimize the performance overhead on
DML operations, it is important to know that there are some performance overhead and
space requirements within the application databases when implementing Change Tracking.
The performance overhead associated with using Change Tracking on a table is similar to
the index maintenance overhead incurred for insert, update, and delete operations. For
each row changed by a DML operation, a row is added to the internal Change Tracking
table. The amount of overhead incurred depends on various factors, such as
. The number of primary key columns
. The amount of data being changed in the user table row
. The number of operations being performed in a transaction
. Whether column Change Tracking is enabled
Change Tracking also consumes some space in the databases where it is enabled as well.
Change Tracking data is stored in the following types of internal tables:
. Internal change tablesThere is one internal change table for each user table that
has Change Tracking enabled.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1634
1635 Summary
4
2
. Internal transaction tableThere is one internal transaction table for the
database.
These internal tables affect storage requirements in the following ways:
. For each change to each row in the user table, a row is added to the internal change
table. This row has a small fixed overhead plus a variable overhead equal to the size
of the primary key columns. The row can contain optional context information set
by an application. In addition, if column tracking is enabled, each changed column
requires an additional 4 bytes per row in the tracking table.
. For each committed transaction, a row is added to an internal transaction table.
If you are concerned about the space usage requirements of the internal Change Tracking
tables, you can determine the space they use by executing the sp_spaceused stored proce-
dure. The internal transaction table is called sys.syscommittab. The names of the internal
change tables for each table are in the form change_tracking_object_id. The following
example returns the size of the internal transaction table and internal change table for the
MyCustomer table:
exec sp_spaceused sys.syscommittab
declare @tablename varchar(128)
set @tablename = sys.change_tracking_
+ CONVERT(varchar(16), object_id(MyCustomer))
exec sp_spaceused @tablename
Summary
Transact-SQL has always been a powerful data access and data modification language,
providing additional features, such as functions, variables, and commands, to control
execution flow. SQL Server 2008 further expands the power and capabilities of T-SQL with
the addition of a number of new features. These new T-SQL features can be incorporated
into the building blocks for creating even more powerful SQL Server database compo-
nents, such as views, stored procedures, triggers, and user-defined functions.
In addition to the powerful features available in T-SQL for developing SQL code and stored
procedures, triggers, and user-defined functions, SQL Server 2008 also enables you to
define custom-managed database objects such as stored procedures, triggers, functions,
data types, and custom aggregates using .NET code. The next chapter, Creating .NET CLR
Objects in SQL Server 2008, provides an overview of using the .NET common language
runtime (CLR) to develop these custom-managed objects.
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1635
42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1636

Das könnte Ihnen auch gefallen