0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
94 Ansichten86 Seiten
This document describes new features introduced in SQL Server 2008 for the Transact-SQL (T-SQL) language. It lists 13 new features including the MERGE statement, insert over DML, GROUP BY clause enhancements, and new date/time data types. It then provides details on the MERGE statement, describing its syntax and how it allows inserting, updating, or deleting rows in one table based on differences found in another table in a single statement. An example is provided to demonstrate how the MERGE statement can be used to synchronize data between tables.
This document describes new features introduced in SQL Server 2008 for the Transact-SQL (T-SQL) language. It lists 13 new features including the MERGE statement, insert over DML, GROUP BY clause enhancements, and new date/time data types. It then provides details on the MERGE statement, describing its syntax and how it allows inserting, updating, or deleting rows in one table based on differences found in another table in a single statement. An example is provided to demonstrate how the MERGE statement can be used to synchronize data between tables.
This document describes new features introduced in SQL Server 2008 for the Transact-SQL (T-SQL) language. It lists 13 new features including the MERGE statement, insert over DML, GROUP BY clause enhancements, and new date/time data types. It then provides details on the MERGE statement, describing its syntax and how it allows inserting, updating, or deleting rows in one table based on differences found in another table in a single statement. An example is provided to demonstrate how the MERGE statement can be used to synchronize data between tables.
Transact-SQL in SQL Server 2008 IN THIS CHAPTER . MERGE Statement . Insert over DML . GROUP BY Clause Enhancements . Variable Assignment in DECLARE Statement . Compound Assignment Operators . Row Constructors . New date and time Data Types and Functions . Table-Valued Parameters . Hierarchyid Data Type . Using FILESTREAM Storage . Sparse Columns . Spatial Data Types . Change Data Capture . Change Tracking Although SQL Server 2008 introduces some new features and changes to the Transact-SQL (T-SQL) language that provide additional capabilities, there is not a significant number of new features over what was available in 2005. T-SQL does offer the following new features: . MERGE statement . Insert over DML . GROUP BY clause enhancements . Variable assignment in DECLARE statement . Compound assignment operators . Row Constructors . date and time data types . Table-valued parameters . Hierarchyid data type . FILESTREAM Storage . Sparse Columns . Spatial Data Types . Change Data Capture . Change Tracking 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1551 1552 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 NOTE If you are making the leap from SQL Server 2000 (or earlier) to SQL Server 2008 or SQL Server 2008 R2, you may not be familiar with a number of T-SQL enhancements introduced in SQL Server 2005. Some of these enhancements are used in the exam- ples in this chapter. If you are looking for an introduction to the new T-SQL features introduced in SQL Server 2005, check out the In Case you Missed It... section in Chapter 43, Transact-SQL Programming Guidelines, Tips, and Tricks, which is provided on the CD included with this book. NOTE Unless stated otherwise, all examples in this chapter use tables in the bigpubs2008 database. MERGE Statement In versions of SQL Server prior to SQL Server 2008, if you had a set of data rows in a source table that you wanted to synchronize with a target table, you had to perform at least three operations: one scan of the source table to find matching rows to update in the target table, another scan of the source table to find nonmatching rows to insert into the target table, and a third scan to find rows in the target table not contained in the source table that needed to be deleted. SQL Server 2008, however, introduces the MERGE statement. With the MERGE statement, you can synchronize two tables by inserting, updat- ing, or deleting rows in one table based on differences found in the other table, all in just a single statement, minimizing the number of times that rows in the source and target tables need to be processed. The MERGE statement can also be used for performing condi- tional inserts or updates of rows in a target table from a source table. The MERGE syntax consists of the following primary clauses: . The MERGE clause specifies the table or view that is the target of the insert, update, or delete operations. . The USING clause specifies the data source being joined with the target. . The ON clause specifies the join conditions that determine how the target and source match. . The WHEN MATCHED clause specifies either the update or delete operation to perform when rows of target table match rows in the source table and any additional search conditions. . WHEN NOT MATCHED BY TARGET specifies the insert operation when a row in the source table does not have a match in the target table. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1552 1553 MERGE Statement 4 2 . WHEN NOT MATCHED BY SOURCE specifies the update or delete operation to perform when rows of the target table do not have matches in the source table. . The OUTPUT clause returns a row for each row in the target that is inserted, updated, or deleted. The basic syntax of the MERGE statement is as follows: [ WITH common_table_expression [,...n] ] MERGE [ TOP ( N ) [ PERCENT ] ] [ INTO ] target_table [ [ AS ] table_alias ] USING table_or_view_name [ [ AS ] table_alias ] ON merge_search_condition [ WHEN MATCHED [ AND search_condition ] THEN { UPDATE SET set_clause | DELETE } ] [ ...n ] [ WHEN NOT MATCHED [ BY TARGET ] [ AND search_condition ] THEN { INSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES }} ] [ WHEN NOT MATCHED BY SOURCE [ AND search_condition ] THEN { UPDATE SET set_clause | DELETE } ] [ ...n ] [ OUTPUT column_name | scalar_expression INTO { @table_variable | output_table } [ (column_list) ] ] [ OUTPUT column_name | scalar_expression [ [AS] column_alias_identifier ] [ ,...n ] ] ; The WHEN clauses specify the actions to take on the rows identified by the conditions speci- fied in the ON clause. The conditions specified in the ON clause determine the full result set that will be operated on. Additional filtering to restrict the affected rows can be specified in the WHEN clauses. Multiple WHEN clauses with different search conditions can be speci- fied. However, if there is a MATCH clause that includes a search condition, it must be speci- fied before all other WHEN MATCH clauses. Note that the MERGE command must be terminated with a semicolon (;). Otherwise, you receive a syntax error. When you run a MERGE statement, rows in the source are matched with rows in the target based on the join predicate that you specify in the ON clause. The rows are processed in a single pass, and one insert, update, or delete operation is performed per input row depending on the WHEN clauses specified. The WHEN clauses determine which of the follow- ing matches exist in the result set: . A matched pair consisting of one row from the target and one from the source as a result of matching condition in the WHEN MATCHED clause . A row from the source that has no matching row in the target as a result of the condition specified the WHEN NOT MATCHED BY TARGET clause . A row from the target that has no corresponding row in the source as a result of the condition specified in the WHEN NOT MATCHED BY SOURCE clause 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1553 1554 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 TABLE 42.1 Join Methods Used for WHEN Clauses Specified WHEN Clauses Join Method WHEN MATCHED clause only INNER JOIN WHEN NOT MATCHED BY TARGET clause, but not the WHEN NOT MATCHED BY SOURCE clause LEFT OUTER JOIN from source to target WHEN MATCHED clause and the WHEN NOT MATCHED BY SOURCE clause, but not the WHEN NOT MATCHED BY TARGET clause RIGHT OUTER JOIN from source to target WHEN NOT MATCHED BY TARGET clause and the WHEN NOT MATCHED BY SOURCE clause FULL OUTER JOIN WHEN NOT MATCHED BY SOURCE clause only ANTI SEMI JOIN The combination of WHEN clauses specified in the MERGE statement determines the join method that SQL Server will use to process the query (see Table 42.1). To improve the performance of the MERGE statement, you should make sure you have appropriate indexes to support the join columns between the source table and target table. Any additional columns in the source table index that will help to cover the query may help improve performance even more (for information on index covering, see Chapter 34, Data Structures, Indexes, and Performance). The indexes should ensure that the join keys are unique and, if possible, sort the data in the tables in the order it will be processed so additional sort operations are not necessary. Unique indexes supporting the join condi- tions for the MERGE statement will improve query performance because the query optimizer does not need to perform extra validation processing to locate and update duplicate rows. To better understand how the MERGE statement works, lets look at an example. First, you need to set up some data in a source table. In the bigpubs2008 database, there is a table called stores. For this example, lets assume you want to set up a new table that keeps track of each stores inventory to support an application that can monitor each stores inventory and send notifications when certain items run low, as well as to support the ability of each store to search other store inventories to locate rare and out-of-print books that other stores may have available. On a daily basis, each store uploads a full refresh of its current inventory to a staging table (inventory_load), which is the source table for the MERGE. You then use the inventory_load table to modify the stores inventory in the store_inventory table (which is the target table for the MERGE operation). First, lets create the new store_inventory table (see Listing 42.1). Just for sake of the example, you can create and populate it with the existing data from the sales table for stor_id A011 and create a primary key constraint on the stor_id and title_id columns. The next step is to load the inventory_load table. Normally, in a real-world scenario, this table would likely be populated via a BULK INSERT statement or SQL Server Integration Services. However, for the sake of this example, you simply are going to create 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1554 1555 MERGE Statement 4 2 some test data by creating and populating the inventory_load table using SELECT INTO with data merged from the sales data for both stor_id A011 and A017. When the inventory_load table is created and populated, you can create a primary key on the stor_id and title_id columns as well to support the join with the store_inventory table. The next step is to build out the MERGE statement. Following are the rules to be applied: . If there is a matching row between the source and target tables and the qty value is different, update the qty value in the target table to the value in the source table. . If a row in the source table doesnt have a match in the target table, this is a new inventory item, so insert the new row to the target table. . If a row in the target table doesnt have a matching row in the source table, that inventory item no longer exists, so delete it from the target table. Also for the sake of the example so that you can see just what the MERGE statement ends up doing, the OUTPUT clause has been added with the $action column included. The $action column displays what operation (INSERT, UPDATE, DELETE) was performed on each row, and displays the title_id and qty values for both the source and target tables for each row processed (note that if the title_id and qty columns are NULL, that was a nonmatching row). LISTING 42.1 A MERGE Example use bigpubs2008 go if OBJECT_ID(store_inventory) is not null drop table store_inventory go -- Create and populate the store_inventory table select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE() into store_inventory from sales s where stor_id = A011 group by stor_id, title_id go -- add primary key on store_inventory to support the join to source table alter table store_inventory add constraint PK_store_inventory primary key (stor_id, title_id) Go if OBJECT_ID(inventory_load) is not null drop table inventory_load go -- Now, create and populate the inventory_load table select stor_id = A011, 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1555 1556 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 title_id, qty = SUM(qty) into inventory_load from sales s where stor_id like A01[17] and title_id not like %8 group by title_id go add primary key on store_inventory to support the join to target table alter table inventory_load add constraint PK_inventory_load primary key (stor_id, title_id) go select * from store_inventory go -- perform the marge, updating any matching rows with different quantities -- adding any rows in source not in the target, and deleting any rows from the -- target that are not in the source. -- Output clause is specified to display the results of the MERGE MERGE INTO store_inventory as s USING inventory_load as i ON s.stor_id = i.stor_id and s.title_id = i.title_id WHEN MATCHED and s.qty <> i.qty THEN UPDATE SET s.qty = i.qty, update_dt = getdate() WHEN NOT MATCHED THEN INSERT (stor_id, title_id, qty, update_dt) VALUES (i.stor_id, i.title_id, i.qty, getdate()) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action, isnull(inserted.title_id, ) as src_titleid, isnull(str(inserted.qty, 5), ) as src_qty, isnull(deleted.title_id, ) as tgt_titleid, isnull(str(deleted.qty, 5), ) as tgt_qty ; go select * from store_inventory go If you run the script in Listing 42.1, you should see output like the following. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1556 1557 MERGE Statement 4 2 stor_id title_id qty update_dt ------- -------- ----------- ----------------------- A011 CH0741 1452 2010-03-25 00:34:25.597 A011 CH3348 24 2010-03-25 00:34:25.597 A011 FI0324 1392 2010-03-25 00:34:25.597 A011 FI0392 1176 2010-03-25 00:34:25.597 A011 FI1552 1476 2010-03-25 00:34:25.597 A011 FI1872 540 2010-03-25 00:34:25.597 A011 FI3484 1428 2010-03-25 00:34:25.597 A011 FI3660 984 2010-03-25 00:34:25.597 A011 FI4020 1704 2010-03-25 00:34:25.597 A011 FI4970 1140 2010-03-25 00:34:25.597 A011 FI4992 180 2010-03-25 00:34:25.597 A011 FI5832 1632 2010-03-25 00:34:25.597 A011 NF8918 1140 2010-03-25 00:34:25.597 A011 PC9999 1272 2010-03-25 00:34:25.597 A011 TC7777 1692 2010-03-25 00:34:25.597 (15 row(s) affected) $action ---------- ------ ----- ------ ----- INSERT BU2075 1536 DELETE CH3348 24 INSERT CH5390 888 INSERT CH7553 540 INSERT FI1950 1308 INSERT FI2100 1104 INSERT FI3822 996 UPDATE FI4970 1632 FI4970 1140 INSERT FI7040 1596 INSERT LC8400 732 DELETE NF8918 1140 (11 row(s) affected) stor_id title_id qty update_dt ------- -------- ----------- ----------------------- A011 BU2075 1536 2010-03-25 00:54:54.547 A011 CH0741 1452 2010-03-25 00:34:25.597 A011 CH5390 888 2010-03-25 00:54:54.547 A011 CH7553 540 2010-03-25 00:54:54.547 A011 FI0324 1392 2010-03-25 00:34:25.597 A011 FI0392 1176 2010-03-25 00:34:25.597 A011 FI1552 1476 2010-03-25 00:34:25.597 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1557 1558 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 A011 FI1872 540 2010-03-25 00:34:25.597 A011 FI1950 1308 2010-03-25 00:54:54.547 A011 FI2100 1104 2010-03-25 00:54:54.547 A011 FI3484 1428 2010-03-25 00:34:25.597 A011 FI3660 984 2010-03-25 00:34:25.597 A011 FI3822 996 2010-03-25 00:54:54.547 A011 FI4020 1704 2010-03-25 00:34:25.597 A011 FI4970 1632 2010-03-25 00:54:54.547 A011 FI4992 180 2010-03-25 00:34:25.597 A011 FI5832 1632 2010-03-25 00:34:25.597 A011 FI7040 1596 2010-03-25 00:54:54.547 A011 LC8400 732 2010-03-25 00:54:54.547 A011 PC9999 1272 2010-03-25 00:34:25.597 A011 TC7777 1692 2010-03-25 00:34:25.597 (21 row(s) affected) If you examine the results and compare the before and after contents of the store_inventory, you see that eight new rows were inserted to store_inventory, two rows were deleted, and one row was updated. MERGE Statement Best Practices and Guidelines The MERGE statement is a great addition to the T-SQL language. It provides a concise and effi- cient mechanism to perform multiple operations on a table based on contents in a source table without having to resort to using a cursor or running multiple set-oriented operations against the table. However, there are some guidelines and best practices you should keep in mind to help ensure you get the best performance from your MERGE statements. First, you should try to reduce the number of rows accessed by the MERGE statement early in the process by specifying any additional search condition to the ON clause that filters out rows that do not need to be processed. You should avoid using the conditions in the WHEN clauses as row filters. However, you need to be careful if you are using any of the WHEN NOT MATCHED clauses because the elimination of rows via the ON clause may cause unexpected and incorrect results. Because the additional search conditions specified in the ON clause are not used for matching the source and target data, they can be misapplied. To ensure correct results are obtained, you should specify only search conditions in the ON clause that determine the criteria for matching data in the source and target tables. That is, specify only columns from the target table that are compared to the corresponding columns of the source table. Do not include comparisons to other values such as a constant. To filter out rows from the source or target tables, you should consider using one of the following methods. . Specify the search condition for row filtering in the appropriate WHEN clause. For example, WHEN NOT MATCHED AND qty > 0 THEN INSERT.... 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1558 1559 Insert over DML 4 2 . Define a view on the source or target that returns the filtered rows and reference the view as the source or target table. If the view is used as the target, make sure the view is updateable (for more information about updating data by using a view, see Chapter 27, Creating and Managing Views). . Use the WITH <common table expression> clause to filter out rows from the source or target tables. However, if you are not careful, this method is similar to specifying additional search criteria in the ON clause and may produce incorrect results. You should test this approach thoroughly before implementing it (for information on using common table expressions, see Chapter 43, Transact-SQL Programming Guidelines, Tips, and Tricks). Insert over DML Another T-SQL enhancement in SQL Server 2008 applies to the use of the OUTPUT clause. The OUTPUT clause allows you to return data from a modification statement (INSERT, UPDATE, MERGE, or DELETE) as a result set or into a table variable or an output table. In SQL Server 2008, you can include one of these Data Manipulation Language (DML) statements with an OUTPUT clause within the context of an INSERT...SELECT statement. In the MERGE statement in Listing 42.1, the OUTPUT clause was used to display the rows affected by the statement. Suppose that you want the output of this to be put into a sepa- rate audit or processing table. In SQL Server 2008, you can do so by allowing the MERGE statement with the OUTPUT clause to be incorporated as a derived table in the SELECT clause of an INSERT statement. To demonstrate this approach, you first need to create a table for storing that data: if OBJECT_ID(inventory_audit) is not null drop table inventory_audit go CREATE TABLE inventory_audit ( Action varchar(10) not null, Src_title_id varchar(6) null, Src_qty int null, Tgt_title_id varchar(6) null, Tgt_qty int null, Loginname varchar(30) null default suser_name(), Action_DT datetime2 null default sysdatetime() ) Now it is possible to be put a SELECT statement atop the MERGE command as the values clause for an INSERT into the inventory_audit table (see Listing 42.2). 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1559 1560 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 LISTING 42.2 Insert over DML Example -- NOTE: to see the results for this example -- you first need to clear out and repopulate -- the store_inventory table Truncate table store_inventory Insert store_inventory (stor_id, title_id, qty, update_dt) select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE() from sales s where stor_id = A011 group by stor_id, title_id go insert inventory_audit (action, Src_title_id, Src_qty , Tgt_title_id, Tgt_qty , Loginname, Action_DT ) select *, SUSER_NAME(), SYSDATETIME() from ( MERGE INTO store_inventory as s USING inventory_load as i ON s.stor_id = i.stor_id and s.title_id = i.title_id WHEN MATCHED and s.qty <> i.qty THEN UPDATE SET s.qty = i.qty, update_dt = getdate() WHEN NOT MATCHED THEN INSERT (stor_id, title_id, qty, update_dt) VALUES (i.stor_id, i.title_id, i.qty, getdate()) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action, isnull(inserted.title_id, ) as src_titleid, isnull(str(inserted.qty, 5), ) as src_qty, isnull(deleted.title_id, ) as tgt_titleid, isnull(str(deleted.qty, 5), ) as tgt_qty ) changes ( action, Src_title_id, Src_qty , Tgt_title_id, 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1560 1561 GROUP BY Clause Enhancements 4 2 Tgt_qty ); go select * from inventory_audit go Action Src_title_id Src_qty Tgt_title_id Tgt_qty Loginname Action_DT ------ ------------ ------- ------------ ------- --------- ---------------------- INSERT BU2075 1536 0 rrankins 2010-04-02 22:20:59.48 DELETE 0 CH3348 24 rrankins 2010-04-02 22:20:59.48 INSERT CH5390 888 0 rrankins 2010-04-02 22:20:59.48 INSERT CH7553 540 0 rrankins 2010-04-02 22:20:59.48 INSERT FI1950 1308 0 rrankins 2010-04-02 22:20:59.48 INSERT FI2100 1104 0 rrankins 2010-04-02 22:20:59.48 INSERT FI3822 996 0 rrankins 2010-04-02 22:20:59.48 UPDATE FI4970 1632 FI4970 1140 rrankins 2010-04-02 22:20:59.48 INSERT FI7040 1596 0 rrankins 2010-04-02 22:20:59.48 INSERT LC8400 732 0 rrankins 2010-04-02 22:20:59.48 DELETE 0 NF8918 1140 rrankins 2010-04-02 22:20:59.48 GROUP BY Clause Enhancements SQL Server 2008 introduces a number of enhancements and changes to the grouping aggregate relational result set. These changes include the following: . ROLLUP and CUBE operator syntax changes . New GROUPING SETS operator . New GROUPING_ID() function ROLLUP and CUBE Operator Syntax Changes The ROLLUP and CUBE operators produce additional aggregate groupings and are appended to the GROUP BY clause. Prior to SQL Server 2008, to include ROLLUP or CUBE groupings, you had to specify the WITH ROLLUP or WITH CUBE options in the GROUP BY clause after the list of grouping columns. In SQL Server 2008, the syntax now follows the ANSI standard for ROLLUP and CUBE; you first designate the ROLLUP or CUBE option and then provide the grouping columns to these operators as a comma-separated list enclosed in parentheses. The new syntax is GROUP BY [ROLLUP | CUBE ( non-aggregate_column_list ) ] Following are examples using the pre-2008 syntax: SELECT type, pub_id, AVG(price) AS average FROM titles 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1561 1562 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 GROUP BY type, pub_id WITH CUBE SELECT pub_id, type, SUM(ytd_sales) as ytd_sales FROM dbo.titles where type like %cook% or type = business GROUP BY type, pub_id WITH ROLLUP An example of the new ANSI standard syntax supported in SQL Server 2008 is as follows: SELECT type, pub_id, AVG(price) AS average FROM titles GROUP BY CUBE ( type, pub_id) SELECT pub_id, type, SUM(ytd_sales) as ytd_sales FROM dbo.titles where type like %cook% or type = business GROUP BY ROLLUP (type, pub_id) NOTE The old-style CUBE and ROLLUP syntax is still supported for backward-compatibility pur- poses but is being deprecated. You should convert any existing queries using the pre- 2008 WITH CUBE or WITH ROLLUP syntax to the new syntax to ensure future compatibility. GROUPING SETS The CUBE and ROLLUP operators allow you to run a single query and generate multiple sets of groupings. However, the sets of groupings are fixed. For example, if you use GROUP BY ROLLUP (A, B, C), you get aggregates generated for the following groupings of nonaggre- gate columns: . GROUP BY A, B, C . GROUP BY A, B . GROUP BY A . A super-aggregate for all rows If you use GROUP BY CUBE (A, B, C), you get aggregates generated for the following groupings of nonaggregate columns: . GROUP BY A, B, C . GROUP BY A, B . GROUP BY A, C 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1562 1563 GROUP BY Clause Enhancements 4 2 . GROUP BY B, C . GROUP BY A . GROUP BY B . GROUP BY C . A super-aggregate for all rows SQL Server 2008 introduces the GROUPING SETS operator in addition to the CUBE and ROLLUP operators for performing several groupings in a single query. With GROUPING SETS, only the specified groups are aggregated instead of the full set of aggregations generated by CUBE or ROLLUP. GROUPING SETS enables you to generate results with multiple groupings in a single query, without having to resort to writing multiple GROUP BY queries and combining the results using a UNION ALL statement. The GROUPING SETS operator supports concatenating column groupings and an optional super aggregate row. The syntax for defining grouping sets is as follows: GROUP BY [ GROUPING SETS ( ( ) | grouping_set_item | grouping_set_item_list [, ...n ] ) ] The GROUPING SETS items can be single columns or a list of columns. The null field list ( ) can also be used to generate a super-aggregate (that is, a grand total for the entire result set). A non-nested list of columns works as separate simple GROUP BY statements, which are then combined in an implied UNION ALL. A nested list of columns in parentheses within the GROUPING SETS item list works as a GROUP BY on that set of columns. Table 42.2 demonstrates examples of GROUPING SETS clauses and the corresponding groupings that the query generates. TABLE 42.2 Grouping Sets Examples GROUPING SETS Clause Equivalent Statement GROUP BY GROUPING SETS (A,B,C) GROUP BY A UNION ALL GROUP BY B UNION ALL GROUP BY C GROUP BY GROUPING SETS ((A,B,C)) GROUP BY A,B,C GROUP BY GROUPING SETS (A,(B,C)) GROUP BY A UNION ALL GROUP BY B,C GROUP BY GROUPING SETS ((A,C),(B,C)) GROUP BY A,C UNION ALL GROUP BY B,C 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1563 1564 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 Listing 42.3 demonstrates how to use the GROUPING SETS operator to perform three group- ings on three individual columns in a single query. LISTING 42.3 GROUPING SETS Example /*** ** Perform a grouping by type, grouping by pub_id, and grouping by price ***/ SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales FROM titles where pub_id < 9 GROUP BY GROUPING SETS ( type, pub_id, price) go type pub_id price ytd_sales ------------ ------ --------------------- ----------- NULL NULL NULL 0 NULL NULL 0.0006 111 NULL NULL 0.0017 750 NULL NULL 14.3279 4095 NULL NULL 14.595 18972 NULL NULL 14.9532 14294 NULL NULL 14.9611 4095 NULL NULL 15.894 40968 NULL NULL 15.9329 3336 NULL NULL 17.0884 2045 NULL NULL 17.1675 8780 NULL 0736 NULL 28286 NULL 0877 NULL 44219 NULL 1389 NULL 24941 business NULL NULL 30788 mod_cook NULL NULL 24278 popular_comp NULL NULL 12875 psychology NULL NULL 9939 trad_cook NULL NULL 19566 In the output in Listing 42.3, the first 11 rows are the results grouped by price, the next 3 rows are grouped by pub_id, and the bottom 5 rows are grouped by type. Now, you can modify this query to include a super-aggregate for all rows by adding a null field list, as shown in Listing 42.4. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1564 1565 GROUP BY Clause Enhancements 4 2 LISTING 42.4 GROUPING SETS Example with Null Field List to Generate Super-Aggregate SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales FROM titles where pub_id < 9 GROUP BY GROUPING SETS ( type, pub_id, price, () ) go type pub_id price ytd_sales ------------ ------ --------------------- ----------- NULL NULL NULL 0 NULL NULL 0.0006 111 NULL NULL 0.0017 750 NULL NULL 14.3279 4095 NULL NULL 14.595 18972 NULL NULL 14.9532 14294 NULL NULL 14.9611 4095 NULL NULL 15.894 40968 NULL NULL 15.9329 3336 NULL NULL 17.0884 2045 NULL NULL 17.1675 8780 NULL NULL NULL 97446 NULL 0736 NULL 28286 NULL 0877 NULL 44219 NULL 1389 NULL 24941 business NULL NULL 30788 mod_cook NULL NULL 24278 popular_comp NULL NULL 12875 psychology NULL NULL 9939 trad_cook NULL NULL 19566 If you look closely at the results in Listing 42.4, you see there are two rows with NULL values for all three columns for type, pub_id, and price. How can you determine definitively which row is the super-aggregate of all three rows, and which is a row grouped by price where the value of price is NULL? This is where the new grouping_id() function comes in. The grouping_id() Function The grouping_id() function, new in SQL Server 2008, can be used to determine the level of grouping in a query using GROUPING SETS or the CUBE and ROLLUP operators. Unlike the GROUPING() function, which takes only a single column expression as an argument and returns a 1 or 0 to indicate whether that individual column is being aggregated, the grouping_id() function accepts multiple column expressions and returns a bitmap to indicate which columns are being aggregated for that row. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1565 1566 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 For example, you can add the grouping_id() and grouping() functions to the query in Listing 42.4 and examine the results (see Listing 42.5). LISTING 42.5 Using the grouping_id() Function SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales, grouping_id(type, pub_id, price) as grping_id, grouping(type) type_rlp, grouping(pub_id) pub_id_rlp, grouping(price) price_rlp FROM titles where pub_id < 9 GROUP BY GROUPING SETS ( type, pub_id, price, () ) go type pub_id price ytd_sales grping_id type_rlp pub_id_rlp price_rlp ------------ ------ ------- --------- --------- -------- ---------- --------- NULL NULL NULL 0 6 1 1 0 NULL NULL 0.0006 111 6 1 1 0 NULL NULL 0.0017 750 6 1 1 0 NULL NULL 14.3279 4095 6 1 1 0 NULL NULL 14.595 18972 6 1 1 0 NULL NULL 14.9532 14294 6 1 1 0 NULL NULL 14.9611 4095 6 1 1 0 NULL NULL 15.894 40968 6 1 1 0 NULL NULL 15.9329 3336 6 1 1 0 NULL NULL 17.0884 2045 6 1 1 0 NULL NULL 17.1675 8780 6 1 1 0 NULL NULL NULL 97446 7 1 1 1 NULL 0736 NULL 28286 5 1 0 1 NULL 0877 NULL 44219 5 1 0 1 NULL 1389 NULL 24941 5 1 0 1 business NULL NULL 30788 3 0 1 1 mod_cook NULL NULL 24278 3 0 1 1 popular_comp NULL NULL 12875 3 0 1 1 psychology NULL NULL 9939 3 0 1 1 trad_cook NULL NULL 19566 3 0 1 1 Unlike the grouping() function, which takes only a single column name as an argument, the grouping_id() function accepts all columns that participate in any grouping set. The grouping_id() function produces an integer result that is a bitmap, where each bit repre- sents a different column, producing a unique integer for each grouping set. The bits in the bitmap indicate whether the columns are being aggregated in the grouping set (bit value is 1) or if the column is used to determine the grouping set (bit value is 0) used to calculate the aggregate value. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1566 1567 GROUP BY Clause Enhancements 4 2 The bit values are assigned to columns from right to left in the order the columns are listed in the grouping_id() function. For example, in the query in Listing 42.5, price is the rightmost bit value, bit 1; pub_id is assigned the next bit value, bit 2, and type is assigned the leftmost bit value, bit 3. When the grouping_id() value equals 6, that means the bits 2 and 3 are turned on (4 + 2 + 0 = 6). This indicates that the type and pub_id columns are being aggregated in the grouping set, and the price column defines the grouping set. The grouping_id() column can thus be used to determine which of the two rows where type, pub_id, and price are all NULL is the row with the super-aggregate of all three columns (grouping_id = 7), and which row is an aggregate rolled up where the value of price is NULL (grouping_id = 6). The values returned by the grouping_id() function can also be used for further filtering your grouping set results or for sorting your grouping set results, as shown in Listing 42.6. LISTING 42.6 Using the grouping_id() Function to Sort Results SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales, grouping_id(type, pub_id, price) as grping_id FROM titles where pub_id < 9 GROUP BY GROUPING SETS ( type, pub_id, price, () ) order by grping_id go type pub_id price ytd_sales grping_id ------------ ------ -------- ----------- ----------- business NULL NULL 30788 3 mod_cook NULL NULL 24278 3 popular_comp NULL NULL 12875 3 psychology NULL NULL 9939 3 trad_cook NULL NULL 19566 3 NULL 0736 NULL 28286 5 NULL 0877 NULL 44219 5 NULL 1389 NULL 24941 5 NULL NULL NULL 0 6 NULL NULL 0.0006 111 6 NULL NULL 0.0017 750 6 NULL NULL 14.3279 4095 6 NULL NULL 14.595 18972 6 NULL NULL 14.9532 14294 6 NULL NULL 14.9611 4095 6 NULL NULL 15.894 40968 6 NULL NULL 15.9329 3336 6 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1567 1568 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 NULL NULL 17.0884 2045 6 NULL NULL 17.1675 8780 6 NULL NULL NULL 97446 7 Variable Assignment in DECLARE Statement In SQL Server 2008, you can now set a variables initial value at the same time you declare it. For example, the following line of code declares a variable named @ctr of type int and set its value to 100: DECLARE @ctr int = 100 Previously, this functionality was only possible with stored procedure parameters. Assigning an initial value to a variable required a separate SET or SELECT statement. This new syntax simply streamlines the process of assigning an initial value to a variable. The value specified can be a constant or a constant expression, as in the following: DECLARE @start_time datetime = getdate() You can even assign the initial value via a subquery, as long as the subquery returns only a single value, as in the following example: declare @max_price money = (select MAX(price) from titles) The value being assigned to the variable must be of the same type as the variable or be implicitly convertible to that type. Compound Assignment Operators Another new feature that streamlines and improves the efficiency of your T-SQL code is compound operators. This is a concept that has been around in many other programming languages for a long time, but has now finally found its way into T-SQL. Compound oper- ators are used when you want to apply an arithmetic operation on a variable and assign the value back into the variable. For example, the += operator adds the specified value to the variable and then assigns the new value back into the variable. For example, SET @ctr += 1 is functionally the same as SET @ctr = @ctr + 1 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1568 1569 Row Constructors 4 2 The compound operators are a quicker to type, and they offer a cleaner piece of finished code. Following is the complete list of compound operators provided in SQL Server 2008: += Add and assign -= Subtract and assign *= Multiply and assign /= Divide and assign %= Modulo and assign &= Bitwise AND and assign ^= Bitwise XOR and assign |= Bitwise OR and assign Row Constructors SQL Server 2008 provides a new method to insert data to SQL Server tables, referred to as row constructors. Row constructors are a feature that can be used to simplify data insertion, allowing multiple rows of data to be specified in a single DML statement. Row construc- tors are used to specify a set of row value expressions to be constructed into a data row. Row constructors can be specified in the VALUES clause of the INSERT statement, in the USING clause of the MERGE statement, and in the definition of a derived table in the FROM clause. The general syntax of the row constructor is as follows: VALUES ( { expression | DEFAULT | NULL |} [ ,...n ] ) [ ,...n ] Each column of data defined in the VALUES clause is separated from the next using a comma. Multiple rows (which may also contain multiple columns) are separated from each other using parentheses and a comma. When multiple rows are specified, the corre- sponding column values must be of the same data type or implicitly convertible data type. The following example shows the row constructor VALUES clause being used within a SELECT statement to define a set of rows and columns with explicit values: SELECT a, b FROM (VALUES (1, 2), (3, 4), (5, 6), (7, 8), (9, 10) ) AS MyTable(a, b); GO a b ----------- ----------- 1 2 3 4 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1569 1570 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 5 6 7 8 9 10 The VALUES clause is commonly used in this manner to populate temporary tables but can also be used in a view, as shown in Listing 42.7. LISTING 42.7 Using the VALUES Clause in a View create view book_types as SELECT type, description FROM (VALUES (mod_cook, Modern Cooking), (trad_cook, Traditional Cooking), (popular_comp, Popular Computing), (biography, Biography), (business, Business Development), (children, Childrens Literature), (fiction, Fiction), (nonfiction, NonFiction), (psychology, Psychology and Self Help), (drama, Drama and Theater), (lit crit, Literay Criticism) ) AS type_lookup(type, description) go Defining a view in this manner can be useful as a code lookup table: select top 10 convert(varchar(50), title) as title, description from titles t inner join book_types bt on t.type = bt.type order by title_id desc go title description
Sushi, Anyone? Traditional Cooking Fifty Years in Buckingham Palace Kitchens Traditional Cooking Onions, Leeks, and Garlic: Cooking Secrets of the Traditional Cooking Emotional Security: A New Algorithm Psychology and Self Help Prolonged Data Deprivation: Four Case Studies Psychology and Self Help Life Without Fear Psychology and Self Help Is Anger the Enemy? Psychology and Self Help 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1570 1571 Row Constructors 4 2 Computer Phobic AND Non-Phobic Individuals: Behavi Psychology and Self Help Net Etiquette Popular Computing Secrets of Silicon Valley Popular Computing The advantage of this approach is that unlike a permanent code table, the view with the VALUES clause doesnt really take up any space; its materialized only when its referenced. Maintaining it involves simply dropping and re-creating the view rather than having to perform inserts, updates, and deletes as you would for a permanent table. The primary use of row constructors is to insert multiple rows of data in a single INSERT statement. Essentially, if you have multiple rows to insert, you can specify multiple rows in the VALUES clause. The maximum number of rows that can be specified in the VALUES clause is 1000. The following example shows how to use the row constructor VALUES clause in a single INSERT statement to insert five rows: insert sales (stor_id, ord_num, ord_date, qty, payterms, title_id) VALUES (6380, 1234, 3/26/2010, 50, Net 30, BU1032), (6380, 1234, 3/26/2010, 150, Net 30, PS2091), (6380, 1234, 3/26/2010, 25, Net 30, CH2480), (6380, 1234, 3/26/2010, 30, Net 30, FI2046), (6380, 1234, 3/26/2010, 10, Net 30, FI6318) As you can see, this new syntax is much more concise and simple than having to issue five individual INSERT statements as you would have had to do in versions of SQL Server prior to SQL Server 2008. The VALUES clause can also be used in the MERGE statement as the source table. Listing 42.8 uses the VALUES clause to define five rows as the source data to perform INSERT/UPDATE operations on the store_inventory table defined in Listing 42.1. LISTING 42.8 Using the VALUES Clause in a MERGE Statement MERGE INTO store_inventory as s USING (VALUES (A011, CH3348, 41 , getdate()), (A011, CH2480, 125 , getdate()), (A011, FI0392, 1100 , getdate()), (A011, FI2046, 1476 , getdate()), (A011, FI1872, 520 , getdate()) ) as i (stor_id, title_id, qty, update_dt) ON s.stor_id = i.stor_id and s.title_id = i.title_id WHEN MATCHED and s.qty <> i.qty THEN UPDATE SET s.qty = i.qty, update_dt = getdate() 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1571 1572 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 WHEN NOT MATCHED THEN INSERT (stor_id, title_id, qty, update_dt) VALUES (i.stor_id, i.title_id, i.qty, getdate()) OUTPUT $action, isnull(inserted.title_id, ) as src_titleid, isnull(str(inserted.qty, 5), ) as src_qty, isnull(deleted.title_id, ) as tgt_titleid, isnull(str(deleted.qty, 5), ) as tgt_qty ; go $action src_titleid src_qty tgt_titleid tgt_qty ---------- ----------- ------- ----------- ------- INSERT CH2480 125 UPDATE CH3348 41 CH3348 24 UPDATE FI0392 1100 FI0392 1176 UPDATE FI1872 520 FI1872 540 INSERT FI2046 1476 New date and time Data Types and Functions SQL Server 2008 introduces four new date and time data types: . date . time (precision) . datetime2 (precision) . datetimeoffset (precision) Two of the most welcome of these new types are the new date and time data types. These new data types allow you to store date-only and time-only values. In previous versions of SQL Server, the datetime and smalldatetime data types were the only available types for storing date or time values, and they always store both the date and time. This made date- only or time-only comparisons tricky at times because you always had to account for the other component (for more detailed examples on working with datetime values in SQL Server, see Chapter 43). In addition, the datetime stored date values range only from 1/1/1753 to 12/31/9999, with accuracy only to 3.33 milliseconds. The smalldatetime stored date values range only from 1/1/1900 to 6/6/2079, with accuracy of only 1 minute. The new date data type stores only the date component without the time component, and stores date values ranging from 1/1/0001 to 12/31/9999. The new time data type stores only the time component with accuracy that can be specified down to seven decimal places (100 nanoseconds). The default is seven decimal places. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1572 1573 New date and time Data Types and Functions 4 2 The datetime2 data type stores both date and time components, similar to datetime, increases the range of allowed values to 1/1/0001 to 12/31/9999, also with accuracy down to seven decimal places (100 ns). The default precision is seven decimal places. The datetimeoffset data type also stores both date and time components just like datetime2, but includes the time zone offset from Universal Time Coordinates (UTC). The time zone offset ranges from -14:00 to +14:00. Along with the new date and time data types, SQL Server 2008 also introduces some new date and time functions for returning the current system date and time in different formats: . SYSDATETIME()Returns the current system datetime as a DATETIME2(7) value . SYSDATETIMEOFFSET()Returns the current system datetime as a DATETIMEOFFSET(7) value . SYSUTCDATETIMEReturns the current system datetime as a DATETIME2(7) value representing the current UTC time . SWITCHOFFSET (DATETIMEOFFSET,time_zone)Changes the DATETIMEOFFSET value from the stored time zone offset to the specified time zone . TODATETIMEOFFSET (datetime, time_zone)Applies the specified time zone to the datetime value that does not reflect time zone difference from UTC Listing 42.9 demonstrates the use of some of the new data types and functions. Notice the difference in the specified decimal precision returned for the time values. LISTING 42.9 Using the new date and time Data Types and Functions declare @date date, @time time, @time3 time(3), @datetime2 datetime2(7), @datetimeoffset datetimeoffset, @datetime datetime, @utcdatetime datetime2(7) select @datetime = getdate(), @date = getdate(), @time = sysdatetime(), @time3 = sysdatetime(), @datetime2 = SYSDATETIME(), @datetimeoffset = SYSDATETIMEOFFSET(), @utcdatetime = SYSUTCDATETIME() select @datetime as datetime, @date as date, @time as time, 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1573 1574 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 @time3 as time3 select @datetime2 as datetime2, @datetimeoffset as datetimeoffset, @utcdatetime as utcdatetime select SYSDATETIMEOFFSET() as sysdatetimeoffset, SYSDATETIME() as sysdatetime go datetime date time time3 ----------------------- ---------- ---------------- ----------------- 2010-03-28 23:18:30.490 2010-03-28 23:18:30.4904294 23:18:30.492 datetime2 datetimeoffset utcdatetime ---------------------- ---------------------------------- ---------------------- 2010-03-28 23:18:30.49 2010-03-28 23:18:30.4924295 -04:00 2010-03-29 03:18:30.49 sysdatetimeoffset sysdatetime ---------------------------------- ---------------------- 2010-03-28 23:24:10.7485902 -04:00 2010-03-28 23:24:10.74 Be aware that retrieving the value from getdate() or sysdatetime() into a datetimeoffset variable or column does not capture the offset from UTC, even if you store the returned value in a column or variable defined with the datetimeoffset data type. To do so, you need to use the SYSDATETIMEOFFSET() function: declare @datetimeoffset1 datetimeoffset, @datetimeoffset2 datetimeoffset select @datetimeoffset1 = SYSDATETIME(), @datetimeoffset2 = SYSDATETIMEOFFSET() select @datetimeoffset1, @datetimeoffset2 go ---------------------------------- ---------------------------------- 2010-03-28 23:36:39.7271831 +00:00 2010-03-28 23:36:39.7271831 -04:00 Note that in the output, SQL Server Management Studio (SSMS) trims the time values down to two decimal places when it displays the results in the Text Results tab. However, this is just for display purposes (and applies only with text results; grid results display the full decimal precision). The actual value does store the precision down to the specified number of decimal places, which can be seen if you convert the datetime2 value to a string format that displays all the decimal places: 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1574 1575 New date and time Data Types and Functions 4 2 select SYSDATETIME() as datetime2_trim, convert(varchar(30), SYSDATETIME(), 121) as datetime2_full go datetime2_trim datetime2_full ---------------------- ------------------------------ 2010-03-30 23:52:30.68 2010-03-30 23:52:30.6851262 The SWITCHOFFSET() function can be used to convert a datetimeoffset value into a differ- ent time zone offset value: select SYSDATETIMEOFFSET(), SWITCHOFFSET ( SYSDATETIMEOFFSET(), -07:00 ) go ---------------------------------- ---------------------------------- 2010-03-29 00:07:21.1335738 -04:00 2010-03-28 21:07:21.1335738 -07:00 When you are specifying a time zone value for the SWITCHOFFSET or TODATETIMEOFFSET offset functions, the value can be specified as an integer value representing the number of minutes of offset or as a time value in hh:mm format. The range of allowed values is +14 hours to -13 hours. select TODATETIMEOFFSET ( SYSDATETIME(), -300 ) select TODATETIMEOFFSET ( SYSDATETIME(), -05:00 ) go ---------------------------------- 2010-03-29 00:23:05.5773288 -05:00 ---------------------------------- 2010-03-29 00:23:05.5773288 -05:00 Date and Time Conversions If an existing CONVERT style includes the time part, and the conversion is from datetimeoffset to a string, the time zone offset (except for style 127) is included. If you do not want the time zone offset, you need to use cast or convert the datetimeoffset value to datetime2 first and then to a string: select convert(varchar(35), SYSDATETIMEOFFSET(), 121) as datetime_offset, CONVERT(varchar(30), cast(SYSDATETIMEOFFSET() as datetime2),121) as datetime2 go datetime_offset datetime2 ----------------------------------- ------------------------------ 2010-03-30 23:57:36.1015950 -04:00 2010-03-30 23:57:36.1015950 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1575 1576 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 When you convert from datetime2 or datetimeoffset to date, there is no rounding and the date part is extracted explicitly. For any implicit conversion from datetimeoffset to date, time, datetime2, datetime, or smalldatetime, conversion is based on the local date and time value (to the persistent time zone offset). For example, when the datetimeoffset(3) value, 2006-10-21 12:20:20.999 -8:00, is converted to time(3), the result is 12:20:20.999, not 20:20:20.999(UTC). If you convert from a higher-precision time value to a lower-precision value, the conversion is permitted, and the higher-precision values are truncated to fit the lower precision type. If you are converting a time(n), datetime2(n), or datetimeoffset(n) value to a string, the number of digits depends on the type specification. If you want a specific precision in the resulting string, convert to a data type with the appropriate precision first and then to a string, as follows: select convert(varchar(35), sysdatetime(), 121) as datetime_offset, CONVERT(varchar(30), cast(sysdatetime() as datetime2(3)), 121) as datetime2 go datetime_offset datetime2 ----------------------------------- ------------------------------ 2010-03-31 00:04:37.3306880 2010-03-31 00:04:37.331 If you attempt to cast a string literal with a fractional seconds precision that is more than that allowed for smalldatetime or datetime, Error 241 is raised: declare @datetime datetime select @datetime = 2010-03-31 00:04:37.3306880 go Msg 241, Level 16, State 1, Line 2 Conversion failed when converting date and/or time from character string. Table-Valued Parameters In previous versions of SQL Server, it was not possible to share the contents of table vari- ables between stored procedures. SQL Server 2008 changes that with the introduction of table-valued parameters, which allow you to pass table variables to stored procedures as input parameters. Table-valued parameters provide more flexibility and, in many cases, better performance than temporary tables as a means to pass result sets between stored procedures. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1576 1577 Table-Valued Parameters 4 2 To create and use table-valued parameters, you must first create a user-defined table type as a TABLE data type and define the table structure. This is done using the CREATE TYPE command, as shown in Listing 42.10. LISTING 42.10 Defining a User-Defined Table Type if exists (select * from sys.systypes t where t.name = ytdsales_tabletype and t.uid = USER_ID(dbo)) drop type ytdsales_tabletype go CREATE TYPE ytdsales_tabletype AS TABLE (title_id char(6), title varchar(50), pubdate date, ytd_sales int) go After creating the user-defined table data type, you can use it for declaring local table vari- ables and for stored procedure parameters. To use the table-valued parameter in a proce- dure, you create a procedure to receive and access data through a table-valued parameter, as shown in Listing 42.11. LISTING 42.11 Defining a Stored Procedure with a Table-Valued Parameter /* Create a procedure to receive data for the table-valued parameter. */ if OBJECT_ID(tab_parm_test) is not null drop proc tab_parm_test go create proc tab_parm_test @pubdate datetime = null, @sales_minimum int = 0, @ytd_sales_tab ytdsales_tabletype READONLY as set nocount on if @pubdate is null -- if no date is specified, set date to last year set @pubdate = dateadd(month, -12, getdate()) select * from @ytd_sales_tab where pubdate > @pubdate and ytd_sales >= @sales_minimum return go 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1577 1578 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 Then, when calling that stored procedure, you declare a local table variable using the table data type defined previously, populate the table variable with data, and then pass the table variable to the stored procedure (see Listing 42.12). LISTING 42.12 Executing a Stored Procedure with a Table-Valued Parameter /* Declare a variable that references the table type. */ declare @ytd_sales_tab ytdsales_tabletype /* Add data to the table variable. */ insert @ytd_sales_tab select title_id, convert(varchar(50), title), pubdate, ytd_sales from titles /* Pass the table variable populated with data to a stored procedure. */ exec tab_parm_test 6/1/2001, 10000, @ytd_sales_tab go title_id title ytd_sales -------- -------------------------------------------------- ----------- BU2075 You Can Combat Computer Stress! 18722 MC3021 The Gourmet Microwave 22246 TC4203 Fifty Years in Buckingham Palace Kitchens 15096 The scope of a table-valued parameter is limited to only the stored procedure to which it is passed. To access the contents of a table-valued parameter in a procedure called by another procedure that contains a table-valued parameter, you need to pass the table- valued parameter to the subprocedure. Listing 42.13 provides an example of a subproce- dure and alters the procedure created in Listing 42.6 to call the subprocedure. LISTING 42.13 Passing a Table-Valued Parameter to a Subprocedure /* Create the sub-procedure */ create proc tab_parm_subproc @pubdate datetime = null, @sales_minimum int = 0, @ytd_sales_tab ytdsales_tabletype READONLY as select * from @ytd_sales_tab where ytd_sales <= @sales_minimum and ytd_sales <> 0 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1578 1579 Table-Valued Parameters 4 2 go /* modify the tab_part_test proc to call the sub-procedure */ alter proc tab_parm_test @pubdate datetime = null, @sales_minimum int = 0, @ytd_sales_tab ytdsales_tabletype READONLY as set nocount on if @pubdate is null -- if no date is specified, set date to last year set @pubdate = dateadd(month, -12, getdate()) select * from @ytd_sales_tab where pubdate > @pubdate and ytd_sales >= @sales_minimum exec tab_parm_subproc @pubdate, @sales_minimum, @ytd_sales_tab return go /* Declare a variable that references the type. */ declare @ytd_sales_tab ytdsales_tabletype /* Add data to the table variable. */ insert @ytd_sales_tab select title_id, convert(varchar(50), title), pubdate, ytd_sales from titles where type = business /* Pass the table variable populated with data to a stored procedure. */ exec tab_parm_test 6/1/2001, 10000, @ytd_sales_tab go title_id title pubdate ytd_sales -------- -------------------------------------------------- ---------- ----------- BU2075 You Can Combat Computer Stress! 2004-06-30 18722 title_id title pubdate ytd_sales -------- -------------------------------------------------- ---------- ----------- BU1032 The Busy Executives Database Guide 2004-06-12 4095 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1579 1580 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 BU1111 Cooking with Computers: Surreptitious Balance Shee 2004-06-09 3876 BU7832 Straight Talk About Computers 2004-06-22 4095 Table-Valued Parameters Versus Temporary Tables Table-valued parameters offer more flexibility and in some cases better performance than temporary tables or other ways to pass a list of values to a stored procedure. One benefit is table-valued parameters do not acquire locks for the initial population of data from a client. Also, table-valued parameters are memory resident and do not incur physical I/O unless they grow too large to remain in cache memory. However, table-valued parameters do have some restrictions: . SQL Server does not create or maintain statistics on columns of table-valued parame- ters. . Table-valued parameters can be passed only as READONLY input parameters to T-SQL routines. You cannot perform UPDATE, DELETE, or INSERT operations on a table-valued parameter within the body of the stored procedure to which it is passed. . Like table variables, a table-valued parameter cannot be specified as the target of a SELECT INTO or INSERT EXEC statement. They can only be populated using an INSERT statement. Hierarchyid Data Type The Hierarchyid data type introduced in SQL Server 2008 is actually a system-supplied common language runtime (CLR) user-defined type (UDT) that can be used for storing and manipulating hierarchical structures (for example, parent-child relationships) in a rela- tional database. The Hierarchyid type is stored as a varbinary value that represents the position of the current node in the hierarchy (both in terms of parent-child position and position among siblings). You can perform manipulations on the type in Transact-SQL by invoking methods exposed by the type. Creating a Hierarchy First, lets define a hierarchy in a table using the Hierarchyid data type. For example, this section uses the Parts table example used in Chapter 28, Creating and Managing Stored Procedures, to demonstrate how a stored procedure could be used to traverse a hierarchy stored in a table. There is also an example in Chapter 52 using a recursive common table expression (CTE) to perform a similar action. Lets see how to implement an alternative solution by adding a Hierarchyid column to the Parts table. First, you create a version of the Parts table using the Hierarchyid data type (see Listing 42.14). 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1580 1581 Hierarchyid Data Type 4 2 LISTING 42.14 Creating the Parts Table with a Hierarchyid Data Type Use bigpubs2008 Go CREATE TABLE PARTS_hierarchy( partid int NOT NULL, hid hierarchyid not null, lvl as hid.GetLevel() persisted, partname varchar(30) NOT NULL, PRIMARY KEY NONCLUSTERED (partid), UNIQUE NONCLUSTERED (partname) ) Note the hid column defined with the Hierarchyid data type. Notice also how the lvl column is defined as a compute column using the GetLevel method of the hid column to define the persisted computed column level. The GetLevel method returns the level of the current node in the hierarchy. The Hierarchyid data type provides topological sorting, meaning that a childs sort value is guaranteed to be greater than the parents sort value. This guarantees that a nodes sort value will be higher than all its ancestors. You can take advantage of this feature by creating an index on the Hierarchyid column because the index will sort the data in a depth-first manner. This ensures that all members of the same subtree are close to each other in the leaf level of the index, which makes the index useful as an efficient mechanism for returning all descendents of a node. To take advantage of this, you can create a clustered index on the hid column: CREATE UNIQUE CLUSTERED INDEX idx_hid_first ON Parts_hierarchy (hid); You can also use another indexing strategy called breadth-first, in which you organize all nodes from the same level close to each other in the leaf level of the index. This is done by building the index such that the leading column is level in the hierarchy. Queries that need to get all nodes from the same level in the hierarchy can benefit from this type of index: CREATE UNIQUE INDEX idx_lvl_first ON Parts_hierarchy(lvl, hid); Populating the Hierarchy Now that youve created the hierarchy table, the next step is to populate it. To insert a new node into the hierarchy, you must first produce a new Hierarchyid value that repre- sents the correct position in the hierarchy. There are two methods available with the Hierarchyid data type to do this: the HIERARCHYID::GetRoot() method and GetDescendant method. You use the HIERARCHYID::GetRoot() method to produce the value for the root node of the hierarchy. This method simply produces a Hierarchyid value that is internally an empty binary string representing the root of the tree. You can use the GetDescendant method to produce a value below a given parent. The GetDescendant method accepts two optional Hierarchyid input values that represent the two nodes between which you want to position the new node. If both values are not NULL, the method produces a new value positioned between the two nodes. If the first parameter 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1581 1582 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 is not NULL and the second parameter is NULL, the method produces a value greater than the first parameter. Finally, if the first parameter is NULL and the second parameter is not NULL, the method produces a value smaller than the second parameter. If both parameters are NULL, the method produces a value simply below the given parent. NOTE The GetDescendant method does not guarantee that Hierarchyid values are unique. To enforce uniqueness, you must define either a primary key, unique constraint, or unique index on the Hierarchyid column. The code in Listing 42.15 uses a cursor to loop through the rows currently in the Parts table and populates the Parts_hierarchy table. If the part is the first node in the hierar- chy, the procedure uses the HIERARCHYID::GetRoot() method to assign the hid value for the root node of the hierarchy. Otherwise, the code in the cursor looks for the last child hid value of the new parts parent part and uses the GetDescendant method to produce a value that positions the new node after the last child of that parent part. NOTE Listing 42.15 also makes use of a recursive common table expression to traverse the existing Parts table in hierarchical order to add in the rows at the proper level, starting with the top-most parent part. If you are unfamiliar with CTEs (which were introduced in SQL Server 2005), you may want to review the In Case you Missed it section in Chapter 43. LISTING 42.15 Populating the Parts_hierarchy Table DECLARE @hid AS HIERARCHYID, @parent_hid AS HIERARCHYID, @last_child_hid AS HIERARCHYID, @partid int, @partname varchar(30), @parentpartid int declare parts_cur cursor for WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE parentpartid is null 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1582 1583 Hierarchyid Data Type 4 2 UNION ALL SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P JOIN PartsCTE as PP ON P.parentpartid = PP.Partid ) SELECT PartID, Partname, ParentPartid FROM PartsCTE order by lvl open parts_cur fetch parts_cur into @partid, @partname, @parentpartid while @@FETCH_STATUS = 0 begin if @parentpartid is null set @hid = HIERARCHYID::GetRoot() else begin select @parent_hid = hid from PARTS_hierarchy where partid = @parentpartid select @last_child_hid = MAX(hid) from PARTS_hierarchy where hid.GetAncestor(1) = @parent_hid select @hid = @parent_hid.GetDescendant(@last_child_hid, NULL) end insert PARTS_hierarchy (partid, hid, partname) values (@partid, @hid, @partname) fetch parts_cur into @partid, @partname, @parentpartid end close parts_cur deallocate parts_cur go Querying the Hierarchy Now that youve populated the hierarchy, you should query it to view the data and verify the hierarchy was populated correctly. However, If you query the hid value directly, you see only its binary representation, which is not very meaningful. To view the Hierarchyid value in a more useful manner, you can use the ToString method, which returns a logical string representation of the Hierarchyid. This string representation is shown as a path 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1583 1584 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 with a slash sign used as a separator between the levels. For example, you can run the following query to get both the binary and logical representations of the hid value: select cast(hid as varbinary(6)) as hid, substring(hid.ToString(), 1, 12) as path, lvl, partid, partname From parts_hierarchy go hid path lvl partid partname -------------- ------------ ------ ----------- ------------------ 0x / 0 22 Car 0x58 /1/ 1 1 DriveTrain 0x68 /2/ 1 23 Body 0x78 /3/ 1 24 Frame 0x5AC0 /1/1/ 2 2 Engine 0x5B40 /1/2/ 2 3 Transmission 0x5BC0 /1/3/ 2 4 Axle 0x5C20 /1/4/ 2 12 Drive Shaft 0x5B56 /1/2/1/ 3 9 Flywheel 0x5B5A /1/2/2/ 3 10 Clutch 0x5B5E /1/2/3/ 3 16 Gear Box 0x5AD6 /1/1/1/ 3 5 Radiator 0x5ADA /1/1/2/ 3 6 Intake Manifold 0x5ADE /1/1/3/ 3 7 Exhaust Manifold 0x5AE1 /1/1/4/ 3 8 Carburetor 0x5AE3 /1/1/5/ 3 13 Piston 0x5AE5 /1/1/6/ 3 14 Crankshaft 0x5AE358 /1/1/5/1/ 4 21 Piston Rings 0x5AE158 /1/1/4/1/ 4 11 Float Valve 0x5B5EB0 /1/2/3/1/ 4 15 Reverse Gear 0x5B5ED0 /1/2/3/2/ 4 17 First Gear 0x5B5EF0 /1/2/3/3/ 4 18 Second Gear 0x5B5F08 /1/2/3/4/ 4 19 Third Gear 0x5B5F18 /1/2/3/5/ 4 20 Fourth Gear As stated previously, the values stored in a Hierarchyid column provide topological sorting of the nodes in the hierarchy. The GetLevel method can be used to produce the level in the hierarchy (as it was to store the level in the computed lvl column in the Parts_hierarchy table). Using the lvl column or the GetLevel method, you can easily produce a graphical depiction of the hierarchy by simply sorting the rows by hid and generating indentation for each row based on the lvl column, as shown in the following example: 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1584 1585 Hierarchyid Data Type 4 2 SELECT REPLICATE(--, lvl) + right(>,lvl) + partname AS partname FROM Parts_hierarchy order by hid go partname ------------------------- Car -->DriveTrain ---->Engine ------>Radiator ------>Intake Manifold ------>Exhaust Manifold ------>Carburetor -------->Float Valve ------>Piston -------->Piston Rings ------>Crankshaft ---->Transmission ------>Flywheel ------>Clutch ------>Gear Box -------->Reverse Gear -------->First Gear -------->Second Gear -------->Third Gear -------->Fourth Gear ---->Axle ---->Drive Shaft -->Body -->Frame To return only the subparts of a specific part, you can use the IsDescendantOf method. The parameter passed to this method is a nodes Hierarchyid value. The method returns 1 if the queried node is a descendant of the input node. For example, the following query returns all subparts of the engine: select child.partid, child.partname, child.lvl from parts_hierarchy as parent inner join parts_hierarchy as child on parent.partname = Engine and child.hid.IsDescendantOf(parent.hid) = 1 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1585 1586 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 go partid partname lvl ----------- ------------------------------ ------ 2 Engine 2 5 Radiator 3 6 Intake Manifold 3 7 Exhaust Manifold 3 8 Carburetor 3 13 Piston 3 14 Crankshaft 3 21 Piston Rings 4 11 Float Valve 4 You can also use the IsDescendantOf method to return all parent parts of a given part: select parent.partid, parent.partname, parent.lvl from parts_hierarchy as parent inner join parts_hierarchy as child on child.partname = Piston and child.hid.IsDescendantOf(parent.hid) = 1 go partid partname lvl ----------- ------------------------------ ------ 22 Car 0 1 DriveTrain 1 2 Engine 2 13 Piston 3 To return a specific level of subparts for a given part, you can use the GetAncestor method. You pass this method an integer value indicating the level below the parent you want to display. The function returns the Hierarchyid value of the ancestor n levels above the queried node. For example, the following query returns all the subparts two levels down from the drivetrain: select child.partid, child.partname from parts_hierarchy as parent inner join parts_hierarchy as child on parent.partname = Drivetrain and child.hid.GetAncestor(2) = parent.hid go 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1586 1587 Hierarchyid Data Type 4 2 partid partname lvl ----------- ------------------------------ ------ 9 Flywheel 3 10 Clutch 3 16 Gear Box 3 5 Radiator 3 6 Intake Manifold 3 7 Exhaust Manifold 3 8 Carburetor 3 13 Piston 3 14 Crankshaft 3 Modifying the Hierarchy The script in Listing 42.15 performs the initial population of the Parts_hierarchy table. What if you need to add additional records into the table? Lets look at how to use the GetDescendant method to add new records at different levels of the hierarchy. For example, to add a child part to the Body node (node /2/), you can use the GetDescendant method without any arguments to add the new row below Body node at node /2/1/: INSERT Parts_hierarchy (hid, partid, partname) select hid.GetDescendant(null, null), 25, left front fender from Parts_hierarchy where partname = Body To add a new row as a higher descendant node at the same level as the left front fender inserted in the previous example, you use the GetDescendant method again, but this time passing the Hierarchyid of the existing child node as the first parameter. This specifies that the new node will follow the existing node, becoming /2/2/. There are a couple of ways to specify the Hierarchyid of the existing child node. You can retrieve it from the table as a Hierarchyid data type, or if you know the string representation of the node, you can use the Parse method. The Parse method converts a canonical string representa- tion of a hierarchical value to Hierarchyid. Parse is also called implicitly when a conver- sion from a string type to Hierarchyid occurs, as in CAST (input AS hierarchyid). Parse is essentially the opposite of the ToString method. INSERT Parts_hierarchy (hid, partid, partname) select hid.GetDescendant(hierarchyid::Parse(/2/1/), null), 26, right front fender from Parts_hierarchy where partname = Body 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1587 1588 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 Now, what if you need to add a new node between the two existing nodes you just added? Again, you use the GetDescendant methods, but this time, you pass it the hierarchy IDs of both existing nodes between which you want to insert the new node: declare @child1 hierarchyid, @child2 hierarchyid select @child1 = hid from Parts_hierarchy where partname = left front fender select @child2 = hid from Parts_hierarchy where partname = right front fender INSERT Parts_hierarchy (hid, partid, partname) select hid.GetDescendant(@child1, @child2), 27, front bumper from Parts_hierarchy where partname = Body Now, lets run a query of the Body subtree to examine the newly inserted child nodes: select child.partid, child.partname, child.lvl, substring(child.hid.ToString(), 1, 12) as path from parts_hierarchy as parent inner join parts_hierarchy as child on parent.partname = Body and child.hid.IsDescendantOf(parent.hid) = 1 order by child.hid go partid partname lvl path ----------- ------------------------------ ------ ------------ 23 Body 1 /2/ 25 left front fender 2 /2/1/ 27 front bumper 2 /2/1.1/ 26 right front fender 2 /2/2/ Notice that the first child added (left front fender) has a node path of /2/1/, and the second row added (right front fender) has a node path of /2/2/. The new child node inserted between these two nodes (front bumper) was given a node path of /2/1.1/ so that it maintains the designated topological ordering of the nodes. What if you need to make other types of changes within hierarchies? For example, you might need to move a whole subtree of parts from one part to another (that is, move a part and all its subordinates). To move nodes or subtrees in a hierarchy, you can use the GetReparentedValue method of the Hierarchyid data type. You invoke this method on the Hierarchyid value of the node you want to reparent and provide as inputs the value of the old parent and the value of the new parent. Note that this method doesnt change the Hierarchyid value for the existing node that you want to move. Instead, it returns a new Hierarchyid value that you can use to update 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1588 1589 Hierarchyid Data Type 4 2 the target nodes Hierarchyid value. Logically, the GetReparentedValue method simply substitutes the part of the existing nodes path that represents the old parents path with the new parents path. For example, if the path of the existing node is /1/2/1/, the path of the old parent is /1/2/, and the path of the new parent is /2/1/3/, the GetReparentedValue method would return /2/1/3/1/. You have to be careful, though. If the target parent node already has child nodes, the GetReparentedValue method may not produce a unique hierarchy path. If you reparent node /1/2/1/ from old parent /1/2/ to new parent /2/1/3/, and /2/1/3/ already has a child /2/1/3/1/, you generate a duplicate value. To avoid this situation when moving a single node from one parent to another, you should not use the GetReparentedValue method but instead use the GetDescendant method to produce a completely new value for the single node. For example, lets assume you want to move the Flywheel part from the Transmission node to the Engine node. A sample approach is shown in Listing 42.16. This example uses the GetDescendant method to generate a new Hierarchyid under the Engine node following the last child node and updates the hid column for the Flywheel record to the new Hierarchyid generated. LISTING 42.16 Moving a Single Node in a Hierarchy declare @newhid hierarchyid, @maxchild hierarchyid -- first, find the max child node under the Engine node -- this is the node we will move the Flywheel node after select @maxchild = max(child.hid) from parts_hierarchy as parent inner join parts_hierarchy as child on parent.partname = Engine and child.hid.GetAncestor(1) = parent.hid select Child to insert after = @maxchild.ToString() -- Now, generate a new descendant hid for the Engine node -- after the max child node select @newhid = hid.GetDescendant(@maxchild, null) from Parts_hierarchy where partname = Engine -- Update the hid for the Flywheel node to the new hid update Parts_hierarchy set hid = @newhid where partname = Flywheel go Child to insert after ---------------------- /1/1/6/ 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1589 1590 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 If you need to move an entire subtree within a hierarchy, you can use the GetReparentedValue method in conjunction with the GetDescendant method. For example, suppose you want to move the whole Engine subtree from its current parent node of Drivetrain to the new parent node of Car. The Car node obviously already has children. If you want to avoid conflicts, the best approach is to generate a new Hierarchyid value for the root node of the subtree. You can achieve this with the following steps: 1. Use the GetDescendant method to produce a completely new Hierarchyid value for the root node of the subtree. 2. Update the Hierarchyid value of all nodes in the subtree to the value returned by the GetReparentedValue method. Because you are generating a completely new Hierarchyid value under the target parent, this new child node has no existing children, which avoids any duplicate Hierarchyid values. Listing 42.17 provides an example for changing the parent node of the Engine subtree from Drivetrain to Car. LISTING 42.17 Reparenting a Subtree in a Hierarchy DECLARE @old_root AS HIERARCHYID, @new_root AS HIERARCHYID, @new_parent_hid AS HIERARCHYID, @max_child as hierarchyid -- Get the hid of the new parent select @new_parent_hid = hid FROM dbo.parts_hierarchy WHERE partname = Car -- Get the hid of the current root of the subnode Select @old_root = hid FROM dbo.parts_hierarchy WHERE partname = Engine -- Get the max hid of child nodes of the new parent select @max_child = MAX(hid) FROM parts_hierarchy WHERE hid.GetAncestor(1) = @new_parent_hid -- get a new hid for the moving child node -- that is after the current max child node of the new parent SET @new_root = @new_parent_hid.GetDescendant (@max_child, null) -- Next, reparent the moving child node and all descendants UPDATE dbo.parts_hierarchy 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1590 1591 Hierarchyid Data Type 4 2 SET hid = hid.GetReparentedValue(@old_root, @new_root) WHERE hid.IsDescendantOf(@old_root) = 1 Now, lets reexamine the hierarchy after the updates made in Listings 42.16. and 42.17: SELECT left(REPLICATE(--, lvl) + right(>,lvl) + partname, 30) AS partname, hid.ToString() AS path FROM Parts_hierarchy order by hid go partname path ------------------------------ ------------ Car / -->DriveTrain /1/ ---->Transmission /1/2/ ------>Clutch /1/2/2/ ------>Gear Box /1/2/3/ -------->Reverse Gear /1/2/3/1/ -------->First Gear /1/2/3/2/ -------->Second Gear /1/2/3/3/ -------->Third Gear /1/2/3/4/ -------->Fourth Gear /1/2/3/5/ ---->Axle /1/3/ ---->Drive Shaft /1/4/ -->Body /2/ ---->left front fender /2/1/ ---->front bumper /2/1.1/ ---->right front fender /2/2/ -->Frame /3/ -->Engine /4/ ---->Radiator /4/1/ ---->Intake Manifold /4/2/ ---->Exhaust Manifold /4/3/ ---->Carburetor /4/4/ ------>Float Valve /4/4/1/ ---->Piston /4/5/ ------>Piston Rings /4/5/1/ ---->Crankshaft /4/6/ ---->Flywheel /4/7/ 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1591 1592 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 As you can see from the results, the Flywheel node is now under the Engine node, and the entire Engine subtree is now under the Car node. Using FILESTREAM Storage In versions of SQL Server prior to SQL Server 2008, there were two ways of storing unstructured data: as a binary large object (BLOB) in an image or varbinary(max) column, or in files outside the database, separate from the structured relational data, storing a refer- ence or pathname to the file in a varchar column. Neither of these methods is ideal for handling unstructured data. Storing the data outside the database makes managing the unstructured data and keeping it associated with structured data more complex. This approach lacks transactional consistency, coordinating backups and restores with the structured data in the database is difficult, and implementing proper data security can be quite cumbersome. Storing the unstructured data in the database solves the transactional consistency, backup/restore, and security issues, but BLOBs have different usage patterns than rela- tional data. SQL Servers storage engine is primarily concerned with doing I/O on rela- tional data stored in pages and extents, not streaming large BLOBs. I/O performance typically degrades dramatically if the size of the BLOB data increases beyond 1MB. Accessing BLOB data stored inside a SQL Server database is generally slower than storing it externally in a location such as the NTFS file system. In addition, BLOB storage is not as efficient as the file system for storing large data values, so more storage space is required. FILESTREAM storage, introduced in SQL Server 2008, helps to solve the issues with using unstructured data by integrating the SQL Server Database Engine with the NTFS file system for storing unstructured data such as documents and images on the file system with a pointer to the data in the database. The file pointer is implemented in SQL Server as a varbinary(max) column, and the actual data is stored in files in the file system. In addition to enabling client applications to leverage the rich NTFS streaming APIs and the performance of the file system for storing and retrieving unstructured data, other advantages of FILESTREAM storage include the following: . You are able to use T-SQL statements to insert, update, query, and back up FILESTREAM data even though the actual data resides outside the database in the NTFS file system. . You are able to maintain transactional consistency between the unstructured data and corresponding structured data. . You are able to enforce the same level of security on the unstructured data as with your relational data using built-in SQL Server security mechanisms. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1592 1593 Using FILESTREAM Storage 4 2 . FILESTREAM uses the NT system cache for caching file data rather than caching the data in the SQL Server buffer pool, leaving more memory available for query processing. . FILESTREAM storage also eliminates the size limitation of BLOBS stored in the data- base. Whereas standard image and varbinary(max) columns have a size limitation of 2GB, the sizes of the FILESTREAM BLOBs are limited only by the available space of the file system. Columns with the FILESTREAM attribute set can be managed just like any other BLOB column in SQL Server. Administrators can use the manageability and security capabilities of SQL Server to integrate FILESTREAM data management with the rest of the data in the relational databasewithout needing to manage the file system data separately. This includes maintenance operations such as backup and restore, complete integration with the SQL Server security model, and full-transaction support to ensure data-level consis- tency between the relational data in the database and the unstructured data physically stored on the file system. The database administrator does not need to manage the file system data separately Whether you should use database storage or file system storage for your BLOB data is determined by the size and use of the unstructured data. If the following conditions are true, you should consider using FILESTREAM: . The objects being stored as BLOBS are, on average, larger than 1MB. . Fast read access is important. . You are developing applications that use a middle tier for application logic. Enabling FILESTREAM Storage If you decide to use FILESTREAM storage, it first needs to be enabled at both the Windows level as well as at the SQL Server Instance level. FILESTREAM storage can be enabled auto- matically during SQL Server installation or manually after installation. If you are enabling FILESTREAM during SQL Server installation, you need to provide the Windows share location where the FILESTREAM data will be stored. You can also choose whether to allow remote clients to access the FILESTREAM data. For more information on how to enable FILESTREAM storage during installation, see Chapter 8, Installing SQL Server 2008. If you did not enable the FILESTREAM option during installation, you can enable it for a running instance of SQL Server 2008 at any time using SQL Server Configuration Manager (SSCM). In SSCM, right-click on the SQL Server Service and select Properties. Then select the FILESTREAM tab, which provides similar options as those displayed during SQL Server installation (see Figure 42.1). This enables SQL Server to work directly with the Windows 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1593 1594 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 FIGURE 42.1 Setting FILESTREAM options in SQL Server Configuration Manager. file system for storing FILESTREAM data. You have three options for how FILESTREAM functionality will be enabled: . Allowing only T-SQL access (by checking only the Enable FILESTREAM for Transact- SQL Access option). . Allowing both T-SQL and Win32 access to FILESTREAM data (by checking the Enable FILESTREAM for File I/O Streaming Access option and providing a Windows share name to be used to access the FILESTREAM data). This allows Win32 file system interfaces to provide streaming access to the data. . Allowing remote clients to have access to the FILESTREAM data that is stored on this share (by selecting the Allow Remote Clients to Have Streaming Access to FILESTREAM Data option). NOTE You need to be Windows Administrator on a local system and have sysadmin rights to enable FILESTREAM for SQL Server. After you enable FILESTREAM in SQL Server Configuration Manager, a new share is created on the host system with the name specified. This share is intended only to allow very low-level streaming interaction between SQL Server and authorized clients. It is recommended that only the service account used by the SQL Server instance should have access to this share. Also, because this change takes place at the OS level and not from within SQL Server, you need to stop and restart the SQL Server instance for the change to take effect. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1594 1595 Using FILESTREAM Storage 4 2 After restarting the SQL Server instance to enable FILESTREAM at the Windows OS level, you next need to enable FILESTREAM for the SQL Server Instance. You can do this either through SQL Server Management Studio or via T-SQL. To enable FILESTREAM for the SQL Server instance using SQL Server Management Studio, right-click on the SQL Server instance in the Object Explorer, select Properties, select the Advanced page, and set the Filestream Access Level property as shown in Figure 42.2. The available options are . Disabled (0)FILESTREAM access is not permitted. . Transact SQL Access Enabled (1)FILESTREAM data can be accessed only by T- SQL commands. . Full Access Enabled (2)Both T-SQL and Win32 access to FILESTREAM data are permitted. You can also optionally enable FILESTREAM for the SQL Server instance using the sp_Configure system procedure, specifying the filestream access level as the setting and passing the option of 0 (disabled), 1 (T-SQL access), or 2 (Full access). The following example shows full access being enabled for the current SQL Server instance: EXEC sp_configure filestream access level, 2 GO RECONFIGURE GO FIGURE 42.2 Enabling FILESTREAM for a SQL Server Instance in SSMS. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1595 1596 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 After you configure the SQL Server instance for FILESTREAM access, the next step is to set up a database to store FILESTREAM data. Setting Up a Database for FILESTREAM Storage After you enable FILESTREAM for the SQL Server instance, you can store FILESTREAM data in a database by creating a FILESTREAM filegroup. You can do this when creating the data- base or by adding a new filegroup to an existing database. The filegroup designated for FILESTREAM storage must include the CONTAINS FILESTREAM clause and be defined. The code in Listing 42.18 creates the Customer database and then adds a FILESTREAM filegroup. LISTING 42.18 Setting Up a Database for FILESTREAM Storage CREATE DATABASE Customer ON ( NAME=Customer_Data, FILENAME=C:\SQLData\Customer_Data1.mdf, SIZE=50, MAXSIZE=100, FILEGROWTH=10) LOG ON ( NAME=Customer_Log, FILENAME=C:\SQLData\Customer_Log.ldf, SIZE=50, FILEGROWTH=20%) GO ALTER DATABASE Customer ADD FILEGROUP Cust_FSGroup CONTAINS FILESTREAM GO ALTER DATABASE Customer ADD FILE ( NAME=custinfo_FS, FILENAME = G:\SQLData\custinfo_FS) TO FILEGROUP Cust_FSGroup GO Notice in Listing 42.18 the FILESTREAM filegroup points to a file system folder rather than an actual file. This folder must not exist already (although the path up to the folder must exist); SQL Server creates the FILESTREAM folder (for example, in Listing 42.18, the custinfo_FS folder is created automatically by SQL Server in the G:\SQLData folder). The FILESTREAM files and file data actually end up being stored in the created folder. A FILESTREAM filegroup is restricted to referencing only a single file folder. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1596 1597 Using FILESTREAM Storage 4 2 Using FILESTREAM Storage for Data Columns Once FILESTREAM storage is enabled for a database, you can specify the FILESTREAM attribute on a varbinary(max) column to indicate that a column should store data in the FILESTREAM filegroup on the file system. When columns are defined with the FILESTREAM attribute, the Database Engine stores all data for that column on the file system instead of in the database file. In addition to a varbinary(max) column with the FILESTREAM attribute, tables used to store FILESTREAM data also require the existence of a UNIQUE ROWGUIDCOL, as shown in Listing 42.19, which creates a custinfo table on the FILESTREAM filegroup. CUSTDATA is defined as the FILESTREAM column, and ID is defined as the unique ROWGUID column. LISTING 42.19 Creating a FILESTREAM-Enabled Table CREATE TABLE CUSTINFO (ID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, CUSTDATA VARBINARY (MAX) FILESTREAM NULL ) FILESTREAM_ON Cust_FSGroup Each table created with a FILESTREAM column(s) creates a new subfolder in the FILESTREAM filegroup folder, and each FILESTREAM column in the table creates a separate subfolder under the table folder. These column folders are where the actual FILESTREAM files are stored. Initially, these folders are empty until you start adding rows into the table. A file is created in the column subfolder for each row inserted into the table with a non-NULL value for the FILESTREAM column. NOTE For more detailed information on how FILESTREAM data is stored and managed, see Chapter 34. To ensure that SQL Server creates a new, blank file within the FILESTREAM storage folder for each row inserted in the table, you can specify a default value of 0x for the FILESTREAM column: alter table CUSTINFO add constraint custdata_def default 0x for CUSTDATA Creating a default is not required if all access to the FILESTREAM data is going to be done through T-SQL. However, if you will be using Win32 streaming clients to upload file contents into the FILESTREAM column, the file needs to exist already. Without the default to ensure creation of a blank file for each row, new files would have to be created first by inserting contents directly through T-SQL before they could be accessed via Win32 client streaming applications. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1597 1598 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 To insert data into a FILESTREAM column, you use a normal INSERT statement and provide a varbinary(max) value to store into the FILESTREAM column: INSERT CUSTINFO (ID, CUSTDATA) VALUES (NEWID(), CONVERT(VARBINARY(MAX), REPLICATE (CUST DATA, 100000))) To retrieve FILESTREAM data, you can use a simple T-SQL SELECT statement, although you may need to convert the varbinary(max) to varchar to be able to display text data: select ID, CONVERT(varchar(40), CUSTDATA) as CUSTDATA from CUSTINFO go ID CUSTDATA ------------------------------------ ---------------------------------------------- FA67BF05-51B5-4BA7-A383-7F88DAAE9C49 CUST DATACUST DATACUST DATACUST DATACUST The preceding examples work fine if the FILESTREAM data is essentially text data; however, neither SQL Server Management Studio nor SQL Server itself really has any user interface, or native way, to let you stream the contents of an actual file into a table thats been marked with the FILESTREAM attribute on one of your varbinary(max) columns. In other words, if you have a .jpg or .mp3 file that you want to store within SQL Server, theres no native functionality to convert that images byte stream into something that you could put, for example, into a simple INSERT statement. To read or store this type of data, you need to use Win32 to read and write data to a FILESTREAM BLOB. Following are the steps you need to perform in your client applications: 1. Read the FILESTREAM file path. 2. Read the current transaction context. 3. Obtain a Win32 handle and use the handle to read and write data to the FILESTREAM BLOB. Each cell in a FILESTREAM table has a file path associated with it. You can use the PATHNAME property to retrieve the file path of a varbinary(max) column in a T-SQL statement: DECLARE @filePath varchar(max) SELECT @filePath = CUSTDATA.PathName() FROM CUSTINFO WHERE ID = FA67BF05-51B5-4BA7-A383-7F88DAAE9C49 PRINT @filepath go \\LATITUDED830-W7\FILESTREAM\v1\Customer\dbo\CUSTINFO\CUSTDATA \FA67BF05-51B5-4BA7-A383-7F88DAAE9C49 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1598 1599 Using FILESTREAM Storage 4 2 Next, to obtain the current transaction context and return it to the client application, use the GET_FILESTREAM_TRANSACTION_CONTEXT() T-SQL function: BEGIN TRAN SELECT GET_FILESTREAM_TRANSACTION_CONTEXT() After you obtain the transaction context, the next step in your application code is to obtain a Win32 file handle to read or write the data to the FILESTREAM column. To obtain a Win32 file handle, you call the OpenSqlFilestream API. The returned handle can then be passed to any of the following Win32 APIs to read and write data to a FILESTREAM BLOB: . ReadFile . WriteFile . TransmitFile . SetFilePointer . SetEndOfFile . FlushFileBuffers To summarize, the steps you perform to upload a file to a FILESTREAM column are as follows: 1. Start a new transaction and obtain the transaction context ID that can be used to initiate the Win32 file-streaming process. 2. Execute a SqlDataReader connection to pull back the full path (in SQL Server) of the FILESTREAM file to which you will be uploading data. 3. Initiate a straight file-streaming operation using the System.Data.SqlTypes.SqlFileStream class. 4. Create a new System.IO.FileStream object to read the file locally and buffer bytes along to the SqlFileStream object until there are no more bytes to transfer. 5. Close the transaction. NOTE Because youre streaming file contents via a Win32 process, you need to use integrat- ed security to connect to SQL Server because native SQL logins cant generate the needed security tokens to access the underlying file system where the FILESTREAM data is stored. To retrieve data from a FILESTREAM column to a file on the client, you primarily follow the same steps as you do for inserting data; however, instead you pull data from a SqlFileStream object into a buffer and push it into a local FILESTREAM object until there are no more bytes left to retrieve. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1599 1600 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 TIP Refer to the Managing FILESTREAM Data by Using Win32 topic in SQL Server 2008 R2 Books Online for specific C#, Visual Basic, and Visual C++ application code exam- ples showing how to obtain a Win32 file handle and use it to read and write data to a FILESTREAM column. Sparse Columns SQL Server 2008 provides a new space-saving storage option referred to as sparse columns. Sparse columns can provide optimized and efficient storage for columns that contain predominately NULL values. The NULL values require no storage space, but these space savings come at a cost of increased space for storing non-NULL values (an additional 24 bytes of space is needed for non-NULL values). For this reason, Microsoft recommends using sparse columns only when the space saved is at least 20% to 40%. However, the consensus rule of thumb that is emerging from experience with sparse columns is that it is best to use them only when more than 90% of the values are NULL. There are a number of restrictions and limitations regarding the use of sparse columns, including the following: . Sparse columns cannot be defined with the ROWGUIDCOL or IDENTITY properties. . Sparse columns cannot be defined with a default value. . Sparse columns cannot be used in a user-defined table type. . Although sparse columns allow up to 30,000 columns per table, the total row size is reduced to 8,018 bytes due to the additional overhead for sparse columns. . If a table has sparse columns, you cant compress it at either the row or page level. . Columns defined with the geography, geometry, text, ntext, timestamp, image, or user-defined data types cannot be defined as sparse columns. . You cant define varbinary(max) fields that use FILESTREAM storage as sparse columns. . You cant define a computed column as sparse, but you can use a sparse column in the calculation of a computed column. . A table cannot have more than 1,024 non-sparse columns. Column Sets Column sets provide an alternative way to view and work with all the sparse columns in a table. The sparse columns are aggregated into a single untyped XML column, which simplifies working with many sparse columns in a table. The XML column used for a column set is similar to a calculated column in that it is not physically stored, but unlike calculated columns, it is updateable. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1600 1601 Sparse Columns 4 2 There are some restrictions on column sets: . You cannot add a column set to a table that already has sparse columns. . You can define only one column set per table. . Constraints or default values cannot be defined on a column set. . Computed columns cannot contain column set columns. . A column set cannot be changed; you must delete and re-create the column set. However, sparse columns can be added to the table after a column set has been defined and is automatically included in the column set. . Distributed queries, replication, and Change Data Capture do not support column sets. . A column set cannot be part of any kind of index, including XML indexes, full-text indexes, and indexed views. NOTE Sparse columns and column sets are defined by using the CREATE TABLE or ALTER TABLE statements. This chapter focuses on using and working with sparse columns. For more information on defining sparse columns and column sets, see Chapter 24, Creating and Managing Tables. Working with Sparse Columns Querying and manipulation of sparse columns is the same as for regular columns, with one exception described later in this chapter. Theres nothing functionally different about a table that includes sparse columns, except the way the sparse columns are stored. You can still use all the standard INSERT, UPDATE, and DELETE statements on tables with sparse columns just like a table that doesnt have sparse columns. You can also wrap operations on a table with sparse columns in transactions as usual. To work with sparse columns, lets first create a table with sparse columns. Listing 42.20 creates a version of the Product table in the AdventureWorks2008R2 database and then populates the table with data from the Production.Product table. The Color, Weight, and SellEndDate columns are defined as sparse columns (the source data contains a significant number of NULL values for these columns). These columns are also defined as part of the column set, ProductInfo. LISTING 42.20 Creating a Table with Sparse Columns USE AdventureWorks2008R2 GO CREATE TABLE Product_sparse ( ProductID INT NOT NULL PRIMARY KEY, ProductName NVARCHAR(50) NOT NULL, 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1601 1602 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 Color NVARCHAR(15) SPARSE NULL, Weight DECIMAL(8,2) SPARSE NULL, SellEndDate DATETIME SPARSE NULL, ProductInfo XML COLUMN_SET FOR ALL_SPARSE_COLUMNS ) GO INSERT INTO Product_sparse (ProductID, ProductName, Color, Weight, SellEndDate) SELECT ProductID, Name, Color, Weight, SellEndDate FROM Production.Product GO You can reference the sparse columns in your queries just as you would any type of column: SELECT productID, productName, Color, Weight, SEllEndDate FROM Product_sparse where ProductID < 320 go productID productName Color Weight SEllEndDate --------- --------------------- ------------ ------------- ----------- 1 Adjustable Race NULL NULL NULL 2 Bearing Ball NULL NULL NULL 3 BB Ball Bearing NULL NULL NULL 4 Headset Ball Bearings NULL NULL NULL 316 Blade NULL NULL NULL 317 LL Crankarm Black NULL NULL 318 ML Crankarm Black NULL NULL 319 HL Crankarm Black NULL NULL Note, however, that if you use SELECT * in a query and the table has a column set defined for the sparse columns, the column set is returned as a single XML column instead of the individual columns: SELECT * FROM Product_sparse where ProductID < 320 go ProductID ProductName ProductInfo ----------- ---------------------- ---------------------------------- 1 Adjustable Race NULL 2 Bearing Ball NULL 3 BB Ball Bearing NULL 4 Headset Ball Bearings NULL 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1602 1603 Sparse Columns 4 2 316 Blade NULL 317 LL Crankarm <Color>Black</Color> 318 ML Crankarm <Color>Black</Color> 319 HL Crankarm <Color>Black</Color> You need to explicitly list the columns in the SELECT clause to have the result columns returned as relational columns. When the column set is defined, you can also operate on the column set by using XML operations instead of relational operations. For example, the following code inserts a row into the table by using the column set and specifying a value for Weight as XML: INSERT Product_sparse(ProductID, ProductName, ProductInfo) VALUES(5, ValveStem, <Weight>.12</Weight>) go SELECT productID, productName, Color, Weight, SEllEndDate FROM Product_sparse where productID = 5 go productID productName Color Weight SEllEndDate ----------- ----------- ----- ------ ----------- 5 ValveStem NULL 0.12 NULL Notice that NULL is assumed for any column omitted from the XML value, such as Color and SellEndDate in this example. When updating a column set using an XML value, you must include values for all the columns in the column set you want to set, including any existing values. Any values not specified in the XML string are set to NULL. For example, the following query sets both Color and Weight where ProductID = 5: Update Product_sparse set ProductInfo = <Color>black</Color><Weight>.20</Weight> where productID = 5 SELECT productID, productName, Color, Weight, SEllEndDate FROM Product_sparse where productID = 5 go productID productName Color Weight SEllEndDate ----------- ----------- ----- ------ ----------- 5 ValveStem black 0.20 NULL 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1603 1604 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 Now, if you run another update but only specify a value for Weight in the XML string, the Color column is set to NULL: Update Product_sparse set ProductInfo = <Weight>.10</Weight> where productID = 5 SELECT productID, productName, Color, Weight, SEllEndDate FROM Product_sparse where productID = 5 go productID productName Color Weight SEllEndDate ----------- ----------- ----- ------ ----------- 5 ValveStem NULL 0.10 NULL However, if you reference the sparse columns explicitly in an UPDATE statement, the other values remain unchanged: Update Product_sparse set Color = silver where ProductID = 5 SELECT productID, productName, Color, Weight, SEllEndDate FROM Product_sparse where productID = 5 go productID productName Color Weight SEllEndDate ----------- ----------- ------ ------ ----------- 5 ValveStem silver 0.10 NULL Column sets are most useful when you have many sparse columns in a table (for example, hundreds) and operating on them individually is cumbersome. Your client applications may more easily and efficiently generate the appropriate XML string to populate the column set rather than your having to build an UPDATE statement dynamically to deter- mine which of the sparse columns need to be included in the SET clause. Applications might actually see some performance improvement when they select, insert, or update data by using column sets on tables that have lots of columns. Sparse Columns: Good or Bad? There is some disagreement in the SQL Server community whether or not sparse columns are appropriate. A number of professionals are of the opinion that any table design that requires sparse columns is a bad design that does not follow good relational design guide- lines. Sparse columns, by their nature, are heavily denormalized. On the other hand, many times you have to live in the real world and make the best of a bad database design 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1604 1605 Spatial Data Types 4 2 that youve inherited. Sparse columns can help solve performance and storage issues in databases that may have been poorly designed. Although sparse columns can solve certain kinds of problems with database design, you should never use them as an alternative to proper database and table design. As cool as sparse columns are, they arent appropriate for every scenario, particularly when youre tempted to violate normalization rules to be able to cram more fields into a table. Spatial Data Types SQL Servers support of SQLCLR allows for very rich user-defined types to be utilized. For example, a developer could create a single object that contains multiple properties and can also perform calculations internally (methods), yet still store it in a single column in a single row in a database table. This allows multiple complex types of data to be stored and queried in the database, instead of just strings and numbers. SQL Server 2008 makes use of SQLCLR to support two new .NET CLR data types for storing spatial data: GEOMETRY and GEOGRAPHY. These types support methods and properties that allow for the creation, comparison, analysis, and retrieval of spatial data. Spatial data types provide a comprehensive, high-performance, and extensible data storage solution for spatial data, enabling organizations of any scale to integrate geospatial features into their applications and services. The GEOMETRY data type is a .NET CLR data type that supports the planar model/data, which assumes a flat projection and is therefore sometimes called flat earth. Geometry data represents information in a uniform two-dimensional plane as points, lines, and polygons on a flat surface, such as maps and interior floor plans where the curvature of the earth does not need to be taken into account. For example, perhaps your user-defined coordinate space is being used to represent a warehouse facility. Within that coordinate space, you can use the GEOMETRY data type to define areas that represent storage bays within the warehouse. You can then store data in your database that tracks which inven- tory is located in which area. You could then query the data to determine which forklift driver is closest to a certain type of item, for example. The GEOGRAPHY data type provides a storage structure for geodetic data, sometimes referred to as round-earth data because it assumes a roughly spherical model of the world. It provides a storage structure for spatial data that is defined by latitude and longitude coor- dinates using an industry standard ellipsoid such as WGS84, the projection method used by Global Positioning System (GPS) applications. The SQL Server GEOGRAPHY data type uses latitude and longitude angles to identify points on the earth. Latitude measures how far north (or south) of the equator a point is, while longitude measures how far east (or west) of a prime meridian a point is. Note that this coordinate system can be used to identify points on any spherical object, be it a baseball, the earth, or even the moon. The GEOMETRY and GEOGRAPHY data types support seven instance types that you can create and work with in a database: . POINTA POINT is an exact location and is defined in terms of an X and Y pair of coordinates, as well as optionally by Z (elevation) and M (measure) coordinates. It 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1605 1606 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 does not have a length or any area associated with it. These instance types are used as the fundamental building blocks of more complex spatial types. . MULTIPOINTA MULTIPOINT is a collection of zero or more points. . LINESTRINGA LINESTRING is the path between a sequence of points (that is, a series of connected line segments). It is considered simple if it does not cross over itself and is considered a ring if the starting point is the same as the ending point. A LINESTRING is always considered to be a one-dimensional object; it has length but does not have area (even if it is a ring). . MULTILINESTRINGA MULTILINESTRING is a collection of zero or more GEOMETRY or GEOGRAPHY LINESTRING instances. . POLYGONA POLYGON is a closed two-dimensional shape defined by a ring. It has both length and area and has at least three distinct points. A POLYGON may also have holes in its interior (a hole is defined by another POLYGON). Area within a hole is consid- ered to be exterior to the POLYGON itself. . MULTIPOLYGONA MULTIPOLYGON instance is a collection of zero or more POLYGON instances. . GEOMETRYCOLLECTIONA GEOMETRYCOLLECTION is a collection of zero or more GEOMETRY or GEOGRAPHY instances. A GEOMETRYCOLLECTION can be empty. This is simi- lar to a list or an array in most programming languages. The most generic type of collection is the GEOMCOLLECTION, whose members can be of any type. Representing Spatial Data The Open Geospatial Consortium, Inc. (OGC) is a nonprofit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location-based services. The OGC defines different ways to represent geospatial information as bytes of data that can then be interpreted by the GEOMETRY or GEOGRAPHY types as being POINTS, LINESTRINGS, and so on. SQL Server 2008 supports three such formats: . Well-Known Text (WKT) . Well-Known Binary (WKB) . Geography Markup Language (GML) For the purposes of this chapter, we stick to WKT examples because they are both concise and somewhat readable. The syntax of WKT is not too difficult to understand, so lets look at some examples: . POINT(10 100)Here, 10 and 100 represent X and Y values of the point. . POINT(10 100 10 1)This example shows Z and M values in addition to X and Y. . LINESTRING(0 0, 10 100)The first two values represent the starting point, and the last two values represent the end point of the line. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1606 1607 Spatial Data Types 4 2 . POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))Each pair of numbers represents a point on the edge of the polygon. Note that the end point is the same as the starting point. Working with Geometry Data As mentioned previously, the geometry data type is implemented as a common language runtime (CLR) data type in SQL Server and is used to represent data in a Euclidean (flat) coordinate system. The GEOMETRY type is predefined and available in each database. Any variable, parameter, or table column can be declared with the GEOMETRY data type, and you can operate on geometry data in the same manner as you would use other CLR types using the built-in methods to create, validate, and query geometry data. NOTE SQL Server provides a number of methods for the GEOMETRY and GEOGRAPHY data types. Covering all the available methods is beyond the scope of this chapter. The examples provided here touch on some of the more common methods. For more information on other GEOMETRY and GEOGRAPHY methods, refer to SQL Server 2008 Books Online. To assign a value to a column or variable of type GEOMETRY, you must use one of the static methods to parse the representation of the data into the spatial data type. For example, to parse geometry data provided in a valid WKT syntax, you can use the STGeomFromText method: Declare @geom GEOMETRY Declare @geom2 GEOMETRY SET @geom = geometry::STGeomFromText(LINESTRING (100 100, 20 180, 180 180), 0) SET @geom2 = geometry::STGeomFromText (POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0)), 0) NOTE The last parameter passed to the method is the spatial reference ID (SRID) parameter. The SRID is required. SQL Server 2008 does not perform calculations on pieces of spatial information that belong to separate spatial reference systems (for example, if one system uses centimeters and another uses miles, SQL Server simply does not have the means to automatically convert units). For the GEOMETRY type, the default SRID value is 0. The default SRID for GEOGRAPHY is 4326, which maps to the WGS 84 spatial reference system. If you are declaring a LINESTRING specifically, you can use the STLineFromText static method that accepts only valid LINESTRINGs as input: 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1607 1608 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 Declare @geom GEOMETRY SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0) The GEOMETRY type, like other SQLCLR UDTs, supports implicit conversion to and from a string. The string format supported by the GEOMETRY type for implicit conversion is WKT. Due to this feature, all the following SET statements are functionally equivalent (the last two SET statements use an implicit SRID of 0): DECLARE @geom GEOMETRY SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0) set @geom = Geometry::Parse(LINESTRING (100 100, 20 180, 180 180)) set @geom = LINESTRING (100 100, 20 180, 180 180) After defining a GEOMETRY instance, you can use the CLR UDT dot notation to access other properties and methods of the GEOGRAPHY instance. For example, the following code uses the STLength() method to return the length of the LINESTRING: DECLARE @geom GEOMETRY SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0) select @geom.STLength() as Length go Length ---------------------- 273.137084989848 The following example uses the STIntersection() method to return the points where two GEOMETRY instances intersect: DECLARE @geom1 GEOMETRY; DECLARE @geom2 GEOMETRY; DECLARE @result GEOMETRY; SET @geom1 = geometry::STGeomFromText(LINESTRING (100 100, 20 180, 180 180), 0) SET @geom2 = geometry::STGeomFromText(POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0)), 0) SELECT @result = @geom1.STIntersection(@geom2); SELECT @result.STAsText(); go ---------------------------- LINESTRING (50 150, 100 100) All the preceding examples use local variables in a batch. You also can declare columns in a table with the GEOMETRY type, and you can use the instance properties and methods against the columns as well: 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1608 1609 Spatial Data Types 4 2 CREATE TABLE #geom_demo ( GeomID INT IDENTITY NOT NULL, GeomCol GEOMETRY ) INSERT INTO #geom_demo (GeomCol) VALUES (LINESTRING (100 100, 20 180, 180 180)), (POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0))), (POINT(10 10)) SELECT GeomID, GeomCol.ToString() AS WKT, GeomCol.STLength() AS LENGTH, GeomCol.STArea() as Area FROM #geom_demo drop table #geom_demo go GeomID WKT LENGTH Area ----------- -------------------------------------------- ----------------- ------ 1 LINESTRING (100 100, 20 180, 180 180) 273.137084989848 0 2 POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0)) 600 22500 3 POINT (10 10) 0 0 Working with Geography Data The GEOGRAPHY data type is also implemented as a .NET common language runtime data type in SQL Server. Unlike the GEOMETRY data type in which locations are defined in terms of X and Y coordinates that can conceivably extend to infinity, the GEOGRAPHY type repre- sents data in a round-earth coordinate system. Whereas flat models do not wrap around, the round-earth coordinate system does wrap around such that if you start at a point on the globe and continue in one direction, you eventually return to the starting point. Because defining points on a ball using X and Y is not very practical, the GEOGRAPHY data type instead defines points using angles. The SQL Server GEOGRAPHY data type stores ellip- soidal (round-earth) data as GPS latitude and longitude coordinates. Longitude represents the horizontal angle and ranges from -180 degrees to 180 degrees, and latitude represents the vertical angle and ranges from -90 degrees to 90 degrees. The GEOGRAPHY data type provides similar built-in methods as the GEOMETRY data type that you can use to create, validate, and query geography instances. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1609 1610 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 To assign a value to a geography column or variable, you can use the STGeogFromText methods to parse the parse geometry data provided in a valid WKT syntax into a valid geography value: Declare @geog GEOGRAPHY Declare @geog2 GEOGRAPHY SET @geog = geography::STGeomFromText(LINESTRING(-122.360 47.656, -122.343 47.656), 4326) SET @geog2 = geography::STGeomFromText(POLYGON((-122.358 47.653, -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653)), 4326) As with the GEOMETRY data type, you can also use the STLineFromText static method that accepts only valid LINESTRINGS as input, or you can take advantage of the support for implicit conversion of WKT strings: DECLARE @geog GEOGRAPHY SET @geog = Geography::STLineFromText(LINESTRING (-122.360 47.656, -122.343 47.656), 4326) set @geog = Geography::Parse(LINESTRING (-122.360 47.656, -122.343 47.656)) set @geog = LINESTRING (-122.360 47.656, -122.343 47.656) The following code uses the STLength() and STArea() methods to return the length of the LINESTRING: DECLARE @geom GEOMETRY SET @geom = geometry::STLineFromText(LINESTRING (100 100, 20 180, 180 180), 0) select @geom.STLength() as Length go Length ---------------------- 273.137084989848 The preceding examples use local variables in a batch. You also can declare columns in a table using the geography data type, and you can use the instance properties and methods against the columns as well: CREATE TABLE #geog ( id int IDENTITY (1,1), 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1610 1611 Spatial Data Types 4 2 GeogCol1 GEOGRAPHY, GeogCol2 AS GeogCol1.STAsText() ); GO INSERT INTO #geog (GeogCol1) VALUES (geography::STGeomFromText (LINESTRING(-122.360 47.656, -122.343 47.656), 4326)); INSERT INTO #geog (GeogCol1) VALUES (geography::STGeomFromText (POLYGON((-122.358 47.653, -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653)), 4326)); GO DECLARE @geog1 GEOGRAPHY; DECLARE @geog2 GEOGRAPHY; DECLARE @result GEOGRAPHY; SELECT @geog1 = GeogCol1 FROM #geog WHERE id = 1; SELECT @geog2 = GeogCol1 FROM #geog WHERE id = 2; SELECT @result = @geog1.STIntersection(@geog2); SELECT Intersection = @result.STAsText(); go Intersection ------------------------------------------------- ----------------------------------------- LINESTRING (-122.3479999999668 47.656000260658459 , -122.35799999998773 47.656000130309728) Spatial Data Support in SSMS When querying spatial data in SSMS, youll find that SSMS has a built-in capability to plot and display some basic maps of your spatial data. To demonstrate this, you can run the following query in the AdventureWorks2008R2 or AdventureWorks2008 database in SSMS: select SpatialLocation from person.Address a inner join person.StateProvince sp on a.StateProvinceID = sp.StateProvinceID and sp.CountryRegionCode = US 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1611 1612 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 After the query runs, you should see a Spatial Results tab next to the Results tab (see Figure 42.3). Click on this tab, and the location points are plotted on a map. Select the Bonne Projection. If you look closely, you can see that the geographical points plotted roughly provide an outline of the United States. If you mouse over one of the points, SSMS displays the associated address information displayed in the Person.Address table. In addition to displaying maps of geography data values, SSMS can also display geometry data, showing lines, points, and polygons in an X-Y grid. For example, if you run the following query and click on the Spatial Results tab, it should display a box like the one shown in Figure 42.4: declare @smallBox GEOMETRY = polygon((0 0, 0 2, 2 2, 2 0, 0 0)); select @smallbox If you want to display multiple polygons, points, or lines together at the same time, they have to be returned as multiple rows in a single table. If you return them as multiple columns, SSMS displays only one column at a time in the Spatial Results tab. For example, if you run the following query, SSMS displays two boxes, the polygon defined by the inter- section of the two boxes, as well as the overlapping line defined by the LineString, as shown in Figure 42.5: FIGURE 42.3 Displaying a map of Person.Address records in SSMS. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1612 1613 Spatial Data Types 4 2 FIGURE 42.4 Displaying a polygon in SSMS. FIGURE 42.5 Displaying intersecting polygons and an overlapping Line in SSMS. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1613 1614 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 declare @smallBox GEOMETRY = polygon((0 0, 0 2, 2 2, 2 0, 0 0)); declare @largeBox GEOMETRY = polygon((1 1, 1 4, 4 4, 4 1, 1 1)); declare @line GEOMETRY = linestring(0 2, 4 4); select @smallBox union all select @largeBox union all select @smallBox.STIntersection(@largeBox) union all select @line Spatial Data Types: Where to Go from Here? The preceding sections provide only a brief introduction to spatial data types and how to work with geometry and geography data. For more information on working with spatial data, in addition to Books Online, you might want to visit the Microsoft SQL Server 2008 Spatial Data page at http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx. This page provides links to whitepapers and other technical documents related to working with spatial data in SQL Server 2008. In addition, all examples here deal with spatial data only as data values and coordinates. Spatial data is often most useful when it can be displayed visually, such as on a map. SQL Server 2008 R2 Reporting Services provides new map controls and a map wizard for creating map reports based on spatial data. For more information, see Chapter 53, SQL Server 2008 Reporting Services. Change Data Capture In SQL Server 2008, Microsoft introduced a new feature called Change Data Capture (CDC), which is designed to make it much easier and less resource intensive to identify and retrieve changed data from tables in an online transaction processing (OLTP) data- base. In a nutshell, CDC captures and records INSERT, UPDATE, and DELETE activity in an OLTP database and stores it in a form that is easily consumed by an application, such as a SQL Server Integration Services (SSIS) package. In the past, capturing data changes for your tables for auditing or extract, transform, and load (ETL) purposes required using replication, time stamp columns, triggers, complex queries, or expensive third-party tools. None of these other methods are easy to imple- ment, and many of them use a lot of server resources, negatively affecting the perfor- mance of the OLTP server. Change Data Capture provides for a more efficient mechanism for capturing the data changes in a table. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1614 1615 Change Data Capture 4 2 NOTE Change Data Capture is available only in the SQL Server 2008 Developer, Enterprise, and Datacenter Editions. The source of change data for Change Data Capture is the SQL Server transaction log. As inserts, updates, and deletes are applied to tables, entries that describe those changes are added to the transaction log. When Change Data Capture is enabled for a database, a SQL Server Agent capture job is created to invoke the sp_replcmds system procedure. This procedure is an internal server function and is the same mechanism used by transactional replication to harvest changes from the transaction log. NOTE If replication is already enabled for the database, the transactional log reader used for replication is also used for CDC. This strategy significantly reduces log contention when both replication and Change Data Capture are enabled for the same database. The principal task of the Change Data Capture process is to scan the log and identify changes to data rows in any tables configured for Change Data Capture. As these changes are identified, the process writes column data and transaction-related information to the Change Data Capture tables. The changes can then be read from these change tables to be applied as needed. The Change Data Capture Tables When CDC is enabled for a database and one or more tables, an associated Change Data Capture table is created for each table being monitored. The Change Data Capture tables are used to store the changes made to the data in corresponding source tables, along with some metadata used to track the changes. By default, the name of the CDC change table is schemaname_tablename_CT and is based on the name of the source table. The first five columns of a Change Data Capture change table are metadata columns and contain additional information relevant to the recorded change: . __$start_lsnIdentifies the commit log sequence number (LSN) assigned to the change. This value can be used to determine the order of the transactions. . __$end_lsnIs currently not used and in SQL Server 2008 is always NULL. . __$seqvalCan be used to order changes that occur within the same transaction. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1615 1616 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 . __$operationRecords the operation associated with the change: 1 = delete, 2 = insert, 3 = update before image(delete), and 4 = update after image(insert) . __$update_maskIs a variable bit mask with one defined bit for each captured col- umn to identify what columns were changed. For insert and delete entries, the update mask always has all bits set. Update rows have the bits set only for the columns that were modified. The remaining columns in the Change Data Capture change table are identical to the columns from the source table in name and type and are used to store the column data gathered from the source table when an insert, update, or delete operation is performed on the table. For every row inserted into the source table, a single row a single row is inserted into the change table, and this row contains the column values inserted into the source table. Every row deleted from the source table is also inserted as a single row into the change table but contains the column values in the row before the delete operation. An update operation is captured as a delete followed by an insert, so two rows are captured for each update: one row entry to capture the column values before the update, and a second row entry to capture the column values after the update. In addition to the Change Data Capture tables, the following Change Data Capture meta- data tables are also created: . cdc.change_tablesContains one row for each change table in the created when Change Data Capture is enabled on a source table. . cdc.index_columnsContains one row for each index column used by Change Data Capture to uniquely identify rows in the source table. By default, this is the column of the primary key of the source table, but a different unique index on the source table can be specified when Change Data Capture is enabled on the source table. A primary key or unique index is required on the source table only if Net Change Tracking is enabled. . cdc.captured_columnsContains one row for each column tracked in each source table. By default, all columns of the source table are captured, but you can include or exclude columns when enabling Change Data Capture for a table by specifying a column list. . cdc.ddl_historyContains a row for each Data Definition Language (DDL) change made to any table enabled for Change Data Capture. You can use this table to deter- mine when a DDL change occurred on a source table and what the change was. . cdc.lsn_time_mappingContains a row for each transaction stored in a change table and is used to map between log sequence number (LSN) commit values and the actual time the transaction was committed. Although you can query the Change Data Capture tables directly, it is not recommended. Instead, you should use the Change Data Capture functions, which are discussed later. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1616 1617 Change Data Capture 4 2 All these objects associated with a CDC instance are created in the special schema called cdc when Change Data Capture is enabled for a database. Enabling CDC for a Database Before you can begin capturing data changes for a table, you must first enable the data- base for Change Data Capture. You do this by running the stored procedure sys.sp_cdc_enable_db within the desired database context. When a database is enabled for Change Data Capture, the cdc schema, cdc user, metadata tables, as well as the system functions, are used to query for change data. NOTE To determine whether a database is already enabled for CDC, you can check the value in the is_cdc_enabled column in the sys.databases catalog view. A value of 1 indi- cates that CDC is enabled for the specified database. The following SQL code enables CDC for the AdventureWorks2008R2 database and then checks that CDC is enabled by querying the sys.databases catalog view: use AdventureWorks2008R2 go exec sys.sp_cdc_enable_db go select is_cdc_enabled from sys.databases where name = AdventureWorks2008R2 go is_cdc_enabled -------------- 1 NOTE Although the examples presented here are run against the AdventureWorks2008R2 data- base, they can also be run against the AdventureWorks2008 database. However, you should be aware that some of the column values displayed may not be exactly the same. Enabling CDC for a Table When the database is enabled for Change Data Capture, you can use the sys.sp_cdc_enable_table stored procedure to enable a Change Data Capture instance for any tables in that database. The sp_cdc_enable_Table stored procedure supports the following parameters: 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1617 1618 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 . @source_schemaSpecifies the name of the schema in which the source table resides. . @source_nameSpecifies the name of the source table. . @role_nameIndicates the name of the database role used to control access to Change Data Capture tables. If this parameter is set to NULL, no role is used to limit access to the change data. If the specified role does not exist, SQL Server creates a database role with the specified name. . @capture_instanceSpecifies the name of the capture instance used to name the instance-specific Change Data Capture objects. By default, this is the source schema name plus the source table name in the format schemaname_sourcename. A source table can have a maximum of two capture instances. . @supports_net_changesIs set to 1 or 0 to indicate whether support for querying for net changes is to be enabled for this capture instance. If this parameter is set to 1, the source table must have a defined primary key, or an alternate unique index must be specified for the @index_name parameter. . @index_nameSpecifies the name of a unique index to use to uniquely identify rows in the source table. . @captured_column_listSpecifies the source table columns to be included in the change table. By default, all columns are included in the change table. . @filegroup_nameSpecifies the filegroup to be used for the change table created for the capture instance. If this parameter is NULL or not specified, the default filegroup is used. If possible, it is recommended you create a separate filegroup from your source tables for the Change Data Capture change tables. . @allow_partition_switchIndicates whether the SWITCH PARTITION command of ALTER TABLE can be executed against a table that is enabled for Change Data Capture. The default is 1 (enabled). If any partition switches occur, Change Data Capture does not track the changes resulting from the switch. This causes data inconsistencies when the change data is consumed. The @source_schema, @source_name, and @role_name parameters are the only required parameters. All the others are optional and apply default values if not specified. To implement basic change data tracking for a table, lets first create a copy of the Customer table to play around with: select * into MyCustomer from Sales.Customer alter table MyCustomer add Primary key (CUstomerID) Now, to enable CDC on the MyCustomer table, you can execute the following: EXEC sys.sp_cdc_enable_table @source_schema = Ndbo, @source_name = NMyCustomer, @role_name = NULL 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1618 1619 Change Data Capture 4 2 NOTE If this is the first time you are enabling CDC for a table in the database, you may see the following messages, which indicate that SQL Server is enabling the SQL Agent jobs to begin capturing the data changes in the database: Job cdc.AdventureWorks2008R2_capture started successfully. Job cdc.AdventureWorks2008R2_cleanup started successfully. The Capture job that is created generally runs continuously and is used to move changed data to the CDC tables from the transaction log. The Cleanup job runs on a scheduled basis to remove older data from the CDC tables so that they dont grow too large. By default, it automatically removes data that is more than three days old. The properties of these jobs can be viewed and modified using the sys.sp_cdc_help_jobs and sys.sp_cdc_change_job procedures, respectively. To determine whether or not a source table has been enabled for Change Data Capture, you can query the is_tracked_by_cdc column in the sys.tables catalog view for that table: select is_tracked_by_cdc from sys.tables where name = MyCustomer go is_tracked_by_cdc ----------------- 1 TIP To get information on which tables are configured for CDC and what the settings for each are, you can execute the sys.sp_cdc_help_change_data_capture stored proce- dure. It reports the name and ID of the source and Change Tracking tables, the CDC table properties, the columns included in the capture, and the date the CDC was enabled/created for the source table. Querying the CDC Tables After you enable change data tracking for a table, SQL Server begins capturing any data changes for the table in the Change Data Capture tables. To identify the data changes, you need to query the Change Data Capture tables. Although you can query the Change Data Capture tables directly, it is recommended that you use the CDC functions instead. The main CDC table-valued functions (TVFs) are . cdc.fn_cdc_get_all_changes_capture_instance 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1619 1620 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 . cdc.fn_cdc_get_net_changes_capture_instance NOTE The Change Data Capture change table and associated CDC table-valued functions created along with it constitute what is referred to as a capture instance. A capture instance is created for every source table that is enabled for CDC. Each capture instance is given a unique name based on the schema and table names. For example, if the table named sales.products is CDC enabled, the capture instance created is named sales_products. The name of the CDC change table within the cap- ture instance is sales_products_CT, and the names of the two associated CDC query functions are cdc.fn_cdc_get_all_changes_sales_products and cdc.fn_cdc_get_net_changes_sales_products. Both of the CDC table-valued functions require two parameters to define the range of log sequence numbers to use as the upper and lower bounds to determine which records are to be included in the returned result set. A third required parameter, the row_filter_option, specifies the content of the metadata columns as well as the rows to be returned in the result set. Two values can be specified for the row_filter for the cdc.fn_cdc_get_all_changes_capture_instance function: all and all update old. If all is specified, the function returns all changes within the specified log sequence number (LSN) range. For changes due to an update operation, only the row containing the new values after the update are returned. If all update old is specified, the function returns all changes within the specified LSN range. For changes due to an update opera- tion, this option returns both the before and after update copies of the row. For the cdc.fn_cdc_get_net_changes_capture_instance function, three values can be specified for the row_filter parameter: all, all with mask, and all with merge. If all is specified, the function returns the LSN of the final change to the row, and the operation needed to apply the change to the row is returned in the __$start_lsn and __$operation metadata columns. The __$update_mask column is always NULL. If all with mask is specified, the function returns the LSN of the final change to the row and the operation needed to apply the change to the row. Plus, if the __$operation equals 4 (that is, it contains the after update row values), the columns actually modified in the update are identified by the bit mask returned in the __$update_mask column. If the all with merge option is passed, the function returns the LSN of the final change to the row and the operation needed to apply the change to the row. The __$operation column will have one of two values: 1 for delete and 5 to indicate that the operation needed to apply the change is either an insert or update. The column __$update_mask is always NULL. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1620 1621 Change Data Capture 4 2 So how do you determine what LSNs to specify to return the rows you need? Fortunately, SQL Server provides several functions to help determine the appropriate LSN values for use in querying the TVFs: . sys.fn_cdc_get_min_lsnReturns the smallest LSN associated with a capture instance validity interval. The validity interval is the time interval for which change data is currently available for its capture instances. . sys.fn_cdc_get_max_lsnReturns the largest LSN in the validity interval. . sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_timeAre used to corre- late LSN values with a standard time value. . sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsnCan be used to make an incremental adjustment to an LSN value. This adjustment is sometimes necessary to ensure that changes are not duplicated in consecutive query windows. So, before you can start querying the CDC tables, you need to generate some records in them by running some data modifications against the source tables. First, you need to run the statements in Listing 42.21 against the MyCustomer table to generate some records in the dbo_MyCustomer_CT Change Data Capture change table. LISTING 42.21 Some Data Modifications to Populate the MyCustomer CDC Capture Table delete MyCustomer where CustomerID = 22 Insert MyCustomer (PersonID, StoreID, TerritoryID, AccountNumber, rowguid, ModifiedDate) Values (20778, null, 9, AW + RIGHT(00000000 + convert(varchar(8), IDENT_Current(MyCustomer)), 8), NEWID(), GETDATE()) declare @ident int select @ident = SCOPE_IDENTITY() update MyCustomer set TerritoryID = 3, ModifiedDate = GETDATE() where CustomerID = @ident Now that you have some rows in the CDC capture table, you can start retrieving them. First, you need to identify the min and max LSN values to pass to the 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1621 1622 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 cdc.fn_cdc_get_all_changes_dbo_MyCustomer function. This can be done using the sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn functions. Listing 42.22 puts all these pieces together to return the records stored in the CDC capture table. LISTING 42.22 Querying the MyCustomer CDC Capture Table USE AdventureWorks2008R2 GO --declare variables to represent beginning and ending lsn DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10) -- get the first LSN for table changes SELECT @from_lsn = sys.fn_cdc_get_min_lsn(dbo_MyCustomer) -- get the last LSN for table changes SELECT @to_lsn = sys.fn_cdc_get_max_lsn() -- get all changes in the range using all update old parameter SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_MyCustomer (@from_lsn, @to_lsn, all update old); GO __$start_lsn __$seqval __$operation __$update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate ---------------------- ---------------------- ------------ -------------- ----------- ----------- ----------- ----------- ------------- ------------------------------------ ----------------------- 0x00000039000014400004 0x00000039000014400002 1 0x7F 22 NULL 494 3 AW00000022 9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263 0x00000039000014410004 0x00000039000014410003 2 0x7F 30119 20778 NULL 9 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:44.267 0x000000390000144C0004 0x000000390000144C0002 3 0x48 30119 20778 NULL 9 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:44.267 ccc0x000000390000144C0004 0x000000390000144C0002 4 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1622 1623 Change Data Capture 4 2 ccc0x48 30119 20778 NULL 3 cccAW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 ccc2010-04-27 22:38:48.263 Because the option all update old is specified in Listing 42.22, all the rows in the dbo_MyCustomer_CT capture table are returned, including the deleted row, inserted row, and both the before and after copies of the row updated. If you want to return only the final version of each row within the LSN range (and the @supports_net_changes was set to 1 when CDC was enabled for the table), you can use the cdc.fn_cdc_get_net_changes_capture_instance function, as shown in Listing 42.23. LISTING 42.23 Querying the MyCustomer CDC Capture Table for Net Changes USE AdventureWorks2008R2 GO --declare variables to represent beginning and ending lsn DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10) -- get the first LSN for table changes SELECT @from_lsn = sys.fn_cdc_get_min_lsn(dbo_MyCustomer) -- get the last LSN for table changes SELECT @to_lsn = sys.fn_cdc_get_max_lsn() -- get all changes in the range using all with_merge parameter SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@from_lsn, @to_lsn, all with merge); GO __$start_lsn __$operation __$update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate ---------------------- ------------ -------------- ----------- ----------- ----------- ----------- ------------- ------------------------------------ ----------------------- 0x00000039000014400004 1 NULL 22 NULL 494 3 AW00000022 9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263 ccc0x000000390000144C0004 5 NULL 30119 ccc20778 NULL 3 AW00030119 ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1623 1624 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 For typical ETL-type applications, querying for change data is an ongoing process, making periodic requests for all the changes that occurred since the last request which need to be applied to the target. For these types of queries, you can use the sys.fn_cdc_increment_lsn function to determine the next lowest LSN boundary that is greater than the max LSN boundary of the previous query. To demonstrate this, lets first execute some additional data modifications against the MyCustomer table: Insert MyCustomer (PersonID, StoreID, TerritoryID, AccountNumber, rowguid, ModifiedDate) Values (20779, null, 12, AW + RIGHT(00000000 + convert(varchar(8), IDENT_Current(MyCustomer)), 8), NEWID(), GETDATE()) delete MyCustomer where CustomerID = 30119 The max LSN from the previous examples is 0x000000390000144C0004. We want to incre- ment from this LSN to find the next set of changes. In Listing 42.24, you pass this value to the sys.fn_cdc_increment_lsn to set the min LSN value youll use with the cdc.fn_cdc_get_net_changes_dbo_MyCustomer function as the lower bound. LISTING 42.24 Using sys.fn_cdc_increment_lsn to Return the Net Changes to the MyCustomer CDC Capture Table Since the Last Retrieval --declare variables to represent beginning and ending lsn DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10) -- get the Next lowest LSN after the previous Max LSN SELECT @from_lsn = sys.fn_cdc_increment_lsn(0x000000390000144C0004) -- get the last LSN for table changes SELECT @to_lsn = sys.fn_cdc_get_max_lsn() -- get all changes in the range using all with_merge parameter SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@from_lsn, @to_lsn, all with merge); GO __$start_lsn __$operation __$update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate ---------------------- ------------ -------------- ----------- 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1624 1625 Change Data Capture 4 2 ----------- ----------- ----------- ------------- --------------------------------- --- ----------------------- 0x00000039000017D30004 5 NULL 30120 20779 NULL 12 AW00030120 CE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477 ccc0x00000039000017E50004 1 NULL 30119 ccc20778 NULL 3 AW00030119 ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 If you want to retrieve the changes captured during a specific time period, you can use the sys.fn_cdc_map_time_to_lsn function, as shown in Listing 42.25. LISTING 42.25 Retrieving all Changes to MyCustomer During a Specific Time Period DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10); SET @begin_time = 2010-04-27 22:38:48.250 SET @end_time = 2010-04-27 23:52:36.500 SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn (smallest greater than, @begin_time); SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn (largest less than or equal, @end_time); SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@begin_lsn, @end_lsn, all); Go __$start_lsn __$operation __$update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate ---------------------- ------------ -------------- ----------- ----------- ----------- ----------- ------------- ------------------------------------ ----------------------- 0x000000390000144C0004 4 NULL 30119 20778 NULL 3 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 ccc0x00000039000017D30004 2 NULL 30120 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1625 1626 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 ccc20779 NULL 12 AW00030120 cccCE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477 CDC and DDL Changes to Source Tables One of the common challenges when capturing data changes from your source tables is how to handle DDL changes to the source tables. This can be an issue if the downstream consumer of the changes has not reflected the same DDL changes for its destination tables. Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent DDL changes from occurring. However, Change Data Capture does help to mitigate the effect on the downstream consumers by allowing the delivered result sets that are returned from the CDC capture tables to remain unchanged even as the column structure of the underlying source table changes. Essentially, the capture process responsible for populat- ing the change table ignores any new columns not present when the source table was enabled for Change Data Capture. If a tracked column is dropped, NULL values are supplied for the column in the subsequent change entries. However, if the data type of a tracked column is modified, the data type change is also propagated to the change table to ensure that the capture mechanism does not introduce data loss in tracked columns as a result of mismatched data types. When a column is modified, the capture process posts any detected changes to the cdc.ddl_history table. Downstream consumers of the change data from the source tables that may need to be alerted of the column changes (and make similar adjustments to the destination tables) can use the stored procedure sys.sp_cdc_get_ddl_history to identify any modifications to the source table columns. So how do you modify the capture instance to recognize any added or dropped columns in the source table? Unfortunately, the only way to do this is to disable CDC on the table and re-enable it. However, in an active source environment where its not possible to suspend processing while CDC is being disabled and re-enabled, there is the possibility of data loss between when CDC is disabled and re-enabled. Fortunately, CDC allows two capture instances to be associated with a single source table. This makes it possible to create a second capture instance for the table that reflects the new column structure. The capture process then captures changes to the same source table into two distinct change tables having two different column structures. While the original change table continues to feed current operational programs, the new change table feeds environments that have been modified to incorporate the new column data. Allowing the capture mechanism to populate both change tables in tandem provides a mechanism for smoothly transitioning from one table structure to the other without any loss of change data. When the transition to the new table structure has been fully effected, the obsolete capture instance can be removed. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1626 1627 Change Tracking 4 2 Change Tracking In addition to Change Data Capture, SQL Server 2008 also introduces Change Tracking. Change Tracking is a lightweight solution that provides an efficient change tracking mechanism for applications. Although they are similar in name, the purposes of Change Tracking and Change Data Capture are different. Change Data Capture is an asynchronous mechanism that uses the transaction log to record all the changes to a data row and store them in change tables. All intermediate versions of a row are available in the change tables. The information captured is stored in a relational format that can be queried by client applications such as ETL processes. Change Tracking, in contrast, is a synchronous mechanism that tracks modifications to a table but stores only the fact that a row has been modified and when. It does not keep track of how many times the row has changed or the values of any of the intermediate changes. However, having a mechanism that records that a row has changed, you can check to see whether data has changed and obtain the latest version of the row directly from the table itself rather than querying a change capture table. NOTE Unlike Change Data Capture, which is available only in the Enterprise, Datacenter, and Developer Editions of SQL Server, Change Tracking is available in all editions. Change Tracking operates by using tracking tables that store a primary key and version number for each row in a table that has been enabled for Change Tracking. Applications can then check to see whether a row has changed by looking up the row in the tracking table by its primary key and see if the version number is different from when the row was first retrieved. One of the common uses of Change Tracking is for applications that have to synchronize data with SQL Server. Change Tracking can be used as a foundation for both one-way and two-way synchronization applications. One-way synchronization applications, such as a client or mid-tier caching application, can be built to use Change Tracking. The caching application, which requires data from a SQL Server database to be cached in other data stores, can use Change Tracking to deter- mine when changes have been made to the database tables and refresh the cache store by retrieving data from the modified rows only to keep the cache up-to-date. Two-way synchronization applications can also be built to use Change Tracking. A typical example of a two-way synchronization application is the occasionally connected applica- tionfor example, a sales application that runs on a laptop and is disconnected from the central SQL Server database while the salesperson is out in the field. Initially, the client 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1627 1628 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 application queries and updates its local data store from the SQL Server database. When it reconnects with the database later, the application synchronizes with the database, and data changes will flow from the laptop to the database and from the database to the laptop. Because data changes happen in both locations while the client application is disconnected, the two-way synchronization application must be able to detect conflicts. A conflict occurs if the same data is changed in both data stores in the time between synchronizations. The client application can use Change Tracking to detect conflicts by identifying rows whose version number has changed since the last synchronization. The application can implement a mechanism to resolve the conflicts so that the data changes are not lost. Implementing Change Tracking To use Change Tracking, you must first enable it for the database and then enable it at the table level for any tables for which you want to track changes. Change Tracking can be enabled via T-SQL statements or through SQL Server Management Studio. To enable Change Tracking for a database in SSMS, right-click on the database in Object Explorer to bring up the Properties dialog and select the Change Tracking page. To enable Change Tracking, set the Change Tracking option to True (see Figure 42.6). Also on this page, you can configure the retention period for how long SQL Server retains the Change Tracking information for each data row and whether to automatically clean up the Change Tracking information when the retention period has been exceeded. FIGURE 42.6 Enabling Change Tracking for a database. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1628 1629 Change Tracking 4 2 Change Tracking can also be enabled with the ALTER DATABASE command: ALTER DATABASE AdventureWorks2008R2 SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON) After enabling Change Tracking at the database level, you can then enable Change Tracking for the tables for which you want to track changes. To enable Change Tracking for a table in SSMS, right-click on the table in Object Explorer to bring up the Properties dialog and select the Change Tracking page. Set the Change Tracking option to True to enable Change Tracking (see Figure 42.7). The TRACK_COLUMNS_UPDATED option specifies whether SQL Server should store in the internal Change Tracking table any extra informa- tion about which specific columns were updated. Column tracking allows an application to synchronize only when specific columns are updated. This capability can improve the efficiency and performance of the synchronization process, but at the cost of additional storage overhead. This option is set to OFF by default. Change Tracking can also be enabled via T-SQL with the ALTER TABLE command: FIGURE 42.7 Enabling Change Tracking for a table. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1629 1630 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 USE [AdventureWorks2008R2] GO ALTER TABLE [dbo].[MyCustomer] ENABLE CHANGE_TRACKING WITH(TRACK_COLUMNS_UPDATED = ON) TIP To determine which tables and databases have Change Tracking enabled, you can use the sys.change_tracking_databases and sys.change_tracking_tables catalog views. Identifying Tracked Changes After Change Tracking is enabled for a table, any data modification statements that affect rows in the table cause Change Tracking information for each modified row to be recorded. To query for the rows that have changed and to obtain information about the changes, you can use the built-in Change Tracking functions. Unless you enabled the TRACK_COLUMNS_UPDATED option, only the values of the primary key column are recorded with the change information to allow you to identify the rows that have been changed. To identify the changed rows, use the CHANGETABLE (CHANGES ...) Change Tracking function. The CHANGETABLE (CHANGES ...) function takes two parame- ters: the first is the table name, and the second is the last synchronization version number. If you pass 0 for the last synchronization version parameter, you get a list of all the rows that have been modified since version 0, which means all the changes to the table since first enabling Change Tracking. Typically, however, you do not want all the rows that have changed from the beginning of Change Tracking, but only those rows that have changed since the last time you retrieved the changed rows. Rather than having to keep track of the version numbers, you can use the CHANGE_TRACKING_CURRENT_VERSION() function to obtain the current version that will be used the next time you query for changes. The version returned represents the version of the last committed transaction. Before an application can obtain changes for the first time, the application must first execute a query to obtain the initial data from the table and a query to retrieve the initial synchronization version using CHANGE_TRACKING_CURRENT_VERSION() function. The version number that is retrieved is passed to the CHANGETABLE(CHANGES ...) function the next time it is invoked. The following example illustrates how to obtain the initial synchronization version and initial data set: USE AdventureWorks2008R2 Go declare @synchronization_version bigint Select change_tracking_version = CHANGE_TRACKING_CURRENT_VERSION(); 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1630 1631 Change Tracking 4 2 -- Obtain initial data set. select CustomerID, TerritoryID, @synchronization_version as version from MyCustomer where CustomerID <= 5 go change_tracking_version ----------------------- 0 CustomerID TerritoryID ----------- ----------- 1 1 2 1 3 4 4 4 5 4 As you can see, because no updates have been performed since Change Tracking was enabled, the initial version is 0. Now lets perform some updates on these rows to effect some changes: update MyCustomer set TerritoryID = 5 where CustomerID = 4 update MyCustomer set TerritoryID = 4 where CustomerID = 5 Now you can use the CHANGETABLE(CHANGES ...) function to find the rows that have changed since the last version (0): declare @last_synchronization_version bigint set @last_synchronization_version = 0 SELECT CT.CustomerID as CustID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT Go CustID SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT ------ -------------------- ------------------ ------------------ 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1631 1632 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 4 U 0x0000000004000000 NULL 5 U 0x0000000004000000 NULL You can see in these results that this query returns the CustomerIDs of the two rows that were changed. However, most applications also want the data from these rows as well. To return the data, you can join the results from CHANGETABLE(CHANGES ...) with the data in the user table. For example, the following query joins with the MyCustomer table to obtain the values for the PersonID, StoredID, and TerritoryID columns. Note that the query uses an OUTER JOIN to make sure that the change information is returned for any rows that may have been deleted from the user table. Also, at the same time you are retrieving the data rows, you also want to retrieve the current version as well to use the next time the application comes back to retrieve the latest changes: declare @last_synchronization_version bigint set @last_synchronization_version = 0 select current_version = CHANGE_TRACKING_CURRENT_VERSION() SELECT CT.CustomerID as CustID, C.PersonID, C.StoreID, C.TerritoryID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM MyCustomer C RIGHT OUTER JOIN CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT on C.CustomerID = CT.CustomerID go current_version -------------------- 2 CustID PersonID StoreID TerritoryID SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT ----------- ----------- ----------- ----------- -------------------- ------------------ ------------------- 4 NULL 932 5 U 0x0000000004000000 NULL 5 NULL 1026 4 U 0x0000000004000000 NULL You can see in the output from this query that the current version is now 2. The next time the application issues a query to identify the rows that have been changed since this 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1632 1633 Change Tracking 4 2 query, it will pass the value of 2 as the @last_synchronization_version to the CHANGETABLE(CHANGES ...) function. CAUTION The version number is NOT specific to a table or user session. The Change Tracking version number is maintained across the entire database for all users and change tracked tables. Whenever a data modification is performed by any user on any table that has Change Tracking enabled, the version number is incremented. For example, immediately after running an update on change tracked table A in the cur- rent application and incrementing the version to 3, another application could run an update on change tracked table B and increment the version to 4, and so on. This is why you should always capture the current version number whenever you are retrieving the latest set of changes from the change tracked tables. If an application has not synchronized with the database in a while, the stored version number could no longer be valid if the Change Tracking retention period has expired for any row modifications that have occurred since that version. To validate the version number, you can use the CHANGE_TRACKING_MIN_VALID_VERSION() function. This function returns the minimum valid version that a client can have and still obtain valid results from CHANGETABLE(). Your client applications should check the last synchronization version obtained against the value returned by this function and if the last synchroniza- tion version is less than the version returned by this function, that version is invalid. The client application has to reinitialize all the data rows from the table. The following T-SQL code snippet can be used to validate the last_synchronization_version: -- Check individual table. IF (@last_synchronization_version < CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID(MyCustomer))) BEGIN -- Handle invalid version and do not enumerate changes. -- Client must be reinitialized. END Identifying Changed Columns In addition to information about which rows were changed and the operation that caused the change (insert, update, or deletereported as I, U, or D in the SYS_CHANGE_OPERATION), the CHANGETABLE(CHANGES ...) function also provides information on which columns were modified if you enabled the TRACK_COLUMNS_UPDATED option. You can use this infor- mation to determine whether any action is needed in your client application based on which columns changed. To identify whether a specific column has changed, you can use the CHANGE_TRACKING_IS_COLUMN_IN_MASK (column_id, change_columns) function. This func- 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1633 1634 CHAPTER 42 Whats New for Transact-SQL in SQL Server 2008 tion interprets the SYS_CHANGE_COLUMNS bitmap value returned by the CHANGETABLE(CHANGES ...) function and returns a 1 if the column was modified or 0 if it was not: declare @last_synchronization_version bigint set @last_synchronization_version = 0 SELECT CT.CustomerID as CustID, TerritoryChanged = CHANGE_TRACKING_IS_COLUMN_IN_MASK (COLUMNPROPERTY(OBJECT_ID(MyCustomer), TerritoryID, ColumnId), CT.SYS_CHANGE_COLUMNS), CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS FROM CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT go CustID TerritoryChanged SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS ----------- ---------------- -------------------- ------------------ 4 1 U 0x0000000004000000 5 1 U 0x0000000004000000 In the query results, you can see that both update operations (SYS_CHANGE_OPERATION = U) modified the TerritoryID column (TerritoryChanged = 1). Change Tracking Overhead Although Change Tracking has been optimized to minimize the performance overhead on DML operations, it is important to know that there are some performance overhead and space requirements within the application databases when implementing Change Tracking. The performance overhead associated with using Change Tracking on a table is similar to the index maintenance overhead incurred for insert, update, and delete operations. For each row changed by a DML operation, a row is added to the internal Change Tracking table. The amount of overhead incurred depends on various factors, such as . The number of primary key columns . The amount of data being changed in the user table row . The number of operations being performed in a transaction . Whether column Change Tracking is enabled Change Tracking also consumes some space in the databases where it is enabled as well. Change Tracking data is stored in the following types of internal tables: . Internal change tablesThere is one internal change table for each user table that has Change Tracking enabled. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1634 1635 Summary 4 2 . Internal transaction tableThere is one internal transaction table for the database. These internal tables affect storage requirements in the following ways: . For each change to each row in the user table, a row is added to the internal change table. This row has a small fixed overhead plus a variable overhead equal to the size of the primary key columns. The row can contain optional context information set by an application. In addition, if column tracking is enabled, each changed column requires an additional 4 bytes per row in the tracking table. . For each committed transaction, a row is added to an internal transaction table. If you are concerned about the space usage requirements of the internal Change Tracking tables, you can determine the space they use by executing the sp_spaceused stored proce- dure. The internal transaction table is called sys.syscommittab. The names of the internal change tables for each table are in the form change_tracking_object_id. The following example returns the size of the internal transaction table and internal change table for the MyCustomer table: exec sp_spaceused sys.syscommittab declare @tablename varchar(128) set @tablename = sys.change_tracking_ + CONVERT(varchar(16), object_id(MyCustomer)) exec sp_spaceused @tablename Summary Transact-SQL has always been a powerful data access and data modification language, providing additional features, such as functions, variables, and commands, to control execution flow. SQL Server 2008 further expands the power and capabilities of T-SQL with the addition of a number of new features. These new T-SQL features can be incorporated into the building blocks for creating even more powerful SQL Server database compo- nents, such as views, stored procedures, triggers, and user-defined functions. In addition to the powerful features available in T-SQL for developing SQL code and stored procedures, triggers, and user-defined functions, SQL Server 2008 also enables you to define custom-managed database objects such as stored procedures, triggers, functions, data types, and custom aggregates using .NET code. The next chapter, Creating .NET CLR Objects in SQL Server 2008, provides an overview of using the .NET common language runtime (CLR) to develop these custom-managed objects. 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1635 42_9780672330568_ch42.qxp 8/19/10 3:07 PM Page 1636