Sie sind auf Seite 1von 49

SQL Server Performance DBCC showcontig (dwg_record)................................................................................2 SQL Server Index Fragmentation and Its Resolution.................................................

.2 How to Determine If an Index is fragmented?...........................................................3 Resolving Fragmentation Issues................................................................................6 What is the effect of DBCC CHECKDB and DBCC DBREINDEX on the Transaction log? ..................................................................................................................................7 I want to add an identity or unique identifier column to a table in order to ensure that each row is unique. For best performance, which one should I use?..................8 How do I estimate how large a database will grow?................................................10 Can you provide some ways that I might be able to avoid using cursors in my applications?...........................................................................................................12 Experiencing a major performance loss with an Indexed view that is using a INDEX SCAN on a table inspite of relevant clustered indexes on the joined tables?..........13 Can you tell me more about the processor queue length?......................................14 Is it better to have one large server running multiple databases, or several smaller servers running one database each?.......................................................................15 Our SQL Server is experiencing lots of blocking, but we don't have any resource bottlenecks.............................................................................................................16 Reducing SQL Server Deadlocks.............................................................................17 Reducing SQL Server Locks.....................................................................................20 How can I help identify the worst performing queries in my SQL Server database application?.............................................................................................................26 What is the difference between a table and index scan in an execution plan?........27 Why doesn't query parallelism use all of the available CPUs?.................................28 Why can it take so long to drop a clustered index?.................................................29 Why is my Memory: pages/sec counter exceeding 20 for my SQL Server?.............30 What happens during the rebuild of a clustered index?..........................................32 What happens when SQL Server executes a stored procedure or query?...............33 Index Covering Boosts SQL Server Query Performance...........................................34 Tips on Optimizing Covering Indexes......................................................................36 What's the difference between a covered query and a covering index?.................38 Functional Dependency (Normalization)..................................................................40 ...............................................................................................................................43

Page 1 of 49

SQL Server Performance

DBCC showcontig (dwg_record)


TABLE level scan performed. - Pages Scanned................................: - Extents Scanned..............................: - Extent Switches..............................: - Avg. Pages per Extent........................: - Scan Density [Best Count:Actual Count].......: - Logical Scan Fragmentation ..................: - Extent Scan Fragmentation ...................: - Avg. Bytes Free per Page.....................: - Avg. Page Density (full).....................:

43867 5493 5492 8.0 99.84% [5484:5493] 0.00% 0.15% 733.2 90.94%

SQL Server Index Fragmentation and Its Resolution


While there is no doubt of the benefits of adding indexes to your tables, and for the most part you have to do little work to keep the indexes maintained, some maintenance is required as indexes can become fragmented during data modifications. This fragmentation can become a source of performance issues with your queries. So what exactly is index fragmentation? Index fragmentation actually comes in two different forms: external fragmentation and internal fragmentation. Each of these two forms of fragmentation basically is the inefficient use of pages within an index. This inefficient use may be because the logical order of the pages are wrong (external fragmentation) or because the amount of data stored within each page is less than the data page can contain (internal fragmentation). Whichever type of fragmentation occurs in your index, you could face performance issues with your queries because of the fragmentation. External Fragmentation External fragmentation occurs when an index leaf page is not in logical order. When an index is created, the index keys are placed in a logical order on a set of index pages. As new data is inserted into the index, it is possible for the new keys to be inserted in between existing keys. This may cause new index pages to be created to accommodate any existing keys that were moved so that the new keys can be inserted in correct order. These new index pages usually will not be physically adjacent to the pages the moved keys were originally stored in. It is this process of creating new pages that causes the index pages to be out of logical order. The following example will explain this concept a little clearer than actual words. Assume this is the existing structure on an index on your table before any additional data inserts.

Page 2 of 49

SQL Server Performance

An INSERT statement adds new data to the index. In this case we will add a 5. The INSERT will cause a new page to be created and the 7 and 8 to be moved to the new page in order to make room for the 5 on the original page. This creation will cause the index pages to be out of logical order.

In cases of queries that have specific searches or that return unordered result sets, the index pages being out of order do not pose a problem. For queries that return ordered result sets, extra processing is needed to search the index pages that are not in order. An example of an ordered result set would be a query that is returning everything from 4 to 10. This query would have to complete an extra page switch in order to return the 7 and 8. While one extra page switch is nothing in the long run, imagine this condition on a very large table with hundreds of page out of order. Internal Fragmentation Internal fragmentation occurs when the index pages are not being used to their maximum volume. While this may be an advantage on an application with heavy data inserts, setting a fill factor causes space to be left on index pages, severe internal fragmentation can lead to increased index size and cause additional reads to be performed to return needed data. These extra reads can lead to degradation in query performance.

How to Determine If an Index is fragmented?


SQL Server provides a database command, DBCC SHOWCONTIG, to use to determine if a particular table or index has fragmentation. DBCC SHOWCONTIG Database console command that displays fragmentation information for the data and indexes of the specified table. Permissions default to members of the sysadmin server role, the db_owner and db_ddladmin database roles and the table owner and are not transferable. Syntax (SQL Server 2000) DBCC SHOWCONTIG [ ( { table_name | table_id| view_name | view_id } [ , index_name | index_id ] ) ] [ WITH { ALL_INDEXES | FAST [ , ALL_INDEXES ] | TABLERESULTS [ , { ALL_INDEXES } ] [ , { FAST | ALL_LEVELS } ] } ]

Page 3 of 49

SQL Server Performance Examples: Query to show fragmentation information on all indexes in a database --Show fragmentation information on all indexes in a database --Clean up the display SET NOCOUNT ON --Use the pubs database USE pubs DBCC SHOWCONTIG WITH ALL_INDEXES GO Query to show fragmentation information on all indexes on a table --Show fragmentation information on all indexes on a table --Clean up the display SET NOCOUNT ON --Use the pubs database USE pubs DBCC SHOWCONTIG (authors) WITH ALL_INDEXES GO Query to show fragmentation information on a specific index --Show fragmentation information on a specific index --Clean up the display SET NOCOUNT ON --Use the pubs database USE pubs DBCC SHOWCONTIG (authors,aunmind) GO

Result Set DBCC SHOWCONTIG will return the number of pages scanned, the number of extents scanned, the number of times the DBCC statement moved from one extent to another while it traversed the pages of the table or index, the average number of pages per extent, the scan density (best count is the ideal number of extent changes if everything is contiguously linked).
DBCC SHOWCONTIG scanning 'authors' table... Table: 'authors' (1977058079); index ID: 1, database ID: 5 TABLE level scan performed. - Pages Scanned................................: 1 - Extents Scanned..............................: 1 - Extent Switches..............................: 0 - Avg. Pages per Extent........................: 1.0 - Scan Density [Best Count: Actual Count]......: 100.00% [1:1] - Logical Scan Fragmentation ..................: 0.00% - Extent Scan Fragmentation ...................: 0.00% - Avg. Bytes Free per Page.....................: 6002.0 - Avg. Page Density (full).....................: 25.85% DBCC execution completed. If DBCC printed error messages, contact your system administrator.

What to Look For Pages Scanned: If you know the approximate row size and the number of rows contained in your table or index, you can estimate the number of pages there should be in that index. Look at

Page 4 of 49

SQL Server Performance

the number of pages scanned and if it is significantly higher than the number of pages you estimated, you have internal fragmentation. Extents Scanned: Take the number of pages scanned and divide that number by 8, rounded to the next highest interval. This figure should match the number of extents scanned returned by DBCC SHOWCONTIG. If the number returned by DBCC SHOWCONTIG is higher, then you have some external fragmentation. The seriousness of the fragmentation depends on just how high the shown value is from the estimated value. Extent Switches: This number should be equal to (Extents Scanned - 1). Higher numbers indicate external fragmentation. Avg. Pages per Extent: This number is the Pages Scanned / Extents Scanned and should be 8. Numbers lower than 8 indicate external fragmentation. Scan Density [Best Count: Actual Count]: One of the most useful of the percentages returned by DBCC SHOWCONTIG. This is the ratio between the Best Count of extents and the Actual Count of extents. This percentage should be as near to 100% as possible. Lower percentages indicate external fragmentation. Logical Scan Fragmentation: Shows the ratio of pages that are out of order. This percentage should be between 0% and 10% with anything higher indicating external fragmentation. Extent Scan Fragmentation: Shows any gaps between extents. This percentage should be 0% and higher percentages indicate external fragmentation. Avg. Bytes Free per Page: Shows the average number of free bytes in a page. Higher numbers show internal fragmentation but you should take the fill factor into account before you let the higher numbers determine if you have internal fragmentation or not. Avg. Page Density (full): Shown as a percentage that is the inverse of Avg. Bytes Free per Page. Lower percentages indicate internal fragmentation. Remarks DBCC SHOWCONTIG is really only useful for large tables. Smaller tables will show results that do not meet the normal standards simply because they may not be made up of more than 8 pages. This small size will throw off what you should look for in the results of DBCC SHOWCONTIG. You should only worry about the results shown for: Extent Switches, Logical Scan Fragmentation, Avg. Bytes Free per Page, and Avg. Page Density (full) when dealing with smaller tables. The information output by DBCC SHOWCONTIG defaults to pages scanned, extents scanned, extent switches, avg. pages per extent, scan density [best count : actual count], logical scan fragmentation, extent scan fragmentation, avg. bytes free per page and avg. page density (full); this output can be controlled with the FAST and TABLERESULTS options. The FAST option specifies whether to perform a fast scan of the index and output minimal information, this option does not read the leaf or data level pages of the index and will only return pages scanned, extent switches, scan density [best count : actual count], and logical scan fragmentation.

Page 5 of 49

SQL Server Performance

The TABLERESULTS option will display the information as a rowset and will return extent switches, avg. bytes free per page, avg. page density (full), scan density, best count, actual count, logical fragmentation, and extent fragmentation. Specifying both the FAST and TABLERESULTS options will return object name, object id, index name, index id, pages, extent switches, scan density, best count, actual count, and logical fragmentation. The ALL_INDEXES option will display results for all the indexes for the specified tables and views, even if a particular index is specified. The ALL_LEVELS option will specifies whether to produce output for each level of each index processed (default is to output only the index leaf level or table data level) and can only be used with the TABLERESULTS option. Resolving Fragmentation Issues Once you determine that you table or index has fragmentation issues, you have four choices to resolve those issues: 1. 2. 3. 4. Drop and Recreate the Index Recreate the Index with the DROP_EXISTING clause Execute DBCC DBREINDEX Execute DBCC INDEXDEFRAG

While each of these techniques will achieve your ultimate purpose of defragmenting your index, each have their own pros and cons. Drop and Recreate the Index Some of the disadvantages of dropping and recreating an index with either DROP INDEX and CREATE INDEX or ALTER TABLE includes the disappearance of the index while you are dropping and recreating it. As the index is dropped and recreated, it is no longer available for queries and query performance may suffer dramatically until you can rebuild the index. Another disadvantage of dropping and recreating an index is the potential to cause blocking as all requests to the index are blocked until the index is rebuilt. This technique can also face blocking by other processes, as the process of dropping the index cannot take place while other processes are using the index. Another major disadvantage of this technique is that rebuilding a cluster index with DROP INDEX and CREATE INDEX has the effect of rebuilding all of the nonclustered indexes twice. Once as the clustered index is dropped and the nonclustered index row pointers are pointed to the data heap and again as the clustered index is rebuilt and the nonclustered index row pointers are pointed back to the clustered index row locations. Dropping and rebuilding an index does have the advantage of completely rebuilding an index which does reorders the index pages, compacting the pages, and dropping any unneeded pages. You may need to consider dropping and rebuilding indexes that show high levels of both internal and external fragmentation to get those indexes back to where they should be. Recreate the Index With the DROP_EXISTING Clause To avoid the processing it takes to rebuild the nonclustered indexes twice on a table that you rebuild a clustered index, you can use the CREATE INDEX with DROP_EXISTING clause. This clause will keep the clustered index key values, avoiding the need to rebuild the

Page 6 of 49

SQL Server Performance

nonclustered indexes twice. Like the regular DROP INDEX and CREATE INDEX technique this technique can cause/face blocking problems and index disappearance problems. Another disadvantage is that using the CREATE INDEX with DROP_EXISTING technique also forces you to find and repair each index on the table separately. Besides the advantages associated with the regular drop and recreate technique and the advantage of not having to rebuild nonclustered indexes twice using the CREATE INDEX with DROP_EXISTING clause can be used for indexes with constraints provided that the index definition exactly matches the requirements of the constraints. Execute DBCC DBREINDEX DBCC DBREINDEX is similar to CREATE INDEX with DROP_EXISTING, but it will rebuild the index physically allowing SQL Server to assign new pages to the index and reduce both internal and external fragmentation. DBCC DBREINDEX also has the ability to recreate indexes with constraints dynamically, unlike CREATE INDEX with DROP_EXISTING. The disadvantages of DBCC DBREINDEX is that it faces the problem of causing/facing blocking and DBCC DBREINDEX is executed within a transaction so if it is stopped before completion, you lose all the defragmentation that had been performed. Execute DBCC INDEXDEFRAG DBCC INDEXDEFRAG (available in SQL Server 2000) reduces external fragmentation by rearranging the existing leaf pages of an index to the logical order of the index key and internal fragmentation by compacting the rows within index pages then discarding unneeded pages. It does not face the blocking problems of the other techniques but the results of DBCC INDEXDEFRAG are not as complete as the other techniques. This is because DBCC INDEXDEFRAG skips locked pages due to its dynamic nature and does not use any new pages to reorder the index. You may also discover that the time needed by DBCC INDEXDEFRAG is longer than recreating an index if the amount of fragmentation is large. DBCC INDEXDEFRAG does have the advantage over the other techniques due to its ability to work on defragmenting an index while other processes are accessing the index, eliminating the blocking problems of the other techniques.

What is the effect of DBCC CHECKDB and DBCC DBREINDEX on the Transaction log?
As you may be aware DBCC DBREINDEX can be used to rebuild one or more indexes for a specific table and it is an offline operation. While this operation is running, the underlying table is unavailable to users of the database. DBCC DBREINDEX rebuilds indexes dynamically. During this operation it restores the page density levels to the original fillfactor (default); or the user can choose another target value for the page density. Internally, running DBCC DBREINDEX is very similar to using TSQL statements to drop and re-create the indexes manually. Internally DBREINDEX process occurs as a single, atomic transaction. To reduce the fragmentation, new indexes must be completely built and in place before the old index pages

Page 7 of 49

SQL Server Performance

are released. Performing the rebuild requires adequate free space in the data files. At this point all the operation involved in this process is logged by default and attempts to write to the disk whenever CHECKPOINT or BACKUP LOG process is initiated. So to accomodate the changes, SQL Server looks for free space in data and log files and if there is insufficient free space in the data file(s), DBCC DBREINDEX may be unable to rebuild the indexes, or the indexes may be rebuilt with logical fragmentation values above zero. The amount of free space needed varies and is dependent on the number of indexes being created in the transaction. DBCC CHECKDB process performs a physical consistency check on indexed views and validates the integrity of every object in a database by collecting the information, and then scans the log for any additional changes made, merging the two sets of information together to produce a consistent view of the data at the end of the scan. This process involves excessive locking and in older versions of SQL Server (6.5 and 7.0) this has negative affect by taking the database essentially offline. In SQL Server 2000 it uses and reads the database transaction log to get a consistent view in order to run the CHECKDB online effectively. The process of involving the transaction log is read from the LSN of the 'begin tran' log record of the oldest transaction that is active at the time the database scan started, to the LSN at the time the database scan stops. The affect of REDO and UNDO of the transactions are as follows:

Log records from transactions that commit during that time are used to generate REDO facts. In the scenario of a row insertion record this would produce a REDO fact of 'a row with these index keys was inserted into page A of table B, index C at slot position X'.

Log records from transactions that rollback or don't commit during that time are used to generate UNDO facts. A row insert record would produce an UNDO fact of 'a row with these index keys was removed from page A of table B, index C at slot position X'

The above REDO and UNDO process will use the log extensively by affecting the volatile changes to available free space. It is recommended to have at least 60% free space available in the transaction log if you are executing the DBREINDEX and CHECKDB statements on larger tables. Also it is a best practice to perform frequent database transaction log backups during this operation in order to keep up the size of database data & log files.

I want to add an identity or unique identifier column to a table in order to ensure that each row is unique. For best performance, which one should I use?
One of the key aspects of database table design is to ensure what is called entity integrity. What this means is that you need to ensure that each row in your database tables is unique. If you

Page 8 of 49

SQL Server Performance

don't take the proper precautions, it is possible that one or more rows of your table might be the same. If this happens, what does this really mean (duplicate records)? One of the more common ways to guarantee entity integrity is to create a primary key. A primary key, which is an index created on one or more columns, is used to guarantee that no duplicate records are entered into a table. If you try to enter a duplicate record, SQL Server won't let you and will give you an error message. Some tables lend themselves to uniqueness and it is easy to find a column that you know will always be unique, and adding a primary key to the column is no problem. But in other cases, there is no single column that is unique. Instead, it may take two, three, four, or more columns to uniquely identify a column. If you want, you can create a primary key on multiple columns. Unfortunately, creating a primary key on multiple columns has a major drawback. And that is the index that is created to enforce each row's uniqueness may get very large. The wider the index, the larger the physical index will be. This can hurt performance because it takes SQL Server more resources to maintain or to query wider indexes than it takes it to maintain or to query narrow indexes. So is there any way to avoid creating primary keys with wide indexes? Yes, and that brings us to the point of our question. Instead of creating a primary key with a wide index, what you can do is to add either an identity or a unique identifier column to the table, and use this column as the primary key. This will greatly reduce the width of the index created, reducing the physical size of the index, and speeding up SQL Server's access to the table. What an identify field or a unique identify column does is to automatically create a unique value for each row created in your database. You don't have to worry about creating these values for yourself, this is done automatically by SQL Server. But what you must decide is whether to use an identify field or a unique identity column. Ideally, from a performance perspective, you want to use the one that creates the smallest possible physical index in order to maximize performance. So which one of these types of columns should you use? Let's first look at identify values. An identify value is an automatically incrementing integer that starts counting from a base seed value (such as 1, or 100, or 1000, whatever integer you choose), and increments by a value you specify, which is called an increment (such as 1, -1, 10, whatever integer you choose). The default seed is 1, and the default increment is 1. Because the

Page 9 of 49

SQL Server Performance

values are never the same (they increment), an identify value often makes for a good column on which to base a primary key. When you create an identity column in a table, you must also specify the data type as some form of integer. As you probably know, SQL Server supports several different integer types, each with different sizes (widths). They include: Bigint Int Smallint Tinyint 8 bytes 4 bytes 2 bytes 1 byte -263 to 263-1 (SQL Server 2000 and 2005 only) -2,147,483,648 to 2,147,483,647 -32,768 to 32,767 0 to 255

When creating an identify column, always select the narrowest integer type that will meet your needs. If an Int will do, then don't use Bigint, as you will just unnecessary be increasing the physical size of the index, which can hurt SQL Server's performance. A unique identifier column, like an identify column, is used to create unique values for each column in your table. But instead of using an incrementing value, instead it creates a unique value based on a special internal algorithm that creates what are called a Globally Unique Identifier (GUID), which is 16 bytes in size. Right away, you should notice which unique column type identify or unique identifier will provide the best performance as a primary key. Because the unique identifier column will always be 16 bytes wide, it will always create a primary key index that is larger than any of the identity column options, and unless you have a special reason that is outside the scope of this question, should not be used as the basis as a primary key. Instead, use one of the variations of the identity column, selecting the smallest one that meets your data's needs. This way, you can ensure that index created by the primary key is as small as it can, helping to ensure overall better performance of your database, and at the same time ensuring entity integrity of your database tables.

How do I estimate how large a database will grow?


Unfortunately, SQL Server does not come with any built-in tools to estimate database size. If you don't have a third-party estimating tool, you have to do it the hard way, which is by calculating the space taken up by each table, and then adding the total space needed for each

Page 10 of 49

SQL Server Performance

table to get a grand total for the database. Here are some guidelines for estimating the space needed for a database.

For each table, find out how many bytes each row will use up on average. It is easy to calculate the size of fixed length columns, but calculating the space used by variable length fields is much more difficult. About the only way to do this is to get some of the actual data that will be stored in each variable length column, and then based on the data you have, estimate the average byte length of each variable length column. Once you know the typical sizes of each column, you can then calculate the size of each typical row in your table.

The above step is a good start, but it usually significantly underestimates the amount of space you need. Besides the actual data, you must also estimate the type and number of indexes you will use for each table. Indexes can use a huge amount of space, and you must estimate how much space you think they will take. This is a function of the type of index (clustered or non-clustered, the number of indexes, and the width of the indexes).

Besides estimating the size of the indexes, you also must take into consideration the Fillfactor and Pad Index used when the indexes are created. Both of these affect how much empty space is left in an index, and this empty space must be included in your estimate.

And one more factor affecting how much space it takes to store data in a table is how many rows can be fitted onto one SQL Server 8K data page. Depending on the size of each row, it is likely that not all of the space in each data page is fully used. This must also be accounted for when estimating page size.

While tables, and their associated indexes take up most of the physical space in most databases, keep in mind that every object in SQL Server takes up space, and must be accounted for.

As you can see, without a tool to help out, manually estimating database size is not a fun task, and it is, at best, only a rough estimate. Another option you might consider, assuming that you already have an existing database with data, is to extrapolate the current size of your database on a table by table basis. For example, if you know that a particular table has 100,000 rows, and it is 1MB in size, then assuming that neither indexing or the fillfactor changes, than when the table gets to 200,000 rows, that it should be about 2MB in size. If you do this for every table, then you can get a fairly good idea

Page 11 of 49

SQL Server Performance

on how much disk space you will need in the future. To find out how much space a particular table uses, use this command: sp_spaceused '

Can you provide some ways that I might be able to avoid using cursors in my applications?
A SQL Server cursor should only be considered in situations where you need to scroll through a set of rows, and then based on criteria you specify, do something potentially different to each row (and in many cases even this can be done using a standard query). If what you need to do to each row is the same, then you should definitely avoid a cursor, and instead use a TransactSQL query. Keep in mind that one of the biggest benefits of using a relational database, such as SQL Server, is that is acts on an entire sets of records in one fell swoop. This results in very fast performance. But if you have to perform different actions on each different record, then you often have to use a cursor to accomplish your goal. Because records have to be examined oneat-a-time, cursors often result in poor performance. While it is true that a query will always outperform a cursor (assuming they are performing the same task), this doesn't always mean that you shouldn't use a cursor in some cases. For example, sometimes I need to perform a fairly simple task on an occasional basis. In these cases, I often use a cursor because they are fairly easy to write, and because performance is not an issue for the task at hand. On the other hand, if the task is repeated often, and performance is an issue, then you should avoid cursors if at all possible. Some ways to avoid cursors include:

Rewriting the cursor as a normal query. Some people write cursors that perform the same task over and over on a set of records. This is a waste of server resources because this could be easily handled by a standard query. And even if what you need to do to each row is conditional on data in a row, you still may be able to to use a standard query using a CASE statement.

Rewriting the cursor as a derived query. Rewriting the cursor using temporary tables in a query. Rewriting the cursor using table variables in a query (SQL Server 2000 or 2005).

Page 12 of 49

SQL Server Performance

If you find that you have to use a cursor, then try to use a FAST-FORWARD, READ-ONLY cursor, which is the cursor that uses the least resources.

Experiencing a major performance loss with an Indexed view that is using a INDEX SCAN on a table inspite of relevant clustered indexes on the joined tables?
Having a major performance loss with an Indexed view that is using a INDEX SCAN on a table inspite of relevant clustered indexes on the joined tables? A bit of background on this issue is first query doesn't use indexed view. It uses index scan on table that has an inner join nested loops with IndexSeek on another table. Index scan cost is 0.17 + other operations. Second query is forced to use index view, it's index seek. Cost is 0.003 Third query is using index scan on indexed view. Why indexed view is not used for first query? It's much cheaper. I run enterprise edition (you can see that third query uses indexed view w/o NOEXPAND. Its important to know about Indexed View before going into details, consider these few guidelines when designing indexed views:
o

Design indexed views that can be used by several queries or multiple operations. For example, an indexed view that contains the SUM of a column and the COUNT_BIG of a column can be used by queries that contain the functions SUM, COUNT, COUNT_BIG, or AVG. The queries will be faster because only a small number of rows from the view need to be retrieved rather than the full number of rows from the base tables and a portion of the computations required for performing the AVG function have already been done.

o o o

Keep the index key compact. Consider the size of the resulting indexed view. In the case of pure aggregation, the indexed view may not provide any significant performance gains if its size is similar to the size of the original table.

Design multiple smaller indexed views that accelerate parts of the process.

Indexed views are similarly maintained; however, if the view references several tables, updating any of them may require updating the indexed view. Unlike ordinary indexes, a single row insert into any of the participating tables may cause multiple row changes in an indexed view. This is because the single row may join with multiple rows of another table. The same is true for updates and deletes. Consequently, the maintenance of an indexed view may be more

Page 13 of 49

SQL Server Performance

expensive than maintaining an index on the table. Conversely, the maintenance of an indexed view with a highly selective condition may be much less expensive than maintaining an index on a table because most inserts, deletes and updates to base tables the view references will not affect the view. These operations can be filtered out with respect to the indexed view without accessing other database data. As a general recommendation, any modifications or updates to the view or the base tables underlying the view should be performed in batches if possible, rather than singleton operations. This may reduce some overhead in the view maintenance. The user can achieve the performance after taking into consideration of table structure & schema with proper indexing strategy and using hints such as NOEXPAND will gain the performance.

Can you tell me more about the processor queue length?


The processor counter, processor queue length, refers to the number of threads that are in the server's processor queue, waiting to be executed by the CPU. All servers, whether they have a single CPU, or multiple CPUs, have only one processor queue. As processes execute on your server, they spawn one or more threads that need to be executed by the CPU or CPUs on the servers. The processor queue length is a measurement of the last observed value, and it is not an average of any kind. If the CPU or CPUs are not busy when a thread in put into the processor queue, it is immediately executed by a CPU. But if all of the available CPUs are busy executing threads, then incoming threads have to wait in the processor queue until there is a CPU available to process the waiting thread. As SQL Server becomes busier, it can send threads to the CPU or CPUs faster than they can be executed, causing the processor queue to fill. As a rule of thumb, if the processor queue exceeds a total of 2 threads, this is a strong indication that the CPU or CPUs in your server have become a bottleneck. On occasion, the processor queue length will spike over 2 for short periods of time. This is normal and is not a reason to worry. But if you see a processor queue length of 2 or greater for extended periods of time (say 5 minutes or more) on a regular basis, then most likely your server is experiencing a CPU bottleneck.

Page 14 of 49

SQL Server Performance

Generally, when evaluating a server for potential CPU bottlenecks, I like to not only watch the processor queue length, I also like to watch the % Processor Time for all of the CPUs in the server. If I see that the processor queue length is often over 2, and I see that the Total % Processor Time is over 80% for all of the CPUs, then I can safely state that the server is experiencing a CPU bottleneck.

Is it better to have one large server running multiple databases, or several smaller servers running one database each?
Assuming you have the budget to spend, you will get better performance by "scaling out" your SQL Servers onto multiple SQL Server servers than running them all on the same physical server. The reason why this is true is obvious to DBAs, but trying to justify this extra cost to managers or accountants is not always easy. Here's some ammunition you can use to help build your case for running multiple SQL Servers instead of one real big SQL Server. Benefits of Using Multiple Smaller SQL Servers Instead of One Large SQL Server

In fact, purchasing several smaller SQL Servers may be less expensive than buying one huge SQL Server, although this may not seem logical to many people not familiar with server hardware. Generally, smaller servers can be purchased at commodity prices, while very large servers are special orders and have a cost premium associated with them.

In many cases, your current, older SQL Server may not be upgradeable, or if it is, it may not be cost-effective to upgrade it as compared to purchasing new physical servers.

As physical servers get larger (more CPUs), there is more and more CPU overhead generated, which in effect reduces the overall performance of the server. Each additional CPU adds a decreasing amount of additional CPU power. Two 4-CPU servers are more efficient from a CPU perspective than a single 8-CPU server is.

Some databases need to be tuned differently than other databases. If all of your databases are located on a single server, then you can't take advantage of SQL Serverwide performance tuning techniques to tune each separate database, as each database will have to share all SQL Server-wide performance tuning settings.

If the single, large server goes down, then all the databases go down. If the databases are on separate servers, then only one database goes down.

Page 15 of 49

SQL Server Performance

Most SQL Server-based applications grow over time. If each database is located on its own server, then it is easier to add incremental hardware (such as RAM) to the servers that need it. On a large server, you may find that you can't expand beyond a certain point.

Some databases require a different sort order than others.

Of course, there are no easy answers. Where I work, critical databases are given their own individual servers. But for less critical database, we often share the same server, especially if they are small, not used much, or are used only for testing purposes.

Our SQL Server is experiencing lots of blocking, but we don't have any resource bottlenecks.
Blocking and time out issues are generally (but not always) related to poor application design. They cannot be solved by having a powerful server, as you have already found out. You may or may not be able to resolve this issue, depending on who is controlling the code. If it the code has been written in-house, then you have a chance at resolving it. If the code has been written be a third party, you may be out of luck, unless the vendor is interested in listening to your feedback (good luck at this). One tool you can use to help you determine the cause of the blocking is to run a trace using Profiler. It allows you to observe the communications between your application and SQL Server, helping you to diagnose the cause of the blocking. This is not a particularly easy task, but it can be done. Blocking occurs when a single connection from an application holds onto a lock, preventing a second connection for acquiring a conflicting lock type. When this happens, the second connection has to wait until the first lock is removed. If the wait time it long enough, the waiting connection can time out. It is not unusual for a connection to get a lock then hold it for an unusually long time, causing not just one other connection to be blocked, but many other connections, causing a cascading effect, which can be disastrous to your application's performance. Some common causes of blocking include, but are not limited to:

Queries with long execution times (very common) Lack of appropriate indexes, causing queries to run long Canceling queries that were not committed or rolled back as the should be

Page 16 of 49

SQL Server Performance

Applications that don't process all the results to final completion Distributed client/server deadlocks

To help resolve blocking issues by reducing locks in your applications, see these two tips pages on this website: Tips for Reducing SQL Server Locks Tips for Reducing SQL Server Deadlocks Until the blocking problem can be fixed in the application, your best work-around (although not ideal) to this problem is to kill the process that is causing the block (assuming that it does not go away on its own in a reasonable amount of time). Blocking processes can be viewed and killed through the Enterprise Manager or Management Studio. Before you kill the blocking process, make a note of the activity going on so that you can use this information to help you better pinpoint the root cause of the problem.

Reducing SQL Server Deadlocks


Deadlocking occurs when two user processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. When this happens, SQL Server identifies the problem and ends the deadlock by automatically choosing one process and aborting the other process, allowing the other process to continue. The aborted transaction is rolled back and an error message is sent to the user of the aborted process. Generally, the transaction that requires the least amount of overhead to rollback is the transaction that is aborted. As you might imagine, deadlocks can use up SQL Server's resources, especially CPU power, wasting it unnecessarily. Most well-designed applications, after receiving a deadlock message, will resubmit the aborted transaction, which most likely can now run successfully. This process, if it happens often on your server, can drag down performance. If the application has not been written to trap deadlock errors and to automatically resubmit the aborted transaction, users may very well become confused as to what is happening when they receive deadlock error messages on their computer. Here are some tips on how to avoid deadlocking on your SQL Server:

Ensure the database design is properly normalized.


Page 17 of 49

SQL Server Performance

Have the application access server objects in the same order each time. During transactions, don't allow any user input. Collect it before the transaction begins. Avoid cursors. Keep transactions as short as possible. One way to help accomplish this is to reduce the number of round trips between your application and SQL Server by using stored procedures or keeping transactions with a single batch. Another way of reducing the time a transaction takes to complete is to make sure you are not performing the same reads over and over again. If your application does need to read the same data more than once, cache it by storing it in a variable or an array, and then re-reading it from there, not from SQL Server.

Reduce lock time. Try to develop your application so that it grabs locks at the latest possible time, and then releases them at the very earliest time.

If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK. Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often.

If appropriate, use as low of an isolation level as possible for the user connection running the transaction.

Consider using bound connections.

***** When a deadlock occurs, by default, SQL Server choose a deadlock "victim" by identifying which of the two processes will use the least amount of resources to rollback, and then returns error message 1205. But what if you don't like default behavior? Can you change it? Yes, you can, by using the following command:
SET DEADLOCK_PRIORITY { LOW | NORMAL | @deadlock_var }

WHERE:

Page 18 of 49

SQL Server Performance

Low tells SQL Server that the current session should be the preferred deadlock victim, not the session that incurs the least amount of rollback resources. The standard deadlock error message 1205 is returned. Normal tells SQL Server to use the default deadlock method. @deadlock_var is a character variable specifying which deadlock method you want to use. Specify "3" for low, or "6" for normal. ***** To help identify deadlock problems, use the SQL Server Profiler's Create Trace Wizard to run the "Identify The Cause of a Deadlock" trace. This will provide you with the raw data you need to help isolate the causes of deadlocks in your databases. ***** To help identify which tables or stored procedures are causing deadlock problems, turn on trace flag 1204 (outputs basic trace data) or trace flag 1205 (outputs more detailed trace data).
DBCC TRACEON (3605,1204,-1)

Be sure to turn off this trace flag when you are done, as this trace can eat up SQL Server's resources unnecessarily, hurting performance.
*****

Ideally, deadlocks should be eliminated from your applications. But if you are unable to eliminate all deadlocks in your application, be sure to include program logic in your application to deal with killed deadlock transactions in a user-friendly way. For example, let's say that two transactions are deadlocked and that SQL Server kills one of the transactions. In this case, SQL Server will raise an error message that your application needs to respond to. In most cases, you will want your application to wait a random amount of time after the deadlock in order to resubmit the killed transaction to SQL Server. It is important that there is a random waiting period because it is possible that another contending transaction could also be waiting, and you don't want both contending transactions

Page 19 of 49

SQL Server Performance

to wait the same amount of time and then both try to execute at the same time, causing another deadlock.

Reducing SQL Server Locks


If your users are complaining that they have to wait for their transactions to complete, you may want to find out if object locking on the server is contributing to this problem. To do this, use the SQL Server Locks Object: Average Wait Time (ms). You can use this counter to measure the average wait time of a variety of locks, including database, extent, Key, Page, RID, and table. If you identify one or more types of locks causing transaction delays, then you will want to investigate further to see if you can identify what specific transactions are causing the locking. The Profiler is as a good tool for this detailed analysis. ***** Use sp_who and sp_who2 (the sp_who2 stored procedure is not documented in the SQL Server Books Online, but offers more detail than sp_who) to identify which processes may be blocking other processes. While blocking can also be identified using Enterprise Manager or Management Studio, these two commands work much faster and more efficiently than these GUI-based front-ends. ***** On tables that change little, if at all, such as lookup tables, consider altering the default lock level for these tables. By default, SQL Server uses row level locking for all tables, unless the SQL Query Optimizer determines that a more appropriate locking level, such as page or table locks, is more appropriate. For most lookup tables that aren't huge, SQL Server will automatically use row level locking. Because row locking has to be done at the row level, SQL Server needs to work harder to maintain row locks that it does for either page or table locks. Since lookup tables aren't being changed by users, it would be more efficient to use a table lock instead of many individual row locks. How do you accomplish this? You can override how SQL Server performs locking on a table by using the SP_INDEXOPTION command. Below is an example of code you can run to tell SQL Server to use page locking, not row locks, for a specific table:

Page 20 of 49

SQL Server Performance

SP_INDEXOPTION 'table_name', 'AllowRowLocks', FALSE GO SP_INDEXOPTION 'table_name', 'AllowPageLocks', FALSE GO

This code turns off both row and page locking for the table, thus only table locking is available. ***** Keep all Transact-SQL transactions as short as possible. This helps to reduce the number of locks (of all types), helping to speed up the overall performance of your SQL Server applications. If practical, you may want to break down long transactions into groups of smaller transactions. In addition, only include those Transact-SQL commands within a transaction that are necessary for the transaction. Leave all other code outside of the transaction. ***** An often overlooked cause of locking is an I/O bottleneck. Whenever your server experiences an I/O bottleneck, the longer it takes user's transactions to complete. And the longer they take to complete, the longer locks must be held, which can lead to other transactions having to wait for previous locks to be released. If your server is experiencing excessive locking problems, be sure to check if you are also running into an I/O bottleneck. If you do find that you have an I/O bottleneck, then resolving it will help to resolve your locking problem, speeding up the performance of your server. ***** To help reduce the amount of time tables are locked, which hurts concurrency and performance, avoid interleaving reads and database changes within the same transaction. Instead, try to do all your reads first, then perform all of the database changes (UPDATES, INSERTS, DELETES) near the end of the transaction. This helps to minimize the amount of time that exclusive locks are held. ***** Any conditional logic, variable assignment, and other related preliminary setup should be done outside of transactions, not inside them. Don't ever pause a transaction to wait for user

Page 21 of 49

SQL Server Performance

input. User input should always be done outside of a transaction. Otherwise, you will be contributing to locking, hurting SQL Server's overall performance. ***** Encapsulate all transactions within stored procedures, including both the BEGIN TRANSACTION and COMMIT TRANSACTION statements in the procedure. This provides two benefits that help to reduce blocking locks. First, it limits the client application and SQL Server to communications before and after the transaction, thus forcing any messages between the client and the server to occur at a time other than when the transaction is running (reducing transaction time). Second, It prevents the user from leaving an open transaction (holding locks open) because the stored procedure forces any transactions that it starts to complete or abort. ***** If you have a client application that needs to "check-out" data for awhile, then perhaps update it later, or maybe not, you don't want the records locked during the entire time the record is being viewed. Assuming "viewing" the data is much more common that "updating" the data, then one way to handle this particular circumstance is to have the application select the record (not using UPDATE, which will put a share lock on the record) and send it to the client. If the user just "views" the record and never updates it, then nothing has to be done. But if the user decides to update the record, then the application can perform an UPDATE by adding a WHERE clause that checks to see whether the values in the current data are the same as those that were retrieved. Similarly, you can check a timestamp column in the record, if it exists. If the data is the same, then the UPDATE can be made. If the record has changed, then the application must include code to notify the user so he or she can decide how to proceed. While this requires extra coding, it reduces locking and can increase overall application performance. Use the least restrictive transaction isolation level possible for your user connection, instead of always using the default READ COMMITTED. In order to do this without causing other problems, the nature of the transaction must be carefully analyzed as to what the effect of a different isolation will be.

Page 22 of 49

SQL Server Performance

One example of when not using the default READ COMMITTED isolation level is when running queries to produce reports. Using an isolation level of READ UNCOMMITTED will turn off locking, speeding the performance of the query, and other queries hitting the same tables. This of course, will only work if your reports are tolerant of potentially dirty data, which is generally not a problem for many reports. ***** Try one or more of the following suggestions to help avoid blocking locks: 1) Use clustered indexes on heavily used tables; 2) Make appropriate use of non-clustered indexes, 3) Try to avoid Transact-SQL statements that affect large numbers of rows at once, especially the INSERT and UPDATE statements; 4) Try to have your UPDATE and DELETE statements use an index; and 5) When using nested transactions, avoid commit and rollback conflicts. ***** If there is a lot of contention for a particular table in your database, consider turning off page locking for that table, requiring SQL Server to use row level locking instead. This will help to reduce the contention for rows located on the same page. It will also cause SQL Server to work a little harder in order to track all of the row locks. How well this option will work for you depends on the tradeoff in performance between the contention and the extra work SQL Server has to perform to maintain many row locks. Testing will be needed to determine what is best for your particular environment. Use the SP_INDEXOPTION stored procedure to turn off page locking for any particular table. ***** If table scans are used regularly to access data in a table, and your table doesn't have any useful indexes to prevent this, then consider turning off both row locking and page locking for that table. This in effect tells SQL Server to only use table locking when accessing this table. This will boost access to this table because SQL Server will not have to escalate from row locking, to page locking, to table locking each time a table lock is needed on the table to perform the table scan. On the negative side, doing this can increase contention for the table. Assuming the data in the table is mostly read only, then this should not be too much of a problem. Testing will be needed to determine what is best for your particular environment. *****

Page 23 of 49

SQL Server Performance

Do not create temporary tables from within a stored procedure that is invoked by the INSERT INTO EXECUTE statement. If you do, locks on the syscolumns, sysobjects, and sysindexes tables in the TEMPDB database will be created, blocking others from using the TEMPDB database, which can significantly affect performance. ***** To help reduce the amount of time it takes to complete a transaction (and thus reducing how long records are locked) try to avoid using the WHILE statement or Data Definition Language (DDL) within a transaction. In addition, do not open a transaction while browsing data and don't SELECT more data than you need for the transaction at hand. For best performance, you always want to keep transactions as short as possible. ***** While nesting transactions is perfectly legal, it is not recommended because of its many pitfalls. If you nest transactions and your code fails to commit or roll back a transaction properly, it can hold locks open indefinitely, significantly impacting performance. ***** By default in SQL Server, a transaction will wait indefinitely for a lock to be removed before continuing. If you want, you can assign a locking timeout value to SQL Server so that long running locks won't cause other transactions to wait long periods of time. To assign a locking timeout value to SQL Server, run this command, "SET LOCK_TIMEOUT length_of_time_in_milliseconds" from within your connection. ***** Sometimes you need to perform a mass INSERT or UPDATE of thousands, if not millions of rows. Depending on what your are doing, this could take some time. Unfortunately, performing such an operation could cause locking problems for other users. If you know users could be affected by your long-running operation, consider breaking up the job into smaller batches, perhaps even with a WAITFOR statement, in order to allow others to "sneak" in and get some of their work done. *****

Page 24 of 49

SQL Server Performance

In SQL Server 7.0, when a replication snapshot is generated, SQL Server puts shared locks on all of the tables that are being published for replication. As you can imagine, this can affect users who are trying to update records in the locked tables. Because of this, you may want to schedule snapshots to be created during less busy times of the day. This is especially true if there are a lot of tables, or if the tables are very large. In SQL Server 2000 and 2005, this behavior has changed. Assuming that all subscribers will be SQL Server, then SQL Server 2000 and 2005 will use what is called concurrent snapshot processing, which does not put a share lock on the affected tables, helping to boost concurrency. One way to help reduce locking issues is to identify those transactions that are taking a long time to run. The longer they take to run, the longer their locks will block other processes, causing contention and reduce performance. The following script can be run to identify current, long-running transactions. This will provide you with a clue as to what transactions are taking a long time, allowing you to investigate and resolve the cause.
SELECT spid, cmd, status, loginame, open_tran, datediff(s, last_batch, getdate ()) AS [WaitTime(s)] FROM master..sysprocesses p WHERE open_tran > 0 AND spid > 50 AND datediff (s, last_batch, getdate ()) > 30 ANd EXISTS (SELECT * FROM master..syslockinfo l WHERE req_spid = p.spid AND rsc_type <> 2)

This query provides results based on the instant is runs, and will vary each time you run it. The goal is to review the results and look for transactions that have been open a long time. This generally indicates some problem that should be investigated. ***** In order to reduce blocking locks in an application, you must first identify them. Once they are identified, you can then evaluate what is going on, and perhaps be able to take the necessary action to prevent them. The following script can be run to identify processes that have blocking locks that occur longer than a time you specify. You can set the value used to identify blocking locks to any value you want. In the example below, it is for 10 seconds.

Page 25 of 49

SQL Server Performance

SELECT spid, waittime, lastwaittype, waitresource FROM master..sysprocesses WHERE waittime > 10000 milliseconds AND spid > 50 Server 7.0 -- Use > 50 for SQL Server 2000, use > 12 for SQL --The wait time is measured in

This query measures blocking locks in real-time, which means that only if there is a blocking lock fitting your time criteria when you run this query will you get any results. If you like, you can add some additional code that will loop through the above code periodically in order to more easily identify locking issues. Or, you can just run the above code during times when you think that locking is a problem. How can I help identify the worst performing queries in my SQL Server database application? This is an easy task using the SQL Server Profiler. Create a new trace in Profiler using the following configuration: Select These Event Classes

SQL:BatchCompleted

Select These Data Columns

Groups
o

Duration EventClass TextData CPU Application Name LoginName NTUserName SPID

Columns
o o o o o o o

Select These Filters (as needed)

Page 26 of 49

SQL Server Performance

Application Name DatabaseName Duration

As the above trace runs, you will notice that the duration of each query execution is sorted, from shortest to longest running duration. The longest running queries are at the bottom. You will very quickly be able to identify those queries that are taking a long time to run. Note: duration is measured in milliseconds. So a query that run in 1000 miliseconds takes one second to run. Once you have identified the worst performing queries in your application, then you can start to analyze each one, trying to find ways to boost their performance. Don't waste your time on queries that are seldom run. Instead, focus your time on those long running queries that run the most often in your application.

What is the difference between a table and index scan in an execution plan?
When the Query Optimizer is asked to optimize a query and create an execution plan for it, it tries its best to use an Index Seek. An Index Seek means that the Query Optimizer was able to find a useful index in order to locate the appropriate records. As you probably know, indexes make data retrieval in SQL Server very fast. But when the Query Optimizer is not able to perform an Index Seek, either because there is no indexes or no useful indexes available, then SQL Server has to scan all the records, looking for all the records that meet the requirements of the query. There are two types of scans the SQL Server can perform. When a Table Scan is performed, all the records in a table are examined, one by one. For large tables, this can take a long time. But for very small tables, a table scan can actually be faster than an Index Seek. So if you see that SQL Server has performed a Table Scan, take a note of how many rows are in the table. If there aren't many, then in this case, a Table Scan is a good thing. When an Index Scan is performed, all the rows in the leaf level of the index are scanned. What does this mean? Essentially, this means that all of the rows of the table or the index are examined instead of the table directly. Sometimes, the Query Optimizer determines that an Index Scan is more efficient than a Table Scan, so one is performed, although the performance difference between them is generally not much.

Page 27 of 49

SQL Server Performance

You might ask that if there is an index available, why can't an Index Seek be performed? In some cases, such as if a huge quantity of rows need to be returned, it is faster to do an Index Scan than an Index Seek. Or it may be because the index is not selective enough. In any case, the Query Optimizer doesn't think the available index is useful, other than for performing an Index Scan. So what does all this mean from an analysis standpoint? Generally speaking, an Index Scan or an Index Seek is almost the same thing, from a performance perspective. If you see any one of these in a query execution plan, the first thing you need to do is to see if there are few rows in the table. If so, then a scan is OK. Or, if many rows are being returned, then a scan is often faster than an Index Seek, and the Query Optimizer made the correct choice of selecting a scan. The only way to speed up this particular situation would be to find a way to rewrite the query in order to return fewer rows, assuming this is possible. If the above two reasons don't apply, then your next step would be to try to identify useable indexes to help speed the performance of the query, assuming that the current performance of the query is unacceptable, so that an Index Seek is performed instead of an Index or Table Scan.

Why doesn't query parallelism use all of the available CPUs?


First, let me answer your question, then I will discuss a little about what you are experiencing. No, you can't force SQL Server to use more CPUs that it wants to use when executing a parallel query. On the other hand, you can prevent SQL Server from using all of the available CPUs, but the most you can do is to tell SQL Server to use parallelism, and that is it. Only SQL Server can decide how to take advantage of the available CPUs. Now, let's talk a little bit about how parallelism works and why SQL Server might not take full advantage of all the available CPUs in your server. By default, if your server has two or more CPUs, and assuming parallelism has not been turned off, and the number of available CPUs for use by SQL Server has not been restricted to a single CPU, and if the Query Optimizer thinks it will take less than 5 seconds (the default value, you can change this if you want) to execute, then the Query Optimizer will consider using parallelism for executing a query. By parallelism, what we mean is that SQL Server will consider splitting the query into two or more execution threads and running them on multiple CPUs in your server. In many cases, this

Page 28 of 49

SQL Server Performance

can speed up the execution of the query, but not in all cases. The use of parallelism has its own overhead, and the time savings of using parallelism must exceed the overhead incurred before the Query Optimizer will use it. This makes sense. When the Query Optimizer evaluates a query for parallelism, it must consider many factors, such as the current load on the server, the nature of the query, I/O requirements, and so on. This is a very complex decision-making process as you might expect. Once it is done with this process, it decides on the optimum query plan, which may use two or more CPUs. We can't control how many will be used, only the Query Optimizer can do this. In regards to your query and the use of the CPUs that you saw with Performance Monitor. There are several potential explanations for the behavior you saw. Some of them include:

The Query Optimizer decided that using more CPUs to execute the query would not provide any performance benefit. For example, the bottleneck to performance for this query may be disk I/O, not CPU, and using more threads may not have helped performance.

Just because there was only one very busy CPU does not necessarily mean that other CPUs were not also executing part of the query. For example, it is possible that most of CPUs were involved executing the query, but that only one of the execution threads was overly busy, while the other execution threads were running, but just at a much lower level based on the amount of work allocated to them.

Another thing that can affect your observations with Performance Monitor is that an executing SQL Server thread can jump from CPU to CPU as needed in order to optimize performance. Because of this, watching CPU activity in order to evaluate the performance of parallelism is not always very helpful.

Perhaps the Query Optimizer made a mistake. This doesn't happen often, but it does happen.

So, unfortunately, there is no way to force SQL Server to use all of the CPU power that is available.

Why can it take so long to drop a clustered index?


Generally speaking, indexes can speed up queries tremendously. This comes at the cost, as changes to the underlying data have to be reflected in the indexes when the index column(s) are modified.

Page 29 of 49

SQL Server Performance

Before we get into the reasons why dropping a clustered index can be time-consuming, we need to take a short look at the different index structures in SQL Server. Every table can have one, and only one clustered index. A clustered index sorts the data physically according to its index key. And since there can only be one physically sort order on a table at a time, this sounds pretty obvious. If a table does not have a clustered index, it is called a heap. The second index structure is called a non-clustered index. You can create non-clustered indexes on tables with clustered indexes, heaps, and indexed views. The difference between both index structures is at the leaf level of the index. While the leaf level of a clustered index actually is the table's data itself, you only find pointers to the data at the leaf level of a non-clustered index. Now we need to understand an important difference:

When a table has a clustered index created, the pointers contain the clustered index keys for that row. When a table does not have a clustered index, the pointers consist of the so-called RowID, which is a combination of FileNumber:PageNumber:Slot.

When you understand this distinction, you can derive the answer to the original question yourself. When you drop a clustered index, SQL Server will have to recreate all non-clustered indexes on that table (assuming there are any). During this recreation, the clustered index keys are replaced by the RowID. It This is a time-consuming operation, especially on larger tables or tables with many indexes.

Why is my Memory: pages/sec counter exceeding 20 for my SQL Server?


On a dedicated SQL Server box that is properly tuned and has sufficient hardware to carry its load, the pages/sec should be less than 20 on average, although you will often see spikes higher than this. These occasional spikes present no performance problems. For those who may not be familiar with the pages/sec counter, what this counter measures is the number of physical pages read to (or written to) disk in order to resolve hard page faults. A hard page fault can occur when data an application needs is not in RAM, but on the hard disk instead, and must be moved from disk to RAM so that it can be used. In addition, if there is no more room in RAM for an application's data, it must often be written to disk. As you might imagine, both of these processes are time consuming (relatively speaking), and because of this,

Page 30 of 49

SQL Server Performance

performance is always better if there are fewer hard page faults. For optimum performance, all the data an application needs should always be in RAM. This, of course, is not always possible, and that is why hard page faults occur. A certain number of them are expected and normal. So what are some examples of what can cause high page faults? Here is a list of some of the most common ones, although it is not comprehensive: Normal Causes of High Paging Rates

When a computer is first booted. When an application is first started, or exited. When data is loaded into an application, or saved from an application to disk. When a file is being written to a disk, or copied off of a disk. When backups are being made or restored.

Fixable Causes of High Paging Rates


Defective I/O hardware. Defective or buggy I/O drivers. When the operating systemdoesn't have enough RAM for all the needs of the currently running applications on the system.

So, if your server is running at a high rate of pages/sec, most likely, one of the above circumstances is causing it. As you can see, some of these events can't be avoided, but others potentially can. The first step to identifying the potential cause of the high paging rates on your server is to use Task Manager. Start Task Manager and go to the "Processes" tab. There, you will see a large number of columns for each of the processes running on your server. Check to see if there is one called "Page Faults." If not, then go to the "View" drop-down menu and select "Select Columns" and check the box next to "Page Faults." The page fault figures you see in Task Manager are the number of page faults that have occurred for each of the various processes that are currently running. This is a cumulative figure, so it is the total number of page faults each process has incurred since it was last started. If your server has been running a long time, some of these processes will have hundreds of thousands, if not millions of page faults. These figures may or may not be an indication of what

Page 31 of 49

SQL Server Performance

process is causing an excessive paging problem, but they will provide a clue. Once you have "Page Faults" displayed on the Process Tab, now click on the word "Page Faults," and this will cause all of the processes to be ordered, from the most to the least. Now, your goal is to look at each process, and the number of page faults each one has, to help you determine which process is causing the most page faults. Hopefully, one or more of these processes will have a disproportionate number of page faults, which may indicate that they are the cause of your high page fault problem. You may find that a process with a lot of page faults is normal, even though it is high. I can't tell you what to expect when you look at this data, as each server is different. But your goal is to try to identify any potential problems. You may be surprised to see that the sqlservr.exe process (the SQL Server engine) has a high number of page faults. This may be normal. If your server has been up a long time, it will accumulate a large number of page faults over time. Assuming that SQL Server's memory is correctly configured, it will rarely exceed over 20 pages/sec during normal operations. But, if you prevent SQL Server from getting enough memory (by manually assigning memory instead of letting SQL Server dynamically allocate it, for example), it is possible to force SQL Server to page excessively, greatly hurting SQL Server's performance. So you will want to check to see if SQL Server's memory setting is appropriate. If SQL Server is not the problem (and it probably won't be), then the next thing to closely look at are other programs or services running on the same server. Ideally, SQL Server should be a dedicated server. But if you add other programs to the same server as SQL Server, and you don't have enough RAM for both SQL Server and the other programs, then the other programs may be causing the paging problem. If this is the case, either get more RAM or remove the offending software. If the above doesn't resolve your paging issue, then I would take a look at your I/O hardware and drivers, and see if they might be causing the problem. Although this is not common, I have seen this problem before.

What happens during the rebuild of a clustered index?


Before SQL Server 2000 Service Pack 2 (SP2) the rebuild of a clustered index automatically forced all nonclustered indexes of that table to be rebuilt as well. This behaviour has changed with SP2. Now whether a nonclustered index has to be rebuilt depends on how the clustered index was inititially created. What does this mean?

Page 32 of 49

SQL Server Performance

You probably know that you can specify a clustered index explicitely as UNIQUE or not. If you don't specify it to be UNIQUE, SQL Server automatically adds a 4-byte uniqueifier, to enforce uniqueness when a duplicate value is encountered. And this small difference decides whether nonclustered indexes have to be rebuilt or not. During the rebuild of a non-unique clustered index this uniqueifier is generated anew. And because the nonclustered indexes contain the clustered index keys at their leaf level, it follows that all nonclustered indexes have to be rebuilt as well. A UNIQUE clustered index does not contain the artificial uniqueifier only the clustered index keys. These values do not change during the rebuild, thus no nonclustered index has to be rebuilt.

What happens when SQL Server executes a stored procedure or query?


SQL Server performs a couple of internal steps before executing a query or stored procedure. The steps that interest us here are compilation and execution. When SQL Server receives a query for execution, its execution plan may already be present in memory (the procedure cache); if not, SQL Server will have to compile the query before executing it. The compilation process is divided into four parts: parsing, normalization, compilation and optimization. Parsing During this stage, SQL Server checks the query for syntax errors and transforms it into a complier-ready structure that it will use later to optimize the query. It does not check for object names or column names. Normalization At this stage, SQL Server checks all references to objects in the query. This is where we typically get the Object not found message when an object referenced in the query is not found in the database. SQL Server also checks to see if a query makes sense. For example, we cannot execute a table or select from a stored procedure. Bear in mind that while we can optimize select, insert, and update statements, there is no way to optimize if, while, and for operators. Compilation

Page 33 of 49

SQL Server Performance

This is where we start building the execution plan for the query we passed to SQL Server. First, we create a sequence tree. The sequence tree is normalized, which includes adding implicit conversions if necessary. Also during this phase, if the query is referencing views, a view definition is placed in the query. If a statement is a DML statement, a special object is created called the query graph. The query graph is the object on which the optimizer works to generate an optimized plan for the query. This is the compiled plan that is stored in the procedure cache for reuse. Optimization SQL Server Optimizer is a cost-based optimizer, which means that it will come up with the cheapest execution plan available for each SQL statement. For each SQL statement to run, we need to use resources like CPU, memory, hard disk, etc. The cheapest plan is the one that will use the least amount of resources to get the desired output. For optimizing DML statements, SQL Server will test different indexes and join orders to get the best plan for executing the query. Your index definition helps optimizer by reducing/minimizing resource usage. If the index has a high selectivity then it is most suitable for optimization. Because a complex query will take into account all indexes and joins, there can be many paths to take to execute the query. In such cases, determining the best path for optimization can take a long time. The longer this process takes, the higher the cost that is involved. So first, a trivial plan is generated. This plan assumes that cost-based optimization is costly; if there is only one path for execution possible, there is no point optimizing the query. For example, when placing a simple insert statement into a table, there is no way that your indexes or join orders can increase optimization, so the trivial plan is used. For any particular query, SQL Server will use statistics to understand the distribution of data. Statistics are stored in the statblob column of the sysindexes table in each database. Join orders can also be optimized based on the number of rows that are fetched by each join condition.

Index Covering Boosts SQL Server Query Performance


Creating a non-clustered index that contains all the columns used in a SQL query, a technique called index covering is a quick and easy solution to many query performance problems. Sometimes just adding a column or two to an index can really boost the query's performance. Check out a couple of all-too-common scenarios where applying index covering can speed up the queries in your Microsoft SQL Server database.
Scenario 1: Speed Up a Query for a Table with a Clustered Index Consider a very typical scenario: you have a table ORDERS with a clustered index on CUSTOMER_ID column. You need to speed up a select query quickly:

Page 34 of 49

SQL Server Performance

SELECT ORDER_DATE, SUM(AMOUNT) FROM ORDERS GROUP BY ORDER_DATE


The query's execution plan is quite simple: the database engine scans the whole clustered index, and then it sorts the intermediate result set to satisfy the GROUP BY clause. Can a nonclustered index speed up the query? Definitely. Just create a non-clustered index that contains all the columns used in the query:

CREATE

INDEX order_amt ON dbo.ORDERS(ORDER_DATE, AMOUNT)

Rerun the query and you'll find it runs very fast. Why? First, look at the execution plan: the query accesses only the index order_amt; it doesn't touch the table at all. In fact, there is no need to access the table because all the columns necessary to satisfy the query are already stored in the index. This is index covering. The index is much smaller than the table, which is why the query read many fewer pages. Also, the index entries are already stored in the necessary order, so there is no need to sort the intermediate result set. Not only does the plan look good, but the real execution cost of the query after creating the index also is much cheaper:

logical reads 1919, (snip)

CPU time = 875 ms

This is significantly less than the real execution costs before the index was created:

logical reads 8365,

(snip)

CPU time = 2015 ms

As you can see, the query ran much faster because it read fewer pages and did not sort intermediate results. Now that the query runs fast enough, it is time to consider the price you paid for speeding it up. The index uses up some storage. Also, modifying the table might become slower. If having this query return as fast as possible is a high priority for you, you can stop at this point. Depending on your circumstances, however, you might want to consider the impact of this new index on the overall performance of the system. You can use the Index Tuning Wizard in SQL Server to validate your quick fix against a realistic workload, but that is beyond the scope of this article. Scenario 2: Speed Up Select Queries Using Predicates on a Column Consider another typical scenario when index covering is very useful: you need to speed up several select queries, like the following two, using predicates on ITEM_ID:

SELECT TURNAROUND, COUNT(*) FROM DBO.ORDERS WHERE ITEM_ID=10000 GROUP BY TURNAROUND SELECT SUM(AMOUNT) FROM DBO.ORDERS WHERE ITEM_ID=10093
The table ORDERS is the same as in the previous scenario. It has a non-clustered index on the ITEM_ID column. Both queries use the non-clustered index on ITEM_ID and access the qualifying rows in the table via bookmark lookups. It would be great to create a clustered index on ITEM_ID (having a clustered index on a foreign key is very common), but that is not an option: a clustered index already exists on CUSTOMER_ID. Instead, you can go for a covering non-clustered index on (ITEM_ID, AMOUNT, TURNAROUND). Both queries will run much faster, because the need to do bookmark lookups is eliminated. All the columns necessary for both queries are stored in index entries. Also consider dropping the index on (ITEM_ID)you probably will no longer need it.

Page 35 of 49

SQL Server Performance The Fine Print When you add an index on a table, you usually slow down modifications against that table. This is why keeping the number of indexes as small as possible usually (not always) is very important. So under most circumstances you should have just one index on (ITEM_ID, AMOUNT, TURNAROUND), not separate indexes on (ITEM_ID, AMOUNT) and (ITEM_ID, TURNAROUND). This one index will use less space, and it will not slow down the modifications as much as several indexes would. If you have a clustered index, then every index entry in a non-clustered index also stores all the columns necessary to locate the row via the clustered index (they are called bookmarks). These bookmark columns also count towards index covering. For instance, if the table ORDERS is clustered on CUSTOMER_ID, then the non-clustered index on ORDER_DATE covers the query:

SELECT DISTINCT CUSTOMER_ID FROM ORDERS WHERE ORDER_DT='20050917'


Note that some well-known rules of thumb about non-clustered indexes are not relevant to covering non-clustered ones. For instance, even though ORDER_DT is less selective than ITEM_ID, you can successfully use an index on ORDERS(ORDER_DT, ITEM_ID) to cover a query:

SELECT DISTINCT ITEM_ID FROM ORDERS WHERE ORDER_DT='20050202'


More to the point, putting the more selective column ITEM_ID first in the index will result in worse performance. Another example is the following query:

SELECT DISTINCT LAST_NAME, FIRST_NAME FROM CUSTOMERS WHERE FIRST_NAME LIKE '%eather%'
This query is covered by an index on (LAST_NAME, FIRST_NAME) even though only the second column in the index is used in the WHERE clause and the predicate FIRST_NAME LIKE '%eather%' is not index-friendly (a.k.a. not sargeable). For SQL Server-to-Oracle Migrants If you have recently switched to MS SQL Server from Oracle, be aware that MS SQL Server's indexes store entries even for the rows for which all the columns in the index definition are null. For example, even if both LAST_NAME and FIRST_NAME columns are nullable, the index on LAST_NAME, FIRST_NAME will still have entries for all the rows in the tableeven for those rows with both LAST_NAME and FIRST_NAME null. So, unlike Oracle, MS SQL Server will consider the index on (LAST_NAME, FIRST_NAME) as covering the query:

SELECT DISTINCT LAST_NAME, FIRST_NAME FROM CUSTOMERS

Tips on Optimizing Covering Indexes


If you have to use a non-clustered index (because your single clustered index can be used better elsewhere in a table), and if you know that your application will be performing the same query over and over on the same table, consider creating a covering index on the table. A covering index, which is a form of a composite index, includes all of the columns referenced in

Page 36 of 49

SQL Server Performance

SELECT, JOIN, and WHERE clauses of a query. Because of this, the index contains the data you are looking for and SQL Server doesn't have to look up the actual data in the table, reducing logical and/or physical I/O, and boosting performance. On the other hand, if the covering index gets too big (has too many columns), this could actually increase I/O and degrade performance. Generally, when creating covering indexes, follow these guidelines:

If the query or queries you run using the covering index are seldom run, then the overhead of the covering index may outweigh the benefits it provides.

The covering index should not add significantly to the size of the key. If it does, then it its use may not outweigh the benefits it provides.

The covering index must include all columns found in the SELECT list, the JOIN clause, and the WHERE clause.

One clue to whether or not a query can be helped by a covering index is if the execution plan of the query uses a Bookmark Lookup. If it does, then adding a covering index is often beneficial. ***** If a query makes use of aggregates, and it is run often, then you may want to consider adding a covering index for this query. Non-clustered indexes include a row with an index key value for every row in a table. Because of this, SQL Server can use these entries in the index's leaf level to perform aggregate calculations. This means that SQL Server does not have to go to the actual table to perform the aggregate calculations, which can boost performance. ***** If you want to create a covering index, if possible, try to piggyback on already existing indexes that exist for the table. For example, say you need a covering index for columns c1 and c3. If you already have an index on column c1, instead of creating a new covering index, change the index on c1 to be a composite index on c1 and c3. Anytime you can prevent indexing the same column more than once, the less I/O overhead SQL Server will experience, and the faster performance will be. *****

Page 37 of 49

SQL Server Performance

How can you tell if a covering index you created is actually being used by the Query Optimizer? You can find this out by turning on and viewing the graphical execution plan output. If you see this phrase, "Scanning a non-clustered index entirely or only a range," this means that the query optimizer was able to cover that particular query with an index. ***** An alternative to creating covering indexes on non-clustered indexes is to let SQL Server create the covering indexes for you automatically. Here's how this works. The query optimizer can perform what is called index intersection. This allows the optimizer to consider multiple indexes from a table, build a hash table based on the multiple indexes, and then use the hash table to reduce I/O for the query. In effect, the hash table becomes a covering index for the query. Although index intersection is performed automatically by the query optimizer, you can help it along by creating single column, non-clustered indexes on all the columns in a table that will be queried frequently. This provides the query optimizer with the data it needs to create covering indexes as needed, on the fly ***** One way to help determine if a covering index could help a query's performance is to create a graphical query execution plan in Query Analyzer or Management Studio of the query in question and see if there are any Bookmark Lookups being performed. Essentially, a Bookmark Lookup is telling you that the Query Processor had to look up the row columns it needs from a table or a clustered index, instead of being able to read it directly from a nonclustered index. Bookmark Lookups can reduce query performance because they produce extra disk I/O to retrieve the column data. One way to avoid a Bookmark Lookup is to create a covering index. This way, all the columns from the query are available directly from the non-clustered index, which means that Bookmark Lookups are unnecessary, which reduces disk I/O and helps to boost performance.

What's the difference between a covered query and a covering index?


Covered queries and covering indexes are different, yet closely related. A query is covered if all the columns it uses come from one or more indexes. These columns include the columns you want the query to return as well as columns in any JOIN, WHERE, HAVING, and ORDER BY clause.

Page 38 of 49

SQL Server Performance


If you take multicolumn index awareness a step further, you can greatly boost performance in some situations: when all the data you need is in the index, the optimizer may retrieve it from the index without going to the data pages. This index is sometimes called a "covering" index and can be a big plus. Why? Because index entries are shorter than data rows and there are more of them on a page; the SQL engine can get to the covering index faster than to the data. To use a covering index, you must meet the following conditions.

Every column in the SELECT list and every column in every other clause (WHERE, ORDER

BY, GROUP BY, HAVING) must be in the index.

The index must not be disabled by functions or math or conversionsor any of the

constructions covered earlier in this chapter.

The WHERE clause must use columns in the order in which they appear in the index (it can

go directly to a specific ordnum or combination of ordnum and prodnum but not to prodnum alone, as discussed in "Using Multicolumn Indexes" at the end of the previous section). For example, to resolve a query including only the orderdetail ordnum and prodnum columns, everything you're asking for (prodnum and ordnum) is in the ordprodix index in ordnum order. If you specify ordnum or ordnum and prodnum in the WHERE clause, the SQL engine can find the index rows that contain that value and return results from there, never touching the data pages. Adaptive Server Anywhere select prodnum, ordnum from orderdetail where ordnum = 84 prodnum ordnum =========== =========== 1099 84 1255 84 2050 84 [3 rows] A less efficient variation results when all the columns in the query are in the index but you specify only a nonleading column (prodnum) in the WHERE clause. The SQL engine may resolve the query by reading the entire index sequentially. In some performance-monitoring tools, this is identified as an index scan, parallel to a table scan but faster. Adaptive Server Anywhere select prodnum, ordnum from orderdetail where prodnum = 1099 prodnum ordnum =========== =========== 1099 84 1099 89 [2 rows] But back to the leading column situation: if you add a column that is not part of the index (here unit), the optimizer will still be able to use the index, but it'll have to get the nonindex values from the data pages (Figure 616). The same thing happens when you add nonindex conditions to the WHERE clause. The index no longer "covers" the query.

Page 39 of 49

SQL Server Performance

Covering Indexes You can cover queries by understanding your indexes and avoiding adding extra columns to your SELECT list. Don't do a SELECT * if all you want is the product number! Remember, the index covers the query only if everything in the SELECT and WHERE clauses is in the index, if you haven't disabled the index, and if you pay attention to order in multicolumn indexes.

Functional Dependency (Normalization)


The concept of functional dependency (also known as normalization was introduced by professor Codd in 1970 when he defined the first three normal forms (first, second and third normal forms). Normalization is used to avoid or eliminate the three types of anomalies (insertion, deletion and update anomalies) which a database may suffer from. These concepts will be clarified soon, but first let us define the first three normal forms. First Normal Form: A relation is in first normal form if all its attributes are simple. In other words, none of the attributes of the relation is a relation. Notice that relation means 2diemenatioanl table. Example -1. Assume the following relation Student-courses (Sid:pk, Sname, Phone, Courses-taken) Where attribute Sid is the primary key, Sname is student name, Phone is student's phone number and Courses-taken is a table contains course-id, course-description, credit hours and grade for each course taken by the student. More precise definition of table Course-taken is : Course-taken (Course-id:pk, Course-description, Credit-hours, Grade) According to the definition of first normal form relation Student-courses is not in first normal form because one of its attribute Courses-taken is itself a table and is not a simple attribute. To clarify it more assume the above tables contain the data as shown below: Student-courses Sid Sname Phone Courses-taken 100 John 487 2454 St-100-courses-taken 200 Smith 671 8120 St-200-courses-taken 300 Russell 871 2356 St-300-courses-taken St-100-Course-taken Course-id Course-description Credit-hours IS380 Database Concepts 3

Grade A

Page 40 of 49

SQL Server Performance

IS416 Unix Operating System St-200-Course-taken Course-id Course-description IS380 Database Concepts IS416 Unix Operating System IS420 Data Net Work St-300-Course-taken Course-id Course-description IS417 System Analysis

3 Credit-hours 3 3 3 Credit-hours 3

B Grade B B C Grade A

Definition of the three types of anomalies: Insertion anomaly means that that some data can not be inserted in the database. For example we can not add a new course to the database of example-1,unless we insert a student who has taken that course. Update anomaly means we have data redundancy in the database and to make any modification we have to change all copies of the redundant data or else the database will contain incorrect data. For example in our database we have the Course description "Database Concepts" for IS380 appears in both St-100-Course-taken and St-200-Course-taken tables. To change its description to "New Database Concepts" we have to change it in all places. Indeed one of the purposes of normalization is to eliminate data redundancy in the database. Deletion anomaly means deleting some data cause other information to be lost. For example if student Russell is deleted from St-100-Course-taken table we also lose the information that we had a course call IS417 with description System Analysis. Thus Student-courses table suffers from all the three anomalies. To convert the above structure to first normal form relations, all non-simple attributes must be removed or converted to simple attribute. To do that a new relation is created by combining each row of Student-courses with all rows of its corresponding course table that was taken by that specific student. Following is Student-courses table in first normal form. Student-courses ( Sid:pk1, Sname, Phone, Course-id:pk2, Course-description, Credit-hours, Grade) Notice that the primary key of this table is a composite key made up of two parts; Sid and Course-id. Note that pk1 following an attribute indicates that the attribute is the first part of the primary key and pk2 indicates that the attribute is the second part of the primary key. Student-courses Sid Sname Phone 10 0 10 0 20 0 20 0 20 John John Smith Smith Smith 487 2454 487 2454 671 8120 671 8120 671 8120 Courseid IS380 IS416 IS380 IS416 IS420 Course-description Database Concepts Unix Operating System Database Concepts Unix Operating System Data Net Work Credithours 3 3 3 3 3 Grade A B B B C

Page 41 of 49

SQL Server Performance

0 30 0

Russell

871 2356

IS417

System Analysis

Examination of the above Student-courses relation reveals that Sid does not uniquely identify a row (tuple) in the relation hence cannot be the primary key. For the same reason Course-id cannot be the primary key. However the combination of Sid and Course-id uniquely identifies a row in Student-courses, Therefore (Sid, Course-id) is the primary key of the above relation. The primary key determines every attribute. For example if you know both Sid and Course-id for any student you will be able to retrieve Sname, Phone, Course-description, Credit-hours and Grade, because these attributes are dependent on the primary key. Figure 1 below is the graphical representation of the functional dependency between the primary key and attributes of the above relation.

Note that the attribute to the right of the arrow is functionally dependent on the attribute in the left of the arrow. Thus the combination (Sid, Course-id) is the determinant (that determines other attributes) and attributes Sname, Phone, Course-description, Credit-hours and Grade are dependent attributes. Formally speaking a determinant is an attribute or a group of attributes determine the value of other attributes. In addition to the (Sid, Course-id) there are two other determinants in the above Student-courses relation. These are; Sid and Course-id attributes. Note that Sid alone determines both Sname and Phone, and attribute Course-id alone determines both Credit-hours and Course_description attributes.
Page 42 of 49

SQL Server Performance

Attribute Grade is fully functionally dependent on the primary key (Sid, Course-id) because both parts of the primary keys are needed to determine Grade. On the other hand both Sname, and Phone attributes are not fully functionally dependent on the primary key, because only a part of the primary key namely Sid is needed to determine both Sname and Phone. Also attributes Credit-hours and Course-Description are not fully functionally dependent on the primary key because only Course-id is needed to determine their values. The new relation Student-courses still suffers from all three anomalies for the following reasons:

The relation contains redundant data (Note Database_Concepts as the course description for IS380 appears in more than one place). The relation contains information about two entities Student and course.

Following is the detail description of the anomalies that relation Student-courses suffers from.

Insertion anomaly: We cannot add a new course such as IS247 with course description programming techniques to the database unless we add a student who to take the course. Update anomaly: If we change the course description for IS380 from Database Concepts to New_Database_Concepts we have to make changes in more than one place or else the database will be inconsistent. In other words in some places the course description will be New_Database_Concepts and in any place were we forgot to make the changes the description still will be Database_Concepts. Deletion anomaly: If student Russell is deleted from the database we also loose information that we had on course IS417 with description System_Analysis.

The above discussion indicates that having a single table Student-courses for our database causing problems (anomalies). Therefore we break the table to smaller table to get a higher normal form relation. Before doing that let us define the second normal form. Second normal relation: A first normal form relation is in second normal form if all its nonprimary attributes are fully functionally dependent on the primary key. Note that primary attributes are those attributes, which are parts of the primary key, and nonprimary attributes do not participate in the primary key. In Student-courses relation both Sid and Course-id are primary attributes because they are components of the primary key. However attributes Sname, Phone, Course-description, Credit-hours and Grade all are non primary attributes because non of them is a component of the primary key. To convert Student-courses to second normal relations we have to make all non-primary attributes to be fully functionally dependent on the primary key. To do that we need to project (that is we break it down to two or more relations) Student-courses table into two or more tables. However projections may cause problems. To avoid such problems it is important to keep attributes, which are dependent on each other in the same table, when a relation is projected to smaller relations. Following this principle and examination of Figure-1 indicate that we should divide Student-courses relation into following three relations: PROJECT Student-courses ON (Sid, Sname, Phone) creates a table call it Student. The relation Student will be Student (Sid:pk, Sname, Phone) and PROJECT Student-courses ON (Sid, Course-id, Grade) creates a table call it Student-grade.

Page 43 of 49

SQL Server Performance

The relation Student-grade will be Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) and Projects Student-courses ON (Course-id, Course-Description, Credit-hours) create a table call it Courses. Following are these three relations and their contents: Student (Sid:pk, Sname, Phone) Sid Sname Phone 100 John 487 2454 200 Smith 671 8120 300 Russell 871 2356 Courses (Course-id::pk, Course-Description) Course-id Course-description IS380 Database Concepts IS416 Unix Operating System IS420 Data Net Work IS417 System Analysis Credit-hours 3 3 3 3

Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) Sid 100 100 200 200 200 300 Course-id IS380 IS416 IS380 IS416 IS420 IS417 Grade A B B B C A

All these three relations are in second normal form. Examination of these relations shows that we have eliminated the redundancy in the database. Now relation Student contains information only related to the entity student, relation Courses contains information related to entity Courses only, and the relation Student-grade contains information related to the relationship between these two entity. Further these three sets are free from all anomalies. Let us clarify this in more detail. Insertion anomaly: Now a new Course with course-id IS247 and Course-description can be inserted to the table Course. Equally we can add any new students to the database by adding their id, name and phone to Student table. Therefore our database, which made up of these three tables does not suffer from insertion anomaly. Update anomaly: Since redundancy of the data was eliminated no update anomaly can occur. To change the course-description for IS380 only one change is needed in table Courses. Deletion anomaly: the deletion of student Russell from the database is achieved by deleting Russell's records from both Student and Student-grade relations and this does not have any side effect because the course IS417 untouched in the table Courses.

Page 44 of 49

SQL Server Performance

Third Normal Form: A second normal form relation is in third normal form if all non-primary attributes (that is attributes that are not parts of the primary key or of any candidate key) have non-transitivity dependency on the primary key. Assume the relation: STUDENT (Sid: pk, Activity, fee) Further Activity ------------> fee that is the Activity determine the fee Sid 100 200 300 400 Activity Swimming Tennis Golf Swimming Fee 100 100 300 100

Table STUDENT is in first normal form because all its attributes are simple. Also STUDENT is in second normal form because all its non-primary attributes are fully functionally dependent on the primary key (Sid). Notice that a first normal relation with non-composite (that is simple) primary key automatically will be in second normal form because all its non-primary attributes will be fully functionally dependent on the primary key. Table STUDENT suffers from all 3 anomalies; a new student can not be added to the database unless he/she takes an activity and no activity can be inserted into the database unless we get a student to take that activity. There is redundancy in the table (see Swimming), therefore to change the fee for Swimming we must make changes in more than one place and that will cause update anomaly problem. If student 300 is deleted from the table we also loose the fact that we had Golf activity with its fee to be 300. To overcome these anomalies STUDENT table should be converted to smaller tables. Consider the following three projection of the STUDENT relation: PROJECT STUDENT on [Sid, Activity] and we get a relation name it STUD-AVT (Sid:pk, Activity) with the following data : STUD_ACT Sid 100 200 300 400 Activity Swimming Tennis Golf Swimming

PROJECT STUDENT on [Activity, Fee] and we get a relation name AVT-Fee (Activity:pk, Fee) with the following data :

Page 45 of 49

SQL Server Performance

AVT-Fee Activity Swimming Tennis Golf Swimming Fee 100 100 300 100

PROJECT STUDENT on [Sid, Fee] and we get a relation name Sid-Fee (Sid:pk, Fee) with the following data : Sid-Fee Sid 100 200 300 400 Fee 100 100 300 100

The question is which pairs of these projections should we choose? The answer to that is to choose the pair STUD-AVT and AVT-Fee because the join of these two projections produces the original STUDENT table. Such projections are called non-loss projections. Therefore the join of STUD-AVT and AVT-Fee on the common attribute Activity recreate the original STUDENT table. On the other hand as shown below the join of projections Sid-Fee and AVTFee on their common attribute Sid generates erroneous data that were not in the original STUDENT table and such projections are called loss projections. Following is the join of projections Sid-Fee and AVT-Fee on their common attribute Sid Sid Activity Fee 100 Swimming 100 100 Tennis 100 200 Tennis 100 200 Swimming 100 300 Golf 300 400 Swimming 100 400 Tennis 100 The three rows marked in red color were not in the original STUDENT table. Thus we have an erroneous data in the database. Both projections STUD-AVT and AVT-Fee are in third normal form and they do not suffer from any anomalies. Boyce Codd normal (BOC): A relation is in BOC form if every determinant is a candidate key. This is an improved form of third normal form. Fourth Normal Form: A Boyce Codd normal form relation is in fourth normal form if there is no multi value dependency in the relation or there are multi value dependency but the attributes, which are multi value dependent on a specific attribute, are dependent between themselves. This is best discussed through mathematical notation. Assume the following relation R(a:pk1, b:pk2, c:pk3) Recall that a relation is in BOC normal form if all its determinant are candidate keys, in other words each determinant can be used as a primary key. Because relation R has only one

Page 46 of 49

SQL Server Performance

determinant (a, b, c),, which is the composite primary, key and since the primary is a candidate key therefore R is in BOC normal form. Now R may or may not be in fourth normal form. 1. If R contains no multi value dependency then R will be in Fourth normal form. 2. Assume R has the following two-multi value dependencies: a ------->> b and a ---------->> c

In this case R will be in the fourth normal form if b and c dependent on each other. However if b and c are independent of each other then R is not in fourth normal form and the relation has to be projected to following two non-loss projections. These non-loss projections will be in fourth normal form. Example: Case 1: Assume the following relation: Employee (Eid:pk1, Language:pk2, Skill:pk3) No multi value dependency, therefore R is in fourth normal form. case 2: Assume the following relation with multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid ---->> Languages Eid ------>> Skills Languages and Skills are dependent. This says an employee speak several languages and has several skills. However for each skill a specific language is used when that skill is practiced. Eid 100 100 100 200 200 Language English Kurdish French English Arabic Skill Teaching Politic cooking cooking Singing

Thus employee 100 when he/she teaches speaks English but when he cooks speaks French. This relation is in fourth normal form and does not suffer from any anomalies.

Page 47 of 49

SQL Server Performance

case 3: Assume the following relation with multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid ---->> Languages Eid ------>> Skills Languages and Skills are independent. Eid 100 100 100 100 200 Language English Kurdish English Kurdish Arabic Skill Teaching Politic politic Teaching Singing

This relation is not in fourth normal form and suffers from all three types of anomalies. Insertion anomaly: To insert row (200 English Cooking) we have to insert two extra rows (200 Arabic cooking), and (200 English Singing) otherwise the database will be inconsistent. Note the table will be as follow: Eid 100 100 100 100 200 200 200 200 Language English Kurdish English Kurdish Arabic English Arabic English Skill Teaching Politic politic Teaching Singing Cooking Cooking Singing

Deletion anomaly: If employee 100 discontinue politic skill we have to delete two rows (100 Kurdish Politic), and (100 English Politic) otherwise the database will be inconsistent. Update anomaly: If employee 200 changes his skill from singing to dancing we have to make changes in more than one place The relation is projected to the following two non-loss projections which are in forth normal form Emplyee_Language(Eid:pk1, Languages:pk2) Eid 100 100 200 Language English Kurdish Arabic

Emplyee_Language(Eid:pk1, Skills:pk2) Eid 100 100 Skill Teaching Politic

Page 48 of 49

SQL Server Performance

200

Singing

Page 49 of 49

Das könnte Ihnen auch gefallen