Sie sind auf Seite 1von 16

Database Systems Journal vol. V, no.

2/2014 33

Query Optimization Techniques in Microsoft SQL Server

Costel Gabriel CORLĂŢAN, Marius Mihai LAZĂR,


Valentina LUCA, Octavian Teodor PETRICICĂ
University of Economic Studies, Bucharest, Romania
gabi.corlatan@yahoo.com, lazar_mariusmihai@yahoo.com,
luca.valentina@ymail.com, octavian.petricica@ymail.com

Microsoft SQL Server is a relational database management system, having MS-SQL and
Transact-SQL as primary structured programming languages. They rely on relational
algebra which is mainly used for data insertion, modifying, deletion and retrieval, as well as
for data access controlling. The problem with getting the expected results is handled by the
management system which has the purpose of finding the best execution plan, this process
being called optimization. The most frequently used queries are those of data retrieval
through SELECT command. We have to take into consideration that not only the select
queries need optimization, but also other objects, such as: index, view or statistics.
Keywords: SQL Server, Query, Index, View, Statistics, Optimization.

1 Introduction
We consider the following problems as
being responsible for the low
of the above situations and give more
details.

performance of a Microsoft SQL Server 2. Missing indexes


system. After optimizing the hardware, the This particular factor affects the most SQL
operating system and then the SQL server Server’s performance. When missing
settings, the main factors which affect the indexing of a table, the system has to go
speed of execution are: step by step through the entire table in
1. Missing indexes; order to find the searched value. This leads
2. Inexact statistics; to overloading RAM memory and CPU,
3. Badly written queries; thus considerably increasing the time
4. Deadlocks; execution of a query. More than that,
5. T-SQL operations which do not deadlocks can be created when for
rely on a single set of results example, session number 1 is running, and
(cursors); session number 2 queries the same table as
6. Excessive fragmentation of the first session.
indexes; Let’s consider a table with 10 000 lines
7. Frequent recompilation of queries. and 4 columns, among which a column
These are only a few of the factors which named ID is automatically incremented
can negatively influence the performance one by one.
of a database. Further, we will discuss each

Table 1.1. Running a simple query to retrieve a row in a table


With clustered index (execution time / query Without clustered index (execution time /
plan) query plan)
34 Query optimization techniques in Microsoft SQL Server

Table 1.2. Running a 2 table join query


With clustered index (execution time / query Without clustered index (execution time /
plan) query plan)

Tabel 1.3. Running a junction between two tables


Query (Q1) Query (Q2)
select*fromT_1whereID= 50000 select*
fromT_1asa
innerjoinT_2asb
ona.ID=b.ID
wherea.ID= 50000

In Table 1.1, the query is created using a - It is recommended that created indexes
single table, with and without a clustered to be used by the query optimizer. In
index on the column specified in the general, grouped indexes are better
WHERE clause (Q1). In the second table used for interval selections and ordered
(Table 1.2), the query has two tables, a join queries. Grouped indexes are also more
on ID column of the two tables and a suitable for dense keys (more
WHERE clause (Q2). duplicated values). Because the lines
According to [1] and [3], SQL Server are not physically sorted, queries which
supports the following types of indexes: run using these values which are not
- Clustered index; unique, will find them with a minimum
- Nonclustered index; of I/O operations. Ungrouped indexes
- Unique index; are more suitable for unique selections
- Columnstore index; and for searching individual lines;
- Index with included columns; - It is recommended for ungrouped
- Index on computed columns; indexes to be created with as low
- Filtered index; density as possible. Selectivity of an
- Spatial index; index can be estimated using the
- XML index; selectivity formula: number of unique
- Full-text index. keys/ number of lines. Ungrouped
According to [2], the main index indexes with selectivity less than 0, 1
optimization methods are the following: are not efficient and the optimizer will
Database Systems Journal vol. V, no. 2/2014 35

refuse to use it. Ungrouped indexes are creating temporary indexes. For
best used when searching for a single example, a report which is ran only
line. Obviously, the duplicate keys once a year or once a semester does not
force the system to use more resources require a permanent index. Create the
to find one particular line; index right before running the reports
- Apart from increasing the selectivity of and give it up afterwards, if that makes
indexes, you should order the key things happen faster than running the
columns of an index with more report without any indexes;
columns, by selectivity: place the - For unblocking page for an index, a
columns with higher selectivity first. system procedure can be used:
As the system goes through the index sys.sp_indexoptions. This forces the
tree to find a value for a given key, server to use blocking at line level and
using the more selective key columns table level. As long as the line
means that it will need less I/O blockings do not turn too often into
operations to get to the leaves level of table blockings, this solution improves
the index, which results in a much the performance in the case of multiple
faster query; simultaneous users;
- When an index is created, transactions - Thanks to using multiple indexes on a
and key operations in database are single table by the optimizer, multiple
taken into consideration. Indexes are indexes with a single key can lead to a
built so that the optimizer can use them better overall performance than an
for the most important transactions; index with a compound key. That is
- It is recommended that we take into because the optimizer can query the
consideration at the time of index indexes separately and can combine
creation, that they have to serve the them to return a set of results. This is
most often combining conditions. For more flexible than using an index with
example, if you often combine two compound key because the index keys
tables after a set of columns (join), you on a single column can be specified in
can build an index that will accelerate any combination, which cannot be
the combination; done in the case of compound keys.
- Give up the indexes which are not Columns which have compound keys
used. If, following the analysis of the have to be used in order, from left to
execution plans of queries which right;
should use indexes we see they cannot - We recommend using Index Tuning
actually be used, they should be Wizard application, which will suggest
deleted; the optimized indexes for your queries.
- It is recommended creating indexes on This is a very complex tool that can
references to external keys. External scan tracking files collected by SQL
keys require an index with unique key Server Profiler in order to recommend
for the referred table, but we have no the indexes that will improve the
restrictions on the table that makes the performance.
reference. Creation of an index in the
dependent table can accelerate 3. Inexact Statistics
checking the integrity of external keys According to [3], the SQL Server database
which result from the modifications to management system relies mostly on cost
the referred table and can improve the based optimization, thus the exact statistics
performance of combining the two are very important for an efficient use of
tables; indexes. Without these, the system cannot
- In order to deserve the rare queries and estimate exactly the number of rows,
reports of users, we recommend affected by the query. The quantity of data
36 Query optimization techniques in Microsoft SQL Server

which will be extracted from one or more execute the system procedure that updates
tables (in the case of join) is important the statistics, based on factors such as
when deciding the optimization method of number of updates and table size:
the query execution. Query optimization is - When inserting a line into an empty
less efficient when date statistics are not table;
correctly updated. - When inserting more than 1000 lines in
The SQL Server query optimizer is based a table that already has 1000 rows.
on cost, meaning that it decides the best Automatic update of statistics is
data access mechanism, by type of query, recommended in the vast majority of cases,
while applying a selectivity identification except for very large table, where statistics
strategy. Each statistic has an index updates can lead to slowing down or
attached, but there can be manually created blocking the system. This is an isolated
statistics, on columns that do not belong to case and the best decision must be taken
any index. Using statistics, the optimizer regarding its update.
can make pretty reasonable estimates Statistics update is made using the system
regarding the needed time for the system to procedure sys.sp_updatestats on an
return a set of results. indexed table or view.

Indexed column statistics Unindexed column statistics


The utility of an index is entirely Sometimes there is the possibility of
dependent on the indexed column executing a query on an unindexed
statistics. Without any statistics, the SQL column. Even in this situation the query
Server cost-based query optimizer cannot optimizer can take the best decision if it
decide which is the most efficient way of knows the data distribution of those
using an index. In order to satisfy this columns. As opposed to index statistics,
requirement, it automatically creates SQL Server can create statistics regarding
statistics on a index key every time the the unindexed columns. Information
index is created. The required mechanism regarding data distribution or the
of data extraction in order to keep the cost probability of having some value in an
low can use changing data. For example, if unindexed column can help the optimizer
a table has a single row that matches some establish an optimum strategy. SQL Server
value which is unique, then using a benefits of query optimization even when
nonclustered index makes sense. But if it cannot use an index to locate the values.
data changes, when adding a big number of This automatically creates statistics on the
rows with the same column value unindexed column when the information
(duplicates), using the same index does not that the system has, helps creating the best
make any sense. plan, usually when the columns are used in
According to [5], SQL Server utilizes an a predicate(ex: WHERE).
efficient algorithm to decide when to

Table 1.4. Query plan


Database Systems Journal vol. V, no. 2/2014 37

Table 1.5. Statistical data of the table "T_1"


selecta.ID,a.Col_1,a.Col_2,b.Col_3
fromT_1asa
innerjoinT_2asb
ona.ID=b.ID
wherea.ID= 50000

4. Badly written queries inefficiency of the index. For improving


Index efficiency depends a lot on the way performances, SQL queries must be
the queries are written. Taking a very large written so that they use the existing
number of lines from a table can lead to indexes.

Table 1.6. T-SQL query to identify the names of Table 1.7. T-SQL query to identify the names of
the tables that contain at least one line using the the tables that contain at least one line using the
system view "sys.partitions" (Q1) system view "sys.sysindexes" (Q2)

Table 1.8. Query execution time using the system Table 1.9. Query execution time using the system
view "sys.partitions" view "sys.sysindexes"

As an example we select two T-SQL sysindexes. As we may see Q1 is


inquiries executed on system tables aproximatively three times quicker than
(views). Both return the names of the table Q2. In this case it is recommended using
which start with “TI Word Map” from the the system view sys.partitions rather than
“dbo” layout and which contain at least sys. sysindexes which according to
one line. This method is more efficient Microsoft will be erased in future versions
than rolling a slider key on all tables from of SQL Server data bases.
“sys.tables”, than rolling an inquiry “select Methods for optimizing SELECT option:
count(*) from table name”, for each line - Every time possible it is recommended
with slider key, insertion of tables which to use as search columns in inquiries,
contain at least one line in a temporary the far left ones of the index. One index
table, and then its inquiry. However this on col_1 and col_2 is of no help in an
number can be determined by two inquiry which filtrates results of col_2;
methods: Table 1.6 and Table 1. 7. - It is recommended to build up WHERE
Although they may be identical the only terms which inquiry optimizer should
difference between the two inquiries is that recognize and use as searching tools;
for determining the number of lines in Q1 - Don’t use DISTINCT or ORDER BY
table it extracts the number from without any need. They may be used
sys.partitions and for Q2 from sys. only to eliminate duplicate values or to
38 Query optimization techniques in Microsoft SQL Server

select a specific order in the result set. the lines of the interior table are
With the sole exception when the scanned for each line of the exterior
optimizer can find an index that might table. This thing works very good for
serve them, they can engage an little tables and it was the only
intermediate working table, which can algorithm strategy used in SQL Server
be expensive when talking about before 7.0 version, but as tables
performance; become bigger and bigger this solution
- Use UNION ALL instead of UNION becomes less and less efficient. It is
when eliminating duplicates from a much better to do the normal
result set is not a priority. Because it algorithms between tables and to let the
eliminates the duplicates, UNION must optimizer select the best way to
sort or deal the result set before analyze them. Mostly the optimizer
returning it; will try to transform the pointless
- You may use SET LOCK_TIMEOUT minor inquiries in algorithms;
when controlling the time limit a - When possible it is recommended to
connection is waiting for a blocked avoid CROSS JOINT type algorithms.
resource. At the start of the session the With the exception of the case in which
automatic variable one cannot avoid the need for Cartesian
@@LOCK_TIMEOUT returns -1 product of two tables, it is used a more
which means that no value was efficient method of algorithms to chain
selected to expire. You can select as a one table after the other. Returning an
value for LOCK_TIMEOUT any unwanted Cartesian product and then
positive number which establishes the eliminating the duplicates generated by
number of milliseconds which an it using DISTINCT and GROUP BY is
inquiry waits a blocked resource before a problem which causes serious
to expire. In more difficult stages this damage to the inquiry;
is necessary to prevent apparent - You may use TOP(n) extension to
blocked applications; restrict the number of lines returned by
- If an inquiry includes the IN predicate an inquiry. This thing is useful mainly
which contains a list of constant values when you may insert values using
(instead of a minor inquiry) order the SELECT, because you may view only
values according to the release the values from the first part of the
frequency in the exterior inquiry, more table;
over when you know data tendencies. - You may use OPTIONS clause of the
A common solution is alphabetical or SELECT instruction for influencing the
numerical ordering of values but these inquiry optimizer with inquiry
may not be optimal. Because the suggestions. As well you may select
predicate returns TRUE as soon as it suggestions for tables and specific
reaches a resemblance for any of its algorithms. As a rule the optimizer
values, moving on the first positions of should optimize the inquiries, but there
the list the values which are released are cases in which the performing plan
frequently should accelerate the chosen is not the best. Using
inquiry, especially in the case when the suggestions for inquiry, table or
column where the searching is done, is algorithm, you may entail to a certain
not indexed; type of algorithm, group or union to
- It is recommended choosing algorithms use a certain index, so on and so forth.
despite of imbricated minor inquiries. These are called query hints; Here are
A minor inquiry may need an some of these:
imbricated inquiry that is a cycle in a o FAST number_of_lines – points
cycle. In the case of imbricated error, out that the inquiry is optimized
Database Systems Journal vol. V, no. 2/2014 39

to rapidly reclaim the first o TABLOCK – points out that the


“number of lines” After the first blocking is at a table level;
“number of lines” are returned, o HOLDLOCK – it’s the
the inquiry goes on and equivalent of the
produces the complete result SERIALIZABLE isolation
set; level;
o FORCE ORDER – points out o TABLOCKX – points out an
that the syntax order of the exclusive blocking of the table.
inquiry is enabled during the - Testing and comparing two inquiries to
inquiry optimization; determine the most efficient way of
o MAXDOP processor_number – accessing data, one must be sure that
overwrites the maximum the mechanism of data placement in
number of configurated cache memory of the SQL Server
parallelism using sp_configure; doesn’t impair test results. One way of
o OPTIMIZE FOR doing this is by cycling the server
(@variable_name) – points out between inquiries rolling. Another way
to the inquiry optimizer to use a is by using undocumented DBCC
specific value to a certain local commands to clean important cache
variable when inquiry is memories. DBCCFREEPROCACHE
compiled and optimized; extricates the memory for the
o USE PLAN – impels the procedure. DBCC
inquiry optimizer to use an DROPCLEANBUFFERS cleans all
existent inquiry plan. It can be cache memories.
used with: insert, update, merge
and delete options. 5. Excessive blocking (deadlocks)
- Beside query hints there are also table According to [2], SQL Server is a software
hints, which influence the inquiry compliant with “ACID” rules where
optimizer in taking some decisions as: “ACID” is an acronym for atomicity,
using a blocking method for a table, consistency, isolation and durability. This
using a certain index, blocking lines, assures that the variations made by the
etc. Here are some types: simultaneous transactions are worthy
o NOLOCK – isolated one versus the other. Compliance
READUNCOMITTED and with the rules by the transactions is
NOLOCK hints are applied mandatory for data safety.
only when data is blocked. - Atomicity: one transaction is atomic if
They obtain Sch-s (stability compliant with the principle “all or
scheme) blocked when nothing”. When a transaction is
compelling and executing. This successful all its variations become
indicates no blocking and permanent, when a transaction fails, all
doesn’t stop other transactions the variations are canceled;
accession to data, including - Consistency: one transaction is atomic
their modification; if compliant with the principle “all or
o INDEX – impels using an nothing”. When a transaction is
index; successful all its variations become
o READPAST – points out to the permanent, when a transaction fails, all
system not to read the lines the variations are canceled;
which are blocked by other - Isolation: a transaction is isolated if it
transactions; doesn’t concern other transactions or it
o ROWLOCK – a line is blocked; isn’t concerned by other rival
40 Query optimization techniques in Microsoft SQL Server

transactions on same data (levels of KEY blocking on a row level or by


isolation); PAG blocking on a page level;
- Durability: a transaction is durable if it • Extent (EXT) – this type of blocking is
can be done or reversed despite of a made after using ALTER INDEX
system breakdown. REBUILD command. This command is
used at a table level and the table pages
SQL Server Blocking can be moved from one scale to
When a session is doing an inquiry, the another scale. All this time the integrity
system determines the resources of the data of the extent is protected by EXT
base and gives data base blocking for the blocking;
respective session. The inquiry is blocked • Heap or B-tree (HoBT) – a HEAP or
in case that a session got blocked. Despite B-TREE blocking is used when data is
all these, for offering isolation and rivalry heaped on many file groups. Target
SQL Server offers different levels of object may be a table without clustered
blocking as: index or a B-TREE object. A setting in
• Row (RID) – this blocking is done on a ALTER TABLE offers a certain level
single row in a table and is the smallest of control over the blocking mode
blocking level. For example when an (lock escalation). Because the parts are
inquiry changes a row in a table a RID heaped in many file groups, each has to
blocking is made for that inquiry. have its own data assignation
• Key (KEY) – this is a blocking system definition. HoBT blocking type acts on
of a row in an index and is identified as a partition level, not on a table level;
a blocking key. For example for a table • Table (TAB) – this blocking type is the
with a clustered index, data pages of highest blocking level. Books complete
the table and data pages of the access to the table and its indexes;
clustered index are the same. Since • File (FIL);
both rows are the same for the • Application (APP);
clustered index table only a KEY type • MetaData (MDT);
blocking is obtained on the clustered • Allocation Unit (AU);
index row or the limited set of rows • Database (DB) – it is a blocking
while accessing the rows in the table; system kept on a database level. When
• Page (PAG) – PAG blocking system is an application gets access to a data
kept only on a page of a table or index. base, the blocking manager offers a
When an inquiry requires more rows in blocking at a data base level worthy to
a page, the consistency of all required a SPID (Speed Process ID).
rows can be kept whether by RID or
Database Systems Journal vol. V, no. 2/2014 41

Table 1.10. The session (event session) for a database jams

Isolation levels of transactions of other transactions) and readings that


SQL Server accepts four types of isolation cannot be repeated (data which changes
of a transaction. As I mentioned earlier the between readings during a transaction);
level of isolation of a transaction controls - READ COMMITTED - The default
the way in which it affects and is affected isolation level in SQL Server, so if you
by other transactions. In a level of isolation do not specify otherwise, you get the
it is always a reverse relation between data READ COMMITTED level. READ
consistency and users rivalry. Selecting a COMMITTED level avoids reading
more restrictive level of isolation amplifies "dirty", enforcing shared locks on
data consistency to the detriment of accessed data but allows modification
accessibility. Selecting a less restrictive of basic data during a transaction, thus
level of isolation amplifies rivalry to the the possibility of repeated readings and
detriment of data consistency. It is / or phantom data;
important to balance these opposite - REPEATABLE READ - The
interests, so the needs of the application to REPEATABLE READ level generates
be assured. locks that prohibit other users to
- READ UNCOMMITTED - READ modify data accessed by a transaction,
UNCOMMITTED parameter but does not prohibit the insert of new
specification is essentially the same as rows, which may result in phantom
using NOLOCK option in each table rows between readings during a
referenced by a transaction. This is the transaction;
least restrictive of the four isolation - SERIALIZABLE - The level
levels in SQL Server. It allows "dirty SERIALIZABLE prevents "dirty
reads" (reading not completed changes reads" and phantom lines by
42 Query optimization techniques in Microsoft SQL Server

introducing key-range locks on data are sometimes referred to as insensitive


access. It is the most restrictive of the and are the opposite of dynamic
four isolation levels in SQL Server. cursors;
This is equivalent to using the option - Dynamic – as with forward- only
HOLDLOCK on each table referenced cursors, dynamic cursors type reflect
by a transaction. changes on rows as they reach them.
In order to enforce an isolation level for a No extra space in tempdb is needed,
transaction, the command SET and unlike forward-only cursors they
TRANSACTION ISOLATION LEVEL are scrollable, which means they are
must be used. The valid parameters for the not restricted to sequential access on
isolation levels are READ rows. Sometimes these cursors are
UNCOMMITTED, READ COMMITTED, called sensitive;
REPEATABLE READ and - Keyset – keyset cursors return a fully
SERIALIZABLE. scrollable result set, whose
membership and order are fixed. As
6. T -SQL operations that are not based with static cursors, the unique-key
on a single set of results (cursors) values set of the cursor lines is copied
Given that Transact -SQL is a set based to tempdb when opening the cursor.
language, we know that it operates on sets. This is why the cursor membership is
Writing non-set based code can lead to fixed.
excessive use of cursors and loops. To
improve performance, it is recommended Cost comparison
to use queries based on set, not the row by Read-Only cursors
row approach, as the latter leads to The read-only model has the following
hardware overloading, resulting in a much advantages:
higher data access time. Excessive use of - The lowest level of locking: the
cursors increases the stress on SQL server read-only model introduces the
and results in reduced system performance. lock with the least impact and
database synchronization. Locking
Classification cursors on the base row while the cursor
Based on the level of isolation, according loops the rows can be avoided by
to [3] and [4] cursors may be: using the NO_LOCK command in
- Read-only – the cursor cannot be the SELECT instruction, but the
updated; dirty reads must be taken into
- Optimistic– updatable cursor and uses account;
the optimistic concurrency model (does - The highest level of concurrency:
not lock rows of data); as supplementary locks are not held
- Scroll Locks – updatable cursor that in the rows that form its base, the
has a locking system for any row to be read-only cursor does not block
updated. other users from accessing the base
table.
Types of cursors The main disadvantage of the read-only
- Forward-only – the default cursor cursor refers to the lack of updates:
type. It returns the rows from the data contents of the base table cannot be
set sequentially. No extra space needed modified by the cursor.
in tempdb and changes on data are
visible as soon as the cursor reaches it; Optimistic cursors
- Static – static cursors return a result set The optimistic model has the following
that can only be read, which is advantages:
impervious to changes in data. They
Database Systems Journal vol. V, no. 2/2014 43

- Low level of locking - Similar to READ_ONLY cursor with several


the read-only model, the optimistic built in performance optimizations;
concurrency model does not have a - It is recommended to avoid changing a
specific type of lock. In order to large number of lines using a cursor
further improve the concurrency, cycle contained in a transaction
the NOLOCK locking instruction because each line that it changes may
may also be used, as in the case of remain locked until the end of the
the read-only concurrency model. transaction, depending on the
Using the cursor to update a row transaction isolation level.
requires exclusive rights on that
row; 7. Excessive index fragmentation
- High concurrency - Since only a Normally, data is organized in an orderly
shared lock is used on the rows way. However, in the case in which the
behind it, the cursor does not block pages contain fragmented data or contain a
other users from accessing the base small amount of data due to frequent page
table. Changing a baseline will division, the number of read operations
block other users from accessing needed to return the data will be higher
the row during the update. than usual. Increasing the number of
readings is due to fragmentation.
Scroll Locks cursors Fragmentation occurs when table data is
The scroll locks model has the following changed .When executing instructions to
advantages: insert or update data (INSERT or
- Concurrency control: By blocking the UPDATE statements), clustered indexes on
base row corresponding the last row of tables and nonclustered indexes are
the cursor, it ensures that the base row affected. This issue can cause a tear in the
cannot be changed by another user; index leaf page when an index change
- For very large data sets, the use of cannot be stored on the same page. A new
asynchronous cursors is recommended. leaf page will then be added as part of the
Returning a cursor allows further original page in order to maintain the
processing while the cursor is logical order of rows in the index key.
populated. Asynchronous cursors are Although the new leaf page maintains the
set as follows: logical order of rows of data from the
original page, it usually will not be
EXECsp_configure'show physically adjacent to the original page on
advanced options', 1; disk. For example, suppose that an index
GO
RECONFIGURE has nine key values (or rows in the index)
GO and average index rows size allows a
EXECsp_configure'cursor maximum of four rows in a leaf page. 8KB
threshold', 0; leaf pages are connected to previous and
GO
RECONFIGURE
next pages to maintain the logical order of
GO the index In Figure 1.1, leaf pages layer 1
is shown. Since the key values of index
- When configuring read-only result sets, leaf pages are always sorted, a new row of
unidirectional, FAST_FORWARD the index with a key value of 25 must
cursor option is recommended instead occupy a place between the extended key
of FORWARD_ONLY. values of 20 and 30. Since the leaf page
FAST_FORWARD option creates a containing that existing index is filled with
FORWARD_ONLY and a four rows, the new index row will result in
dividing the page.
44 Query optimization techniques in Microsoft SQL Server

Fig 1.1. Layer leaf pages

A new page will be assigned to the index - Secondly, a measure switch to retrieve
and a part of the first page will be moved the value of key 50 after the value of
on this new page so that the new index key key 45;
can be expressed in the correct logical - Thirdly, a measure switch to retrieve
form. The links between indexed pages the value of key 90 after the value of
will also be updated as the pages are key 80.
logically linked in index order. As it can be This type of fragmentation is called
seen in Figure 1.1, a new page, even if external fragmentation. Fragmentation can
linking it to the other pages id done in the also occur in an index page. If an INSERT
correct logical order, physically the linking or UPDATE operation creates a page
can sometimes be unordered. break, then free space will be left behind in
Pages are grouped into larger units called the original leaf page. Free space can also
extents, which may include eight pages. be caused by a DELETE operation.
SQL Server uses unit as a physical unit of The net effect is the reduction of the
disk allocation. Ideally, the physical order number of rows included in one leaf page.
of extensions contains leaf pages whose For example, in Figure 1.3, the page split
key must be the same as the logical order caused by the INSERT operation has
of the index. This reduces the number of created a gap in the first leaf page. This is
required switches between extents when known as internal fragmentation.
reading a series of rows in the index. For a highly transactional database, it is
However, crevice in the pages can disturb preferable to deliberately leave space
the pages within extensions physically and inside the leaf pages to be able to add new
can also cause physical disorder even rows or change existing rows size without
among extensions. For example, suppose causing a rift in the page. In Figure 1.3 the
the first two leaf pages of the index are as free space of the first leaf page allows a
measure 1 and the third page is measure 2. key value in the index of 26. This can be
If measure 2 contains unallocated space, added to the leaf page without causing a
then the new leaf page allocated to the rift.
index causes page division. The page will Heap pages can become fragmented in
be measure 2, as shown in Figure 1.3. exactly the same way. Unfortunately,
With leaf pages distributed between two because of the storage mechanism and
extents, one would ideally expect a read of because every nonclustered index uses the
a series of index rows with a maximum of physical location of the data to retrieve
one switch between the two extents. data from heap, defragmenting is quite
However, the disorganization of pages difficult. The ALTER TABLE command
between extents can cause more than one using the REBUILD clause can be used to
switch during the read of a series of rows perform a reconstruction.
in the index. For example, to retrieve a SQL Server 2012 exposes leaf pages and
number of rows in the index between 25 other data through a dynamic system view
and 90, you will need three switches management system called
between the two extents, as follows: sys.dm_db_index_physical_stats. It stores
- Firstly, a measure switch to retrieve the both index size and degree of
value of key 30 after the value of key fragmentation.
25;
Database Systems Journal vol. V, no. 2/2014 45

According to Microsoft, the commands depending on the degree of fragmentation


ALTER INDEX REORGANIZE and of the system view as follows:
ALTER INDEX REBUILD are used

Table 1.11. Degree of fragmentation of the system


Value of avg_fragmentation_in_percent T-SQL code
column
> 5% and <= 30% ALTER INDEX REORGANIZE
> 30% ALTER INDEX REBUILD

Rebuilding an index can be performed


either online or offline, but reorganization
can only be executed online.

Fig 1.2. Outside leaf page order

Fig 1.3. Outside leaf pages order distributed extensions

Note that this index fragmentation is because the order pages in a SQL Server
different from disk fragmentation. Index file is understood only by SQL Server, not
fragmentation can’t be determined by by the operating system.
running the Disk Defragmenter tool,
46 Query optimization techniques in Microsoft SQL Server

Fig 1.4. Determining the degree of fragmentation of the index tables in schema "dbo" with
fragmentation greater than 10

Fig 1.5. The result of the query to determine the degree of fragmentation of indexes

8. Rebuilding Frequent Queries sys.sp_executesql. This is somewhere


The most common way to provide a between rigid stored procedures and ad
reusable execution plan, independently of hoc Transact-SQL queries, allowing to
the variables used in a query, is to use a run ad-hoc queries with replaced
stored procedure or parameterized query. parameters. This facilitates reuse of ad-
By creating a stored procedure to execute a hoc execution plans without the need
set of SQL queries, the database system for precise consistency;
creates a parameterized execution plan - For a small portion of a stored
independently of the parameters during procedure the query plan must be
execution. The execution plan generated rebuilt at every execution (e.g. due to
will be reusable only if SQL Server does data changes that doesn’t make the
not have to recompile individual optimal plan), but we do not want the
statements from stored procedure each overload associated with rebuilding
time it is executed (e.g. sequences of plan for the entire procedure each time,
dynamic SQL). Rebuilding frequent query that portion it should be moved in a
execution causes time increases. stand-alone procedure. This allows
Optimizing stored procedures methods: reconstruction of its execution plan
- Whenever possible it is recommended every time a run, but without affecting
to use stored procedures instead of ad- the procedure longer. If this is not
hoc queries. In order to reuse the possible, try using EXEC() to call the
execution plan of an ad hoc SQL query suspect code in the main procedure.
you have to match exactly and must Because this subroutine it is
fully qualify each object meant. If in dynamically built we can generate a
future use the query, everything is new execution plan at every execution,
different: the parameters, name objects, without affecting the whole stored
key elements of SET, the plan will not procedure query plan;
be reused. A good solution that avoids - When possible, it is better to use output
the limitations of ad-hoc queries is to parameters of stored procedures instead
use the system stored procedure of set results. If you need to return the
Database Systems Journal vol. V, no. 2/2014 47

result of a calculation or to locate a Conclusions


single value in a table, it is preferable Performance optimization is an ongoing
to return the output parameter of a process. This process requires continuous
stored procedure instead of a set result monitoring and improving database
with a single line. Even if you return performance. The purpose of this paper is
multiple columns, output parameters of to provide a list of SQL scenarios to serve
stored procedures are more effective as a quick and easy reference guide during
than complete set results; the development phase and maintenance of
- When you need to return a set of lines the database. Transact-SQL language
from a stored procedure to another, it is provides a wide range of techniques for
better to use output parameters of the updating query. On the top of the list are
cursor instead of set results. This found correct database design, adding
technique is considerably more flexible indexes and query interrogations.
and allows the second procedure to run Performance optimization is a complex
more quickly since it doesn’t work as subject, which certainly could fill many
set results. Then the caller can process books. The secret to success is knowing
the rows returned by the cursor as the instruments mentioned above in
desired; relation to the available space, knowing
- It is recommended to minimize the how the server works and the own the
number of network packets between required skills to solve problems using this
the client and server. A very effective knowledge.
way to achieve this goal is to disable
DONE_IN_PROC messages. You can References
disable it at the procedure level with [1] Adam Jorgensen, Jorge Segarra,
the SET NOCOUNT command or at Patrick Leblanc, Jose Chinchilla,
the server level with tracking indicator Aaron Nelson, Microsoft SQL Server
3640. Doing so may lead to huge 2012 Bible, Ed. Johs Wiley & Sons,
differences in performance, especially Inc., 2012, Indianapolis, Indiana -
when relatively slow networks are used USA, ISBN: 978-1-118-10687-7
such as WAN networks. When you [2] Ken Henderson, Transact-SQL (Titlul
choose not to use the tracking indicator original: The Guru’s Guide to
3640, SET NOCOUNT ON should be Transact-SQL), Ed. Teora, București,
used at the beginning of any stored România, 2002, ISBN: 973-20-0612-9
procedures that you write; [3] Grant Fritchey, SQL Server 2012
- When adjusting query use Query Performance Tuning, Ed.
PROCCACHE DBCC command to list Apress, USA, 2012, ISBN: 978-1-
information about cache memory 4302-4203-1
reserved for the procedure. Also use [4] Leonard Lobel, Andrew Brust,
the DBCC FREEPROCCACHE Programming Microsoft SQL Server
command to clear the memory cache so 2012, Ed. Microsoft Press, 2012,
that multiple executions of a given ISBN-13: 978-0735658226
procedure not alter test results. DBCC [5] Jason Strate, TedKrueger, Expert
FLUSHPROCINDB used to force the Performance Indexing for SQL Server
creation of new execution plans for 2012, Ed. Apress, USA, 2012, ISBN:
basic procedures. 978-1-4302-3741-9
48 Query optimization techniques in Microsoft SQL Server

Costel Gabriel CORLĂŢAN studies at Academy of Economic Studies,


Database for Business Support master program. Main technologies that he is
working with are: MS SQL Server, .NET Framework, ASP. NET, C#.

Marius Mihai LAZĂR: Database for Business Support masterand and


programmer in the area of database developing, tuning and optimization,
mainly Microsoft technologies like SQL Server and Visual Studio. Also he
develop web, desktop and mobile applications. In 2012 he published and
documented an application about “Elementary cellular automaton”.

Valentina LUCA graduated from the Faculty of Cybernetics, Statistics and


Economic Informatics of the Bucharest Universisty of Economic Studies in
2012. Currently she is a masterand enrolled in Databases for Business
Support program. Her fields of interest: databases, data warehouses,
business intelligence.

Octavian Teodor PETRICICĂ: graduated from the Faculty of


Mathematics and Informatics of the Hyperion University of Bucharest in
2012. He is currently a masterand of Database for Business Support
program. His scientific fields of interest include: Databases, Web
development and Application programming interfaces. At present he is a
Software Developer in the department of IT Applications at Cargus
International SA.

Das könnte Ihnen auch gefallen