You are on page 1of 14

Indexes

Abstract
This is a small article on the Index in DB and defining Clustered and Non-Clustered Index with
reference to the Microsoft SQL server DB.

1 Introduction
In our day to day business application huge amount of transaction
information is being stored on our database. In the beginning when the
amount of records is less and everything is working fine and then
suddenly after some time the support guy comes and say query is taking
time to execute and the system is taking the huge amount of time to
process the records. Indexes comes to prevention where we forecast our
SQL queries and how db generating the Execution SQL plan for the query.
Index result in a tremendous improvement in SQL query if used properly
and vice versa also is true.

2. Indexes
Indexes are used to explicitly speed up the SQL statement execution on a table. The
index points to the location of the rows containing the value.
The default type of the Index used in the DB are B-Tree index
By default, when we execute a select command with no primary key and index
based on a search criteria e.g. Take the below numbers as an Id and we will be
writing our select command with id in where clause. So when the database will
execute the query it will search sequentially row by row the Id right from the
beginning till it find the desired row. So we can imagine the scenario if we have
billions of data.

So in order to minimize the sequential search problem there when B-Tree comes into
the picture. B-Tree is a self-balancing tree data structure which is an extension of
Binary tree that keeps data sorted and allows searches, sequential, insertion and
deletion in a quick manner.

Figure 2: B-Tree

Image Source: https://technet.microsoft.com/enus/library/ms177443(v=sql.105).aspx

Sequential Search is called Table Scan and when they are searched using B-tree
they are called Index seek or index scan.
Lets do a demo to check what kind of scan does SQL server does when there are no
indexes and primary key in table.
So below we have a table called CityName which doesnt have any indexer and will
contain information regarding the City.

Figure 3: TblCustomer Data

So lets run our select command with one particular Cityid selected
As shown in the diagram and we will use, display Estimated Execution plan.

Figure 4: Demonstrating Display Estimated Execution plan option.

Execution plan tells us how the SQL query performs when we dont have any index
in our table.

Figure 5: Demonstrating the execution plan of the query.

We will make use of STATISTICS IO. STATISTICS IO provides the detailed


information about the query impact on the DB server. It tells the number of logical
reads, physical reads and how many tables was scanned. A good performing query
is that query which has less number of logical reads.
Once we execute below sets of command and check the output statistics:
SET STATISTICS IO ON
SELECT * FROM CityName WHERE CityID='6137';

Figure 6: Number of logical reads 26

Scan Count: This count specifies that the optimizer has chosen a plan that
caused 1 Scan count.
Logical Reads: This number specifies the number of pages read from the
cache. This is the important parameter that is needed to be focused. This
number can be decreased using index structure.
Physical Read: This number specifies the number of pages actually read
from the disk. These are pages that were not in data cache. SQL server
performs everything on cache if the requested page is not present it will read
the page from the disk and then put the same in the cache, then use that
page.

Lob logical Reads: This number grows if request any large object such as
an image, varchar(max), nvarchar(max).
Lob physical read: is same as physical read for lob pages.
Lob read-ahead reads: This number specifies Number
of text, ntext, image or large value type pages placed into the cache for the
query.

Now lets follow the below steps and add an index into the table and check the
performance.

Figure 7: Table Design View and Add Indexes/Keys

Figure 8: Adding an Index and setting Create as Clustered Index as True

Now lets again run our execution plan and check is it again using Table Scan?

Figure 9: Demonstrating Index scan now.

Now lets run our performance statistics and check is there any difference?

(1 row(s) affected)
Table 'CityName'. Scan count 1, logical reads 2 and physical reads 0, read-ahead reads 0, lob logical
reads 0, lob physical reads 0, lob read-ahead reads 0.

We can see the difference before the logical reads were 26 and now they have
decreased to 2. So we can say there is a performance improvement by using an
Index.

Lets talk about Clustered and Non Clustered Index.

3. CLUSTERED INDEX:
Clustered Index determines the physical order of data in the table. In Clustered
Index the rows are stored physically on the disk in the same order as the Index. So
therefor there can only be one Clustered Index because the records can only be
ordered in one order.

Figure 10: Structure of a clustered index in a single partition

Image Source: https://technet.microsoft.com/enus/library/ms177443(v=sql.105).aspx

By the statement Clustered Index determines the physical order of data in the table
means that data inserted in the table is order wise reference to the Index Column.
That means that if we insert data in unordered way still SQL
server will insert
in order wise according to the index. If the table is not a Clustered Index, its rows
are inserted in unordered structure known as heap.
Let take an example and create two same table with different name and create
index in one table and leave the second and try to insert the values in the table and
fetch the same.

CREATE TABLE tblOrder


(
order_id int,
sku_id int,
description varchar(50),

mrp decimal,
sp decimal
)
CREATE TABLE tblOrderWithIndex
(
order_id int primary key,
sku_id int,
description varchar(50),
mrp decimal,
sp decimal
)

To check the Index in the table we can use the following command
Exec Sp_helptIndex tableName

Figure 11: Demonstrating Index created in SQL Server

So here we have created two tables, one with primary key and leaving the other
table without an index. Lets now insert some values into the table.
Inserting values in tblOrderWithIndex table
insert into tblOrderWithIndex values(10,1234,'Apple Iphone 5',23000,22000);
insert into tblOrderWithIndex values(1,1235,'Samsung Note 3',21000,20000);
Lets run the select command to check how the records are being retrieved in an order or not?

Figure 12: Order wise records are retrieved from the database.

Lets try the same with table without index.


insert into tblOrder values(10,1234,'Apple Iphone 5',23000,22000);
insert into tblOrder values(1,1235,'Samsung Note 3',21000,20000);

Figure 13. Checking if there is any Index present in table tblOrder.

Figure 14. Un Ordered records are retrieved from the table.

I.e the Clustered Index stores the records in an ordered format while if there is no
index the records are not stored in order format.
We can have only one Clustered Index but however the index can have multiple
columns can be termed as (Composite Index)

In order to create the Composite Index, we just add the column name to the Index
CREATE CLUSTERED INDEX In_tblOrderWithIndex_skui_id_description
on tblOrderWithIndex(description asc,sku_id asc)

If we try to execute the same we will face an error as mentioned below:


Msg 1902, Level 16, State 3, Line 2
Cannot create more than one clustered index on table 'tblOrderWithIndex'.
Drop the existing clustered index 'PK__tblOrder__46596229992B8137'
before creating another.
So we need to first drop the first Index and create the new composite index.

Figure 15. Deleting Index.

Delete the Index and create the composite index keys


CREATE CLUSTERED INDEX In_tblOrderWithIndex_skui_id_description
on tblOrderWithIndex(description asc,sku_id asc)

Now lets execute our select command to retrieve the rows and check how the
records are now stored.

Figure 16. Fetching records with Composite Clustered Index.

We can see that now we are getting records based on sku id and description in asc
mode.

4. Non-clustered Index
Non-clustered index is a pointer to the data. The data is stored in one place, the
index in another place. The index will have a pointer to the location of the data.
Since, the data is stored in another place than the actual data we can have a
different Non-Clustered Index. We can determine the following as shown below
where we have a pointer table with a row address pointing to the actual data.
order_
id
2
10
1

sku_id
1233
1234
1235

Description
Apple Iphone 4
Apple Iphone 5
Samsung Note 3

mrp
11000
23000
21000

sp
10000
22000
20000

Table 1. Actual Data Stored.

order_id

ROW
2 ROW
10 ROW
1 ROW

Locator
ADDRESS
ADDRESS
ADDRESS

Table 2. Reference Table pointing towards the Actual Table rows address.

Figure 17. Non Clustered Index pointing towards Clustered Index

Image source: https://technet.microsoft.com/en-us/library/ms177484(v=sql.105).aspx

A Non Clustered Index can be typically be imagined as Book Index.

Img source: https://www.prismnet.com/~hcexres/textbook/images/print_index.gif


Figure 18. Book Index Page.

We can have a many Non-Clustered Index as shown in above diagram where we


have a number of Indexes. In a Non-Clustered Index the data is stored in the
ascending and descending order of the Non-Clustered Index key. The Non-Clustered
Index doesnt influence the way records get inserted in the table. The leaf node of
the Non-Clustered index is made up of index pages rather than data pages.

Concluding the Clustered and Non-Clustered.

We can have only one clustered Index while we can have as many Nonclustered Index.
Clustered Index are faster than the Non Clustered as Clustered Index
determines the physical order of the records.

Disadvantage of Clustered Index: As we have already discussed about the


advantages of Clustered Index lets discuss the disadvantage of Clustered
index.
As we know that in Clustered Index determines the physical order of data in
the table. So if update a record and change the value of the indexed column,
the database will need to move an entire column to a new position to make
sure the every row is sorted. So the resulting in update query into a delete
query followed by an Insert query which obviously leads to decrease in
performance.

References
http://odetocode.com/articles/70.aspx
https://www.simple-talk.com/content/article.aspx?article=934