Sie sind auf Seite 1von 11

Merge join Vs Hash join Vs Nested

loop join
by Muthukkumaran kaliyamoorthy Published on: October 5, 2011
Comments: 25 Comments
Tags: Merge join Vs Hash join Vs Nested loop join, Physical join type
Categories:Performance, SQL party

This months TSQL Tuesday party is being hosted by Stuart R


Ainsworth (Blog| Twitter). I am very glad to write my first blog
post as t-SQL Tuesday post on my newly designed website.
SQL server has three types of internal joins. I know most of folks
never heard this join type because its not logical join and its not
often used in their codes.
Then, when it will be used?
Well the answer is it depends.
This means it depends upon the record sets and indexes. The
query optimizer will be smart and always try to pick up the most
optimal physical joins. As we know SQL optimizer creates a plan
cost based and depends upon the query cost it will choose the
best join.
How the query optimizer will choose the join type internally?
Well, there is some algorithm has written internally for the query
optimizer to choose the join type.
Lets go for some practical examples and will finally summarize it.
First I will give some basic idea how the join will work and
when/How the optimizer will decide to use anyone of the internal
join (Physical join).

It depends upon the table size


It depends upon the index on the join column
It depends upon the Sorted order on the join column
Test:
The test has done by following configuration.
RAM: 4 GB
Server : SQL server 2008 (RTM)

?
create table tableA (id int identity ,name
varchar(50))
declare @i int
set @i=0
while (@i<100)
begin
insert into tableA (name)
select name from master.dbo.spt_values
set @i=@i+1
end
--select COUNT(*) from dbo.tableA --250600
go
create table tableB (id int identity ,name
varchar(50))
declare @i int
set @i=0
while (@i<100)
begin
insert into tableB (name)
select name from master.dbo.spt_values
set @i=@i+1
end
-- select COUNT(*) from dbo.tableB --250600
select * from dbo.tableA A join tableB B
on (a.id=b.id)

Test1: Without Index

Lets create a clustered index


?
create
tableA
create
tableB

unique clustered index cx_tableA on


(id)
unique clustered index cx_tableB on
(id)

Test1: With Index

If either of the table has indexed then it goes hash join. I havent
shown this picture here. You can drop either of the table indexes
and test it.
Test2: Without Index
Lets create a medium table

?
create table tableC (id int identity,name
varchar(50))
insert into tableC (name)
select name from master.dbo.spt_values
-- select COUNT(*) from dbo.tableC --2506
create table tableD (id int identity,name
varchar(50))

insert into tableD (name)


select name from master.dbo.spt_values
select * from dbo.tableC C join tableD D
on (C.id=D.id)
-- select COUNT(*) from dbo.tableD --2506

Test2: With Index


Lets create a clustered index
?
create
tableC
create
tableD

unique clustered index cx_tableC on


(id)
unique clustered index cx_tableD on
(id)

If either of the table has indexed then it goes merge join. I


havent shown this picture here. You can drop either of the table
indexes and test it.
Test3: Without Index

?
create table tableE (id int identity,name
varchar(50))
insert into tableE (name)
select top 10 name from master.dbo.spt_values
-- select COUNT(*) from dbo.tableE --10
create table tableF (id int identity,name
varchar(50))
insert into tableF (name)
select top 10 name from master.dbo.spt_values

-- select COUNT(*) from dbo.tableF --10

Lets create a clustered index

?
create
tableE
create
tableF

unique clustered index cx_tableE on


(id)
unique clustered index cx_tableF on
(id)

Test3: With Index

If either of the table has indexed then it goes Nested loop join. I
havent shown this picture here. You can drop either of the table
indexes and test it.
You can also join tables vice versa like big table Vs Medium table
Vs small table
?
select * from dbo.tableA A join tableC C
on (a.id=C.id)
select * from dbo.tableA A join tableE E
on (a.id=E.id)
select * from dbo.tableC C join tableE E
on (C.id=E.id)

In this case if all the table has indexed then it goes Nested loop
join. If they dont then hash join. If either of the table has
indexed then it goes Nested loop join. I havent shown this
picture here.
Still you can force optimizer to use any one of the internal joins,
but its not good practice. The query optimizer is smart it will
dynamically choose the best one.
Here just I used the merge hint so the optimizer goes to merge
join instead of a hash join (Test1 without an index)

?
select * from dbo.tableA A join tableB B
on (A.id=B.id)option (merge join)
select * from dbo.tableA A inner merge join
tableB B
on (A.id=B.id)

Table 1: Test uses a unique clustered index

From the table diagram:

If both the tables have NO index then the query optimizer


will choose Hash joins internally.

If both the tables have indexes then the query optimizer will
choose Merge (For big tables) /Nested loop (For small
tables) internally.

If either of the tables have indexes then the query optimizer


will choose Merge (For medium tables) /Hash (For big
tables) /Nested loop (For small & big Vs small tables) internally.
Table 1: Test using clustered index
?
(create clustered index cx_tableA on tableA
(id))

With index
Table size
(Both)
Big (Both)
HASH
Medium (Both) HASH
NESTED
Small (Both)
LOOP
Big Vs
Small(medium) HASH

Without
Index(Both)
HASH
HASH
NESTED
LOOP
HASH

Either of
table has
index
HASH
HASH
HASH
HASH

Das könnte Ihnen auch gefallen