Sie sind auf Seite 1von 3

Summary of the Most Commonly Used Join Algorithms

The following table summarizes some of the major characteristics of the most commonly used
join algorithms:
Join Method Important Properties
Product
• Always selected by the Optimizer for WHERE clause inequalityconditions.
• High cost because of the number of comparisons required.
Merge/Hash
• When done on matching primary indexes, do not require any data to be redistributed.
• Hash indexes are often better performers and should be used whenever possible.
Nested
• Only join expression that generally does not require all AMPs.
• Preferred join expression for OLTP applications.

Product Join
Definition
A product join compares every qualifying row from one table to every qualifying row from the
other table and saves the rows that match the WHERE condition. This operation is called a
product join because the number of comparisons needed is the product of the number of
qualifying rows in the two tables.
Cost of a Product Join
Product joins are relatively more time consuming than other types of joins because of the number
of comparisons that must be made. The Optimizer uses product joins under the following
conditions.
• The join condition is not based on equality
• The join conditions are ORed
• It is less costly than other join forms
The product join is usually the most costly in terms of system resources, and is used only when
there is no more efficient method, such as a merge join or a nested join. However, a product join
is useful because it can resolve any combination of join conditions.

Merge Join
Definition
A merge join retrieves rows from two tables and then puts them onto a common AMP based on
the row hash of the columns involved in the join. The system sorts the rows into join column row
hash sequence, then joins those rows that have matching join column row hash values.
In a merge join, the columns on which tables are matched are also the columns on which both
tables, or redistributed spools of tables, are ordered. A merge join is generally more efficient than
a product join because it requires fewer comparisons and because blocks from both tables are
read only once.
Two different merge join algorithms are available:
• Slow path
The slow path is used when the left table is accessed using a read mode other than an all-rows
scan. The determination is made in the AMP, not by the Optimizer.
• Fast path
The fast path is used when the left table is accessed using the all-row scan reading mode.
Hash Join
Introduction
Hash Join is a method that performs better than merge join under certain conditions. The
performance gain with the classic hash join comes mainly from eliminating the need for sorting
the join tables before performing the join.
Two different hash join algorithms are available:
• Classic
• Direct table
Classic Hash Join
Stage / Process
1. Read a row from the right table, which is an unsorted spool file containing the row hash
value for each row as well as its row data.
2. Match each right row with all the left table rows having the same row hash.
3. Join the rows.

Nested Join
Definition
A nested join is a join for which the WHERE conditions specify a constant value for a unique
index in one table and those conditions also match some column of that single row to the primary
or secondary index of the second table.

Types of Nested Join


There are two types of nested joins: local and remote.

Local nested joins


Definition
Use of a local nested join implies several things.
• If necessary, the resulting rows of a nested join are redistributed by row hashing the
rowID of the right table rows.
• The rowID is used to retrieve the data rows from the right table.
Only local nested joins can result in a rowed join. A rowID join is needed if and only if a nested
join is carried out and only rowIDs for rows in the right table are retrieved

Remote nested joins


Definition
Remote nested joins are used for the case in which a WHERE condition specifies a constant
value for a unique index of one table, and the conditions might also match some column of that
single row to the primary or secondary index of a second table.
The expression remote nested join implies that a message is to be sent to another AMP to get the
rows from the right table.
A remote nested join does not always use all AMPs. For this reason, it is the most efficient join in
terms of system resources and is almost always the best choice for OLTP applications.

Exclusion Join
Definition
An exclusion join is a product or merge join where only the rows that do not satisfy (are NOT IN)
any condition specified in the request are joined. In other words, an exclusion join finds rows in
the first table that do not have a matching row in the second table. Exclusion joins are an implicit
form of the outer join.

Inclusion Join
Definition
An inclusion join is a product or merge join where the first right table row that matches the left row
is joined.
There are two types of inclusion join.
• Inclusion merge
• Inclusion product

RowID Join
Introduction
The RowID join is a special form of the nested join. The Optimizer selects a RowID join instead of
a nested join when the first condition in the query specifies a literal for the first table. This value is
then used to select a small number of rows which are then equijoined with a secondary index
from the second table.

Correlated Joins
Introduction
The correlated join constitute a class of join methods developed to process correlated
subqueries. Some correlated joins are extensions of the following more general join types:
• Inclusion Merge
• Exclusion Merge
• Inclusion Product
• Exclusion Product
For each of these types the right table is a collection of groups and a left row can be returned
once for each group. Other members of the correlated join family are unique types. The following
graphic illustrates the generic correlated join process:

Minus All Join


Definition
The Minus All join method is used to implement MINUS, INTERSECT, and outer joins.
The process applied by the minus all join algorithm is provided in the following table:

Stage
• Distribute and sort the left and right tables based on their column_1 values.
• For each left table row, start at the current right table row and read until a row having a
value >= the left table column_1 value is found.
• If the right table row column_1 > the left table row column_1 or if no more right table rows
are found, then return the left table row.

Das könnte Ihnen auch gefallen