Advanced Teradata Concepts

Advanced
Teradata
Concepts
Last Updated : 21st Dec 2004
Center of Excellence
Data Warehousing
Topics
Primary and Secondary Indexes
Join Processing
Join Indexes
Hash Indexes
Partitioned Primary Indexes
Collect Statistics
Priority Scheduler
Teradata Dual Active Server
Primary and Secondary Indexes
Indexes
Teradata provides numerous indexing
options that can improve query
performance for different types of queries
and workloads. Following kinds of indexes
are available:
Primary Index
Secondary Indexes
Join Indexes
Hash Indexes
Partitioned Primary Indexes.
Primary Indexes
In Teradata, Primary Index is a mechanism
to assign and store a data row in an AMP.
Since primary index is used to store data
rows, retrieving data using primary index is
very efficient.
Primary Index can be Unique or Non-Unique.
Choosing Primary Index is critical as it
affects the data distribution across the
processing units (AMPs) and hence affects
the performance.
Primary Index Choice Criteria

Access Demographics Choose the column
most frequently used for access to
maximize the number of one AMP
operations.
Distribution Demographics Better
distribution optimizes parallel processing.
Volatility Changing PI may cause the row
itself to be moved to another AMP. Stable
PI reduces data movement overhead.
UPI and NUPI

UPI
Best distribution due to unique value.
One AMP operation and uses only one I/O.
Best performance.
NUPI
Good distribution for near unique values.
Duplicate PI rows go to same block. No extra
I/O if all duplicate rows fit in single block.

Duplicate row check required if there is no
USI defined.
Multiple I/Os required if rows do not fit in a
single data block.
UPI and NUPI (cont.)

Highly non-unique values cause skewed distribution.
Highly non-unique values cause extra overhead in
duplicate row check.
Multi-Column PI gives better distribution.
But as the number of column increases the index
becomes less usable.
Partial values can not be used for PI access.
Do not include a column for index selection that
does not improve the selectivity of the index.
Secondary Indexes
Secondary Index values are stored in sub tables.
May be unique or non unique.
Teradata implements USI and NUSI differently.
Index Subtable
Sec. Index value
Hash Algorithm
SI Value
BT Row ID
SI value
BT Row ID
Base Table
Secondary Indexes
USI are hash distributed across all AMPs.
Sub table rows may reside in a AMP other than
the base table row.
USI access involved two-AMP operation.
NUSI are implemented on a AMP local

basis.
Sub table rows located in the same AMP of
base table rows.
NUSI access involved all-AMP operation.
Secondary Index Considerations

Need additional storage to hold sub-table.
Need additional I/O.
Choose columns for NUSI candidate only those
having frequent access.
If COLLECTed STATISTICS are not available
Teradata may not choose NUSI as the access
path.
Use EXPLAIN facility to see the plan chosen by
the optimizer.
NUSI Bit Mapping

Used when multiple NUSI are being used will AND
condition.
Identifies common Row Ids in the satisfied by the
query before retrieving the base table rows.
SI Value
Row Id
Indx1
Multiple-column
secondary indexes are less
usable. Define multiple
secondary indexes to allow
bit mapping.
Indx2
Row Access Methods

PI/NUPI
USI
NUSI
FTS
PI Value
USI Value
NUSI Value
Hashing Algorithm
Hashing Algorithm
Hashing Algorithm
Sub Table
Sub Table
Sub Table
Sub Table
Base Table
Base Table
Base Table
Base Table
Value
Hashing Algorithm
Value Ordered NUSI

NUSI sub-tables are local to the AMP
corresponding to its base table and, by
default, are sorted in row hash of the
secondary index column.
Value Ordered NUSI sub-tables are sorted by
secondary index column-value rather than its
row hash.
Value Ordered NUSI are efficient for
processing queries with range conditions and
inequality conditions on the secondary index
column.
Value Ordered NUSI (Cont.)

CREATE INDEX Indx_Shipdate(L_Shipdate) on LineItem;
SELECT * FROM LineItem WHERE L_Shipdate < DATE '1992-02-28';
3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an

all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE <
DATE '1992-02-28'") into Spool 1 (group_amps), which is built
locally on the AMPs. The input table will not be cached in memory,
but it is eligible for synchronized scanning. The size of Spool 1
is estimated with high confidence to be 1,764 rows. The estimated
time for this step is 2.58 seconds.
Optimizer did not choose Secondary Index. Why ?
Value Ordered NUSI (Cont.)

CREATE INDEX Indx_Shipdate(L_Shipdate) ORDER BY VALUES ON
Lineitem;
SELECT * FROM Lineitem WHERE L_Shipdate < DATE '1992-02-28';
3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of a

traversal of index # 4 with a range constraint of (
"TPCH.lineitem.Field_1035 < DATE '1992-02-28'") extracting row ids
only into Spool 2 (all_amps), which is built locally on the AMPs.
Then we do a SORT to order Spool 2 by row id eliminating duplicate
rows. The size of Spool 2 is estimated to be 1,764 rows. The
estimated time for this step is 0.04 seconds.
4) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of row
ids from Spool 2 (Last Use) with no residual conditions into Spool
1 (group_amps), which is built locally on the AMPs. The input
table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 1,764 rows. The estimated time for this step is
2.58 seconds.
Join Processing
Join Processing
Each AMP performs join processing in
parallel.
Optimizer chooses best join strategy based
on
Available indexes, and
Data Demographics (Collect
Statistics/Dynamic Sampling)
Rows must be on the same AMP for matching.
Teradata temporarily moves the rows to same
AMP if they are not in the same AMP for join.
Join Processing
General Join Scenarios:
Join column is the PI of both the tables.
Join column is PI of one of the tables.
Join column is not a PI of either of the table.
Case 1- PI of both the tables

Rows taking part in the join are already in
the same AMP.
No data movement is necessary.
This is the best-case scenario.
Case 1 - Example
CREATE SET TABLE EMPLOYEE
(
EmpNo SMALLINT
Name VARCHAR(12),
DeptNo SMALLINT,
JobTitle VARCHAR(12),
Salary DECIMAL(8,2)
DOB DATE,
)
UNIQUE PRIMARY INDEX ( EmpNo )
CREATE SET TABLE LOCATION

(
EmpNo INTEGER,
Loc VARCHAR(25)
)
PRIMARY INDEX ( EmpNo );
SELECT E.EmpNo, E.Name, L.Loc

FROM Employee E, Location L
WHERE E.EmpNo = L.EmpNo;
Case 1 - Explain Output

1) First, we lock a distinct PERSONNEL."pseudo table" for read on a
RowHash to prevent global deadlock for PERSONNEL.LOCATION.
2) Next, we lock a distinct PERSONNEL."pseudo table" for read on a
RowHash to prevent global deadlock for PERSONNEL.EMPLOYEE.
3) We lock PERSONNEL.LOCATION for read, and we lock
PERSONNEL.EMPLOYEE for read.
4) We do an all-AMPs JOIN step from PERSONNEL.EMPLOYEE by way of a
RowHash match scan with no residual conditions, which is joined to
PERSONNEL.LOCATION. PERSONNEL.EMPLOYEE and PERSONNEL.LOCATION are
joined using a merge join, with a join condition of (
"PERSONNEL.EMPLOYEE.EmpNo = PERSONNEL.LOCATION.EmpNo"). The
result goes into Spool 1 (group_amps), which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to
be 24 rows. The estimated time for this step is 0.04 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.04 seconds.
Case 2 - PI of one of the tables

One table has its rows on the target AMP.
Rows of the other table need to be
redistributed to their target AMPs by the
hash code of the join column value.
If the table is small, optimizer may choose
to duplicate the table on all AMPs
Case 2 Example
(
EmpNo SMALLINT
Name VARCHAR(12),
DeptNo SMALLINT,
Salary DECIMAL(8,2)
DOB DATE,
)
CREATE SET TABLE Department

(
DeptNo SMALLINT,
DeptName VARCHAR(14),
Loc CHAR(3),
MgrNo SMALLINT
)
UNIQUE PRIMARY INDEX ( DeptNo );
SELECT E.EmpNo, E.Name, D.DeptName

FROM Employee E, Department D
WHERE E.Deptno = D.DeptNo;
Case 2 Explain Output
4) We do an all-AMPs RETRIEVE step from PERSONNEL.EMPLOYEE by way of

an all-rows scan with a condition of ("(PERSONNEL.EMPLOYEE.DeptNo
>= 100) AND ((PERSONNEL.EMPLOYEE.DeptNo <= 900) AND (NOT
(PERSONNEL.EMPLOYEE.DeptNo IS NULL )))") into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with no confidence to be 5 rows. The estimated time for
this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from PERSONNEL.Department by way of a
RowHash match scan with no residual conditions, which is joined to
Spool 2 (Last Use). PERSONNEL.Department and Spool 2 are joined
using a merge join, with a join condition of ("DeptNo =
PERSONNEL.Department.DeptNo"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 5 rows. The
Case 3 - not a PI of either of the table

Rows of both the tables need to be
redistributed to their target AMPs by the
hash code of the join column value.
Optimizer might choose to duplicate the
smaller table on all AMPs.
This join scenario involves maximum
number of data movement.
Case 3 - Example
CREATE SET TABLE Partsupp

CREATE SET TABLE Lineitem
(
(
PS_PARTKEY INTEGER NOT NULL,
L_ORDERKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL
L_LINENUMBER INTEGER NOT NULL,
PS_COMMENT VARCHAR(199)
L_QUANTITY DECIMAL(15,2) NOT NULL,
)
)
PRIMARY INDEX ( PS_PARTKEY );
PRIMARY INDEX ( L_ORDERKEY );
SELECT L_Suppkey, L_Quantity

FROM Lineitem, Partsupp
WHERE L_Suppkey = Ps_Suppkey;
Case 3 Explain Output
4) We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from TPCH.partsupp by way of
an all-rows scan with no residual conditions into Spool 2
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 2 by row hash. The size of
Spool 2 is estimated with low confidence to be 31,938 rows.
The estimated time for this step is 0.79 seconds.
2) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 3 by row hash. The result
spool file will not be cached in memory. The size of Spool 3
is estimated with low confidence to be 240,480 rows. The
Case 3 Explain Output Cont

5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use). Spool
2 and Spool 3 are joined using a merge join, with a join condition
of ("L_SUPPKEY = PS_SUPPKEY"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with no confidence to be 15,661,999 rows. The estimated
time for this step is 1 minute and 1 second.
statement 1. The total estimated time is 1 minute and 7 seconds.
Join Strategies
Nested Join
Merge Join
Product Join
Nested Join
Optimizer choose this join strategy when
SELECT ...
FROM Table_1, Table_2
WHERE Table_1.Col1 = Table_2.<Any Index>
AND Table_1.<Unique Index> = <value>;
Example:
SELECT E.Name, D.DeptName
FROM Employee E, Department D
WHERE E.DeptNo = D.DeptNo
AND E.Name = 'Sandy M';
Nested Join Explain Output
1) First, we do a two-AMP JOIN step from PERSONNEL.E by way of unique

index # 4 "PERSONNEL.E.Name = 'Sandy M'" with a residual condition
of ("(PERSONNEL.E.DeptNo >= 100) AND ((PERSONNEL.E.DeptNo <= 900)
AND (NOT (PERSONNEL.E.DeptNo IS NULL )))"), which is joined to
PERSONNEL.D by way of the unique primary index "PERSONNEL.D.DeptNo
= PERSONNEL.E.DeptNo". PERSONNEL.E and PERSONNEL.D are joined
using a nested join, with a join condition of ("(1=1)"). The
result goes into Spool 1 (one-amp), which is built locally on the
AMPs. The size of Spool 1 is estimated with high confidence to be
1 row. The estimated time for this step is 0.04 seconds.
Merge Join
Commonly done when the join conditions are
based on equality.
Steps
Identify the smaller table.
Put the qualifying rows from one or both table into
spool.
Move the spool rows to the AMPs based on join column
hash (if required).
Sort the spool rows by join column hash value (if
necessary).
Compare those rows with matching join column hash
values.
Example : Case 1, Case 2 and Case 3 as described.
Product Join
Most general for of join. A X B.
Optimizer choose product join usually in
following conditions
WHERE clause is missing.
Join condition is not based on equality
condition.
Steps:
Identify the smaller table
Duplicate it in spool on all AMPs.
Join each spool row of the smaller table to
every row of the larger table.
Exclusion Merge Join

(
EmpNo SMALLINT
Name VARCHAR(12),
DeptNo SMALLINT NOT NULL,
Salary DECIMAL(8,2)
DOB DATE,
)
CREATE SET TABLE Department

(
DeptNo SMALLINT NOT NULL,
DeptName VARCHAR(14),
Loc CHAR(3),
MgrNo SMALLINT
)
UNIQUE PRIMARY INDEX ( DeptNo );
SELECT EmpNo, Name, Salary

FROM Employee
WHERE DeptNo NOT IN ( SELECT DeptNo FROM Department);
Exclusion Merge Join Explain

Output
4) We do an all-AMPs RETRIEVE step from PERSONNEL.employee by way of
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with high confidence to be 21 rows. The estimated time
for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to PERSONNEL.department. Spool 2
and PERSONNEL.department are joined using an exclusion merge join,
with a join condition of ("DeptNo = PERSONNEL.department.DeptNo").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with index join
confidence to be 21 rows. The estimated time for this step is
0.03 seconds.
Exclusion Merge Join Example

AMP 2
AMP 1
AMP 3
AMP 4
1005
300
1000
300
1003
200
1001
400
1009
100
1002
100
1004
300
1007
400
1014
500
1006
900
1010
200
1008
500
1019
100
1011
200
1013
400
1012
300
1017
300
1015
200
1018
400
1016
700
Department
200
Result
400
100
600
500
300
1001
400
1009
100
1005
300
1010
200
1007
400
1019
100
1017
300
1011
200
1013
400
1002
100
1000
300
1015
200
1018
400
1014
500
1004
300
1003
200
1018
500
1012
300
1006
900
1016
700
400
100
600
500
300
200
Employee
Join Indexes
Join Indexes
Join Index is an index structure that stores and
maintains results from joining two or more tables.
Optimizer resolves the query using join index,
rather than performing joins every time the query
is executed.
Teradata supports a variety of Join Indexes such
as:
Multi-table Join Indexes
Single-table join Indexes
Aggregate Join Indexes
Join Index Example

CREATE JOIN INDEX EmpDept AS
SELECT (e.DeptNo, d.DeptName) ,
(E.Name, E.Salary)
FROM Employee e INNER JOIN
Department d
ON
e.DeptNo = d.DeptNo;
SELECT e.Name, d.DeptName, e.Salary

Department d
ON e.DeptNo = d.DeptNo
ORDER BY d.DeptName;
Does the index cover the query ?

1) First, we lock a distinct PERSONNEL."pseudo table" for read on a
RowHash to prevent global deadlock for PERSONNEL.EmpDept.
2) Next, we lock PERSONNEL.EmpDept for read.
3) We do an all-AMPs RETRIEVE step from PERSONNEL.EmpDept by way of
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with low confidence to be 6 rows. The
Join Index Example

(E.Name, E.Salary)
Department d
ON
SELECT e.Name, d.DeptName, e.Salary, e.YrsExp

Department d
ORDER BY e.DeptName;
Join Index Example

(E.Name, E.Salary)
FROM Employee e LEFT JOIN
Department d
ON
SELECT e.Name, d.DeptName, e.Salary,
Department d
SELECT e.Name, d.DeptName, e.Salary,

FROM Employee e LEFT JOIN
Department d
Does the index cover the query ?Does the index cover the quer
Note: A join index with outer join covers both inner join query as well as outer join query.
Join Index Example

SELECT e.Name, d.DeptName, e.Salary , c.Hou
SELECT (e.EmpNo, e.DeptNo ,d.DeptNameFROM
),
Employee e,
(e.Name ,e.Salary )
Department d,
FROM Employee e JOIN
Charges c
Department d
WHERE e.DeptNo = d.DeptNo
AND
e.EmpNo = c.EmpNo;

4) We do an all-AMPs RETRIEVE step from PERSONNEL.c by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
estimated with low confidence to be 16 rows. The estimated time
RowHash match scan, which is joined to PERSONNEL.EMPDEPT. Spool 2
and PERSONNEL.EMPDEPT are joined using a merge join, with a join
condition of ("PERSONNEL.EMPDEPT.EmpNo = EmpNo"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with index join confidence to be
32 rows. The estimated time for this step is 0.04 seconds.
Secondary Index On Top Of Join

Index
Further performance improvement can be achieved by defining a
Secondary Index on top of join index to avoid full table scan of the
join index table.
SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice
FROM Customer JOIN
Ordertbl
ON
C_Custkey = O_Custkey
WHERE O_Orderdate BETWEEN 950101 AND 970101;
CREATE JOIN INDEX OrderByCust AS
SELECT (C_Name ,C_Address ),
(O_Orderdate ,O_TotalPrice)
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
CREATE INDEX(O_Orderdate) ORDER BY VALUES ON OrderByCustomer;
COLLECT STATISTICS ON OrderByCust INDEX(O_Orderdate);
Sparse Indexes
Sparse Index can be used to index a portion of a table.
SELECT (C_Custkey, C_Name ,C_Address, O_Orderdate),
(O_TotalPrice)
WHERE O_Orderdate > DATE 2004-01-01'
PRIMARY INDEX(C_Custkey);
SELECT C_Name ,C_Address,
O_Orderdate ,O_TotalPrice
WHERE O_Orderdate BETWEEN DATE '2004-06-01' AND DATE '2004-12-31';
Shorter index table enables faster full table scan.
Join Index - Compressed Form

Uncompressed Form

SELECT C_Custkey, C_Name ,C_Address, O_Orderdate, O_TotalP
SELECT SUM(CurrentPerm) FROM DBC.TableSize
WHERE DataBaseName = 'tpch' AND
TableName = 'OrderByCust';
Sum(CurrentPerm)
------------------------4,791,296
Join Index - Compressed Form

Compressed Form

SELECT (C_Custkey, C_Name ,C_Address), (O_Orderdate, O_TotalPri
SELECT SUM(CurrentPerm) FROM DBC.TableSize

WHERE DataBaseName = 'tpch' AND
TableName = 'OrderByCust';
Sum(CurrentPerm)
------------------------1,209,856
Single-Table Join Indexes

Single-table join indexes help in performance improvement in certain
kind of joins by partially covering the query.
SELECT d.DeptName, e.Name
FROM Employee e,Department d
WHERE e.DeptNo = d.DeptNo;
4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an
5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash
match scan with no residual conditions, which is joined to Spool 2
(Last Use). PERSONNEL.d and Spool 2 are joined using a merge join,
with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The
Cont..

SELECT Empno, Deptno, Name
FROME Employee
PRIMARY INDEX(DeptNo);
As the join index covers the employee part of the query, Optimizer join
Department table with the join index instead of Employee table itself.

match scan with no residual conditions, which is joined to
PERSONNEL.EmpDept. PERSONNEL.d and PERSONNEL.EmpDept are joined
using a merge join, with a join condition of (
"PERSONNEL.EmpDept.DeptNo = PERSONNEL.d.DeptNo"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with low confidence to be 18 rows.

SELECT d.DeptName, e.Name, e.Salary
FROM Employee e,Department d
WHERE e.DeptNo = d.DeptNo;
4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an
(Last Use). PERSONNEL.d and Spool 2 are joined using a merge join,
with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The
Note : Optimizer went for full table scan of the Employee table instead of using Jo
because the existing join index EmpDept does not fully cover the Employee part o
ROWID can be included in the join index definition to enable ro

join for partially covered queries.
CREATE JOIN INDEX JIorders AS
SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE, ROWID)
FROM Ordertbl
PRIMARY INDEX (O_CUSTKEY);
SELECT C_Name, C_Address, O_Orderdate,
O_TotalPrice, O_Orderstatus
FROM Customer, Ordertbl
WHERE C_Custkey = O_Custkey
AND C_Nationkey = 10;
Cont

5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
match scan with a condition of ("TPCH.Customer.C_NATIONKEY = 10"),
which is joined to TPCH.JIorders. TPCH.Customer and TPCH.JIorders
are joined using a merge join, with a join condition of (
"TPCH.Customer.C_CUSTKEY = TPCH.JIorders.O_CUSTKEY"). The input
table TPCH.JIorders will not be cached in memory. The result goes
into Spool 2 (all_amps), which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 2 by row hash. The
size of Spool 2 is estimated with no confidence to be 9,000 rows.
6) We do an all-AMPs JOIN step from TPCH.Ordertbl by way of a RowHash
(Last Use). TPCH.Ordertbl and Spool 2 are joined using a merge
join, with a join condition of ("Field_2 = TPCH.Ordertbl.RowID").
The input table TPCH.Ordertbl will not be cached in memory, but it
is eligible for synchronized scanning. The result goes into Spool
1 (group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 9,000 rows. The
Unique Primary Index column can also be used in place of ROW

as shown in the example below.
CREATE JOIN INDEX JIorders AS
SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE,
O_ORDERKEY)
FROM Ordertbl
UNIQUE PRIMARY INDEX (O_CUSTKEY);
SELECT C_Name, C_Address, O_Orderdate,
O_TotalPrice, O_Orderstatus
AND C_Nationkey = 10;
Aggregate Join Index

Aggregate Join Index are used to store pre-calculated summary data.
SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;
3) We do an all-AMPs SUM step to aggregate from TPCH.lineitem by way
of an all-rows scan with no residual conditions, and the grouping
identifier in field 1. Aggregate Intermediate Results are
computed globally, then placed in Spool 3. The input table will
not be cached in memory, but it is eligible for synchronized
scanning. The aggregate spool file will not be cached in memory.
The size of Spool 3 is estimated with low confidence to be 238,809
rows. The estimated time for this step is 10.04 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be 238,809

CREATE JOIN INDEX AS

3) We do an all-AMPs RETRIEVE step from TPCH.JIAggLineItem by way of

(group_amps), which is built locally on the AMPs. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with high confidence to be
238,809 rows. The estimated time for this step is 0.19 seconds.

CREATE JOIN INDEX AS
Does the
SELECT L_ShipDate, SUM(L_Quantity) AS SumQty
index
FROM Lineitem GROUP BY 1;
cover
the
3) We do an all-AMPs SUM step to aggregate from TPCH.JIAggLineItem by
query ?
way of an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results
are computed globally, then placed in Spool 3. The input table
scanning. The size of Spool 3 is estimated with no confidence to
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 491 rows. The estimated time for this step is 0.04 seconds.
Hash Indexes
Hash Indexes
Index file structures that share properties with single table join indexes and
secondary indexes.
Hash indexes are like single table join indexes but they automatically carry bas
table primary index value.
CREATE HASH INDEX HIOrder

(O_CustKey ,
O_OrderDate,
O_TotalPrice)
ON OrderTbl
BY (O_CustKey)
ORDER BY (O_CustKey)
SELECT O_CustKey,
O_Orderdate,
O_Totalprice,
FROM OrderTbl
WHERE O_CustKey > 12;
SELECT O_CustKey,
O_Orderdate,
O_Totalprice,
O_Orderstatus
FROM OrderTbl
WHERE O_CustKey > 12;
Hash Indexes
CREATE HASH INDEX HIOrder
(O_CustKey ,
O_OrderDate,
O_TotalPrice)
ON OrderTbl
BY (O_CustKey)
ORDER BY (O_CustKey)
SELECT C_Name, C_Address,
O_Orderdate, O_TotalPrice, O_Orderstatus
AND O_Custkey < 10;
Explain
Hash Indexes
5) We do an all-AMPs JOIN step from TPCH.Customer by way of an
all-rows scan with a condition of ("TPCH.Customer.C_CUSTKEY < 10"),
which is joined to TPCH.HIOrder with a range constraint of (
"TPCH.HIOrder.O_CUSTKEY <= 9") with an additional condition of (
"TPCH.HIOrder.O_CUSTKEY <= 9"). TPCH.Customer and TPCH.HIOrder
are joined using a product join, with a join condition of (
"TPCH.Customer.C_CUSTKEY = TPCH.HIOrder.O_CUSTKEY"). The input
table TPCH.HIOrder will not be cached in memory, but it is
eligible for synchronized scanning. The result goes into Spool 2
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
RowHash match scan, which is joined to TPCH.Ordertbl. Spool 2 and
TPCH.Ordertbl are joined using a merge join, with a join condition
of ("(Field_3 = (SUBSTRING((TPCH.Ordertbl.RowID) FROM 7 FOR 4 )))
AND (Field_2 =)"). The input table TPCH.Ordertbl will not be
cached in memory. The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is
Hash Indexes
Hash Index ( also Join Index ) can also be used to avoid row
redistribution for join preparation.
SELECT C_Name, C_Address,

O_Orderdate, O_TotalPrice
WHERE C_Custkey = O_Custkey;
Hash Indexes
Without Hash Index defined:
4) We do an all-AMPs RETRIEVE step from TPCH.Ordertbl by way of an
SORT to order Spool 2 by row hash. The result spool file will not
be cached in memory. The size of Spool 2 is estimated with high
confidence to be 60,000 rows. The estimated time for this step is
1.20 seconds.
(Last Use). TPCH.Customer and Spool 2 are joined using a merge
join, with a join condition of ("TPCH.Customer.C_CUSTKEY =
O_CUSTKEY"). The result goes into Spool 1 (group_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated with
low confidence to be 60,000 rows. The estimated time for this
step is 0.48 seconds.
The total estimated time is 1.68 seconds.
Hash Indexes
With Hash Index defined:
CREATE HASH INDEX HIOrder(O_Custkey,
O_TotalPrice,
O_OrderDate)
ON OrderTbl BY (O_CustKey) ORDER BY HASH (O_CustKey);
match scan with no residual conditions, which is joined to
TPCH.HIOrder. TPCH.Customer and TPCH.HIOrder are joined using a
merge join, with a join condition of ("TPCH.Customer.C_CUSTKEY =
TPCH.HIOrder.O_CUSTKEY"). The input table TPCH.HIOrder will not
be cached in memory. The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is
No redistribution,
estimated with low confidence to be 60,000 rows. The estimated
No sorting.
Total join time
significantly
reduced
The total estimated time is 0.52 seconds.
Partitioned Primary Index
Partitioned Primary Indexes

Partitioned Primary Index (PPI) allows a
class of queries to access a portion of a
large table instead of the whole table.
PPI table rows are assigned to user defined
partitions in each AMP enabling enhanced
performance for range queries that are
predicated on primary index values.
PPIs increase query efficiency by avoiding
full table scan without the overhead and
maintenance cost of secondary indexes.
NON-PPI Table
Records are sorted in row hash (not shown) sequence within the AMP.
101 10
120
30 01/10
131 20
115
01/10
30 01/02
114 40
119
01/12
30 01/20
107 20
135
01/18
30 01/10
129 10
102
01/02
01/20
30 01/18
110 10
125
30 01/12
122 20
132
01/10
10 01/18
113 40
106
01/18
30 01/20
123 40
118
01/20
40 01/02
138 30
128
01/10
01/02
40 01/12
140 10
121
01/10
40 01/12
116 30
01/18
101 10
130
20 01/10
139 40
112 20 01/10
134
103 40
105 10
133
10 01/20
126 20
127
01/02
30 01/18
136 20
109
01/18
01/10
30 01/02
01/02
01/18
30 01/20
01/02
104 20 01/12
124 20
117
30 01/18
137 20
108
01/20
01/02
10 01/12
SELECT * FROM Employee WHERE EmpId = 114;

SELECT * FROM Employee WHERE EmpId BETWEEN DATE 2004-01-12
AND DATE 2004-01-18;
PPI Table
Records are sorted in row hash (not shown) sequence in each partition within t
101 10
01/02
113 40
01/02
01/02
101 10
01/02
30 01/02
105 10
01/02
137 20
01/02
135
30 01/02
132
115
30 01/10
110 10
01/10
140 10
01/10
123 40
01/10
136 20
01/10
106
40 01/12
112 20
01/12
125
30 01/12
121
30 01/18
118
10 01/18
116 30
01/18
138
30 01/18
127
30 01/20
128
30 01/20
103 40
107 20
120
30 01/10
129 10
119
131 20
102
01/10
114 40
01/20
40 01/02
126 20
122 20
01/20
109
133
130
20 01/10
01/10
104
20 01/12
40 01/12
108
10 01/12
01/18
30 01/18
01/18
10 01/20
139 40
01/18
117
30 01/18
134
30 01/20
124 20
01/20
SELECT * FROM Employee WHERE EmpId = 114;

SELECT * FROM Employee WHERE EmpId BETWEEN DATE 2004-01-12
AND DATE 2004-01-18;
PPI Example
CREATE TABLE Lineitem (
CREATE TABLE LineitemPPI (
L_ORDERKEY INTEGER,
L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_SUPPKEY INTEGER,
L_LINENUMBER INTEGER ,
L_QUANTITY DECIMAL(15,2),
L_EXTENDEDPRICE DECIMAL(15,2),
L_DISCOUNT DECIMAL(15,2),
L_TAX DECIMAL(15,2),
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE DATE,
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT CHAR(25),
L_SHIPMODE CHAR(10),
L_COMMENT VARCHAR(44)
)
)
PRIMARY INDEX (L_ORDERKEY);
PRIMARY INDEX (L_ORDERKEY)
PARTITION BY RANGE_N(L_ShipDate BETWEEN
DATE '1992-01-03' AND DATE '1998-11-30'
EACH INTERVAL '1' MONTH );
PPI Example
SELECT MIN(L_Shipdate), MAX(L_Shipdate) FROM Lineitem;
Minimum(L_SHIPDATE)
------------------------------1992-01-03
Maximum(L_SHIPDATE)
-------------------------------1998-11-30
NON-PPI Table:
EXPLAIN SELECT * FROM Lineitem WHERE l_Shipdate > DATE '1997-12-31';
3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an

all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE >
DATE '1997-12-30'") into Spool 1 (group_amps), which is built
locally on the AMPs. The input table will not be cached in memory,
but it is eligible for synchronized scanning. The result spool
file will not be cached in memory. The size of Spool 1 is
estimated with high confidence to be 27,783 rows. The estimated
PPI Example
PPI Table:
EXPLAIN SELECT * FROM LineitemPPI WHERE l_Shipdate > DATE '1997-12-31';
3) We do an all-AMPs RETRIEVE step from 12 partitions of

TPCH.lineitemppi with a condition of (
"TPCH.lineitemppi.L_SHIPDATE > DATE '1997-12-30'") into Spool
(group_amps), which is built locally on the AMPs. The input table
scanning. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with high confidence to be 27,73
Only 12 partitions are retrieved instead of a full table scan
PPI Example
NON-PPI Table:
EXPLAIN SELECT * FROM Lineitem WHERE L_Orderkey = 240000

1) First, we do a single-AMP RETRIEVE step from TPCH.lineitem by
way of the primary index "TPCH.lineitem.L_ORDERKEY = 240000"
with no residual conditions into Spool 1 (one-amp), which is built
locally on that AMP. The input table will not be cached in memory,
but it is eligible for synchronized scanning. The size of Spool 1 is
Rows are stored in rowhash order within a AMP. Search is very

Only one block read.
PPI Example
PPI Table:
EXPLAIN SELECT * FROM LineitemPPI WHERE L_Orderkey = 240000
1) First, we do a single-AMP RETRIEVE step from all partitions of

TPCH.lineitemppi by way of the primary index
"TPCH.lineitemppi.L_ORDERKEY = 240000" with a residual
condition of ("TPCH.lineitemppi.L_ORDERKEY = 240000")
into Spool 1 (one-amp), which is built locally on that AMP.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The size of Spool 1 is estimated with
high confidence to be 5 rows. The estimated time for this step is
0.67 seconds.
All partitions are to be scanned for comparison.
PPI Delete Performance

CREATE TABLE Lineitem (
L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
)
PRIMARY INDEX (L_ORDERKEY);

L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
)
PARTITION BY RANGE_N(L_ShipDate BETWEEN
DATE '1992-01-03' AND DATE '1998-11-30'
PPI Delete Performance

DELETE FROM Lineitem
WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';
3) We do an all-AMPs DELETE from TPCH.Lineitem by way of an all-rows

scan with a condition of ("(TPCH.Lineitem.L_SHIPDATE >= DATE
'1996-12-31') AND (TPCH.Lineitem.L_SHIPDATE <= DATE '1997-12-31')").
DELETE FROM LineitemPPI

WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';
3) We do an all-AMPs DELETE from 2 partitions of TPCH.LineitemPPI

with a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE
'1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE
'1997-12-31')").
4) We do an all-AMPs DELETE of 11 partitions of TPCH.LineitemPPI with
a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE
'1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE
'1997-12-31')").
PPI Joins
L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
)
PARTITION BY RANGE_N(L_ShipDate
BETWEEN DATE '1992-01-03'
AND
DATE '1998-11-30'
CREATE TABLE Shipping (

S_ORDERKEY INTEGER,
S_SHIPDATE DATE,
S_RECEIPTDATE DATE,
S_SHIPINSTRUCT CHAR(25),
S_SHIPMODE CHAR(10)
)
PRIMARY INDEX (S_ORDERKEY)
PARTITION BY RANGE_N(S_ShipDate BETWEEN
DATE '1992-01-03' AND DATE '1998-11-30'
SELECT L_Orderkey, L_Shipdate,S_Shipmode

FROM LineitemPPI
INNER JOIN Shipping
ON L_Orderkey = S_Orderkey;
PPI Joins
4) We do an all-AMPs JOIN step from all partitions of TPCH.shipping

by way of a RowHash match scan with a condition of ("NOT
(TPCH.shipping.S_SHIPDATE IS NULL)"), which is joined to
TPCH.lineitemppi with a condition of ("NOT
(TPCH.lineitemppi.L_SHIPDATE IS NULL)"). TPCH.shipping and
TPCH.lineitemppi are joined using a rowkey-based merge join, with
a join condition of ("(TPCH.lineitemppi.L_SHIPDATE =
TPCH.shipping.S_SHIPDATE) AND (TPCH.lineitemppi.L_ORDERKEY =
TPCH.shipping.S_ORDERKEY)"). The input tables TPCH.shipping and
TPCH.lineitemppi will not be cached in memory, but TPCH.shipping
is eligible for synchronized scanning. The result goes into Spool
1 (group_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with low confidence to be 401,785 rows. The estimated
Collect Statistics
Collect Statistics
Optimizer must be provided with correct
demographic information on your data to
choose optimal plan to execute your query.
Statistics tells the optimizer
How many rows per value are there.
How many distinct values are there in the
column.
If Collected Statistics are not available,

optimizer does random AMP sampling to
derive demographics.
Collect Statistics
Collected statistics are not automatically
updated by Teradata DBS.
User must refresh statistics when 5% to
10% change on the table rows.
Collect Statistics on
All non-unique Indexes of a table or a join
index.
Any column used in WHERE clause for set
selection or join constraint.
Collect Statistics
COLLECT STATISTICS ON Lineitem COLUMN L_Orderkey;
COLLECT STATISTICS ON Lineitem COLUMN L_Shipdate;
COLLECT STATISTICS ON Lineitem COLUMN (L_Orderkey, L_Shipdate);
HELP STATISTICS Lineitem;
Date
Time
Unique Values
Column Names
------------- ------------ -------------------- -----------------------------------04/10/05 11:04:48
60,000
L_ORDERKEY
04/10/05 09:57:52
2,524
L_SHIPDATE
04/10/05 11:49:47 236,352
L_ORDERKEY,L_SHIPDATE
Data Compression
Data Compression
Makes row sizes smaller
Allows more rows per block
Reduces the number of I/Os
Implemented in column level
Compression is a I/O-intensive workload.
Improvement gained through the more-rows-per-block
concept is significant in the Full Table Scan operations.
Compression is transparent to applications.
Data Compression
Single-Value Compression
V2R4 and prior
CREATE TABLE Employee
(EmployeeNo INTEGER
Jobtitle CHARACTER(30)
COMPRESS (cashier)
);
Nulls and
cashiers will be compressed.
Multi-Value Compression
V2R5 and later
CREATE TABLE Employee
(EmployeeNo INTEGER
Jobtitle CHARACTER(30)
COMPRESS (cashier,
manager,
programmer)
...
);
Cashiers,
managers,
programmers
will be compressed including nulls.
255 distinct values for an individual
column can be compressed.
Data Compression Implementation
The following graphics shows how Data Compression is implemented in

CREATE TABLE CompressExample (
StreetAddress VARCHAR(40),
City CHARACTER(20)
Field:
COMPRESS (New York,
StreetAddres
Los Angeles,
s
Chicago)
VARCHAR(40)
NOT NULL,
StateCode CHARACTER(2));
00
Actual Data Rows
Table Header
Field: City
CAHR(20)
01 Chicago
10 Los Angeles
11 New York
130 Sutter
San
Francisco
01
St.
133 Wacker
Drive.
11
5 Times Sq.
NY
01
900 North Michigan Av.
11
135 East 57th
00
1525 Howe St.
10
304 S.
Broadway
Field:
StateCod
e Char(2)
CA
IL
IL
NY
Racine
CA
WI
Multi Value Compression &

VARCHAR
VARCHAR consumes two extra bytes for each value whereas
compression consumes CPU resource to decode compress
values.
The data demographics determine whether variable length
character data type or fixed length plus compression is
more efficient.
VARCHAR is better when the difference of max and average
field length is high and a low percentage of fields are
compressible.
Compression is better when the difference of max and
average field length is low and a high percentage of fields
are compressible.
If no clear pictures about data demographics are available,
use VARCHAR as it is less CPU intensive.
Query Management
Priority Scheduler
A DBA may want to:
Configure the system to execute queries at a higher
priority submitted by Sales Managers.
Or
Configure the system to execute queries submitted by Develop

group at a lower priority during 8:00 AM and 3:00 PM and exe
at medium priority during 3:00 PM and 8:00 AM.
Or
Lower the priority of a job if it takes more than one hour to co
Priority Scheduler
Can be used to control resources allocated
to users.
Administrator can specify performance
group while creating the user.
It manages resource distribution to
improve performance of one application at
the expense of other.
Priority Scheduler Components

Resource Partition
Performance
Groups
RP#
Performance
Periods
8am5pm 5pm-9pm 9pm-8am
AG1
AG2
AG3
Allocation
Groups
AG1
5
AG2
10
8pm-8am
AG4
AG3
20
8am-8pm
AG3
AG4
40
Priority Scheduler Components

Resource Partition
High level Resource
Partitioning
Default is Partition 0
Performance Group
Provides relative priority with
in the Resource Partition
Can be specified in the
Account String in Create User
statement.
Can be specified in user Logon
String ($M$, $DEV$ etc).
Performance Period
Controls the scheduling
policy at that point in time.
Links a PG to an Allocation
Groups weight and policy
Allocation Group
Defines a method for
disbursing resources
among sessions active
within that allocation
group
Carries the weight.
Defines a scheduling policy
Example 1 Percentage of
Resource Allocation
User WHDev with performance group $L$ logged on to the system at
9:30 PM.
What is the percentage share of system resources the user WHDev
will get ?
Sum of the weights = 5 + 10 + 20 + 40 = 75

At 9:30 PM performance group L will be
assigned to allocation group AG3.
So % of resource allocation = 20/75 = 26%
Priority
Example 2 - Automatic Change

in Priority Based CPU usage
20
10
5
0
1000
2000
3000
Time
Performance Period 1
Usage 3600 Seconds
Allocation Group AG11
Performance Period 2
Usage 0 Seconds
Allocation Group AGDEF
Allocation Grp=AG11
Weight=40
Allocation Grp=AGDEF
Weight=5
Teradata Dynamic Query

Manager
A DBA may want to:
Prevent all queries that are estimated to return more than 100
rows from running between the hours of 8:00 a.m. and 1:00 p.
on Fridays.
Or
Prevent all queries from Testing group that are estimated to t

more than 3 minutes running between the hours of 8:00 a.m. a
3.00 p.m. on Monday.
Or
Schedule a request to run on every Friday at 8.00 pm.
Teradata Dynamic Query

Manager
Teradata Dynamic Query Manager (TDQM) is product
that enables you to effectively manage the access to
and utilization of a Teradata database system.
Managing the database system increases the workload
capacity and efficiency of database usage.
TDQM addresses the key problems of database system
overload and network saturation that result from large
number of clients accessing the Teradata system.
Two main functionalities of TDQM are:
Limiting the execution of some queries on the Teradata
database according to rules Query Management.
Scheduling SQL request for batch execution Scheduled
Requests.
TDQM Architecture
Query Management
All Client
systems
accessing
Teradata.
TDQM
Administrat
or
TDQM
Partition
TDQM
Metadat
a
Scheduled Requests
Scheduled
Requests
Client
User Data
Scheduled
Request
Server
Teradata
RDBMS
Teradata Dual Active Solution

Provides support for unplanned down time.
Eliminates the need for planned down
time.
Provides additional processing power to
smooth out peak workload on the primary
system.
100% data replication as well as only
mission-critical data replication possible.
Teradata Dual Active Solution

Architecture
Primary System
Users/
Applications
Data
Synchronization
Operation Control
Teradata
Query Director
Users/
Applications
Users/
Applications
Backup System
Teradata Query Director

Designed to intelligently route queries
based on customer-established rules.
Helps to share workload between the
system.
Provide failover capability.
Questions ?

Advanced Teradata Concepts

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Advanced Teradata Concepts

Hochgeladen von

Copyright:

Verfügbare Formate

Advanced

Last Updated : 21st Dec 2004

Primary and Secondary Indexes

Primary Index Choice Criteria

UPI and NUPI

I/O if all duplicate rows fit in single block.

UPI and NUPI (cont.)

NUSI are implemented on a AMP local

Secondary Index Considerations

NUSI Bit Mapping

Row Access Methods

Value Ordered NUSI

Value Ordered NUSI (Cont.)

3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an

Optimizer did not choose Secondary Index. Why ?

Value Ordered NUSI (Cont.)

3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of a

Case 1- PI of both the tables

CREATE SET TABLE LOCATION

SELECT E.EmpNo, E.Name, L.Loc

Case 1 - Explain Output

Case 2 - PI of one of the tables

CREATE SET TABLE Department

SELECT E.EmpNo, E.Name, D.DeptName

Case 2 Explain Output

4) We do an all-AMPs RETRIEVE step from PERSONNEL.EMPLOYEE by way of

Case 3 - not a PI of either of the table

CREATE SET TABLE Partsupp

SELECT L_Suppkey, L_Quantity

Case 3 Explain Output

4) We execute the following steps in parallel.

Case 3 Explain Output Cont

Nested Join Explain Output

1) First, we do a two-AMP JOIN step from PERSONNEL.E by way of unique

Exclusion Merge Join

CREATE SET TABLE Department

SELECT EmpNo, Name, Salary

Exclusion Merge Join Explain

Exclusion Merge Join Example

Join Index Example

SELECT e.Name, d.DeptName, e.Salary

Does the index cover the query ?

Join Index Example

SELECT e.Name, d.DeptName, e.Salary, e.YrsExp

Does the index cover the query ?

Join Index Example

SELECT e.Name, d.DeptName, e.Salary,

Join Index Example

CREATE JOIN INDEX EmpDept AS

Does the index cover the query ?

Secondary Index On Top Of Join

Shorter index table enables faster full table scan.

Join Index - Compressed Form

CREATE JOIN INDEX OrderByCust AS

Join Index - Compressed Form

CREATE JOIN INDEX OrderByCust AS

SELECT SUM(CurrentPerm) FROM DBC.TableSize

Single-Table Join Indexes

Single-Table Join Indexes

4) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash

Single-Table Join Indexes

Single-Table Join Indexes

ROWID can be included in the join index definition to enable ro