Sie sind auf Seite 1von 99

Advanced

Teradata
Concepts

Last Updated : 21st Dec 2004

Center of Excellence
Data Warehousing

Topics
Primary and Secondary Indexes
Join Processing
Join Indexes
Hash Indexes
Partitioned Primary Indexes
Collect Statistics
Priority Scheduler
Teradata Dual Active Server

Primary and Secondary Indexes

Indexes
Teradata provides numerous indexing
options that can improve query
performance for different types of queries
and workloads. Following kinds of indexes
are available:
Primary Index
Secondary Indexes
Join Indexes
Hash Indexes
Partitioned Primary Indexes.

Primary Indexes
In Teradata, Primary Index is a mechanism
to assign and store a data row in an AMP.
Since primary index is used to store data
rows, retrieving data using primary index is
very efficient.
Primary Index can be Unique or Non-Unique.
Choosing Primary Index is critical as it
affects the data distribution across the
processing units (AMPs) and hence affects
the performance.

Primary Index Choice Criteria


Access Demographics Choose the column
most frequently used for access to
maximize the number of one AMP
operations.
Distribution Demographics Better
distribution optimizes parallel processing.
Volatility Changing PI may cause the row
itself to be moved to another AMP. Stable
PI reduces data movement overhead.

UPI and NUPI


UPI
Best distribution due to unique value.
One AMP operation and uses only one I/O.
Best performance.

NUPI
Good distribution for near unique values.
Duplicate PI rows go to same block. No extra

I/O if all duplicate rows fit in single block.


Duplicate row check required if there is no
USI defined.
Multiple I/Os required if rows do not fit in a
single data block.

UPI and NUPI (cont.)


Highly non-unique values cause skewed distribution.
Highly non-unique values cause extra overhead in
duplicate row check.
Multi-Column PI gives better distribution.
But as the number of column increases the index
becomes less usable.
Partial values can not be used for PI access.
Do not include a column for index selection that
does not improve the selectivity of the index.

Secondary Indexes
Secondary Index values are stored in sub tables.
May be unique or non unique.
Teradata implements USI and NUSI differently.

Index Subtable
Sec. Index value

Hash Algorithm

SI Value

BT Row ID

SI value

BT Row ID

Base Table

Secondary Indexes
USI are hash distributed across all AMPs.
Sub table rows may reside in a AMP other than
the base table row.
USI access involved two-AMP operation.

NUSI are implemented on a AMP local


basis.
Sub table rows located in the same AMP of
base table rows.
NUSI access involved all-AMP operation.

Secondary Index Considerations


Need additional storage to hold sub-table.
Need additional I/O.
Choose columns for NUSI candidate only those
having frequent access.
If COLLECTed STATISTICS are not available
Teradata may not choose NUSI as the access
path.
Use EXPLAIN facility to see the plan chosen by
the optimizer.

NUSI Bit Mapping


Used when multiple NUSI are being used will AND
condition.
Identifies common Row Ids in the satisfied by the
query before retrieving the base table rows.
SI Value

Row Id

Indx1

Multiple-column
secondary indexes are less
usable. Define multiple
secondary indexes to allow
bit mapping.

Indx2

Row Access Methods


PI/NUPI

USI

NUSI

FTS

PI Value

USI Value

NUSI Value

Hashing Algorithm

Hashing Algorithm

Hashing Algorithm

Sub Table

Sub Table

Sub Table

Sub Table

Base Table

Base Table

Base Table

Base Table

Value

Hashing Algorithm

Value Ordered NUSI


NUSI sub-tables are local to the AMP
corresponding to its base table and, by
default, are sorted in row hash of the
secondary index column.
Value Ordered NUSI sub-tables are sorted by
secondary index column-value rather than its
row hash.
Value Ordered NUSI are efficient for
processing queries with range conditions and
inequality conditions on the secondary index
column.

Value Ordered NUSI (Cont.)


CREATE INDEX Indx_Shipdate(L_Shipdate) on LineItem;
SELECT * FROM LineItem WHERE L_Shipdate < DATE '1992-02-28';

3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an


all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE <
DATE '1992-02-28'") into Spool 1 (group_amps), which is built
locally on the AMPs. The input table will not be cached in memory,
but it is eligible for synchronized scanning. The size of Spool 1
is estimated with high confidence to be 1,764 rows. The estimated
time for this step is 2.58 seconds.

Optimizer did not choose Secondary Index. Why ?

Value Ordered NUSI (Cont.)


CREATE INDEX Indx_Shipdate(L_Shipdate) ORDER BY VALUES ON
Lineitem;
SELECT * FROM Lineitem WHERE L_Shipdate < DATE '1992-02-28';

3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of a


traversal of index # 4 with a range constraint of (
"TPCH.lineitem.Field_1035 < DATE '1992-02-28'") extracting row ids
only into Spool 2 (all_amps), which is built locally on the AMPs.
Then we do a SORT to order Spool 2 by row id eliminating duplicate
rows. The size of Spool 2 is estimated to be 1,764 rows. The
estimated time for this step is 0.04 seconds.
4) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of row
ids from Spool 2 (Last Use) with no residual conditions into Spool
1 (group_amps), which is built locally on the AMPs. The input
table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 1,764 rows. The estimated time for this step is
2.58 seconds.

Join Processing

Join Processing
Each AMP performs join processing in
parallel.
Optimizer chooses best join strategy based
on
Available indexes, and
Data Demographics (Collect
Statistics/Dynamic Sampling)
Rows must be on the same AMP for matching.
Teradata temporarily moves the rows to same
AMP if they are not in the same AMP for join.

Join Processing
General Join Scenarios:
Join column is the PI of both the tables.
Join column is PI of one of the tables.
Join column is not a PI of either of the table.

Case 1- PI of both the tables


Rows taking part in the join are already in
the same AMP.
No data movement is necessary.
This is the best-case scenario.

Case 1 - Example
CREATE SET TABLE EMPLOYEE
(
EmpNo SMALLINT
Name VARCHAR(12),
DeptNo SMALLINT,
JobTitle VARCHAR(12),
Salary DECIMAL(8,2)
DOB DATE,
)
UNIQUE PRIMARY INDEX ( EmpNo )

CREATE SET TABLE LOCATION


(
EmpNo INTEGER,
Loc VARCHAR(25)
)
PRIMARY INDEX ( EmpNo );

SELECT E.EmpNo, E.Name, L.Loc


FROM Employee E, Location L
WHERE E.EmpNo = L.EmpNo;

Case 1 - Explain Output


1) First, we lock a distinct PERSONNEL."pseudo table" for read on a
RowHash to prevent global deadlock for PERSONNEL.LOCATION.
2) Next, we lock a distinct PERSONNEL."pseudo table" for read on a
RowHash to prevent global deadlock for PERSONNEL.EMPLOYEE.
3) We lock PERSONNEL.LOCATION for read, and we lock
PERSONNEL.EMPLOYEE for read.
4) We do an all-AMPs JOIN step from PERSONNEL.EMPLOYEE by way of a
RowHash match scan with no residual conditions, which is joined to
PERSONNEL.LOCATION. PERSONNEL.EMPLOYEE and PERSONNEL.LOCATION are
joined using a merge join, with a join condition of (
"PERSONNEL.EMPLOYEE.EmpNo = PERSONNEL.LOCATION.EmpNo"). The
result goes into Spool 1 (group_amps), which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to
be 24 rows. The estimated time for this step is 0.04 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.04 seconds.

Case 2 - PI of one of the tables


One table has its rows on the target AMP.
Rows of the other table need to be
redistributed to their target AMPs by the
hash code of the join column value.
If the table is small, optimizer may choose
to duplicate the table on all AMPs

Case 2 Example
CREATE SET TABLE EMPLOYEE
(
EmpNo SMALLINT
Name VARCHAR(12),
DeptNo SMALLINT,
JobTitle VARCHAR(12),
Salary DECIMAL(8,2)
DOB DATE,
)
UNIQUE PRIMARY INDEX ( EmpNo )

CREATE SET TABLE Department


(
DeptNo SMALLINT,
DeptName VARCHAR(14),
Loc CHAR(3),
MgrNo SMALLINT
)
UNIQUE PRIMARY INDEX ( DeptNo );

SELECT E.EmpNo, E.Name, D.DeptName


FROM Employee E, Department D
WHERE E.Deptno = D.DeptNo;

Case 2 Explain Output

4) We do an all-AMPs RETRIEVE step from PERSONNEL.EMPLOYEE by way of


an all-rows scan with a condition of ("(PERSONNEL.EMPLOYEE.DeptNo
>= 100) AND ((PERSONNEL.EMPLOYEE.DeptNo <= 900) AND (NOT
(PERSONNEL.EMPLOYEE.DeptNo IS NULL )))") into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with no confidence to be 5 rows. The estimated time for
this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from PERSONNEL.Department by way of a
RowHash match scan with no residual conditions, which is joined to
Spool 2 (Last Use). PERSONNEL.Department and Spool 2 are joined
using a merge join, with a join condition of ("DeptNo =
PERSONNEL.Department.DeptNo"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 5 rows. The
estimated time for this step is 0.04 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.08 seconds.

Case 3 - not a PI of either of the table


Rows of both the tables need to be
redistributed to their target AMPs by the
hash code of the join column value.
Optimizer might choose to duplicate the
smaller table on all AMPs.
This join scenario involves maximum
number of data movement.

Case 3 - Example

CREATE SET TABLE Partsupp


CREATE SET TABLE Lineitem
(
(
PS_PARTKEY INTEGER NOT NULL,
L_ORDERKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL
L_LINENUMBER INTEGER NOT NULL,
PS_COMMENT VARCHAR(199)
L_QUANTITY DECIMAL(15,2) NOT NULL,
)
)
PRIMARY INDEX ( PS_PARTKEY );
PRIMARY INDEX ( L_ORDERKEY );

SELECT L_Suppkey, L_Quantity


FROM Lineitem, Partsupp
WHERE L_Suppkey = Ps_Suppkey;

Case 3 Explain Output

4) We execute the following steps in parallel.


1) We do an all-AMPs RETRIEVE step from TPCH.partsupp by way of
an all-rows scan with no residual conditions into Spool 2
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 2 by row hash. The size of
Spool 2 is estimated with low confidence to be 31,938 rows.
The estimated time for this step is 0.79 seconds.
2) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of
an all-rows scan with no residual conditions into Spool 3
(all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 3 by row hash. The result
spool file will not be cached in memory. The size of Spool 3
is estimated with low confidence to be 240,480 rows. The
estimated time for this step is 6.27 seconds.

Case 3 Explain Output Cont


5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use). Spool
2 and Spool 3 are joined using a merge join, with a join condition
of ("L_SUPPKEY = PS_SUPPKEY"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with no confidence to be 15,661,999 rows. The estimated
time for this step is 1 minute and 1 second.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 1 minute and 7 seconds.

Join Strategies
Nested Join
Merge Join
Product Join

Nested Join
Optimizer choose this join strategy when
SELECT ...
FROM Table_1, Table_2
WHERE Table_1.Col1 = Table_2.<Any Index>
AND Table_1.<Unique Index> = <value>;
Example:
SELECT E.Name, D.DeptName
FROM Employee E, Department D
WHERE E.DeptNo = D.DeptNo
AND E.Name = 'Sandy M';

Nested Join Explain Output

1) First, we do a two-AMP JOIN step from PERSONNEL.E by way of unique


index # 4 "PERSONNEL.E.Name = 'Sandy M'" with a residual condition
of ("(PERSONNEL.E.DeptNo >= 100) AND ((PERSONNEL.E.DeptNo <= 900)
AND (NOT (PERSONNEL.E.DeptNo IS NULL )))"), which is joined to
PERSONNEL.D by way of the unique primary index "PERSONNEL.D.DeptNo
= PERSONNEL.E.DeptNo". PERSONNEL.E and PERSONNEL.D are joined
using a nested join, with a join condition of ("(1=1)"). The
result goes into Spool 1 (one-amp), which is built locally on the
AMPs. The size of Spool 1 is estimated with high confidence to be
1 row. The estimated time for this step is 0.04 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.04 seconds.

Merge Join
Commonly done when the join conditions are
based on equality.
Steps
Identify the smaller table.
Put the qualifying rows from one or both table into
spool.
Move the spool rows to the AMPs based on join column
hash (if required).
Sort the spool rows by join column hash value (if
necessary).
Compare those rows with matching join column hash
values.
Example : Case 1, Case 2 and Case 3 as described.

Product Join
Most general for of join. A X B.
Optimizer choose product join usually in
following conditions
WHERE clause is missing.
Join condition is not based on equality
condition.

Steps:
Identify the smaller table
Duplicate it in spool on all AMPs.
Join each spool row of the smaller table to
every row of the larger table.

Exclusion Merge Join


CREATE SET TABLE EMPLOYEE
(
EmpNo SMALLINT
Name VARCHAR(12),
DeptNo SMALLINT NOT NULL,
JobTitle VARCHAR(12),
Salary DECIMAL(8,2)
DOB DATE,
)
UNIQUE PRIMARY INDEX ( EmpNo )

CREATE SET TABLE Department


(
DeptNo SMALLINT NOT NULL,
DeptName VARCHAR(14),
Loc CHAR(3),
MgrNo SMALLINT
)
UNIQUE PRIMARY INDEX ( DeptNo );

SELECT EmpNo, Name, Salary


FROM Employee
WHERE DeptNo NOT IN ( SELECT DeptNo FROM Department);

Exclusion Merge Join Explain


Output
4) We do an all-AMPs RETRIEVE step from PERSONNEL.employee by way of
an all-rows scan with no residual conditions into Spool 2
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with high confidence to be 21 rows. The estimated time
for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to PERSONNEL.department. Spool 2
and PERSONNEL.department are joined using an exclusion merge join,
with a join condition of ("DeptNo = PERSONNEL.department.DeptNo").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with index join
confidence to be 21 rows. The estimated time for this step is
0.03 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.06 seconds.

Exclusion Merge Join Example


AMP 2

AMP 1

AMP 3

AMP 4

1005

300

1000

300

1003

200

1001

400

1009

100

1002

100

1004

300

1007

400

1014

500

1006

900

1010

200

1008

500

1019

100

1011

200

1013

400

1012

300

1017

300

1015

200

1018

400

1016

700

Department

200

Result

400

100

600

500

300

1001

400

1009

100

1005

300

1010

200

1007

400

1019

100

1017

300

1011

200

1013

400

1002

100

1000

300

1015

200

1018

400

1014

500

1004

300

1003

200

1018

500

1012

300

1006

900

1016

700

400

100

600

500

300

200

Employee

Join Indexes

Join Indexes
Join Index is an index structure that stores and
maintains results from joining two or more tables.
Optimizer resolves the query using join index,
rather than performing joins every time the query
is executed.
Teradata supports a variety of Join Indexes such
as:
Multi-table Join Indexes
Single-table join Indexes
Aggregate Join Indexes

Join Index Example


CREATE JOIN INDEX EmpDept AS
SELECT (e.DeptNo, d.DeptName) ,
(E.Name, E.Salary)
FROM Employee e INNER JOIN
Department d
ON
e.DeptNo = d.DeptNo;

SELECT e.Name, d.DeptName, e.Salary


FROM Employee e INNER JOIN
Department d
ON e.DeptNo = d.DeptNo
ORDER BY d.DeptName;

Does the index cover the query ?


1) First, we lock a distinct PERSONNEL."pseudo table" for read on a
RowHash to prevent global deadlock for PERSONNEL.EmpDept.
2) Next, we lock PERSONNEL.EmpDept for read.
3) We do an all-AMPs RETRIEVE step from PERSONNEL.EmpDept by way of
an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with low confidence to be 6 rows. The
estimated time for this step is 0.03 seconds.

Join Index Example


CREATE JOIN INDEX EmpDept AS
SELECT (e.DeptNo, d.DeptName) ,
(E.Name, E.Salary)
FROM Employee e INNER JOIN
Department d
ON
e.DeptNo = d.DeptNo;

SELECT e.Name, d.DeptName, e.Salary, e.YrsExp


FROM Employee e INNER JOIN
Department d
ON e.DeptNo = d.DeptNo
ORDER BY e.DeptName;

Does the index cover the query ?

Join Index Example


CREATE JOIN INDEX EmpDept AS
SELECT (e.DeptNo, d.DeptName) ,
(E.Name, E.Salary)
FROM Employee e LEFT JOIN
Department d
ON
e.DeptNo = d.DeptNo;
SELECT e.Name, d.DeptName, e.Salary,
FROM Employee e INNER JOIN
Department d
ON e.DeptNo = d.DeptNo
ORDER BY e.DeptName;

SELECT e.Name, d.DeptName, e.Salary,


FROM Employee e LEFT JOIN
Department d
ON e.DeptNo = d.DeptNo
ORDER BY e.DeptName;

Does the index cover the query ?Does the index cover the quer

Note: A join index with outer join covers both inner join query as well as outer join query.

Join Index Example

CREATE JOIN INDEX EmpDept AS


SELECT e.Name, d.DeptName, e.Salary , c.Hou
SELECT (e.EmpNo, e.DeptNo ,d.DeptNameFROM
),
Employee e,
(e.Name ,e.Salary )
Department d,
FROM Employee e JOIN
Charges c
Department d
WHERE e.DeptNo = d.DeptNo
ON e.DeptNo = d.DeptNo
AND
e.EmpNo = c.EmpNo;

Does the index cover the query ?


4) We do an all-AMPs RETRIEVE step from PERSONNEL.c by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with low confidence to be 16 rows. The estimated time
for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to PERSONNEL.EMPDEPT. Spool 2
and PERSONNEL.EMPDEPT are joined using a merge join, with a join
condition of ("PERSONNEL.EMPDEPT.EmpNo = EmpNo"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with index join confidence to be
32 rows. The estimated time for this step is 0.04 seconds.

Secondary Index On Top Of Join


Index
Further performance improvement can be achieved by defining a
Secondary Index on top of join index to avoid full table scan of the
join index table.
SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice
FROM Customer JOIN
Ordertbl
ON
C_Custkey = O_Custkey
WHERE O_Orderdate BETWEEN 950101 AND 970101;
CREATE JOIN INDEX OrderByCust AS
SELECT (C_Name ,C_Address ),
(O_Orderdate ,O_TotalPrice)
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
CREATE INDEX(O_Orderdate) ORDER BY VALUES ON OrderByCustomer;
COLLECT STATISTICS ON OrderByCust INDEX(O_Orderdate);

Sparse Indexes
Sparse Index can be used to index a portion of a table.
CREATE JOIN INDEX OrderByCust AS
SELECT (C_Custkey, C_Name ,C_Address, O_Orderdate),
(O_TotalPrice)
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
WHERE O_Orderdate > DATE 2004-01-01'
PRIMARY INDEX(C_Custkey);
SELECT C_Name ,C_Address,
O_Orderdate ,O_TotalPrice
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
WHERE O_Orderdate BETWEEN DATE '2004-06-01' AND DATE '2004-12-31';

Shorter index table enables faster full table scan.

Join Index - Compressed Form


Uncompressed Form

CREATE JOIN INDEX OrderByCust AS


SELECT C_Custkey, C_Name ,C_Address, O_Orderdate, O_TotalP
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
PRIMARY INDEX(C_Custkey);
SELECT SUM(CurrentPerm) FROM DBC.TableSize
WHERE DataBaseName = 'tpch' AND
TableName = 'OrderByCust';
Sum(CurrentPerm)
------------------------4,791,296

Join Index - Compressed Form


Compressed Form

CREATE JOIN INDEX OrderByCust AS


SELECT (C_Custkey, C_Name ,C_Address), (O_Orderdate, O_TotalPri
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
PRIMARY INDEX(C_Custkey);

SELECT SUM(CurrentPerm) FROM DBC.TableSize


WHERE DataBaseName = 'tpch' AND
TableName = 'OrderByCust';
Sum(CurrentPerm)
------------------------1,209,856

Single-Table Join Indexes


Single-table join indexes help in performance improvement in certain
kind of joins by partially covering the query.
SELECT d.DeptName, e.Name
FROM Employee e,Department d
WHERE e.DeptNo = d.DeptNo;
4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with high confidence to be 21 rows. The estimated time
for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash
match scan with no residual conditions, which is joined to Spool 2
(Last Use). PERSONNEL.d and Spool 2 are joined using a merge join,
with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The
result goes into Spool 1 (group_amps), which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to
be 18 rows. The estimated time for this step is 0.04 seconds.

Cont..

Single-Table Join Indexes


CREATE JOIN INDEX EmpDept AS
SELECT Empno, Deptno, Name
FROME Employee
PRIMARY INDEX(DeptNo);

As the join index covers the employee part of the query, Optimizer join
Department table with the join index instead of Employee table itself.

4) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash


match scan with no residual conditions, which is joined to
PERSONNEL.EmpDept. PERSONNEL.d and PERSONNEL.EmpDept are joined
using a merge join, with a join condition of (
"PERSONNEL.EmpDept.DeptNo = PERSONNEL.d.DeptNo"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with low confidence to be 18 rows.
The estimated time for this step is 0.04 seconds.

Single-Table Join Indexes


SELECT d.DeptName, e.Name, e.Salary
FROM Employee e,Department d
WHERE e.DeptNo = d.DeptNo;
4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with high confidence to be 21 rows. The estimated time
for this step is 0.03 seconds.
5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash
match scan with no residual conditions, which is joined to Spool 2
(Last Use). PERSONNEL.d and Spool 2 are joined using a merge join,
with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The
result goes into Spool 1 (group_amps), which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to
be 18 rows. The estimated time for this step is 0.04 seconds.

Note : Optimizer went for full table scan of the Employee table instead of using Jo
because the existing join index EmpDept does not fully cover the Employee part o

Single-Table Join Indexes

ROWID can be included in the join index definition to enable ro


join for partially covered queries.
CREATE JOIN INDEX JIorders AS
SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE, ROWID)
FROM Ordertbl
PRIMARY INDEX (O_CUSTKEY);
SELECT C_Name, C_Address, O_Orderdate,
O_TotalPrice, O_Orderstatus
FROM Customer, Ordertbl
WHERE C_Custkey = O_Custkey
AND C_Nationkey = 10;

Cont

Single-Table Join Indexes


5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
match scan with a condition of ("TPCH.Customer.C_NATIONKEY = 10"),
which is joined to TPCH.JIorders. TPCH.Customer and TPCH.JIorders
are joined using a merge join, with a join condition of (
"TPCH.Customer.C_CUSTKEY = TPCH.JIorders.O_CUSTKEY"). The input
table TPCH.JIorders will not be cached in memory. The result goes
into Spool 2 (all_amps), which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 2 by row hash. The
size of Spool 2 is estimated with no confidence to be 9,000 rows.
The estimated time for this step is 0.44 seconds.
6) We do an all-AMPs JOIN step from TPCH.Ordertbl by way of a RowHash
match scan with no residual conditions, which is joined to Spool 2
(Last Use). TPCH.Ordertbl and Spool 2 are joined using a merge
join, with a join condition of ("Field_2 = TPCH.Ordertbl.RowID").
The input table TPCH.Ordertbl will not be cached in memory, but it
is eligible for synchronized scanning. The result goes into Spool
1 (group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 9,000 rows. The
estimated time for this step is 0.56 seconds.

Single-Table Join Indexes

Unique Primary Index column can also be used in place of ROW


as shown in the example below.
CREATE JOIN INDEX JIorders AS
SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE,
O_ORDERKEY)
FROM Ordertbl
UNIQUE PRIMARY INDEX (O_CUSTKEY);
SELECT C_Name, C_Address, O_Orderdate,
O_TotalPrice, O_Orderstatus
FROM Customer, Ordertbl
WHERE C_Custkey = O_Custkey
AND C_Nationkey = 10;

Aggregate Join Index


Aggregate Join Index are used to store pre-calculated summary data.
SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;
3) We do an all-AMPs SUM step to aggregate from TPCH.lineitem by way
of an all-rows scan with no residual conditions, and the grouping
identifier in field 1. Aggregate Intermediate Results are
computed globally, then placed in Spool 3. The input table will
not be cached in memory, but it is eligible for synchronized
scanning. The aggregate spool file will not be cached in memory.
The size of Spool 3 is estimated with low confidence to be 238,809
rows. The estimated time for this step is 10.04 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be 238,809
rows. The estimated time for this step is 1.56 seconds.

Aggregate Join Index


CREATE JOIN INDEX AS
SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;

SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty


FROM Lineitem GROUP BY 1,2;

3) We do an all-AMPs RETRIEVE step from TPCH.JIAggLineItem by way of


an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with high confidence to be
238,809 rows. The estimated time for this step is 0.19 seconds.

Aggregate Join Index


CREATE JOIN INDEX AS
SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;

Does the
SELECT L_ShipDate, SUM(L_Quantity) AS SumQty
index
FROM Lineitem GROUP BY 1;
cover
the
3) We do an all-AMPs SUM step to aggregate from TPCH.JIAggLineItem by
query ?
way of an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results
are computed globally, then placed in Spool 3. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The size of Spool 3 is estimated with no confidence to
be 491 rows. The estimated time for this step is 0.84 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 491 rows. The estimated time for this step is 0.04 seconds.

Hash Indexes

Hash Indexes
Index file structures that share properties with single table join indexes and
secondary indexes.

Hash indexes are like single table join indexes but they automatically carry bas
table primary index value.

CREATE HASH INDEX HIOrder


(O_CustKey ,
O_OrderDate,
O_TotalPrice)
ON OrderTbl
BY (O_CustKey)
ORDER BY (O_CustKey)

SELECT O_CustKey,
O_Orderdate,
O_Totalprice,
FROM OrderTbl
WHERE O_CustKey > 12;
SELECT O_CustKey,
O_Orderdate,
O_Totalprice,
O_Orderstatus
FROM OrderTbl
WHERE O_CustKey > 12;

Hash Indexes
CREATE HASH INDEX HIOrder
(O_CustKey ,
O_OrderDate,
O_TotalPrice)
ON OrderTbl
BY (O_CustKey)
ORDER BY (O_CustKey)
SELECT C_Name, C_Address,
O_Orderdate, O_TotalPrice, O_Orderstatus
FROM Customer, Ordertbl
WHERE C_Custkey = O_Custkey
AND O_Custkey < 10;

Explain

Hash Indexes
5) We do an all-AMPs JOIN step from TPCH.Customer by way of an
all-rows scan with a condition of ("TPCH.Customer.C_CUSTKEY < 10"),
which is joined to TPCH.HIOrder with a range constraint of (
"TPCH.HIOrder.O_CUSTKEY <= 9") with an additional condition of (
"TPCH.HIOrder.O_CUSTKEY <= 9"). TPCH.Customer and TPCH.HIOrder
are joined using a product join, with a join condition of (
"TPCH.Customer.C_CUSTKEY = TPCH.HIOrder.O_CUSTKEY"). The input
table TPCH.HIOrder will not be cached in memory, but it is
eligible for synchronized scanning. The result goes into Spool 2
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with no confidence to be 3 rows. The estimated time for
this step is 0.10 seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to TPCH.Ordertbl. Spool 2 and
TPCH.Ordertbl are joined using a merge join, with a join condition
of ("(Field_3 = (SUBSTRING((TPCH.Ordertbl.RowID) FROM 7 FOR 4 )))
AND (Field_2 =)"). The input table TPCH.Ordertbl will not be
cached in memory. The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is
estimated with no confidence to be 3 rows. The estimated time for
this step is 0.05 seconds.

Hash Indexes
Hash Index ( also Join Index ) can also be used to avoid row
redistribution for join preparation.

SELECT C_Name, C_Address,


O_Orderdate, O_TotalPrice
FROM Customer, Ordertbl
WHERE C_Custkey = O_Custkey;

Hash Indexes
Without Hash Index defined:
4) We do an all-AMPs RETRIEVE step from TPCH.Ordertbl by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The result spool file will not
be cached in memory. The size of Spool 2 is estimated with high
confidence to be 60,000 rows. The estimated time for this step is
1.20 seconds.
5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
match scan with no residual conditions, which is joined to Spool 2
(Last Use). TPCH.Customer and Spool 2 are joined using a merge
join, with a join condition of ("TPCH.Customer.C_CUSTKEY =
O_CUSTKEY"). The result goes into Spool 1 (group_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated with
low confidence to be 60,000 rows. The estimated time for this
step is 0.48 seconds.

The total estimated time is 1.68 seconds.

Hash Indexes
With Hash Index defined:
CREATE HASH INDEX HIOrder(O_Custkey,
O_TotalPrice,
O_OrderDate)
ON OrderTbl BY (O_CustKey) ORDER BY HASH (O_CustKey);
4) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
match scan with no residual conditions, which is joined to
TPCH.HIOrder. TPCH.Customer and TPCH.HIOrder are joined using a
merge join, with a join condition of ("TPCH.Customer.C_CUSTKEY =
TPCH.HIOrder.O_CUSTKEY"). The input table TPCH.HIOrder will not
be cached in memory. The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is
No redistribution,
estimated with low confidence to be 60,000 rows. The estimated
No sorting.
time for this step is 0.52 seconds.
Total join time

significantly
reduced
The total estimated time is 0.52 seconds.

Partitioned Primary Index

Partitioned Primary Indexes


Partitioned Primary Index (PPI) allows a
class of queries to access a portion of a
large table instead of the whole table.
PPI table rows are assigned to user defined
partitions in each AMP enabling enhanced
performance for range queries that are
predicated on primary index values.
PPIs increase query efficiency by avoiding
full table scan without the overhead and
maintenance cost of secondary indexes.

NON-PPI Table
Records are sorted in row hash (not shown) sequence within the AMP.
101 10
120

30 01/10

131 20
115

01/10

30 01/02

114 40
119

01/12

30 01/20

107 20
135

01/18

30 01/10

129 10
102

01/02

01/20

30 01/18

110 10
125

30 01/12

122 20
132

01/10

10 01/18

113 40
106

01/18

30 01/20

123 40
118

01/20

40 01/02

138 30
128

01/10

01/02

40 01/12

140 10
121

01/10

40 01/12

116 30

01/18

101 10
130

20 01/10

139 40

112 20 01/10

134

103 40

105 10

133

10 01/20

126 20
127

01/02

30 01/18

136 20
109

01/18

01/10

30 01/02

01/02

01/18

30 01/20
01/02

104 20 01/12
124 20
117

30 01/18

137 20
108

01/20

01/02

10 01/12

SELECT * FROM Employee WHERE EmpId = 114;


SELECT * FROM Employee WHERE EmpId BETWEEN DATE 2004-01-12
AND DATE 2004-01-18;

PPI Table

Records are sorted in row hash (not shown) sequence in each partition within t
101 10

01/02

113 40

01/02

01/02

101 10

01/02

30 01/02

105 10

01/02

137 20

01/02

135

30 01/02

132

115

30 01/10

110 10

01/10

140 10

01/10

123 40

01/10

136 20

01/10

106

40 01/12

112 20

01/12

125

30 01/12

121

30 01/18

118

10 01/18

116 30

01/18

138

30 01/18

127

30 01/20

128

30 01/20

103 40

107 20
120

30 01/10

129 10
119

131 20
102

01/10

114 40

01/20

40 01/02

126 20

122 20

01/20

109

133

130

20 01/10

01/10

104

20 01/12

40 01/12

108

10 01/12

01/18

30 01/18
01/18

10 01/20

139 40

01/18

117

30 01/18

134

30 01/20

124 20

01/20

SELECT * FROM Employee WHERE EmpId = 114;


SELECT * FROM Employee WHERE EmpId BETWEEN DATE 2004-01-12
AND DATE 2004-01-18;

PPI Example
CREATE TABLE Lineitem (
CREATE TABLE LineitemPPI (
L_ORDERKEY INTEGER,
L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_SUPPKEY INTEGER,
L_LINENUMBER INTEGER ,
L_LINENUMBER INTEGER ,
L_QUANTITY DECIMAL(15,2),
L_QUANTITY DECIMAL(15,2),
L_EXTENDEDPRICE DECIMAL(15,2),
L_EXTENDEDPRICE DECIMAL(15,2),
L_DISCOUNT DECIMAL(15,2),
L_DISCOUNT DECIMAL(15,2),
L_TAX DECIMAL(15,2),
L_TAX DECIMAL(15,2),
L_RETURNFLAG CHAR(1),
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE DATE,
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT CHAR(25),
L_SHIPINSTRUCT CHAR(25),
L_SHIPMODE CHAR(10),
L_SHIPMODE CHAR(10),
L_COMMENT VARCHAR(44)
L_COMMENT VARCHAR(44)
)
)
PRIMARY INDEX (L_ORDERKEY);
PRIMARY INDEX (L_ORDERKEY)
PARTITION BY RANGE_N(L_ShipDate BETWEEN
DATE '1992-01-03' AND DATE '1998-11-30'
EACH INTERVAL '1' MONTH );

PPI Example
SELECT MIN(L_Shipdate), MAX(L_Shipdate) FROM Lineitem;
Minimum(L_SHIPDATE)
------------------------------1992-01-03

Maximum(L_SHIPDATE)
-------------------------------1998-11-30

NON-PPI Table:
EXPLAIN SELECT * FROM Lineitem WHERE l_Shipdate > DATE '1997-12-31';

3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an


all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE >
DATE '1997-12-30'") into Spool 1 (group_amps), which is built
locally on the AMPs. The input table will not be cached in memory,
but it is eligible for synchronized scanning. The result spool
file will not be cached in memory. The size of Spool 1 is
estimated with high confidence to be 27,783 rows. The estimated
time for this step is 3.00 seconds.

PPI Example
PPI Table:
EXPLAIN SELECT * FROM LineitemPPI WHERE l_Shipdate > DATE '1997-12-31';

3) We do an all-AMPs RETRIEVE step from 12 partitions of


TPCH.lineitemppi with a condition of (
"TPCH.lineitemppi.L_SHIPDATE > DATE '1997-12-30'") into Spool
(group_amps), which is built locally on the AMPs. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with high confidence to be 27,73
rows. The estimated time for this step is 0.88 seconds.
Only 12 partitions are retrieved instead of a full table scan

PPI Example
NON-PPI Table:

EXPLAIN SELECT * FROM Lineitem WHERE L_Orderkey = 240000


1) First, we do a single-AMP RETRIEVE step from TPCH.lineitem by
way of the primary index "TPCH.lineitem.L_ORDERKEY = 240000"
with no residual conditions into Spool 1 (one-amp), which is built
locally on that AMP. The input table will not be cached in memory,
but it is eligible for synchronized scanning. The size of Spool 1 is
estimated with high confidence to be 5 rows. The estimated time
for this step is 0.03 seconds.

Rows are stored in rowhash order within a AMP. Search is very


Only one block read.

PPI Example
PPI Table:

EXPLAIN SELECT * FROM LineitemPPI WHERE L_Orderkey = 240000

1) First, we do a single-AMP RETRIEVE step from all partitions of


TPCH.lineitemppi by way of the primary index
"TPCH.lineitemppi.L_ORDERKEY = 240000" with a residual
condition of ("TPCH.lineitemppi.L_ORDERKEY = 240000")
into Spool 1 (one-amp), which is built locally on that AMP.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The size of Spool 1 is estimated with
high confidence to be 5 rows. The estimated time for this step is
0.67 seconds.
All partitions are to be scanned for comparison.

PPI Delete Performance


CREATE TABLE Lineitem (
L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_LINENUMBER INTEGER ,
L_QUANTITY DECIMAL(15,2),
L_EXTENDEDPRICE DECIMAL(15,2),
L_DISCOUNT DECIMAL(15,2),
L_TAX DECIMAL(15,2),
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT CHAR(25),
L_SHIPMODE CHAR(10),
L_COMMENT VARCHAR(44)
)
PRIMARY INDEX (L_ORDERKEY);

CREATE TABLE LineitemPPI (


L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_LINENUMBER INTEGER ,
L_QUANTITY DECIMAL(15,2),
L_EXTENDEDPRICE DECIMAL(15,2),
L_DISCOUNT DECIMAL(15,2),
L_TAX DECIMAL(15,2),
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT CHAR(25),
L_SHIPMODE CHAR(10),
L_COMMENT VARCHAR(44)
)
PRIMARY INDEX (L_ORDERKEY)
PARTITION BY RANGE_N(L_ShipDate BETWEEN
DATE '1992-01-03' AND DATE '1998-11-30'
EACH INTERVAL '1' MONTH );

PPI Delete Performance


DELETE FROM Lineitem
WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';

3) We do an all-AMPs DELETE from TPCH.Lineitem by way of an all-rows


scan with a condition of ("(TPCH.Lineitem.L_SHIPDATE >= DATE
'1996-12-31') AND (TPCH.Lineitem.L_SHIPDATE <= DATE '1997-12-31')").

DELETE FROM LineitemPPI


WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';

3) We do an all-AMPs DELETE from 2 partitions of TPCH.LineitemPPI


with a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE
'1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE
'1997-12-31')").
4) We do an all-AMPs DELETE of 11 partitions of TPCH.LineitemPPI with
a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE
'1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE
'1997-12-31')").

PPI Joins
CREATE TABLE LineitemPPI (
L_ORDERKEY INTEGER,
L_PARTKEY INTEGER,
L_SUPPKEY INTEGER,
L_LINENUMBER INTEGER ,
L_QUANTITY DECIMAL(15,2),
L_EXTENDEDPRICE DECIMAL(15,2),
L_DISCOUNT DECIMAL(15,2),
L_TAX DECIMAL(15,2),
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_COMMENT VARCHAR(44)
)
PRIMARY INDEX (L_ORDERKEY)
PARTITION BY RANGE_N(L_ShipDate
BETWEEN DATE '1992-01-03'
AND
DATE '1998-11-30'
EACH INTERVAL '1' MONTH );

CREATE TABLE Shipping (


S_ORDERKEY INTEGER,
S_SHIPDATE DATE,
S_RECEIPTDATE DATE,
S_SHIPINSTRUCT CHAR(25),
S_SHIPMODE CHAR(10)
)
PRIMARY INDEX (S_ORDERKEY)
PARTITION BY RANGE_N(S_ShipDate BETWEEN
DATE '1992-01-03' AND DATE '1998-11-30'
EACH INTERVAL '1' MONTH );

SELECT L_Orderkey, L_Shipdate,S_Shipmode


FROM LineitemPPI
INNER JOIN Shipping
ON L_Orderkey = S_Orderkey;

PPI Joins

4) We do an all-AMPs JOIN step from all partitions of TPCH.shipping


by way of a RowHash match scan with a condition of ("NOT
(TPCH.shipping.S_SHIPDATE IS NULL)"), which is joined to
TPCH.lineitemppi with a condition of ("NOT
(TPCH.lineitemppi.L_SHIPDATE IS NULL)"). TPCH.shipping and
TPCH.lineitemppi are joined using a rowkey-based merge join, with
a join condition of ("(TPCH.lineitemppi.L_SHIPDATE =
TPCH.shipping.S_SHIPDATE) AND (TPCH.lineitemppi.L_ORDERKEY =
TPCH.shipping.S_ORDERKEY)"). The input tables TPCH.shipping and
TPCH.lineitemppi will not be cached in memory, but TPCH.shipping
is eligible for synchronized scanning. The result goes into Spool
1 (group_amps), which is built locally on the AMPs. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with low confidence to be 401,785 rows. The estimated
time for this step is 4.85 seconds.

Collect Statistics

Collect Statistics
Optimizer must be provided with correct
demographic information on your data to
choose optimal plan to execute your query.
Statistics tells the optimizer
How many rows per value are there.
How many distinct values are there in the
column.

If Collected Statistics are not available,


optimizer does random AMP sampling to
derive demographics.

Collect Statistics
Collected statistics are not automatically
updated by Teradata DBS.
User must refresh statistics when 5% to
10% change on the table rows.
Collect Statistics on
All non-unique Indexes of a table or a join
index.
Any column used in WHERE clause for set
selection or join constraint.

Collect Statistics
COLLECT STATISTICS ON Lineitem COLUMN L_Orderkey;
COLLECT STATISTICS ON Lineitem COLUMN L_Shipdate;
COLLECT STATISTICS ON Lineitem COLUMN (L_Orderkey, L_Shipdate);
HELP STATISTICS Lineitem;
Date
Time
Unique Values
Column Names
------------- ------------ -------------------- -----------------------------------04/10/05 11:04:48
60,000
L_ORDERKEY
04/10/05 09:57:52
2,524
L_SHIPDATE
04/10/05 11:49:47 236,352
L_ORDERKEY,L_SHIPDATE

Data Compression

Data Compression
Makes row sizes smaller
Allows more rows per block
Reduces the number of I/Os
Implemented in column level
Compression is a I/O-intensive workload.
Improvement gained through the more-rows-per-block
concept is significant in the Full Table Scan operations.
Compression is transparent to applications.

Data Compression

Single-Value Compression
V2R4 and prior
CREATE TABLE Employee
(EmployeeNo INTEGER

Jobtitle CHARACTER(30)
COMPRESS (cashier)

);

Nulls and
cashiers will be compressed.

Multi-Value Compression
V2R5 and later
CREATE TABLE Employee
(EmployeeNo INTEGER

Jobtitle CHARACTER(30)
COMPRESS (cashier,
manager,
programmer)
...
);
Cashiers,
managers,
programmers
will be compressed including nulls.
255 distinct values for an individual
column can be compressed.

Data Compression Implementation

The following graphics shows how Data Compression is implemented in


CREATE TABLE CompressExample (
StreetAddress VARCHAR(40),
City CHARACTER(20)
Field:
COMPRESS (New York,
StreetAddres
Los Angeles,
s
Chicago)
VARCHAR(40)
NOT NULL,
StateCode CHARACTER(2));
00

Actual Data Rows

Table Header

Field: City
CAHR(20)
01 Chicago

10 Los Angeles
11 New York
130 Sutter
San
Francisco

01

St.
133 Wacker
Drive.

11

5 Times Sq.

NY

01

900 North Michigan Av.

11

135 East 57th

00

1525 Howe St.

10

304 S.
Broadway

Field:
StateCod
e Char(2)

CA

IL

IL

NY
Racine
CA

WI

Multi Value Compression &


VARCHAR
VARCHAR consumes two extra bytes for each value whereas
compression consumes CPU resource to decode compress
values.
The data demographics determine whether variable length
character data type or fixed length plus compression is
more efficient.
VARCHAR is better when the difference of max and average
field length is high and a low percentage of fields are
compressible.
Compression is better when the difference of max and
average field length is low and a high percentage of fields
are compressible.
If no clear pictures about data demographics are available,
use VARCHAR as it is less CPU intensive.

Query Management

Priority Scheduler
A DBA may want to:
Configure the system to execute queries at a higher
priority submitted by Sales Managers.
Or

Configure the system to execute queries submitted by Develop


group at a lower priority during 8:00 AM and 3:00 PM and exe
at medium priority during 3:00 PM and 8:00 AM.
Or

Lower the priority of a job if it takes more than one hour to co

Priority Scheduler
Can be used to control resources allocated
to users.
Administrator can specify performance
group while creating the user.
It manages resource distribution to
improve performance of one application at
the expense of other.

Priority Scheduler Components


Resource Partition

Performance
Groups

RP#

Performance
Periods
8am5pm 5pm-9pm 9pm-8am
AG1
AG2
AG3

Allocation
Groups

AG1
5

AG2
10

8pm-8am
AG4

AG3
20

8am-8pm
AG3

AG4
40

Priority Scheduler Components


Resource Partition
High level Resource
Partitioning
Default is Partition 0

Performance Group
Provides relative priority with
in the Resource Partition
Can be specified in the
Account String in Create User
statement.
Can be specified in user Logon
String ($M$, $DEV$ etc).

Performance Period
Controls the scheduling
policy at that point in time.
Links a PG to an Allocation
Groups weight and policy

Allocation Group
Defines a method for
disbursing resources
among sessions active
within that allocation
group
Carries the weight.
Defines a scheduling policy

Example 1 Percentage of
Resource Allocation
User WHDev with performance group $L$ logged on to the system at
9:30 PM.
What is the percentage share of system resources the user WHDev
will get ?

Sum of the weights = 5 + 10 + 20 + 40 = 75


At 9:30 PM performance group L will be
assigned to allocation group AG3.
So % of resource allocation = 20/75 = 26%

Priority

Example 2 - Automatic Change


in Priority Based CPU usage

20
10
5
0

1000

2000

3000

Time

Performance Period 1
Usage 3600 Seconds
Allocation Group AG11

Performance Period 2
Usage 0 Seconds
Allocation Group AGDEF

Allocation Grp=AG11
Weight=40

Allocation Grp=AGDEF
Weight=5

Teradata Dynamic Query


Manager
A DBA may want to:

Prevent all queries that are estimated to return more than 100
rows from running between the hours of 8:00 a.m. and 1:00 p.
on Fridays.
Or

Prevent all queries from Testing group that are estimated to t


more than 3 minutes running between the hours of 8:00 a.m. a
3.00 p.m. on Monday.
Or
Schedule a request to run on every Friday at 8.00 pm.

Teradata Dynamic Query


Manager
Teradata Dynamic Query Manager (TDQM) is product
that enables you to effectively manage the access to
and utilization of a Teradata database system.
Managing the database system increases the workload
capacity and efficiency of database usage.
TDQM addresses the key problems of database system
overload and network saturation that result from large
number of clients accessing the Teradata system.
Two main functionalities of TDQM are:
Limiting the execution of some queries on the Teradata
database according to rules Query Management.
Scheduling SQL request for batch execution Scheduled
Requests.

TDQM Architecture
Query Management

All Client
systems
accessing
Teradata.

TDQM
Administrat
or

TDQM
Partition
TDQM
Metadat
a

Scheduled Requests

Scheduled
Requests
Client

User Data

Scheduled
Request
Server

Teradata
RDBMS

Teradata Dual Active Solution


Provides support for unplanned down time.
Eliminates the need for planned down
time.
Provides additional processing power to
smooth out peak workload on the primary
system.
100% data replication as well as only
mission-critical data replication possible.

Teradata Dual Active Solution


Architecture
Primary System
Users/
Applications

Data
Synchronization

Operation Control

Teradata
Query Director

Users/
Applications

Users/
Applications
Backup System

Teradata Query Director


Designed to intelligently route queries
based on customer-established rules.
Helps to share workload between the
system.
Provide failover capability.

Questions ?

Das könnte Ihnen auch gefallen