You are on page 1of 33

TERADATA- DAY 2

Teradata Indexes
Types of tables

Prepared By

AnilKumar P

-Primary index
-Unique Primary Index (UPI)
-Non Unique Primary Index(NUPI)
-No Primary Index (NOPI)
-Partition Primary Index(PPI)
-Secondary Index
-Unique Secondary Index (USI)
-Non Unique Secondary Index(NUSI)
-Join Index
-Single Table Join Index(STJI)
-Multi table Join Index (MTJI)
-Aggregate Join Index (AJI)
-Hash Index
-Types of tables
-Set table
-Multi set table
-Derived table
-Volatile table
-Global Temporary Table
-Locks

Types of tables:
Derived tables are always local to a single SQL request. They are built dynamically
using an additional SELECT within the query. The rows of the derived table are stored
in spool and discarded as soon as the query finishes..
Volatile Temporary tables are local to a session rather than a specific query. This
means that the table may be used repeatedly within a user session. That is the
major difference between volatile temporary tables (multiple use) and derived
tables (single use). Like a derived, a volatile temporary table is materialized in spool
space. However, it is not discarded until the session ends or when the user
manually drops it.
Global Temporary tables are local to a session, like volatile tables. Global temporary
tables are used temporary space. But the major difference is GTT Data Definition is
stored in Data Dictionary. But not data. When ever user come out the session data
automatically deleted but not definition.

Derived Table Example :


Ex 1 : Select * From ( Select AVG(SAL) as Avgsalary From Emp) sample;
Ex 2 : A Derived Table that Joins to an Existing Table
SELECT Dept_No, First_Name, Last_Name, AVGSAL
FROM Employee_Table
INNER JOIN
(SELECT Dept_No, AVG(Salary) FROM Employee_Table
GROUP BY Dept_No) as Sample (Dno, AVGSAL)
ON Dept_No = Dno

Show all employees and their Average Salary per department!

The first THREE columns in the Answer Set came from the Employee_Table. AVGSAL came from
the derived table named TeraTom.

Derived table Example :


Get top three selling items across all stores :
we must first aggregate the sales by product-id using a derived table. Once we have
this aggregation done in spool, we may apply the RANK function to answer the
question.
SELECT Prodid, Sumsales, RANK(Sumsales) AS "Rank"
FROM (SELECT Prodid, sum(sales) FROM Salestbl GROUP BY 1)
AS tmp (Prodid, Sumsales)
QUALIFY RANK (Sumsales) <= 3;
Result :

Prodid
--------A
C
D

Sumsales
--------------170000.00
115000.00
110000.00

Rank
-------1
2
3

Derived table name is tmp.


The table is required for this query but no others.
The query will be run only one time with this data.
Derived column names are Prodid and Sumsales.
Table is created in spool using the inner SELECT.
SELECT statement is always in parenthesis following FROM.

Volatile Temporary tables are local to a session rather than a specific query. This
means that the table may be used repeatedly within a user session. That is the
major difference between volatile temporary tables (multiple use) and derived
tables (single use). Like a derived, a volatile temporary table is materialized in
spool space. However, it is not discarded until the session ends or when the user
manually drops it.
Syntax: CREATE VOLATILE TABLE Dept_Agg_Vol , NO LOG
( Dept_no Integer
,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;

NO LOG allows for better performance.


LOG indicates that a transaction journal is maintained.
PRESERVE ROWS indicates keep table rows at TXN end.
DELETE ROWS indicates delete all table rows at TXN end.

The Three Steps to Use a Volatile Table :


CREATE VOLATILE TABLE Dept_Agg_Vol , NO
LOG
( Dept_no Integer
,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;

INSERT INTO Dept_Agg_Vol


SELECT Dept_no
,SUM(Salary)
FROM Employee_Table
GROUP BY Dept_no ;
SELECT * FROM Dept_Agg_Vol
ORDER BY 1;
1) A USER Creates a Volatile Table and then
2) populates the Volatile Table with an
INSERT/SELECT Statement, and then
3) Query it until you Logoff.

HELP VOLATILE TABLE ;


This command is used to display the names of all Volatile temporary tables
active for the current user session.

SessionID TableName TableId Protection CreatorName CommitOption TransactionLog


1010
Dept_Agg_Vol 10C0C04 N
Anil
P
N

Global Temporary tables are local to a session, like volatile tables.


Global temporary tables are used temporary space. But the major
difference is GTT Data Definition is stored in Data Dictionary,But not
data. When ever user come out the session data automatically
deleted but not definition.
CREATE Global Temporary TABLE Dept_Agg_GLO
( Dept_no Integer
,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;
Have LOG and ON COMMIT PRESERVE/DELETE options.

The Three Steps to using a Global Temporary Table


CREATE Global Temporary TABLE
Dept_Agg_GLO
( Dept_no Integer
,Sum_Salary Decimal(10,2)
)
ON COMMIT PRESERVE ROWS ;
INSERT INTO Dept_Agg_GLO
SELECT Dept_no
,SUM(Salary)
FROM Employee_Table
GROUP BY Dept_no ;
SELECT * FROM Dept_Agg_GLO
ORDER BY 1;

Primary Index :
A Primary Index (PI) is the physical mechanism for assigning a data row to an AMP
and a location on the AMPs disks. It is also used to access rows without having to
search the entire table.
The rows of every table are distributed among all AMPs
Each AMP is responsible for a subset of the rows of each table.
Ideally, each table will be evenly distributed among all AMPs.
Evenly distributed tables result in evenly distributed workloads.
The uniformity of distribution of the rows of a table depends on the choice of the
Primary Index.
Three Purpose of primary index
1-Distribution of rows to proper AMP.
2-Fastest way to Retrieve the single row
3-Accessig Joins

Types of Primary Index :


- Unique Primary Index
- Non Unique Primary Index
Syntax :CREATE TABLE sample_1
(col_a
INTEGER
,col_b
INTEGER
,col_c
INTEGER)
UNIQUE PRIMARY INDEX (col_b);

CREATE TABLE sample_2


(col_x
INTEGER
,col_y
INTEGER
,col_z
INTEGER)
PRIMARY INDEX (col_x);

Limitations:
1. Each table should have only one primary index. It supports up to 64 Columns.
2. Once primary index we cant alter and Drop.
3. Primary index is always One AMP Operation.
4.if we dont give to any column for index, Teradata automatically created first column
of table at the time table creation.

Physical Mechanism
Index value(s)

hashing algorithm

Row Hash
DSW or
Hash Bucket #

Hash Map

AMP #

{
{
{
{

The hashing algorithm is designed to insure even distribution of


unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.

A Row Hash is the 32-bit result of applying a hashing algorithm to


an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.

A Hash Map is uniquely configured for each system.


It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.

Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.

A Hashing Example
Order
Order
Number
PK
UPI
7325
7324
7415
7415
7103
7225
7384
7402
7188
7202

Customer
Number

Order
Date

SELECT * FROM order


WHERE order_number = 7202;

Order
Status

7202
2
3
3
1
1
2
1
3
1
2

4/13
4/13
4/13
4/13
4/10
4/15
4/12
4/12
4/13
4/09

O
O
O
C
O
C
C
C
C
C

Hashing Algorithm

691B 14AE

32 bit Row Hash


Destination Selection Word

Remaining 16 bits

0110 1001 0001 1011

0001 0100 1010 1110

The Hash Map


7202

691B 14AE

Hashing Algorithm

32 bit Row Hash


Destination Selection Word

Remaining 16 bits

0110 1001 0001 1011

0001 0100 1010 1110

(Hexadecimal)

HASH MAP
690
691
692
693
694
695

07
15
01
07
04
11

06
08
00
06
12
11

07
02
15
15
11
12

06
04
11
13
13
10

07
01
14
11
05
03

04
00
14
06
10
02

05
14
13
15
07
06

06
14
13
08
07
13

05
03
14
15
03
01

05
02
14
15
02
00

A B C

D E

14
03
08
08
11
06

13
00
10
07
00
06

04
15
09
10
13
12

09
09
09
08
04
05

14
01
15
11
01
07

03
02
09
05
11
05

AMP 9

7202 2 4/09 C

Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.

Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
1254

A Row Hash = 32 bits = 4.2 billion possible


values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.

7769

Data values input

Hash Algorithm
10A2 2936

10A2 2936

Hash Synonyms

(Dave)
'Smith'

NUPI Duplicates

Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.

(John)
'Smith'

Hash Algorithm

0016 5557

Conclusion

A row hash is not adequate to uniquely identify a row.

0016 5557

Rows have
same hash

The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row ID
Each stored row
has a Row ID as a
prefix.

Rows are logically


maintained in Row
ID sequence.

Row Hash
(32 bits)

Uniqueness Id
(32 bits)

Row ID

Row Data

Row ID

Row Data

Row Hash

Unique ID

Emp_No

Last_Name

First_Name

3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034

0000 0001
0000 0002
0000 0003
0000 0001
0000 0001

1018
1020
1031
1014
1012

Reynolds
Davidson
Green
Jacobs
Chevas

Jane
Evan
Jason
Paul
Jose

3B11 5034

0000 0002

1021

Carnet

Jean

Secondary Index :
There are 3 general ways to access a table:
Primary Index access

(one AMP access)

Secondary Index access

(two or all AMP access)

Full Table Scan

(all AMP access)

A secondary Index provides an alternate path to the rows of a table.


A table can have from 0 to 32 secondary indexes.
Secondary Indexes:
Do not effect table distribution.
Add overhead, both in terms of disk space and maintenance.
May be added or dropped dynamically as needed.
Are chosen to improve table performance

Choosing a Secondary Index


A Secondary Index may be defined ...
at table creation
(CREATE TABLE)
following table creation
(CREATE INDEX)
it supports up to 64 columns
NUSI

USI
If the index choice of column(s) is unique,
it is called a USI.
Unique Secondary Index)

If the index choice of column(s) is nonunique, it is called a NUSI.


Non-Unique Secondary Index

Accessing a row via a USI is a 2 AMP


operation.

Accessing row(s) via a NUSI is an all AMP


operation.

CREATE UNIQUE INDEX


(Employee_Number) ON Employee;

CREATE INDEX
(Last_Name) ON Employee;

Notes:

Secondary Indexes cause an internal sub-table to be built.


Dropping the index causes the sub-table to be deleted.

Unique Secondary Index (USI) Access


Message Passing Layer

Create USI
CREATE UNIQUE INDEX
(Cust) ON Customer;

AMP 1

AMP 2

USI Subtable

Access via USI

RowID
244, 1
505, 1
744, 4
757, 1

SELECT *
FROM
Customer
WHERE Cust = 56;

Cust
74
77
51
27

RowID
884, 1
639, 1
915, 9
388, 1

AMP 3

USI Subtable
RowID
135, 1
296, 1
602, 1
969, 1

Cust
98
84
56
49

100

Cust
31
40
45
95

RowID
638, 1
640, 1
471, 1
778, 3

RowID
175, 1
489, 1
838, 4
919, 1

Cust
37
72
12
62

RowID
107, 1
717, 2
147, 2
822, 1

778

Message Passing Layer


USI Value = 56
Hashing
Algorithm

Table ID

RowID
288, 1
339, 1
372, 2
588, 1

USI Subtable

Row Hash Unique Val

100

Customer
Table ID = 100

USI Subtable

RowID
555, 6
536, 5
778, 7
147, 1

Table ID

PE

AMP 4

AMP 1

AMP 2

AMP 3

AMP 4

Base Table

Base Table

Base Table

Base Table

Row Hash USI Value


602

to MPL

56

RowID Cust Name


USI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith

Phone
NUPI
555-4444
666-5555
111-2222
222-3333

RowID Cust Name


USI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

RowID Cust Name


USI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black

Phone
NUPI
111-6666
777-4444
222-8888
444-5555

RowID Cust Name


USI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

Non-Unique Secondary Index (NUSI) Access


Message Passing Layer

Create NUSI
AMP 1

CREATE INDEX (Name) ON


Customer;

AMP 2

NUSI Subtable

Access via NUSI


SELECT *
FROM
Customer
WHERE Name = 'Adams';

RowID
432, 8
448, 1
567, 3
656, 1

Name
Smith
White
Adams
Rice

RowID
640, 1
107, 1
638, 1
536, 5

AMP 3

NUSI Subtable
RowID Name
432, 3 Smith
567, 2 Adams
852, 1

Brown

RowID
884, 1
471, 1
717, 2
555, 6

AMP 4

NUSI Subtable
RowID
432, 1
448, 4
567, 6
770, 1

Name
Smith
Black
Jones
Young

RowID
147, 1
822, 1
338, 1
147, 2

NUSI Subtable
RowID
155, 1
396, 1
432, 5
567, 1

Name
Marsh
Peters
Smith
Jones

RowID
915, 9
778, 3
778, 7
639, 1

PE
Customer
NUSI Value = 'Adams'
Table ID = 100

AMP 1

AMP 2

AMP 3

AMP 4

Hashing
Algorithm

Base Table
Table ID
100

Row Hash NUSI Value


567

to MPL

Adams

RowID Cust Name


NUSI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith

Phone
NUPI
555-4444
666-5555
111-2222
222-3333

Base Table
RowID Cust Name
NUSI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

Base Table
RowID Cust Name
NUSI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black

Phone
NUPI
111-6666
777-4444
222-8888
444-5555

Base Table
RowID Cust Name
NUSI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

Full Table Scans


Every row of the table must be read.
All AMPs scan their portion of the table in parallel.
Fast and efficient on Teradata due to parallelism.

Full table scans typically occur when either:


An index is not used in the query
An index is used in a non-equality test
Customer
Cust_ID
USI

Cust_Name

Cust_Phone
NUPI

Examples of Full Table Scans:


SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name = 'Davis';

SELECT * FROM Customer WHERE Cust_ID > 1000;

Partitioned Primary Indexes (PPI)


What is a Partitioned Primary Index or PPI?
A new indexing mechanism in Teradata.
Data rows can be grouped into partitions at the AMP level.
What advantages does a PPI provide?
Increases the available options to improve the performance of certain types of
queries.
Only the rows of the qualified partitions in a query need to be accessed - avoid full
table scans.
Types of Partition Primary Index :
Range Based Partition and Case Based Partition.
As always, data is distributed among AMPs and automatically placed
within partitions.
In a table defined with a PPI, each row is uniquely identified by its Row Key.

Row Key = Partition # + Row Hash + Uniqueness Value

Logical Example of NPPI versus PPI


4 AMPs with
Orders Table defined
with NPPI.

4 AMPs with
Orders Table defined
with PPI on O_Date.
SELECT
WHERE O_Date
BETWEEN '2002-11-01'
AND '200211-30';

RH

O_#

'01'

1028

'03'

O_Date

RH

O_#

02/11

'06'

1009

1016

02/10

'07'

'12'

1031

02/11

'14'

1001

'17'

RH

O_#

RH

O_#

02/09

'04'

1008

02/09

'02'

1024

02/10

1017

02/10

'05'

1048

02/12

'08'

1006

02/09

'10'

1034

02/11

'09'

1018

02/10

'11'

1019

02/10

02/09

'13'

1037

02/12

'15'

1042

02/12

'18'

1041

02/12

1013

02/10

'16'

1021

02/10

'19'

1025

02/11

'20'

1005

02/09

'23'

1040

02/12

'21'

1045

02/12

'24'

1004

02/09

'22'

1020

02/10

'28'

1032

02/11

'26'

1002

02/09

'27'

1014

02/10

'25'

1036

02/11

'30'

1038

02/12

'29'

1033

02/11

'32'

1003

02/09

'31'

1026

02/11

'35'

1007

02/09

'34'

1029

02/11

'33'

1039

02/12

'38'

1046

02/12

'39'

1011

02/09

'36'

1012

02/09

'40'

1035

02/11

'41'

1044

02/12

'42'

1047

02/12

'36'

1043

02/12

'44'

1022

02/10

'43'

1010

02/09

'48'

1023

02/10

'45'

1015

02/10

'47'

1027

02/11

'46'

1030

02/11

RH

O_#

RH

O_#

RH

O_#

RH

O_#

'14'

1001

02/09

'06'

1009

02/09

'04'

1008

02/09

'08'

1006

02/09

'35'

1007

02/09

'26'

1002

02/09

'24'

1004

02/09

'20'

1005

02/09

'39'

1011

02/09

'36'

1012

02/09

'32'

1003

02/09

'43'

1010

02/09

'03'

1016

02/10

'07'

1017

02/10

'09'

1018

02/10

'02'

1024

02/10

'17'

1013

02/10

'16'

1021

02/10

'27'

1014

02/10

'11'

1019

02/10

'48'

1023

02/10

'45'

1015

02/10

'44'

1022

02/10

'22'

1020

02/10

'01'

1028

02/11

'10'

1034

02/11

'19'

1025

02/11

'25'

1036

02/11

'12'

1031

02/11

'29'

1033

02/11

'40'

1035

02/11

'31'

1026

02/11

'28'

1032

02/11

'34'

1029

02/11

'47'

1027

02/11

'46'

1030

02/11

'23'

1040

02/12

'13'

1037

02/12

'05'

1048

02/12

'18'

1041

02/12

'30'

1038

02/12

'21'

1045

02/12

'15'

1042

02/12

'38'

1046

02/12

'42'

1047

02/12

'36'

1043

02/12

'33'

1039

02/12

'41'

1044

02/12

O_Date

O_Date

O_Date

O_Date

O_Date

O_Date

O_Date

Partitioning with RANGE_N


Notes:

Partition current sales table into daily partitions.


Assume current sales table only has data for the first 3 months of 2003,
but we have defined partitions for the entire year 2003.

It is relatively easy to ALTER the table to extend the partitions for 2004.
A UPI is allowed because the partitioning columns are part of the PI.
CREATE TABLE Sales
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue DECIMAL(9,2),
total_sold
INTEGER,
UNIQUE PRIMARY INDEX (store_id ,item_id ,sales_date)
PARTITION BY RANGE_N (
sales_date
BETWEEN DATE '2003-01-01' AND DATE '2003-12-31'
EACH INTERVAL '1' MONTH);

Partitioning with CASE_N


Notes:

Partition the data based on total revenue for the products.


The NO CASE and UNKNOWN options allow for total_revenue >=100,000 or unknown
revenue.

A UPI is NOT allowed because the partitioning columns are NOT part of the PI.
CREATE TABLE Sales_Revenue
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue DECIMAL(9,2),
total_sold
INTEGER,)
PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY CASE_N
( total_revenue < 2000 , total_revenue
total_revenue < 6000 , total_revenue
total_revenue < 10000 , total_revenue
total_revenue < 50000 , total_revenue
NO CASE,
UNKNOWN);

<
4000 ,
<
8000 ,
< 20000 ,
< 100000 ,

Join Index :