Sie sind auf Seite 1von 11

V14 Certification: Teradata Basics

by Tom Coffing and John Nolan


Coffing Data Warehousing. (c) 2014. Copying Prohibited.

Reprinted for Anand Ranjan, Accenture


anand.ranjan@accenture.com
Reprinted with permission as a subscription benefit of Skillport,
http://skillport.books24x7.com/

All rights reserved. Reproduction and/or distribution in whole or in part in electronic,paper or


other forms without written permission is prohibited.

V14Certification:TeradataBasics

Chapter 3: Hashing of the Primary Index


Overview
"Write a wise saying and your name will live forever."
- Anonymous

The Hashing Formula Facts

1. There is only one Hashing Formula.


2. A row's Primary Index value is hashed, and the output is its Row Hash.
3. The Row Hash will be stored with the row (on disk) as part of the Row_ID.
4. If the Hashing Formula hashes value 1001, and gets a row hash of 13, then it will produce a 13 every time it hashes a 1001
value. It's consistent!
There is one Hashing Formula in Teradata, and it is consistent. The concept is to take the value of a row's Primary Index and run it through the
Hash Formula. It will produce a Row Hash number. That Row Hash will stay with the row forever and reside as the first part of the row. The Row
Hash also determines which AMP owns the row.

The Hash Map Determines which AMP will own the Row

A row will be placed on an AMP after the Parsing Engine (PE) hashes the row's Primary Index value. The output of the Hashing Algorithm is a
row's Row Hash. The Row hash goes to a bucket in the Hash Map and is assigned to an AMP.

The Hash Map Determines which AMP will own the Row

Page 2 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

The above example hashed Emp_No 1001 (Primary Index value) and the output was a Row Hash of 13. Teradata counted over to bucket 13 in
the Hash Map, and it has the number one (1) inside that bucket. This means that this row will go to AMP 1.

Placing rows on the AMP

The above example hashed Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5. Teradata counted over to bucket 5 in
the Hash Map, and it has the number two (2) inside that bucket. This means that this row will go to AMP 2.

Placing rows on the AMP Continued

Page 3 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

The above example hashed Emp_No 1003 (Primary Index value) and the output was a Row Hash of 9. Teradata counted over to bucket 9 in
the Hash Map, and it has the number one (3) inside that bucket. This means that this row will go to AMP 3.

A Review of the Hashing Process


1. Hash the Primary Index Value for a row with the Hash Formula.
2. The output of the Hash Formula is a 32-bit Row Hash.
3. Take the Row Hash and find its corresponding bucket in the Hash Map.
4. Send the row and its Row Hash to the AMP listed in the Hash Map Bucket.

Take a look at the row hash for each row, and notice it corresponds with the Hash Map.

Non-Unique Primary Indexes have Skewed Data


Imagine if we made Last_Name the Primary Index for a table. Here is an example of how it would distribute. Notice all duplicates
have the same Row Hash.

Page 4 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

The Hash Formula is consistent so every Smith has the same Row Hash and the same goes for each Jones and each Patel. Therefore,
duplicate values land on the same AMP.

The Uniqueness Value

Each AMP will place a Uniqueness Value after the row hash to track duplicate values.

The Row Hash and Uniqueness Value make up the Row-ID

Page 5 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

Row-ID equals the Row Hash of the Primary Index column and the Uniqueness Value.

A Row-ID Example for a Unique Primary Index

Notice two things for this Unique Primary Index (UPI) example:
1. The Uniqueness Value on each Row-ID is 1.
2. Each AMP sorts their rows by the Row-ID.
Row Hash and the Uniqueness Value make up the Row-ID. AMPs sort by the Row-ID.

A Row-ID Example for a Non-Unique Primary Index (NUPI)

Page 6 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

Notice two things:


1. Uniqueness Value increases on all duplicate
names.
2. Each AMP sorts their rows by the Row-ID.

Row Hash and the Uniqueness Value make up the Row-ID. AMPs sort by the Row-ID.

Two Reasons why each AMP Sorts their rows by the Row-ID
The Two Reasons are:
1. So like Data such as 'Smith' is grouped together on the AMP's disk.
2. So each AMP can perform a Binary Search, like a Phone Book in Alphabetical order.

AMPs sort rows by Row-ID so like data is grouped together and for Binary searches.

AMPs sort their rows by Row-ID to Group like Data

Page 7 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

Notice that all of the Smiths are lumped together because of the sorting by Row-ID.

AMPs sort their rows by Row-ID to do a Binary Search

A Binary Search knows the Row-IDs are in numeric order. It's like you using a phone book. Go to the middle first and then go up or down in
chunks to find things quickly.

Table CREATE Examples with four different Primary Indexes

Page 8 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

Null Values all Hash to the Same AMP

If there are NULL values in the Primary Index, you could find this is the reason for your skew. A Table with a Unique Primary Index can have only
one Null value, but a NUPI table can have many NULL values, and each NULL value hashes to the same AMP.

A Unique Primary Index (UPI) Example

Page 9 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

A Unique Primary Index will spread the data perfectly evenly.

A Non-Unique Primary Index (NUPI) Example

A Non-Unique Primary Index will NOT spread the data perfectly evenly.

A Multi-Column Primary Index Example

Page 10 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

V14Certification:TeradataBasics

A Multi-Column Primary Index is often used to fix a data skew problem.

A No Primary Index (NoPI) Example

All AMPs read all of their rows (full table scan) because there is no Primary Index.

Page 11 / 11
ReprintedforOET7P/1179538,Accenture

CoffingDataWarehousing,CoffingPublishing(c)2014,CopyingProhibited

Das könnte Ihnen auch gefallen