Sie sind auf Seite 1von 12

Module 8: Primary Index Mechanics

After completing this module, you will be able to:


Explain the role of the hashing algorithm and the hash map in
locating a row.
Explain the makeup of the Row ID and its role in row storage.
Describe the sequence of events for locating a row given its PI
value.

Hashing Primary Index Values


SQL with primary index values
and data.

PARSER

PI value = 38

Hashing
Algorithm

For example:
Assume PI value is 38

Hashing
Algorithm

Row Hash
DSW

PI values
and data

1177 7C3C
DSW

Message Passing Layer (Hash Maps)


AMP 0

AMP 1

...

...

AMP x

AMP n - 1

Hash Maps
AMP n

AMP #

Data Table
Row ID
Row Hash

Row Data

Uniq Value

x '00000000'

RH
Data

x'1177 7C3C' 0000 0001

x 'FFFFFFFF'

38

Summary
The MPL uses the DSW of
1177 and uses this value to
locate bucket #1177 in the
Hash Map.
Bucket# 1177 contains the
AMP number that has this
hash value effectively the
AMP with this row.

Hashing Down to the AMPs


Index value(s)

hashing algorithm

Row Hash
DSW or
Hash Bucket #

Hash Map

AMP #

{
{
{
{

The hashing algorithm is designed to insure even distribution of


unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.

A Row Hash is the 32-bit result of applying a hashing algorithm to


an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.

A Hash Map is uniquely configured for each system.


It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.

Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.

A Hashing Example
Order
Order
Number
PK
UPI
7325
7324
7415
7415
7103
7225
7384
7402
7188
7202

Customer
Number

Order
Date

SELECT * FROM order


WHERE order_number = 7202;

Order
Status

7202
2
3
3
1
1
2
1
3
1
2

4/13
4/13
4/13
4/13
4/10
4/15
4/12
4/12
4/13
4/09

O
O
O
C
O
C
C
C
C
C

Hashing Algorithm

691B 14AE

32 bit Row Hash


Destination Selection Word

Remaining 16 bits

0110 1001 0001 1011

0001 0100 1010 1110

The Hash Map


7202

Hashing Algorithm

691B 14AE

32 bit Row Hash


Destination Selection Word

Remaining 16 bits

0110 1001 0001 1011

0001 0100 1010 1110

(Hexadecimal)

HASH MAP
690
691
692
693
694
695

07
15
01
07
04
11

06
08
00
06
12
11

07
02
15
15
11
12

06
04
11
13
13
10

07
01
14
11
05
03

04
00
14
06
10
02

05
14
13
15
07
06

06
14
13
08
07
13

05
03
14
15
03
01

05
02
14
15
02
00

A B C
14
03
08
08
11
06

09
09
09
08
04
05

14
01
15
11
01
07

D E
13
00
10
07
00
06

03
02
09
05
11
05

F
04
15
09
10
13
12

AMP 9

7202 2 4/09 C

Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.

Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
A Row Hash = 32 bits = 4.2 billion possible
values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.

Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.

1254

7769

Data values input

Hash Algorithm
10A2 2936

(John)
'Smith'

10A2 2936

Hash Synonyms

(Dave)
'Smith'

NUPI Duplicates

Hash Algorithm

0016 5557

Conclusion
A row hash is not adequate to uniquely identify a row.

0016 5557

Rows have
same hash

The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row ID
Each stored row
has a Row ID as a
prefix.

Rows are logically


maintained in Row
ID sequence.

Row Hash
(32 bits)

Uniqueness Id
(32 bits)

Row ID

Row Data

Row ID

Row Data

Row Hash

Unique ID

Emp_No

3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
3B11 5034
:

0000 0001
0000 0002
0000 0003
0000 0001
0000 0001
0000 0002
:

1018
1020
1031
1014
1012
1021
:

Last_Name
Reynolds
Davidson
Green
Jacobs
Chevas
Carnet
:

First_Name
Jane
Evan
Jason
Paul
Jose
Jean
:

Storing Rows (1 of 2)
Assumptions:
Last_Name is defined as a NUPI.
All rows in this example hash to the same AMP.
Add a row for 'John Smith'
'Smith'

Hash Algorithm

0016 5557

Hash Map

Row ID

AMP #3
Row Data

Row Hash

Unique ID

Last_Name

First_Name

0016 5557

0000 0001

Smith

John

Etc.

Add a row for 'Sam Adams'


'Adams'

Hash Algorithm

1058 9829

Hash Map

Row ID

AMP #3
Row Data

Row Hash

Unique ID

Last_Name

First_Name

0016 5557
1058 9829

0000 0001
0000 0001

Smith
Adams

John
Sam

Etc.

Storing Rows (2 of 2)
Add a row for 'Fred Smith' - (NUPI Duplicate)
'Smith'

Hash Algorithm

0016 5557

Hash Map

Row ID

AMP #3
Row Data

Row Hash

Unique ID

Last_Name

First_Name

0016 5557
0016 5557
1058 9829

0000 0001
0000 0002
0000 0001

Smith
Smith
Adams

John
Fred
Sam

Etc.

Add a row for 'Dan Jones' - (Hash Synonym)


'Jones'

Hash Algorithm

0016 5557

Hash Map

Row ID

AMP #3
Row Data

Row Hash

Unique ID

Last_Name

First_Name

0016 5557
0016 5557
0016 5557
1058 9829

0000 0001
0000 0002
0000 0003
0000 0001

Smith
Smith
Jones
Adams

John
Fred
Dan
Sam

Etc.

Given the row hash, what other information would be needed to find the 'Dan Jones' row?
The 'Fred Smith' row?

Locating a Row On An AMP Using a PI


Locating a row on an AMP
requires three input elements:
1. The Table ID
2. The Row Hash of the PI
3. The PI value itself
Table ID

M
a
s
t
e
r

Cyl 1
Index
d
lin
Cy

er

Cyl 2
Index

I
n
d
e
x

Row Hash

START WITH:

AMP #3

APPLY TO:

Table Id
Row Hash

Master
Index

Table Id
Row Hash
Cylinder #

Cylinder
Index

Row Hash
PI Value

Data
Block

FIND:

Cylinder #

Data Block Address


Data Row

Cyl 3
Index

Cyl 4
Index

PI Value
DATA
BLOCK
Data Row
Row
Data

Cyl 5
Index

Cyl 6
Index

Cyl 7
Index

Review Questions
Fill in the Blanks
1. The output of the hashing algorithm is called the _____ _____.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the ________ number.
3. Two different PI values which hash to the same value are called Hash ___________ .
4. A Row ID consists of a row hash plus a ____________ value.
5. A uniqueness value is required to produce a unique Row ID because of _______
_________ and ______ ___________ .
6. Once the target AMP has been determined for a PI search, the _______ ________ for that
AMP must be consulted.
7. The Cylinder Index points us to the address and length of the data _______ .

Module 8: Review Question Answers


Fill in the Blanks
1. The output of the hashing algorithm is called the Row Hash.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the DSW or bucket number.
3. Two different PI values which hash to the same value are called Hash Synonyms .
4. A Row ID consists of a row hash plus a uniqueness value.
5. A uniqueness value is required to produce a unique Row ID because of hash synonyms
and NUPI duplicates .
6. Once the target AMP has been determined for a PI search, the Master Index for that AMP
must be consulted.
7. The Cylinder Index points us to the address and length of the data block .