B108 PI Mechanics

Module 8: Primary Index Mechanics
After completing this module, you will be able to:

Explain the role of the hashing algorithm and the hash map in
locating a row.
Explain the makeup of the Row ID and its role in row storage.
Describe the sequence of events for locating a row given its PI
value.
Hashing Primary Index Values

SQL with primary index values
and data.
PARSER
PI value = 38
Hashing
Algorithm
For example:
Assume PI value is 38
Hashing
Algorithm
Row Hash
DSW
PI values
and data
1177 7C3C
DSW
Message Passing Layer (Hash Maps)

AMP 0
AMP 1
...
...
AMP x
AMP n - 1
Hash Maps
AMP n
AMP #
Data Table
Row ID
Row Hash
Row Data
Uniq Value
x '00000000'
RH
Data
x'1177 7C3C' 0000 0001
x 'FFFFFFFF'
38
Summary
The MPL uses the DSW of
1177 and uses this value to
locate bucket #1177 in the
Hash Map.
Bucket# 1177 contains the
AMP number that has this
hash value effectively the
AMP with this row.
Hashing Down to the AMPs

Index value(s)
hashing algorithm
Row Hash
DSW or
Hash Bucket #
Hash Map
AMP #
{
{
{
{
The hashing algorithm is designed to insure even distribution of

unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.
A Row Hash is the 32-bit result of applying a hashing algorithm to

an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.
A Hash Map is uniquely configured for each system.

It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.
Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.
A Hashing Example
Order
Order
Number
PK
UPI
7325
7324
7415
7415
7103
7225
7384
7402
7188
7202
Customer
Number
Order
Date
SELECT * FROM order

WHERE order_number = 7202;
Order
Status
7202
2
3
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/13
4/10
4/15
4/12
4/12
4/13
4/09
O
O
O
C
O
C
C
C
C
C
Hashing Algorithm
691B 14AE
32 bit Row Hash

Destination Selection Word
Remaining 16 bits
0110 1001 0001 1011
0001 0100 1010 1110
The Hash Map

7202
Hashing Algorithm
691B 14AE
32 bit Row Hash

Destination Selection Word
Remaining 16 bits
0110 1001 0001 1011
0001 0100 1010 1110
(Hexadecimal)
HASH MAP
690
691
692
693
694
695
07
15
01
07
04
11
06
08
00
06
12
11
07
02
15
15
11
12
06
04
11
13
13
10
07
01
14
11
05
03
04
00
14
06
10
02
05
14
13
15
07
06
06
14
13
08
07
13
05
03
14
15
03
01
05
02
14
15
02
00
A B C
14
03
08
08
11
06
09
09
09
08
04
05
14
01
15
11
01
07
D E
13
00
10
07
00
06
03
02
09
05
11
05
F
04
15
09
10
13
12
AMP 9
7202 2 4/09 C
Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.
Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
A Row Hash = 32 bits = 4.2 billion possible
values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.
Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.
1254
7769
Data values input
Hash Algorithm
10A2 2936
(John)
'Smith'
10A2 2936
Hash Synonyms
(Dave)
'Smith'
NUPI Duplicates
Hash Algorithm
0016 5557
Conclusion
A row hash is not adequate to uniquely identify a row.
0016 5557
Rows have
same hash
The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row ID
Each stored row
has a Row ID as a
prefix.
Rows are logically

maintained in Row
ID sequence.
Row Hash
(32 bits)
Uniqueness Id
(32 bits)
Row ID
Row Data
Row ID
Row Data
Row Hash
Unique ID
Emp_No
3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
3B11 5034
:
0000 0001
0000 0002
0000 0003
0000 0001
0000 0001
0000 0002
:
1018
1020
1031
1014
1012
1021
:
Last_Name
Reynolds
Davidson
Green
Jacobs
Chevas
Carnet
:
First_Name
Jane
Evan
Jason
Paul
Jose
Jean
:
Storing Rows (1 of 2)
Assumptions:
Last_Name is defined as a NUPI.
All rows in this example hash to the same AMP.
Add a row for 'John Smith'
'Smith'
Hash Algorithm
0016 5557
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
0000 0001
Smith
John
Etc.
Add a row for 'Sam Adams'

'Adams'
Hash Algorithm
1058 9829
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
1058 9829
0000 0001
0000 0001
Smith
Adams
John
Sam
Etc.
Storing Rows (2 of 2)
Add a row for 'Fred Smith' - (NUPI Duplicate)
'Smith'
Hash Algorithm
0016 5557
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
0016 5557
1058 9829
0000 0001
0000 0002
0000 0001
Smith
Smith
Adams
John
Fred
Sam
Etc.
Add a row for 'Dan Jones' - (Hash Synonym)

'Jones'
Hash Algorithm
0016 5557
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
0016 5557
0016 5557
1058 9829
0000 0001
0000 0002
0000 0003
0000 0001
Smith
Smith
Jones
Adams
John
Fred
Dan
Sam
Etc.
Given the row hash, what other information would be needed to find the 'Dan Jones' row?
The 'Fred Smith' row?
Locating a Row On An AMP Using a PI

Locating a row on an AMP
requires three input elements:
1. The Table ID
2. The Row Hash of the PI
3. The PI value itself
Table ID
M
a
s
t
e
r
Cyl 1
Index
d
lin
Cy
er
Cyl 2
Index
I
n
d
e
x
Row Hash
START WITH:
AMP #3
APPLY TO:
Table Id
Row Hash
Master
Index
Table Id
Row Hash
Cylinder #
Cylinder
Index
Row Hash
PI Value
Data
Block
FIND:
Cylinder #
Data Block Address

Data Row
Cyl 3
Index
Cyl 4
Index
PI Value
DATA
BLOCK
Data Row
Row
Data
Cyl 5
Index
Cyl 6
Index
Cyl 7
Index
Review Questions
Fill in the Blanks
1. The output of the hashing algorithm is called the _____ _____.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the ________ number.
3. Two different PI values which hash to the same value are called Hash ___________ .
4. A Row ID consists of a row hash plus a ____________ value.
5. A uniqueness value is required to produce a unique Row ID because of _______
_________ and ______ ___________ .
6. Once the target AMP has been determined for a PI search, the _______ ________ for that
AMP must be consulted.
7. The Cylinder Index points us to the address and length of the data _______ .
Module 8: Review Question Answers

Fill in the Blanks
1. The output of the hashing algorithm is called the Row Hash.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the DSW or bucket number.
3. Two different PI values which hash to the same value are called Hash Synonyms .
4. A Row ID consists of a row hash plus a uniqueness value.
5. A uniqueness value is required to produce a unique Row ID because of hash synonyms
and NUPI duplicates .
6. Once the target AMP has been determined for a PI search, the Master Index for that AMP
must be consulted.
7. The Cylinder Index points us to the address and length of the data block .

B108 PI Mechanics

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

B108 PI Mechanics

Hochgeladen von

Copyright:

Verfügbare Formate

Module 8: Primary Index Mechanics

After completing this module, you will be able to:

Hashing Primary Index Values

Message Passing Layer (Hash Maps)

x'1177 7C3C' 0000 0001

Hashing Down to the AMPs

The hashing algorithm is designed to insure even distribution of

A Row Hash is the 32-bit result of applying a hashing algorithm to

A Hash Map is uniquely configured for each system.

SELECT * FROM order

32 bit Row Hash

0110 1001 0001 1011

0001 0100 1010 1110

The Hash Map

32 bit Row Hash

0110 1001 0001 1011

0001 0100 1010 1110

Data values input

Rows are logically

Add a row for 'Sam Adams'

Add a row for 'Dan Jones' - (Hash Synonym)

Locating a Row On An AMP Using a PI

Data Block Address

Module 8: Review Question Answers

Das könnte Ihnen auch gefallen