Mastering Teradata

Tera-Tom on Teradata Basics for V13
Understanding is the key!
First Edition
Published by Coffing Publishing
First Edition May, 2010 Web Page: www.Tera-Tom.com and www.CoffingDW.com E-Mail address: Tom.Coffing@CoffingDW.Com Written by W. Coffing
Teradata, NCR, BYNET, V2R3, V2R4, V2R5, V2R6 are registered trademarks of NCR Corporation, Dayton, Ohio, U.S.A., IBM and DB2 are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards Institute. In addition to these products names, all brands and product names in this document are registered names or trademarks of their respective holders. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manual is not a publication of NCR Corporation, nor was it produced in conjunction with NCR Corporation. Copyright May 2010 by Coffing Publishing All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of information contained herein. Coffing Publishing International Standard Book Number: ISBN 0-9704980-8-X
Printed in the United States of America

All terms mentioned in this book that are known to be trademarks or service have been stated. Coffing Publishing cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark.
About Coffing Data Warehousings CEO Tom Coffing

Tom is President, CEO, and Founder of Coffing Data Warehousing. He is an internationally known consultant, facilitator, speaker, trainer, and executive coach with an extensive background in data warehousing. Tom has helped implement data warehousing in over 40 major data warehouse accounts, spoken in over 20 countries, and has provided consulting and Teradata training to over 20,000 individuals involved in data warehousing globally. Tom has co-authored over 30 books on Teradata and Data Warehousing. To name a few: Secrets of the Best Data Warehouses in the World Teradata SQL - Unleash the Power Tera-Tom on Teradata Basics Tera-Tom on Teradata E-business Teradata SQL Quick Reference Guide - Simplicity by Design Teradata Database Design - Giving Detailed Data Flight Teradata Users Guide -The Ultimate Companion Teradata Utilities - Breaking the Barriers
Mr. Coffing has also published over 30 data warehousing articles and has been a contributing columnist to DM Review on the subject of data warehousing. He wrote a monthly column for DM Review entitled, "Teradata Territory". He is a nationally known speaker and gives frequent seminars on Data Warehousing. He is also known as "The Speech Doctor" because of his presentation skills and sales seminars. Tom Coffing has taken his expert speaking and data warehouse knowledge and revolutionized the way technical training and consultant services are delivered. He founded CoffingDW with the same philosophy more than a decade ago. Centered around 20 Teradata Certified Masters this dynamic and growing company teaches every Teradata class, provides world class Teradata consultants, offers a suite of software products to enhance Teradata data warehouses, and has eight books published on Teradata. Tom has a bachelor's degree in Speech Communications and over 35 years of business and technical computer experience. Tom is considered by many to be the best technical and business speaker in the United States. He has trained and consulted at so many Teradata sites that students affectionately call him Tera-Tom. Teradata Certified Master - Teradata Certified Professional - Teradata Certified Administrator - Teradata Certified Developer - Teradata Certified Designer - Teradata Certified SQL Specialist - Teradata Certified Implementation Specialist
Table of Contents Chapter 1 The Teradata Architecture ............................................................................. 2 The Parsing Engine ......................................................................................................... 4 The AMPs ....................................................................................................................... 6 Born to be Parallel .......................................................................................................... 8 The BYNET .................................................................................................................. 10 A Scalable Architecture ................................................................................................ 12 Logical Modeling Primary and Foreign Keys ........................................................... 16 Physical Modeling - The Primary Index ....................................................................... 18 Two Types of Primary Indexes (UPI or NUPI) ............................................................ 20 Unique Primary Index (UPI) ......................................................................................... 22 Non-Unique Primary Index (NUPI).............................................................................. 24 Multi-Column Primary Indexes .................................................................................... 26 When do you define the Primary Index? ...................................................................... 28 Defining a Non-Unique Primary Index (NUPI)............................................................ 30 Defining a Multi-Column Primary Index ..................................................................... 32 How Teradata Distributes and Retrieves Rows ............................................................ 34 Hashing the Primary Index Value ................................................................................. 36 The Hash Map ............................................................................................................... 38 An 8-AMP Hash Map Example .................................................................................... 40 Laying a Row onto the Proper AMP............................................................................. 42 Retrieving a Row by way of the Primary Index ........................................................... 44 Hashing Non-Unique Primary Indexes (NUPI) ............................................................ 46 Placing Non-Unique Primary Indexes (NUPI) Rows ................................................... 48 Placing (NUPI) Rows Continued .................................................................................. 50 Retrieving (NUPI) Rows............................................................................................... 52 Placing Multi-Column Primary Index Rows ................................................................ 54 Retrieving Multi-Column Primary Index Rows ........................................................... 56 Even Distribution with an UPI...................................................................................... 58 Uneven Distribution with a NUPI................................................................................. 60 Unacceptable Distribution with a NUPI ....................................................................... 62 Review Parsing Engines Plan with an UPI ................................................................ 64 Review Parsing Engines Plan with a NUPI ............................................................... 66 Review Big Trouble The Full Table Scan .............................................................. 68 Big Trouble A Picture of a Full Table Scan .............................................................. 70 Test your Teradata Primary Index Knowledge ............................................................. 72 The Row Hash............................................................................................................... 76 The Uniqueness Value .................................................................................................. 78 The Row ID................................................................................................................... 80 Duplicates and the Uniqueness Value........................................................................... 82 AMPs Sort Their Rows by the Row ID ........................................................................ 84 Search the Data like a Phone Book ............................................................................... 86 Why is my Phone Book 00000s and 111111s? .......................................................... 88 Performing a Binary Search .......................................................................................... 90 Opening the Phone Book to the Middle ........................................................................ 92 I can Name that Tune in 5 Notes .................................................................................. 94 A Visual for Data Layout .............................................................................................. 96
Copyright OSS 2010
Table of Contents Test Your Teradata Access Query Knowledge ............................................................. 98 UPI Row-ID Test ........................................................................................................ 102 NUPI Row-ID Test ..................................................................................................... 106 Secondary Indexes ...................................................................................................... 112 The Base Table ........................................................................................................... 114 Creating a Unique Secondary Index (USI) ................................................................. 116 The Secondary Index Subtable ................................................................................... 118 Inside the Secondary Index Subtable .......................................................................... 120 How Teradata builds the Secondary Index Subtable .................................................. 122 How Teradata builds the Secondary Index Subtable .................................................. 124 Building the Secondary Index Subtable...................................................................... 128 USI Always a Two-AMP Operation ........................................................................ 130 The Parsing Engines Plan with an USI Query ............................................................ 134 Retrieving Base Rows using the USI .......................................................................... 136 Picture that USI in Action ........................................................................................... 138 USI Summary.............................................................................................................. 140 USI Pictorial using the Hash Maps ............................................................................. 142 USI Secondary Index Quiz ......................................................................................... 144 USI Secondary and Primary Index Quiz Answers ...................................................... 146 A Full Table Scan Example ........................................................................................ 148 The Base Table ........................................................................................................... 150 Creating a Non-Unique Secondary Index (NUSI) ...................................................... 152 Columns inside a NUSI Secondary Index Subtable ................................................... 154 NUSI Subtable is AMP-Local .................................................................................... 156 A Query using the NUSI Column ............................................................................... 158 A Query using the NUSI Column ............................................................................... 160 NUSI Recap ................................................................................................................ 162 Secondary Index Summary ......................................................................................... 164 Test Your Teradata Access Query Knowledge ........................................................... 166 An Incredible Quiz Opportunity ................................................................................. 170 An Incredible Quiz Opportunity ................................................................................. 173 A Table used for our Partitioning Example ................................................................ 178 Range Queries ............................................................................................................. 180 Why we had to perform a Full Table Scan ................................................................. 182 A Partitioned Table ..................................................................................................... 184 A Partitioned Table ..................................................................................................... 186 One Year of Orders Partitioned .................................................................................. 188 Fundamentals of Partitioning ...................................................................................... 190 Add the Partition to the Row-ID for the Row Key ..................................................... 192 You Partition a Table when you CREATE the Table ................................................. 194 RANGE_N Partitioning by Week ............................................................................... 196 RANGE_N Partitioning Older and Newer Data ......................................................... 198 Case_N Partitioning .................................................................................................... 200 Multi-Level Partitioning ............................................................................................. 202 Partitioning Rules........................................................................................................ 204 See the data ................................................................................................................. 206
II
Copyright OSS 2010
Table of Contents Test Your Teradata Access Knowledge ...................................................................... 208 The most Powerful USER........................................................................................... 214 DBC owns all the Disk Space ..................................................................................... 216 DBC Example of 1000 GBs ........................................................................................ 218 DBC will first CREATE a USER or a DATABASE .................................................. 220 Teradata is Hierarchical .............................................................................................. 222 Only two Objects can Receive PERM Space ............................................................. 224 Only difference between a User and a Database ........................................................ 226 A Typical approach to Security .................................................................................. 228 Example of a DATABASE and USER Interchanged ................................................. 230 PERM and SPOOL Space ........................................................................................... 232 Each AMP will have PERM and SPOOL ................................................................... 234 A Query using both PERM and SPOOL Space .......................................................... 236 Spool is Deleted when the Query is Done .................................................................. 238 Getting a better understanding of Spool ..................................................................... 240 Answering the MRKT Spool Query Answer .............................................................. 242 Spool is like a Speed Limit ......................................................................................... 244 All Space is calculated on a Per AMP Basis............................................................... 246 Examples of Perm and Spool on a Per AMP Basis .................................................... 248 Quiz on Perm and Spool Space ................................................................................... 250 Answers to Quiz on Perm and Spool Space................................................................ 252 Collecting Statistics .................................................................................................... 256 Parsing Engine uses Statistics for the Plan ................................................................. 258 Columns and Indexes to Collect Statistics On ............................................................ 260 Syntax to Collect Statistics ......................................................................................... 262 Recollecting Statistics ................................................................................................. 264 Random Sample instead of Collected Statistics.......................................................... 266 V12 Statistics Enhancement Stale Statistics ............................................................ 268 Where Statistics are Stored in DBC ............................................................................ 270 A Collect Statistics Example ...................................................................................... 272 What Statistics are Really Collected ........................................................................... 274 Loner Values and High Bias Intervals ........................................................................ 276 Teradata Limits ........................................................................................................... 278 Data Protection................................................................................................................ 282 Transaction Concept ................................................................................................... 284 Two Modes to Teradata .............................................................................................. 286 Differences between ANSI and Teradata Mode ......................................................... 288 ANSI Mode Commit ................................................................................................... 290 Teradata Mode Commit also called BTET ................................................................. 292 Trick to CREATE a Multi-Statement with BTEQ ...................................................... 294 Transient Journal ......................................................................................................... 296 How the Transient Journal Works .............................................................................. 298 The Transient Journal after a Commit ........................................................................ 300 VProcs ......................................................................................................................... 302 Nodes and MPP........................................................................................................... 304 RAID 1 - Mirroring ..................................................................................................... 306
III
Copyright OSS 2010
Table of Contents Cliques ........................................................................................................................ 308 VProcs Migrate when a Node Fails ............................................................................ 310 Cliques An 8-Node Example ................................................................................... 312 Cliques An 8-Node Example with Migration .......................................................... 314 Hot Standby Nodes ..................................................................................................... 316 Hot Standby Nodes in Action ..................................................................................... 318 FALLBACK Protection .............................................................................................. 320 How Fallback Works .................................................................................................. 322 Fallback Clusters Exercise .......................................................................................... 324 Fallback Clusters ......................................................................................................... 326 Fallback Exercises with Clusters ................................................................................ 328 Fallback Exercises with Clusters Answer ................................................................... 330 More Fallback Exercises ............................................................................................. 332 More Fallback Exercises with Answers ...................................................................... 334 Fallback Performance Vs Protection Questions ...................................................... 336 The Six Rules of Fallback ........................................................................................... 338 Cliques and Clusters ................................................................................................... 340 Cliques and Clusters Answers .................................................................................... 342 Down AMP Recovery Journal (DARJ) ...................................................................... 344 Permanent Journal ....................................................................................................... 346 Table create with Fallback and Permanent Journal .................................................... 348 Permanent Journal Rules............................................................................................. 350 Some Permanent Journal Possibilities ........................................................................ 352 Creating a Permanent Journal ..................................................................................... 354 Create Table Examples with Permanent Journals ....................................................... 356 Each Permanent Journal is made up of 3 Areas .......................................................... 358 Permanent Journal Rules............................................................................................. 360 The Four Locks of Teradata ........................................................................................ 366 Teradata has 3 levels of Locking ................................................................................ 368 Quiz Which Level of Locking is Occurring? ........................................................... 370 Quiz Locking Answers ............................................................................................... 372 The Teradata Lock Manager ....................................................................................... 374 Locking Modifiers The Access Lock ....................................................................... 376 Locks and their compatibility ..................................................................................... 378 Moving Through the Locking Queue ......................................................................... 380 Quiz Which Locks Move Up? ................................................................................. 382 Answers to Locking Quiz ........................................................................................... 384 A Single AMP Acts as the Locking Gatekeeper ......................................................... 386 Every AMP performs Locking Gatekeeper Duties ..................................................... 388 Answers to Which AMP is Waiting on Access .......................................................... 390 Explains The Pseudo Table for Locks ..................................................................... 392 The NOWAIT Locking Option ................................................................................... 394 Rules of Teradata Locking .......................................................................................... 396 Explains Psuedo Tables ........................................................................................... 400 Explain Full Table Scan ........................................................................................... 402 Explain Primary Index Reads .................................................................................. 404
IV
Copyright OSS 2010
Table of Contents Explain Secondary Index Read ................................................................................ 406 Explain - View DDL of a Partitioned Table ............................................................... 408 Explain Partition Elimination .................................................................................. 410 Explain Joins with Duplication on all AMPs ........................................................... 412 Explain Joins with Redistribution ............................................................................ 414 Explain Bit Mapping with multiple NUSIs ............................................................. 416 Fundamentals of Teradata Joins.................................................................................. 420 A Join Example ........................................................................................................... 422 Joins and the Primary Index ........................................................................................ 424 Redistributing Rows in Spool ..................................................................................... 426 Redistributing Rows of Both Tables ........................................................................... 428 Duplicating the Smaller Table .................................................................................... 430 Quiz How Many Rows are in Spool? ...................................................................... 432 Quiz Answer How Many Rows in Spool? ............................................................... 434 How Duplication Appears on Every AMP ................................................................. 436 How Many Rows in Spool with Redistribution? ........................................................ 438 Answer to How Many Rows in Spool ........................................................................ 440 An Example of an AMP with Redistribution.............................................................. 442 The System Calendar .................................................................................................. 446 Columns in the System Calendar Views ..................................................................... 448 How to use the System Calendar with Tables ............................................................ 450 Teradata Temporary Tables ........................................................................................ 454 Derived Tables ............................................................................................................ 456 A Query Pictorial Example with a Derived Table ...................................................... 458 Volatile Tables ............................................................................................................ 460 How to Populate a Volatile Table ............................................................................... 462 Global Temporary Tables ........................................................................................... 464 A Pictorial of a Global Temporary Table ................................................................... 466 What Happens to Global Tables after the Session ...................................................... 468 Global Temporary Tables and Temp Space................................................................ 470 V13 No Primary Index Tables ................................................................................. 474 NoPI CREATE Statement........................................................................................... 476 NoPI Row-ID Increments the Uniqueness Value ....................................................... 478 NoPI Row-Hash Different on each AMP ................................................................... 480 NoPI Options and Facts .............................................................................................. 482 NoPI Restrictions ........................................................................................................ 484 Write Ahead Logging (WAL) ..................................................................................... 488 AMPs have FSG Cache for the Memories .................................................................. 490 An Example of an UPDATE Statements .................................................................... 492 AMP Local WALs ...................................................................................................... 494 AMPs UPDATE Rows in FSG Cache ........................................................................ 496 Write to WAL then Write Back to Disk ..................................................................... 498 The WAL Depot ......................................................................................................... 500 Clearing out the Wal Depot and the Wal Log............................................................. 502 V13 Teradata Virtual Storage (TVS) ....................................................................... 506 AMPs in the 1980s .................................................................................................... 508
Copyright OSS 2010
Table of Contents AMPs in the 1990s .................................................................................................... 510 Data Blocks and Cylinders make up a Disk................................................................ 512 Cylinders are dedicated to Perm, Spool, etc. .............................................................. 514 Outside Disk Tracks are much Faster ......................................................................... 516 AMPs assigned Disk Cylinders, not Entire Disks ...................................................... 518 Hot, Warm, and Cold Data ......................................................................................... 520 The old way Teradata had to add Disk Space ............................................................. 522 Doubling the Disk Capacity ........................................................................................ 524 Incremental Disk Growth Is Here ............................................................................... 526 Mixed Disks and Solid State Drives ........................................................................... 528 Solid State Drives are like Giant Flash Drives ........................................................... 530 Virtual Storage Metrics ............................................................................................... 532 The Two Modes of Virtual Storage ............................................................................ 534 What is a Row Hash Lock? ......................................................................................... 543 Chapter 6 Loading the Data ....................................................................................... 544 FastLoad ...................................................................................................................... 546 Multiload ..................................................................................................................... 548 TPump ......................................................................................................................... 550 FastExport ................................................................................................................... 552
VI
Copyright OSS 2010
Mastering the Teradata Architecture
Copyright OSS 2010
Chapter 1 The Teradata Architecture
Let me once again explain the rules. Teradata rules!

Tera-Tom Coffing
Teradata relies on three architectural components that have set the rules for parallel processing. They are the Parsing Engine, which is also called the PE or the Optimizer, the Access Module Processors, which are referred to as the AMPs, and two BYNETs to communicate between PEs and AMPs. The PE is the boss and tells the AMPs exactly what to do. The AMPs each have their own virtual disk, which no other AMP can read, and they merely read and write to their respective disks. When a user logon to Teradata their logon is accepted or rejected by a Parsing Engine. The Parsing Engine will take care of that user for the entire session, which really means until that user Logs Off. The Parsing Engine will accept each query from that user and come up with a plan for the AMPs to satisfy the request. The PEs plan is passed to the AMPs via the BYNET. The AMPs will retrieve the data requested from their virtual disks and pass it back up the BYNET to the PE. The PE will then deliver the data to the user.
Copyright OSS 2010
Copyright OSS 2010
The Parsing Engine
Fall seven times, stand up eight.

--Japanese Proverb
The Parsing Engines are perfectly balanced, with each having the capability to handle up to 120 users at a time. This could be 120 distinct users or a single user utilizing the power of all 120 sessions for a single application. That is why there are multiple PEs in every Teradata system. Each PE has total command over every AMP. Divided they stand (PEs) and United are the AMPs! Each PE will take users SQL and do three things: The PE will check the users SQL syntax. If there is a syntax error the user will receive and error. For example, if the user wanted to use the KEY WORD SELECT and instead wrote SLLLECCCT the PE would reject the SQL, but be kind enough to send the user a message to help them correct the error. Thats because the PEs are Stand-up guys! If the SQL passes the syntax check the PE will check the users ACCESS RIGHTS to ensure the user has permission to access the data in that table. If not then the user receives a message ACCESS Denied! If the user passes the Security Check then the Parsing Engine will come up with a PLAN to satisfy the user request. The fastest plan is a Single-AMP retrieve. The second fastest plan is a Two-AMP retrieve. The next fastest plan will be all AMPs reading only a portion of the table, and the slowest plan is the full table scan. That is where each AMP reads every row they contain for a table.
Copyright OSS 2010
Copyright OSS 2010
The AMPs
Not all who wander are lost.

J. R. R. Tolkien
The AMPs are never lost because the PE always tells them what to do. One PE to rule them all? No! Each PE rules them all because the rows of every table are spread across all the AMPs. The AMPs organize every table in separate blocks just like you might organize your clothes in separate dresser drawers. Organizing their tables and the rows they contain is an obsession with the AMPs. They make organization a hobbit! The PE passes the PLAN to the AMPs over the BYNET. The AMPs then retrieve the rows they own from their disks and pass it back to the PE over the BYNET. When a table is first created each AMP creates a table header on their disk. Even though the table is empty the AMPs at least know the table name, the columns in the table, and any indexes the table. When the table is loaded each AMP receives rows for that table that they and only they own. They carefully place the rows inside data blocks where they can easily be retrieved. Now each AMP will own their own Table Header for the table and they will also own data blocks where they place the rows for that table. Now the AMP is truly Lord of the Disks!
Copyright OSS 2010
Copyright OSS 2010
Born to be Parallel
Only he who attempts the ridiculous may achieve the impossible.

Don Quixote
When Teradata was born in the late 1970s it was born to be parallel. That means that multiple processors (AMPs) would split up the work and do it in parallel. At the time this was considered impossible. To make it happen Teradata did something considered ridiculous at the time. They would spread the data across all processors and let each processor be responsible for only the data on its disk. You will never see a Teradata table that is only on one AMP. The parallel processing aspect is then lost. You will see every Teradata table spread the rows of the table across all AMPs. Teradata was born to be parallel and the impossible was born. The first picture on the opposite page never happens. The second picture below that is exactly the design behind Teradata.
Copyright OSS 2010
Teradata never lays out data like this!
Teradata lays out data like this!
Copyright OSS 2010
The BYNET
A Journey of a thousand miles begins with a single step.

-Lao Tzu
The BYNET is the communication network between AMPs and PEs. The PE comes up with a PLAN and passes the plan to the AMPs in steps over the BYNET. A journey to a Thousand AMPs begins with a single step. This step and all the steps of the plan travel down the BYNET highway which guarantees delivery to each AMP. The AMPs then retrieve the data requested by the PE and they deliver their portion of the answer set to the PE over the BYNET. The BYNET provides the communications between AMPs and PEs so no matter how large the data warehouse physically gets, the BYNET makes each AMP and PE think that they are right next to one another. The BYNET gets its name from the Banyan tree. The Banyan tree has the ability to continually plant new roots to grow forever. Likewise, the BYNET scales as the Teradata system grows in size. The BYNET is scalable. There are always two BYNETs for redundancy and extra bandwidth. AMPs and PEs can use both BYNETs to send and retrieve data simultaneously. What a network! It is like having to phone lines to talk. Each AMP or PE can use one BYNET to retrieve communication and simultaneously accept messages using the other BYNET. Both BYNETs can be used to send a message or to receive a message! Below is the steps to completely satisfy a query. The PE checks the users SQL Syntax; The PE checks the users security rights; The PE comes up with a plan for the AMPs to follow; The PE passes the plan along to the AMPs over the BYNET; The AMPs follow the plan and retrieve the data requested; The AMPs pass the data to the PE over the BYNET; and The PE then passes the final data to the user.
10
Copyright OSS 2010
11
Copyright OSS 2010
A Scalable Architecture
No wonder nobody comes here Its too crowded

Yogi Berra
When it comes to scalability Teradata has put together a team of PEs and AMPs that are guaranteed to hit a home run, while their competition continues to strike out when trying to catch them in terms of scalability. In Teradata land it never gets too crowded because Teradata can easily scale by adding additional AMPs and PEs. This is considered to be something called Linear Scalability. That means if you double your AMPs you will double your speed. A 4-AMP system can double its speed by adding 4 more AMPs to become an 8-AMP system. This can theoretically go on forever. Other vendor systems can double their size and double their speed for a while, but eventually they max out. Teradata has many customers who start with a small system configuration and grow each year. Some of the largest data warehouses in the world are Teradata systems who have proven their value each year and continually grow. In the picture on the following page you can see we have a 4,000 AMP system. This system is literally 1,000 times more powerful than a 4-AMP system.
12
Copyright OSS 2010
13
Copyright OSS 2010
14
Copyright OSS 2010
15
Copyright OSS 2010
Logical Modeling Primary and Foreign Keys
I have found the best way to give advice to your children is to find out what they want and then advise them to do it.
--Harry S. Truman
Harry Truman was an American president with great logical skills. Tables are logically created for all database systems. This is called Logical Modeling. A table that is modeled or normalized will always have a Primary Key. The Primary Key is usually the first column in a table, but the Primary Key column(s) will have three characteristics: 1. Never be Null 2. Never change 3. Never have duplicate values If a table with a Primary Key has a relationship with another table it will be joined through a Primary Key Foreign Key relationship. When two tables are joined they are joined by taking the Primary Key of one table and matching it with a normal key in another table with the same values. This normal key is called a foreign key. Teradata doesnt care about Primary Keys and Foreign Keys when it lays out the data in Teradata. It only cares about what is called the Primary Index. We will learn about the Primary Index shortly.
16
Copyright OSS 2010
17
Copyright OSS 2010
Physical Modeling - The Primary Index
Speak in a moment of anger and youll deliver the greatest speech youll ever regret.
Anonymous
Every table in Teradata has one and only one Primary Index. Teradata uses the Primary Index of each table to provide a row its destination to the proper AMP. This is why each table in Teradata is required to have a Primary Index. The biggest key to a great Teradata Database Design begins with choosing the correct column to be the Primary Index. The Primary Index columns value is the only thing that will determine on which AMP a row will reside. Because this concept is extremely important, let me state again that the Primary Index value for a row is the only thing that will determine on which AMP a row will reside. Many people new to Teradata assume that the most important concept concerning the Primary Index is data distribution. INCORRECT! The Primary Index does determine data distribution, but even more importantly, the Primary Index provides the fastest physical path to retrieving data. The Primary Index also plays an incredibly important role in how joins are performed. Remember these three important concepts of the Primary Index and you are well on your way to a great Physical Database Design.
18
Copyright OSS 2010
19
Copyright OSS 2010
Two Types of Primary Indexes (UPI or NUPI)
A man who chases two rabbits Catches none.

Roman Proverb
Every table must have at least one column as the Primary Index. The Primary Index is defined when the table is created. There are only two types of Primary Indexes, which are a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI).
A man who chases two rabbits misses both by a HARE! A person who chases two Primary Indexes misses both by an ERR!
Tera-Tom Proverb
Every table must have one and only one Primary Index. Because Teradata distributes the data based on the Primary Index columns value it is quite obvious that you must have a primary index and that there can be only one primary index per table. The Primary index is the Physical Mechanism used to retrieve and distribute data. The primary index is limited to the number of columns in the primary index. This means that the primary index is comprised totally of all the columns in the primary index. You can have up to 64 multi-column keys comprising your primary index or as little as one column as your primary index.. Most databases use the Primary Key as the physical mechanism. Teradata uses the Primary Index and NOT the Primary Key. There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths.
20
Copyright OSS 2010
21
Copyright OSS 2010
Unique Primary Index (UPI)
Always remember that you are unique just like everyone else.
Anonymous
A Unique Primary Index (UPI) is unique and cant have any duplicates. It is as unique as you are. Nobody is like you and you are extremely beautiful and amazing. Not one other person in the history of mankind has ever been exactly like you. You are the creation of your beautiful parents and must realize how important you are to the world. A Unique Primary Index is not as amazing as you are, but it is also very special. A Unique Primary Index means that the values for the selected column must be unique. If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column. A Unique Primary Index will always spread the table rows evenly amongst the AMPs. Please dont assume this is always the best thing to do. The diagram on the next pages shows a table that has a Unique Primary Index. We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table. A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. You will better understand what I am talking about concerning a one-AMP operation by the end of this chapter.
22
Copyright OSS 2010
23
Copyright OSS 2010
Non-Unique Primary Index (NUPI)
You miss 100 percent of the shots you never take.

Wayne Gretzky
Take a shot at using a Non-Unique Primary Index in your Teradata tables. A NonUnique Primary Index (NUPI) means that the values for the selected column can be nonunique. You can have many rows with the same value in the Primary Index so dont expect any value such as Smith to be a one-timer. Duplicate values can exist. A Non-Unique Primary Index will almost never spread the table rows evenly. Please dont assume this is always a bad thing. On the following page is a table that has a NonUnique Primary Index. We have selected LAST_NAME to be our Primary Index. Because we have designated LAST_NAME to be a Non-Unique Primary Index we are anticipating that there will be individuals in the table with the same last name. An All-AMP operation will take longer if the data is unevenly distributed. You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins.
24
Copyright OSS 2010
25
Copyright OSS 2010
Multi-Column Primary Indexes
Every sunrise is a second chance.

Unknown
A Primary Index can have multiple columns. Teradata allows more than one column to be designated as the Primary Index. It is still only one Primary Index, but it is merely made up by combining multiple columns together. Teradata allows up to 64 combined columns to make up the one Primary Index required for a table. On the following page you can see we have designated First_Name and Last_Name combined to make up the Primary Index. This is often done for two reasons: (1) To get better data distribution among the AMPs (2) Users often use multiple keys consistently to query
26
Copyright OSS 2010
27
Copyright OSS 2010
When do you define the Primary Index?
When you go into court you are putting your fate into the hands of twelve people who werent smart enough to get out of jury duty.
- Norm Crosby
When you go to query Teradata you are putting your hands in the fate of the DBA who created the tables Primary Index. When the table is created it is given a table name, the columns and their data types are defined and the Primary Index is specified. As you can see on the following page we have created the table called Employee_Table. It contains five columns which are Emp_No, Dept_No, First_Name, Last_Name and Salary. The Primary Index is Unique and on the column Emp_No. This really means that Emp_No is the most important column in this table. If users query the table and put Emp_No in the WHERE Clause it will always be a 1-AMP query. It is as fast as lightning. If no Primary Index is defined the system will define one for you. It will most likely pick the first column and make it a Non-Unique Primary Index (NUPI). It will however check to see if you have a Primary Key defined for referential integrity purposes. If you do it will choose that column(s) and make it a Unique Primary Index (UPI). If you didnt define a Primary Index or Primary Key then the system will check to see if you defined a Unique Secondary Index (USI) on any column and if you have it will make that column a Unique Primary Index (UPI). This is now way to build a system. The Primary Index should always be explicitly defined when the table is first created.
28
Copyright OSS 2010
29
Copyright OSS 2010
Defining a Non-Unique Primary Index (NUPI)
I know that you believe that you understand what you think I said, but I am not sure you realize that what you heard is not what I meant.
-Sign on Pentagon office wall
When the table is created it is given a table name, the columns and their data types are defined and the Primary Index is specified. As you can see on the following page we have created the table called Employee_Table. It contains five columns which are Emp_No, Dept_No, First_Name, Last_Name and Salary. The Primary Index is Non-Unique and on the column Last_Name. This really means that Last_Name is the most important column in this table. If users query the table and put Last_Name in the WHERE Clause it will always be a 1-AMP query. There could be duplicates, but duplicate values will be on the same AMP. I will explain this further. Remember, If no Primary Index is defined the system will define one for you. It will most likely pick the first column and make it a Non-Unique Primary Index (NUPI). It will however check to see if you have a Primary Key defined for referential integrity purposes. If you do it will choose that column(s) and make it a Unique Primary Index (UPI). If you didnt define a Primary Index or Primary Key then the system will check to see if you defined a Unique Secondary Index (USI) on any column and if you have it will make that column a Unique Primary Index (UPI). This is now way to build a system. The Primary Index should always be explicitly defined when the table is first created.
30
Copyright OSS 2010
31
Copyright OSS 2010
Defining a Multi-Column Primary Index
Some birds arent meant to be caged, their feathers are just too bright. And when they fly away, the part of you that knows it was a sin to lock them up, does rejoice.
Shawshank Redemption
When the table is created it is given a table name, the columns and their data types are defined and the Primary Index is specified. The example on the following page shows a multi-column Primary Index on First_Name and Last_Name combined. As you can also see on the following page we have created the table called Employee_Table. It contains five columns which are Emp_No, Dept_No, First_Name, Last_Name and Salary. The Primary Index is Non-Unique and on the columns First_Name and Last_Name. This really means that both First_Name and Last_Name are the most important columns in this table. If users query the table and put both the First_Name and the Last_Name in the WHERE Clause it will always be a 1-AMP query. There could be duplicates, but duplicate values will be on the same AMP. I will explain this further. Remember, if no Primary Index is defined the system will define one for you. It will most likely pick the first column and make it a Non-Unique Primary Index (NUPI). It will however check to see if you have a Primary Key defined for referential integrity purposes. If you do it will choose that column(s) and make it a Unique Primary Index (UPI). If you didnt define a Primary Index or Primary Key then the system will check to see if you defined a Unique Secondary Index (USI) on any column and if you have it will make that column a Unique Primary Index (UPI). This is now way to build a system. The Primary Index should always be explicitly defined when the table is first created.
32
Copyright OSS 2010
33
Copyright OSS 2010
How Teradata Distributes and Retrieves Rows
I dont know who my grandfather was. I am more interested in who his grandson will become.
Abraham Lincoln, 16th president of the United States
Teradata freed the AMPs from doing everything together by giving each table a Primary Index. The Primary Index is the column(s) that lays out the data row to the proper AMP and the Primary Index column(s) is also the fastest way to retrieve a row from that same AMP. Follow this part closely because this is fundamentally the most important subject you will learn about Teradata. Teradata takes a table and spreads the rows across the AMPs one row at a time. A Unique Primary Index on the table will spread the data rows perfectly evenly across the AMPs. This is pretty amazing in itself, but the more amazing part is that Teradata knows exactly which rows went to which AMPs so retrieval is always a 1-AMP operation when users use the Primary Index in the WHERE Clause of their SQL. Here is how that works. The Teradata Parsing Engine will take the Primary Index Value of a row and run a math calculation called the Hash Formula on that Primary Index column value. The Hash Formula doesnt change and can be calculated on any value or data type. The results of the Hash Formula calculation on the Primary Index value will result in a number ranging from one to one million. Teradata then has a Hash Map with one million buckets. Inside the buckets are AMP numbers. So, when the Hash Formula is calculated on the value of the column designated as the Primary Index, and the result is for example 20, Teradata will go to bucket 20 of the Hash Map, look inside bucket 20 and see which AMP it says should get the row. I will give you some visual examples in the next couple of pages to show you exactly what I am talking about.
34
Copyright OSS 2010
35
Copyright OSS 2010
Hashing the Primary Index Value
Measure a thousand times and cut once.

-Turkish Proverb
Teradata doesnt measure a thousand times and cut once as the Turkish Proverb for Rug makers states. Teradata measures one time and then places the row on the proper AMP. There is only one Hash Formula and the result is always a 32-bit Row Hash that will produce a result from one to one million. On the following page you will see that Teradata Hashed the Primary Index Value of 2 and the result was 000000000101, which equates to a 5. What you need to understand right now is that if Teradata Hashed this value again it would absolutely come up with the same Row Hash of 0000000000101 and it would absolutely equate to a 5. If the Hash Formula was run one thousand times against the value 2 it would return the same 00000000000101 Row Hash and this would equate to a 5 every time. That is how Teradata lays the data out and it is also how it retrieves the row. When a user writes SQL using the Primary Index in the WHERE clause it knows where to get it. For Example: SELECT * FROM Employee_Table WHERE Emp_No = 2 ; The Parsing Engine knows Emp_No is the Primary Index so it Hashes the value of 2 with the Hash Formula, comes up with a 000000000101 Row Hash, which equates to a 5, and then goes to bucket 5 in the Hash Map, which then tells the PE which AMP holds that row. Ingenious! Stay tuned because this is about to become even more clear.
36
Copyright OSS 2010
37
Copyright OSS 2010
The Hash Map
We're going to have the best-educated American people in the world.

Dan Quayle
Every Teradata System has one Hash Map with a million buckets. Inside the buckets are AMP numbers. The AMP numbers dont change inside the Hash Map. They are static. If you have a 4-AMP system then the numbers 1, 2, 3, 4 are repeated until all one million buckets contain a 1, 2, 3 or 4. The following page shows and excellent example of a 4AMP system. Notice how 1, 2, 3, 4 keep repeating throughout the entire Hash Map. That is because we have a small 4-AMP system. Do you remember on the previous page when we ran the Hash Formula on the Primary Index value of 2? The result was a Row Hash of 000000101, which equates to a 5. Teradata would count over 5 buckets and look inside bucket number 5. That would tell the PE to place the row on AMP 1. I have circled bucket 5 in the Hash Map on the following page so you can see exactly how this works. You are soon to be one of the best-educated people in the Teradata world.
38
Copyright OSS 2010
39
Copyright OSS 2010
An 8-AMP Hash Map Example
A true friend is one who walks in when the rest of the world walks out.
Anonymous
The example on the following page is another Hash Map, but this one is for an 8-AMP system. Notice how the numbers 1, 2, 3, 4, 5, 6, 7, 8 keep repeating inside all one million buckets. Every Teradata System has one Hash Map with a million buckets. Inside the buckets are AMP numbers. The AMP numbers dont change inside the Hash Map. They are static. Do you remember a couple of pages ago when we ran the Hash Formula on the Primary Index value of 2? The result was a Row Hash of 000000101, which equates to a 5. Teradata would count over 5 buckets and look inside bucket number 5. That would tell the PE to place the row on AMP 5 in this system. I have circled bucket 5 in the Hash Map on the following page so you can see exactly how this works. They say a dog is mans best friend, but the best friend to Teradata is the Hash Map. It is its guide dog.
40
Copyright OSS 2010
41
Copyright OSS 2010
Laying a Row onto the Proper AMP
To have everything is to possess nothing.

--Buddha
The brilliance about how Teradata lays a data row to the proper AMP is that by possessing nothing they have everything. Let me explain. On the following page you see that Teradata Hashes the Primary Index Value of 2 and receives an answer set called the Row Hash, which is calculated to be 000000101, which equates to a 5. Teradata counts over 5 buckets in the Hash Map and the system tells Teradata to place this row on AMP 1. Nothing changes in the Hash Map and no record of what just happened exists, but when Teradata needs to find this row it will run through the same Hashing Process, then look at the Hash Map and know this row can be found on AMP 1. The same Hash Formula always produces the same answer on a specific value so this consistency allows Teradata to have everything, without writing anything, pointing, or possessing. They just do the math again each time. If 1,000 users all ran a query to find Employee Number 2 the math would be run 1,000 times, with each time the system telling Teradata to look only on AMP 1.
42
Copyright OSS 2010
43
Copyright OSS 2010
Retrieving a Row by way of the Primary Index
The best way to predict the future is to create it.

- Sophia Bedford-Pierce The best way to predict the future is to create it, to make things happen yourself. To control your own destiny is something that all of us have. Teradata does this by using the same process to retrieve a row as it did for placing the row on the proper AMP. On the following page you see that the user has run a query looking for all columns in the Employee_Table where the Emp_No = 2. The Parsing Engine knows that Emp_No is the Primary Index and comes up with a 1-AMP Plan. Teradata Hashes the Primary Index Value of 2 and receives an answer set called the Row Hash, which is calculated to be 000000101, which equates to a 5. Teradata counts over 5 buckets in the Hash Map and the system tells the BYNET to contact AMP 1. If 1,000 users all ran a query to find Employee Number 2 the math would be run 1,000 times, with each time the system telling Teradata to look only on AMP 1. They just do the math again each time, quite quickly, and only the AMP holding the row needs to be contacted.
44
Copyright OSS 2010
45
Copyright OSS 2010
Hashing Non-Unique Primary Indexes (NUPI)
Be not afraid of going slowly, be afraid of standing still.

- Chinese Proverb
Teradata can run the same Hash Formula on Character data and Non-Unique values. On the following page you can see that Last_Name is the Primary Index. It is a Non-Unique Primary Index (NUPI). Teradata hashes the Last_Name value of Ratel and once again it comes up with a 32-bit Row Hash. The Row Hash value is 00000000000011111, which equates to a 31. Teradata will go to the Hash Map and looks inside bucket 31 and then know which AMP to place the row. Here is what you need to understand about a Non-Unique Primary Index value. It will have duplicates. If there are 5,000 people in the table with the Last_Name of Smith then all 5,000 of these rows will go to the same AMP. Remember, there is only one Hash Formula and only one Hash Map. That is the only problem with a NUPI. It can cause uneven distribution and often one AMP gets many more rows then the other AMPs. This is called a Hot AMP or Data Spike. You dont need to have perfect distribution in Teradata so a NUPI is very acceptable, but sometimes the Hot AMP or Data Spike situation is just too big and this can cause problems.
46
Copyright OSS 2010
47
Copyright OSS 2010
Placing Non-Unique Primary Indexes (NUPI) Rows
If the facts dont fit the theory, change the facts

-Albert Einstein
Teradata can run the same Hash Formula on Character data and Non-Unique values. On the following page you can see that Last_Name is the Primary Index. It is a Non-Unique Primary Index (NUPI). Teradata hashes the Last_Name value of Lacy and once again it comes up with a 32-bit Row Hash. The Row Hash value is 00000000000001100, which equates to a 12. Teradata will go to the Hash Map and looks inside bucket 12 and then know which AMP to place the row. I want you to notice that the last names in this table have three people named Lacy and two people named Jones. Can you predict what will happen? Check out the next couple of slides.
48
Copyright OSS 2010
49
Copyright OSS 2010
Placing (NUPI) Rows Continued
We must use time as a tool, not as a crutch.

John F. Kennedy
I want you to notice that the last names in this table have three people named Lacy and two people named Jones. Each of the rows with the name Lacy went to AMP 4 and everyone named Jones went to AMP 2. Duplicate values Hash the same and they point to the same bucket in the Hash Map, so they always go to the same AMP. If we had more people in the table named Lacy they would all continue to go to AMP 4. If there were 1,000,000 people named Jones they would all end up on AMP 2.
50
Copyright OSS 2010
51
Copyright OSS 2010
Retrieving (NUPI) Rows
Every sunrise is a second chance.

Unknown
Please remember that in our example on the next page that Last_Name is the Primary Index of the table. It is a Non-Unique Primary Index (NUPI). A query that uses Last_Name in the WHERE clause will always be a 1-AMP operation as you can see from the example on the following page. Even though there could be hundreds, thousands or millions of Last_Name values of Lacy, they are all on the same AMP. Teradata will often claim that NUPI values are grouped together and that is exactly the truth. All duplicate values go to the same AMP as their duplicate counterparts. The bottom line is that if you use an UPI or a NUPI in your WHERE clause it will always be a 1-AMP operation.
52
Copyright OSS 2010
53
Copyright OSS 2010
Placing Multi-Column Primary Index Rows
Life is a succession of lessons, which must be lived to be understood.

--Ralph Waldo Emerson
When multiple columns are combined to make up the Primary Index it is called a MultiColumn Primary Index. Teradata places the columns together and then performs the same process of hashing. In the picture on the next page notice that the Primary Index consists of First_Name and Last_Name combined. Teradata will add the two names together turning the First_Name of Rakish and the Last_Name of Ratel into RakishRatel. It will then Hash RakishRatel and get a 32-bit Row Hash answer. In the example we see that RakishRatel has hashed to 0000000011010, which equates to a 26. Teradata will go to bucket 26 in the Hash Map and place the row on the AMP inside the bucket.
54
Copyright OSS 2010
55
Copyright OSS 2010
Retrieving Multi-Column Primary Index Rows
What lies behind us and what lies before us are tiny matters compared to what lies within us
-Ralph Waldo Emerson
Remember that the Primary Index for this table example was a Multi-Column Primary Index on both First_Name and Last_Name combined. When the user queries using both the First_Name and the Last_Name Teradata knows this is a 1-AMP operation. Teradata first combines the First_Name of Rakish and the Last_Name of Ratel and it becomes RakishRatel. This produces a Row Hash of 0000000000011010, which equates to a 26. Teradata can go to bucket 26 in the Hash Map and knows this is on AMP2. There are a couple of items I want you to think about. First and foremost, the only way Teradata can use a Multi-Column Primary Index is if you use all columns in the MultiColumn Index in the WHERE Clause of your query. As you can see in our example we used both the First_Name AND the Last_Name in the WHERE clause of our SQL. If the query would have only used one of the columns instead of both, then Teradata would have had to do a Full Table Scan. Partial Indexing does not work in Teradata. The positive news about a Multi-Column Index is that it usually spreads the rows fairly evenly across the AMPs.
56
Copyright OSS 2010
57
Copyright OSS 2010
Even Distribution with an UPI
Nobody forgets where they buried the hatchet

Frank McKinney Kin Hubbard
A Unique Primary Index always lays the data out perfectly evenly. Well, I shouldnt say perfectly. If you have 20 AMPs and 42 rows then a couple of extra rows will go to one or two AMPs, but overall it is considered perfectly distributed. Perfect distribution is nice, but it isnt everything. If you decide that users query a NonUnique column like Last_Name a ton more than Emp_No then you are much better off making Last_Name your Primary Index and making it a NUPI. Uneven distribution is not a problem unless there are huge spikes causing hot AMPs.
58
Copyright OSS 2010
59
Copyright OSS 2010
Uneven Distribution with a NUPI
To escape criticism do nothing, say nothing, be nothing.

Elbert Hubbard
A Non-Unique Primary Index will seldom lay the data out with perfect distribution. Perfect distribution is nice, but it isnt everything. If you decide that users query a NonUnique column like Last_Name a ton more than Emp_No then you are much better off making Last_Name your Primary Index and making it a NUPI. Uneven distribution is not a problem unless there are huge spikes causing hot AMPs. Remember that duplicates always go to the same AMP as their duplicate counterparts. The example on the following page demonstrates this clearly.
60
Copyright OSS 2010
61
Copyright OSS 2010
Unacceptable Distribution with a NUPI
When I was 14 I thought my parents were the stupidest people in the world. When I was 21 I was amazed at how much they learned in seven years.
Mark Twain
A Non-Unique Primary Index will seldom lay the data out with perfect distribution. Perfect distribution is nice, but it isnt everything. Then again, the following example on the next page is awful. You should never choose a column to be your Primary Index if it has less UNIQUE values than the number of AMPs. You should never do what the example on the next page shows. Uneven distribution is not a problem unless there are huge spikes causing hot AMPs. The example will not only cause hot AMPs and huge distribution spikes, but only two AMPs will be used when distributing and retrieving data. Horrible! Remember that duplicates always go to the same AMP as their duplicate counterparts. The example on the following page demonstrates this clearly.
62
Copyright OSS 2010
63
Copyright OSS 2010
Review Parsing Engines Plan with an UPI
When you are courting a nice girl an hour seems like a second. When you sit on a redhot cinder a second seems like an hour. Thats relativity.
Albert Einstein
A Unique Primary Index always lays the data out perfectly evenly. Plus, the Parsing Engines plan is a 1-AMP operation that can return a maximum of one row. That is because the value the query is seeking is UNIQUE. On the following page you can see the Parsing Engines plan. This is as sweet as it gets.
64
Copyright OSS 2010
65
Copyright OSS 2010
Review Parsing Engines Plan with a NUPI
If you don't know where you're going, Any road will take you there.
Lewis Carroll
A Non-Unique Primary Index doesnt always lays the data out perfectly evenly, but it is always a 1-AMP operation when used in the WHERE clause. There can be millions of rows returned because the value the query is seeking is NOT UNIQUE. On the following page you can see the Parsing Engines plan. This is pretty sweet as well.
66
Copyright OSS 2010
67
Copyright OSS 2010
Review Big Trouble The Full Table Scan
The only true wisdom is in knowing You know nothing.

Socrates
Although I will show you additional ways Teradata accesses the data, so far we have learned about only two. A Primary Index Single-AMP or 1-AMP retrieve and the dreaded Full Table Scan. A Full Table Scan in Teradata is pretty fast because of the Parallel Processing. Each AMP reads the rows for the table that it owns only once and passes any rows that meet the criteria up the BYNET to the Parsing Engine. The only thing wrong with a Full Table Scan is to do one when it isnt necessary. If a table has a Primary Index of Last_Name then you should attempt to use Last_Name in the Where Clause. If a table has a Primary Index of Employee_No then you should attempt to use that in the WHERE Clause. Imagine that you work for HR and an employee comes in to talk. You want to know about the employee so you run a query to access the Employee_Table. You notice that Last_Name is the Primary Index of the table, and the good news is that you know their last name. It is a waste of time to run the query without a WHERE Clause or to put in their Employee_No in the WHERE Clause. Sometimes a Full Table Scan is needed and that is why Teradatas Parallel Processing is so great. You can ask difficult questions you might not have been able to ask before with systems of less power, but please be aware of the power of knowing the Primary Index of a table.
68
Copyright OSS 2010
69
Copyright OSS 2010
Big Trouble A Picture of a Full Table Scan
If I have seen farther than others, it is because I was standing on the shoulders of giants.
- Isaac Newton
As you can see on the next page a Full Table Scan will cause ALL-AMPs to read every row they own. Each row is read only once and the AMP will return rows that match the criteria. There is nothing wrong with doing a Full Table Scan query unless you dont have to do it. There is nothing wrong with walking across the city if you dont have a car and cant afford transportation, but if you can you might want to consider riding. Especially when you are on the companys payroll and time and resources are important.
70
Copyright OSS 2010
71
Copyright OSS 2010
Test your Teradata Primary Index Knowledge
Look at life through the windshield, not the rearview mirror.

- Byrd Baggett
The following page allows you to test your knowledge of what you have learned so far. This will actually be an exercise that builds continually after each chapter. Follow the instructions on the page and make your family proud with the right answers.
72
Copyright OSS 2010
73
Copyright OSS 2010
74
Copyright OSS 2010
75
Copyright OSS 2010
The Row Hash
Following the light of the sun, we left the Old World.

Christopher Columbus
I have a feeling you are going to sail through this chapter and discover a whole new world. You are about to learn how the AMP gets their rows organized in ship shop shape. We already know that Teradata places rows on an AMP by Hashing the Primary Index, which returns a Row Hash value, which is then equated to a number ranging from one to one million, which equates to a bucket in the Hash Map, which points to an AMP number in which the row will reside. When the row is placed on the AMP the Row Hash that was derived by Hashing the Primary Index will be placed at the front of the row. The AMP will sort the rows that it owns for the table by the Row Hash. Every Row will be kept in a perfect order on the AMP by sorting by the Row Hash.
76
Copyright OSS 2010
77
Copyright OSS 2010
The Uniqueness Value
Its not the size of the dog in the fight, but the size of the fight in the dog.
Archie Griffin
We have learned that the Row Hash is placed at the front of every row and that the AMP will sort their rows by the Row Hash, thus keeping things sorted in perfect order. The AMP will also add a Uniqueness Value behind the Row Hash so it can keep track of duplicate values. When a row comes in with its Row Hash the AMP will check to see if it has any other Row Hashes exactly like the one it has just received. If this Row Hash is Unique it will put a 1 as the Uniqueness value. If it already has another Row Hash just like this one it will put a 2 in the Uniqueness value. If this Row Hash is the third duplicate it will put a Uniqueness value of 3, etc., etc., etc. For example, if there are 1000 duplicate Primary Index values such as the Last_Name of Smith, then they would each have the same Row Hash and go to the same AMP. Their Row Hash would be the same, but their Uniqueness Value would range from 1 to 1,000.
78
Copyright OSS 2010
79
Copyright OSS 2010
The Row ID
A good plan, violently executed now, is better than a perfect plan next week.
- George S. Patton
The Row Hash and the Uniqueness Value make up the Row ID. Teradata rows placed on an AMP always have the Row ID at the beginning of every row. Each AMP actually sorts their rows by the Row ID, not just the Row Hash. This not only organizes the rows perfectly, but is really how Teradata AMPs find their data so quickly. Just like George S. Patton the Row ID shall RETURN. An Answer Set! The Row Hash is a 32-bit value and the Uniqueness Value is also a 32-bit value. This means that 64-bits (8 Bytes) is placed in front of every Teradata Row!
80
Copyright OSS 2010
81
Copyright OSS 2010
Duplicates and the Uniqueness Value
Ambition is a dream with a V8 Engine.

Elvis Presley
Teradata has a lot in common with Elvis Presley because to keep the AMPs disks from overheating they have some really cool fans! Notice in the picture on the following page that the Primary Index is Last_Name and this AMP happens to have three rows with the Last_Name of Zao. Notice that the Row Hash for all three rows containing Zao is identical, and also notice the Uniqueness Values are1, 2, 3. The first row placed on this AMP that had the name Zao was given a Uniqueness Value of 1, the second Zao a 2, and the third Zao a 3. Teradata takes the time to place the data in perfect order because it treats its rows like a King! That is why Teradata disks are often referred to the King of Block and Roll!
82
Copyright OSS 2010
83
Copyright OSS 2010
AMPs Sort Their Rows by the Row ID
Don't use a big word where a diminutive one will suffice.

- Unknown
Please remember merely that each AMP sorts their rows by the Row ID. This will become apparent why within the next few pages.
84
Copyright OSS 2010
85
Copyright OSS 2010
Search the Data like a Phone Book
I've learned that you can't have everything and do everything at the same time.
Oprah Winfrey
Everyone has use a Phone Book at some time in their life. If you decide you want to order a Pizza you know you can go to the Phone Book. How do people handle the Phone Book? Because a Phone Book is organized alphabetically from A-Z people generally open the Phone Book to about the middle. Then they see where they are at alphabetically and adjust the search towards the beginning or the end. It doesnt take long to find where you can order a pizza. Can you imagine if every time you used the Phone Book you started on page 1 and then turned a page at a time until you found Pizza Delivery? That wouldnt be a Pizza search, but a serial search! You might starve before you even found the Pizza Delivery place of your choice. Teradata AMPs dont search for their data serially, but they do it just like a phone book. They go to the middle of the table and see where they are and then adjust.
86
Copyright OSS 2010
87
Copyright OSS 2010
Why is my Phone Book 00000s and 111111s?
I was walking down the street wearing glasses when the prescription ran out.
- Steven Wright
The AMPs dont like to brag, but they are so fast they often make a spectacle of themselves! The phone book is sorted alphabetically from A-Z, but computers read and write data in Binary so they sort their numbers with zeros and ones (000000 to 111111). This is why AMPs sort their rows by the Row-ID. This allows the AMP to search for a row with a Binary Search! This gives each AMP 20 20 vision when searching for a row! The next couple of pages will show you clearly a Binary Search.
88
Copyright OSS 2010
89
Copyright OSS 2010
Performing a Binary Search
Diplomacy is the art of saying Nice Doggie until you can find a rock.
Will Rogers
The Row ID is the AMPs best friend and guides the AMP to the proper row. Lets just say the Row ID is both a blood hound and a retriever when it comes to looking for a row. Unless Teradata is doing a Full Table Scan it will always perform a Binary Search when looking for a row based on the Primary Index. A Binary Search is always done on only the Row ID when looking for a Primary Index value. That is why AMPs sort their rows by the Row ID. In the picture on the next page you can see that Last_Name is the Primary Index and this AMP has been instructed to find a user named Vey. The AMP is actually instructed to find Row Hash 000011110, and then double check to make sure the Last_Name is actually Vey. Dont let looking for a row frighten you because the disks Bark is bigger than its Byte!
90
Copyright OSS 2010
91
Copyright OSS 2010
Opening the Phone Book to the Middle
Look at life through the windshield, not the rearview mirror.

- Byrd Baggett
I once saw a funny show where everyone was racing to get to the finish line, but the finish line was in a city hundreds of miles away. The Italian driver ripped off the Rear View mirror, and the person sitting in the passenger street asked why he would tear the Rear View mirror off the car! The Italian racer said, It is Italian driving Whats behind you doesnt matter! This is not the case when AMPs are driving. The Binary Search takes the AMP to the middle of the rows. It will then move forward looking through the windshield or backward using the rearview mirror to adjust its search. Because the rows are sorted by the Row ID, and the Row ID is in Binary, the AMP can race to the row it wants to retrieve. In the picture on the following page the AMP has gone to the middle of the rows and realizes it needs to go further.
92
Copyright OSS 2010
93
Copyright OSS 2010
I can Name that Tune in 5 Notes
Where there is no patrol car, there is no speed limit.

- Al Capone
The following page shows how the AMP moves down through the next portion of the rows to find the row. There is no speed limit on the AMP highway. It always moves as fast as possible. You must remember that Teradata was built to hold Terabytes of data. A single AMP may hold billions of rows for a single table. A binary search always cuts the search in half. The first search goes to the middle of the phone book. Then the AMP knows if it should move forward or backward. If it moves forward it goes halfway further and checks where it is at again. It continues this brilliant binary search until it finds the row.
94
Copyright OSS 2010
95
Copyright OSS 2010
A Visual for Data Layout
I saw the angel in the marble and carved until I set him free.
--Michelangelo
The example on the following page is a logical view of data on AMPs. Each AMP holds a portion of a table. Each AMP keeps the tables in their own separate drawers. The Row ID is used to sort each table on an AMP.
Each AMP holds a portion of every table. Each AMP keeps their tables in separate drawers. Each table is sorted by Row ID.
96
Copyright OSS 2010
97
Copyright OSS 2010
Test Your Teradata Access Query Knowledge

Fill in the chart on the next page. Good luck.
98
Copyright OSS 2010
99
Copyright OSS 2010
100
Copyright OSS 2010
101
Copyright OSS 2010
UPI Row-ID Test
Acting is all about honesty. If you can fake that, youve got it made
- George Burns To store the data, the value(s) in the PI are hashed though a calculation to determine which AMP will possess the row. The same data values always hash the same row hash and therefore are always associated with the same AMP. The PI is what makes or breaks the system. The PI is responsible for all of the systems data distribution. Our quiz on the next page is designed to only show in theory how Teradata places a row on an AMP. We are going to divide the Primary Index value by two. The output is called the Row-Hash. We will take our Row-Hash answer and it will point to a bucket in the Hash Map. That bucket will tell Teradata which AMP will hold the row. Your mission, if you decide to accept it, is to place the Row ID and the Row on the proper AMP. I have already completed the first row for you because I am a nice guy!
102
Copyright OSS 2010
103
Copyright OSS 2010
104
Copyright OSS 2010
105
Copyright OSS 2010
NUPI Row-ID Test
Warning: Keyboard Not Attached. Press F10 to Continue.

Actual Computer Error Message
To store the data, the value(s) in the PI are hashed though a calculation to determine which AMP will possess the row. The same data values always hash the same row hash and therefore are always associated with the same AMP. The PI is what makes or breaks the system. The PI is responsible for all of the systems data distribution. Our quiz on the next page is designed to only show in theory how Teradata places a row on an AMP. I have already Hashed the Last_Name Primary Index (NUPI) for you. The output is called the Row-Hash. We will take our Row-Hash answer and it will point to a bucket in the Hash Map. That bucket will tell Teradata which AMP will hold the row. Your mission, if you decide to accept it, is to place the Row ID and the Row on the proper AMP. I have already completed the first row for you because I am a nice guy!
106
Copyright OSS 2010
107
Copyright OSS 2010
108
Copyright OSS 2010
109
Copyright OSS 2010
110
Copyright OSS 2010
111
Copyright OSS 2010
Secondary Indexes
The afternoon knows what the morning never suspected.

- Swedish Proverb
The secondary index knows what the full table scan never suspected. Secondary Indexes provide an alternate path to the data. So far we have learned that every table has one and only one Primary Index and we have learned that the Primary Index is much faster than the Full Table Scan. Secondary Indexes are not as fast as the Primary Index, but they can be pretty fast, and they can be much faster than a Full Table Scan. There can be up to 32 Secondary Indexes on a table, but there is a price to pay. Every Secondary Index creates a Subtable on every AMP designed to point to the real Primary Index Row-ID. I will explain in full detail. You may have wondered why I was so persistent with the explanation of the Primary Index and the actual Row-IDs, but you will soon see exactly why I really stressed you knowing that information. There are two types of Secondary Index and they are Unique Secondary Indexes, which are called USIs and Non-Unique Secondary Indexes called NUSIs. An USI is always a Two-AMP operation so it is almost as fast as a Primary Index, but a NUSI is an All-AMP operation, but not a Full Table Scan.
112
Copyright OSS 2010
113
Copyright OSS 2010
Master the Teradata Architecture
The Base Table
Its deja vu all over again!

-Yogi Berra
We have already discussed a table, but I want to emphasize that for this chapter we will refer to the real tables as Base Tables and the Secondary Index tables as Subtables. Secondary Index Subtables are treated by Teradata as just another table so I guess you could say Its deja 2 all over again! I want you to look at the picture on the next page. Notice that the Primary Index is Last_Name and it is a Non-Unique Primary Index (NUPI). I also want you to pay close attention to the Row-IDs in front of each row. Secondary Index Subtables are designed to point to the real row in the base table and they will do so by pointing to the exact Row-ID of the row they are looking for in the base table.
114
Copyright OSS 2010
115
Copyright OSS 2010
Creating a Unique Secondary Index (USI)
Beware of the young doctor and the old barber.

- Benjamin Franklin
If Ben Franklin was able to read this book I think he would be shocked at how easy it is to create a secondary index! The slide on the next page shows the syntax for creating a Unique Secondary Index (USI). We have created the USI on the column Emp_No. Once the USI is created no duplicate Employee Number can exist in the table. If a row is added with a duplicate value for Emp_No then Teradata will reject the row and the user will receive an error message. As soon as we create the USI with our SQL Statement a Subtable will be created on every AMP. I will show you that in the next pages.
116
Copyright OSS 2010
117
Copyright OSS 2010
The Secondary Index Subtable
Once the game is over, the king and the pawn go back in the same box.
- Italian Proverb
As soon as the USI is created with the SQL syntax the next move comes from Teradata creating a Subtable on every AMP. This is true for both the USI and the NUSI. Lets say for example the DBA created the maximum of 32 secondary indexes on a table. Then there would be 32 Subtables created, each taking up PERM Space. The entire purpose for the Secondary Index Subtable will be to point back to the real row in the base table via the Row-ID.
118
Copyright OSS 2010
119
Copyright OSS 2010
Inside the Secondary Index Subtable
The absent are always in the wrong.

English Proverb
There are always two columns in the Secondary Index Subtable. The column value in which you created the secondary index on, which in this case was Emp_No, and the real Row-ID of the row in the Base Table. Think of the Subtable as a confidential informant of the FBI. Without the secondary index Subtable there would only be only two ways for Teradata to find a particular row. It would be by using the Primary Index value in the query or by doing Full Table Scan. The Secondary Index Subtable is really a baby table that contains the Secondary Index column, which acts as the Primary Index of the Subtable so Teradata can easily find Emp_No 2 in the Subtable. It can find any Emp_No in the Subtable because Emp_No is the Primary Index of the Subtable. So why is the Subtable like an FBI informant? When a query is written with Emp_No in the WHERE clause the Teradata Parsing Engine (PE) recognizes it is an USI and looks up the Emp_No value in the Subtable with a 1-AMP operation. It then asks, Can you tell me the Row-ID of the row in the base table? Once Teradata has the Row-ID it takes the value in the Row-Hash of the Row-ID and looks at the Hash Map and knows exactly which AMP the Base Table row is on.
120
Copyright OSS 2010
121
Copyright OSS 2010
How Teradata builds the Secondary Index Subtable
As soon as the DBA uses the SQL to create a secondary index Teradata immediately gets to work. Teradata must build the secondary index Subtable immediately before it can become an alternate path to the data. Each AMP Hashes the secondary index value for each row they own with the Hash Formula. The result is a 32-bit Row Hash which points to a bucket in the Hash Map, which tells the secondary index row which AMPs Subtable it will be on. All UNIQUE Secondary Indexes are hashed and the value plus the real Row-ID of the base table are sent to the proper AMP over the BYNET. Pay close attention to the slide on the next page and let me walk you through it. We created the USI on the column Emp_No. Every Emp_No value will now have to also reside inside the Subtable. The first rows value for Emp_No is a 2. The PE hashes the value of 2 and the result is a 32-bit row hash. The PE then points to the bucket in the Hash Map that corresponds to the 32-bit row hash and the Hash Map says that AMP 1 is the destination AMP. So the Emp_No value of 2 goes to AMP 1s Subtable. It also brings with it the real Row-ID for its row from the Base Table, which is 1,1. Now the first secondary row is perfectly placed.
122
Copyright OSS 2010
123
Copyright OSS 2010
How Teradata builds the Secondary Index Subtable
The most exciting phrase to hear in science, the one that heralds the most discoveries, is not Eureka!, but Thats funny
Isaac Asimov
Your job is to now place the remaining three rows on the proper AMP perfectly. I want you to use the Tera-Tom Hash Formula, which is to divide by 2. This is designed to show you that a consistent formula will produce predictable and repeatable results. Divide each Emp_No by 2 and that will represent the Hash Formula, with a 32-bit Row Hash as the result. You can then point to the corresponding bucket in the Hash Map where you will place the row on the destination AMP. You need to place the USI value and the real Row-ID of the row with it. Good luck!
124
Copyright OSS 2010
125
Copyright OSS 2010
126
Copyright OSS 2010
127
Copyright OSS 2010
Building the Secondary Index Subtable
I dont skate to where the puck is; I skate to where I want the puck to be.
Wayne Gretzky
If you received the answers listed on the next page you are no longer skating on thin ice. You are performing a Power Play. Each Secondary Index Subtable row wont perform a one-timer, but instead perform a two-timer because the USI Value and Base Row-ID always make an USI query a 2-AMP operation. Now that the Secondary Index Subtable is built, the users can query the base table. If the USI is used in the WHERE clause the Parsing Engine knows it has an alternate routine to the data. It can use a 1-AMP operation to find the Subtable Row and then use another 1AMP operation to find the base row. If there are 1,000,000 rows in the base table there will be 1,000,000 rows in the Subtable. The Base table will be much larger because it probably has many columns, but the Subtable only has two columns.
128
Copyright OSS 2010
129
Copyright OSS 2010
USI Always a Two-AMP Operation
Measure a thousand times and cut once.

-Turkish Proverb
Secondary Indexes provide an alternate path to the data, and should be used on queries that run thousands of times. Teradata runs extremely well without Secondary Indexes, but since secondary indexes use up space and overhead, they should only be used on KNOWN QUERIES or queries that are run over and over again. Once you know the data warehouse, environment you can create Secondary Indexes to enhance its performance.
Measure a thousand query times and create a secondary index.

-Turkish Teradata Certified Professional
Every time the Parsing Engine sees the USI column in the WHERE clause it comes up with a plan that involves only two AMPs. Memorize this if you have to, but always know that an USI query is a two-AMP operation. Read the next couple of pages and you will know why.
130
Copyright OSS 2010
131
Copyright OSS 2010
132
Copyright OSS 2010
133
Copyright OSS 2010
The Parsing Engines Plan with an USI Query
If you do what you've always done, you'll get what you've always got.
Anonymous
The above quote is perfect for the secondary index because the Secondary Index and the Hash Map do what they have always done, and they know theyll get what they always got. The Parsing Engine doesnt always put out the entire plan and wait for the data to return. Sometimes the Parsing Engine gives pieces of the plan and helps guide the AMPs. Take a look at the explanation below and the picture on the next page. The first part of the USI plan will be to find the USI value in the Secondary Index Subtable. Lets say for example the WHERE Clause stated: WHERE Emp_No = 2; The Parsing Engine knows that the Subtables Primary Index is Emp_No. It puts out the first part of the plan by stating: Hash the value of 2 and then look at the corresponding bucket in the Hash Map. Go to the Destination AMP inside the Hash Map bucket and tell that AMP to find Emp_No 2 in its Subtable. Then have it return the Row-ID of the base row to me (The Parsing Engine). Once the Parsing Engine receives the Row-ID of the base row it takes the first part of the Row-ID (which is the row hash), looks at the corresponding bucket in the Hash Map and now knows the AMP that holds the base row. It sends a second message to that AMP and says Find this Row-ID in your Employee_Table and retrieve the row.
134
Copyright OSS 2010
135
Copyright OSS 2010
Retrieving Base Rows using the USI
Never trust the advice of a man in difficulties

Aesop (620 BC 560 BC)
Remember the previous slide? The first part of the USI plan was to find the USI value in the Secondary Index Subtable. The PE saw: SELECT * FROM Employee_Table WHERE Emp_No = 2; The Parsing Engine knows that the Subtables Primary Index is Emp_No. It puts out the first part of the plan by stating: Hash the value of 2 and then look at the corresponding bucket in the Hash Map. Go to the Destination AMP inside the Hash Map bucket and tell that AMP to find Emp_No 2 in its Subtable. Then have it return the Row-ID of the base row to me (The Parsing Engine). Now look at the slide on the next page. Once the Parsing Engine receives the Row-ID of the base row it takes the first part of the Row-ID (which is the row hash), looks at the corresponding bucket in the Hash Map and now knows the AMP that holds the base row WHERE Emp_No = 2. It sends a second message to the AMP holding the base row and says Find this Row-ID in your Employee_Table and retrieve the row. This process is always a 2-AMP operation.
136
Copyright OSS 2010
137
Copyright OSS 2010
Picture that USI in Action
You always pass failure on your way to success

Mickey Rooney 1920
The following page shows a great picture of what to expect with an USI query!
138
Copyright OSS 2010
139
Copyright OSS 2010
USI Summary
Nearly all men can stand adversity, but if you want to test a mans character, give him power.
Abraham Lincoln
The following page provides a summary to show you the power of the USI. This will work if you want to test a rows Characters or Integers!
140
Copyright OSS 2010
141
Copyright OSS 2010
USI Pictorial using the Hash Maps
Nobody forgets where they buried the hatchet

Frank McKinney Kin Hubbard
You wont have an Ax to grind with the next slide. This shows you exactly how Teradata uses the Hash Maps twice for an USI query. This means that it does two binary searches to chop away at finding the result set requested. Notice that the PE hashes the USI value to find out which AMP holds the Subtable row. Then the AMP does a binary search on its Subtable to deliver the real Row-ID of the base table. The PE doesnt need to Hash this. It knows the first part of the Row-ID was created when the PE hashed the Primary Index of the table to originally place the base row. It takes the first part of the Row-ID, which is the Row Hash and looks at the Hash Maps corresponding bucket. Now it sends a message to the proper AMP to get the base row using the Row-ID. That AMP will do a Binary Search to quickly find the row. There are always two binary searches when an USI query is used to retrieve a unique row. One Binary Search in the Subtable look-up and one in the Base Table look-up.
142
Copyright OSS 2010
143
Copyright OSS 2010
USI Secondary Index Quiz
Only dead fish swim with the stream.

Anonymous
Answer the questions on the next page, but be careful. I have a few tricks that you could fall for Hook, Line, and Sinker!
144
Copyright OSS 2010
145
Copyright OSS 2010
USI Secondary and Primary Index Quiz Answers
Choice, not chance, determines destiny.

Anonymous
Check your answers and see if you made the right choice.
146
Copyright OSS 2010
147
Copyright OSS 2010
A Full Table Scan Example
Those who dance are considered insane by those who cannot hear the music.
George Carlin
The next page is designed to show that a Full Table Scan will be performed when a NonIndexed Column is used by itself in the WHERE clause. We will soon create a NonUnique Secondary Index on this column, but first perform the Full Table Scan. Please make a note of it!
148
Copyright OSS 2010
149
Copyright OSS 2010
The Base Table
"He who controls the past commands the future. He who commands the future conquers the past."
George Orwell
The next page is designed to merely remind you that we have two types of tables. Those are the Base Tables that hold the actual data that the users query against and the Secondary Index Subtables designed to point to the real Row-ID of the base table. Once again I want you to notice the Row-IDs at the front of each row in the Base Table. Remember how those were derived? They were derived when the row was originally loaded. The PE hashed the Primary Index Column (Last_Name) Value and that came up with a 32-bit Row Hash. It then counted over the appropriate number of buckets in the Hash Map that corresponded to the Row Hash, and inside the bucket was the Destination AMP for that row. Once the row and the Row Hash went to the AMP the actual AMP placed a 32-bit Uniqueness Value behind the Row Hash. The Row Hash plus the Uniqueness Value make up a rows Row-ID.
150
Copyright OSS 2010
151
Copyright OSS 2010
Creating a Non-Unique Secondary Index (NUSI)
Life is not the candle or the wick, it's the burning.

David Joseph Schwartz
Get heated up and get ready to glow because we have just put the SQL Syntax into Teradata to CREATE a Non-Unique Secondary Index. It really isnt apparent that this is a Non-Unique Secondary Index, but it is. The word NON is never used in Teradata, but the word UNIQUE is often used. Once the SQL to CREATE the Secondary Index is successfully completed by Teradata, a Subtable is created on every AMP. Pay close attention to the next couple of pages because a NUSI is handled much differently than an USI, as you will soon understand.
152
Copyright OSS 2010
153
Copyright OSS 2010
Columns inside a NUSI Secondary Index Subtable
Darkness cannot drive out darkness; only light can do that. Hate cannot drive out hate; only love can do that.
Martin Luther King, Jr.
Inside the NUSI Subtable resides two columns. They are the column values of the NUSI and the real Row-ID of the row in the base table. This is exactly the same two columns that were in the USI Subtable. Remember that the entire purpose of the NUSI or the USI Subtable is to point to the real row in the Base Table. This pointing is done by capturing the rows Row-ID. The big difference between the USI and the NUSI Subtable is that the USI Subtable rows are Hashed and the NUSI subtable rows are AMP-Local. Read on my friend!
154
Copyright OSS 2010
155
Copyright OSS 2010
NUSI Subtable is AMP-Local
We never know the worth of water til the well is dry.

English Proverb
The NUSI Subtable is always AMP-Local. What does that term mean? Let me first again explain how USIs are Hashed and then how NUSIs are AMP-Local. When an USI Subtable is created each USI value for each row in the Base Table is Hashed and sent to the AMP the Hash Formula and Hash Map dictate. Most often the Base Row and the Subtable Row end up on different AMPs. The great news is that the Parsing Engines plan is always a Two-AMP operation. This can be done because a Unique Secondary Index is UNIQUE, which obviously means there can only be one row returned. A NUSI is a Non-Unique Secondary Index, obvious again meaning that the value is NonUnique and there could be thousands, millions or even billions of duplicates. So the Parsing Engine takes on a different strategy when building the NUSI Subtable. Each row in the Subtable only tracks the Base rows on the same AMP. This is what is meant by AMP-Local. On the following page you can see that the AMP labeled A Typical AMP holds two base rows of the Employee_Table. The First_Name values, which was the column we created the NUSI Index on holds two values on this AMP, which are Rakish and Vu. So in this typical AMPs Subtable there will be two rows tracking Rakish and Vu. A NUSI Subtable is always created on each AMP, but in each AMPs Subtable are only values local to the base rows for that AMP. Now you know what the term AMP-Local means.
156
Copyright OSS 2010
157
Copyright OSS 2010
A Query using the NUSI Column
Everyone is kneaded out of the same dough but not baked in the same oven.
Yiddish Proverb
On the following page you will see we are running SQL and using the First_Name column in our WHERE clause. This will usually cause the Parsing Engine to use the NUSI Index, but not always. Sometimes the Parsing Engine will decide it is faster to do a Full Table Scan. This is dependent on three things: 1) If the NUSI is weakly or strongly selective. An example of something that is weakly selective might be this. Imagine you are in a large room with hundreds of people and you ask, How many people here usually eat dinner every evening? The answer would be everyone! Here is an example of a strongly selective index. You now ask the same large room of people, How many of you were born in Russia, are a twin, and only speak French? If the NUSI is strongly selective it will be used by the Parsing Engine. 2) If the table is small it is sometimes faster to just do a Full Table Scan. 3) If the DBA collected statistics. We will talk about Statistics in the future, but the short answer is that the DBA will usually collect statistics on all Non-Unique columns for a table so the Parsing Engine knows if the Index is strongly or weakly selective.
158
Copyright OSS 2010
159
Copyright OSS 2010
A Query using the NUSI Column
If all my possessions were taken from me but one, I would choose to keep the power of speech, for with it I could soon regain all the rest.
Daniel Webster
A NUSI query is always an All-AMP operation, but not a Full Table Scan. In our query example on the previous page we selected all columns WHERE the First_Name was equal to Rakish. There could potentially be a Rakish or multiple people named Rakish on every AMP, one AMP, No AMPs or some AMPs. That is why each NUSI Subtable is AMP-Local. Now the PE can use this two-step strategy. 1) Each AMP needs to search their own AMP-Local NUSI Subtable and check if you have one or more individuals named Rakish. 2) Each AMP checks and only the AMPs that found Rakish in their Subtable will retrieve rows with Rakish in the rows they own in their Employee_Table Base Table. That is why a NUSI query always involves All-AMPs, but it is not a Full Table Scan. Think about this! Imagine we had a system with 100 AMPs. Now lets say there was only one Rakish found on AMP 99. How many Binary Searches would there be in this query? Well, in step 1 each AMP would perform a Binary Search on their AMP Local NUSI Subtable so this would mean 100 Binary Searches, because there are 100 AMPs. Then only AMP 99 would find a Rakish in its Subtable and so only AMP 99 would have to perform a Binary Search in its Base Table, so the final answer is 101.
160
Copyright OSS 2010
161
Copyright OSS 2010
NUSI Recap
He conquers who endures.

Persius 34 AD - 62 AD
A NUSI query is always an All-AMP operation, but not a Full Table Scan. The following page sums all of this up.
162
Copyright OSS 2010
163
Copyright OSS 2010
Secondary Index Summary
The mind is not a vessel to be filled but a fire to be kindled.

Plutarch 40 - 120 AD
The next page sums up the USI and the NUSI secondary indexes. Remember that USI rows are hashed and NUSI rows are AMP-Local. Also remember that an USI query is always a two-AMP operation. A NUSI query is an All-AMP operation, but not a full table scan. An USI query is much faster than a NUSI! The Parsing Engine will use an USI at a moments notice, but it will not always choose to use a NUSI. Sometimes it will choose a Full Table Scan over a NUSI. The Parsing Engine will never choose a Full Table Scan over an USI.
164
Copyright OSS 2010
165
Copyright OSS 2010
Test Your Teradata Access Query Knowledge

Fill in the chart on the next page. Good luck.
166
Copyright OSS 2010
167
Copyright OSS 2010
168
Copyright OSS 2010
169
Copyright OSS 2010
An Incredible Quiz Opportunity

Your mission if you decide to accept it is to answer the multiple choice question on the following page. Should you be killed, captured, or get the answer wrong you will be disavowed. Be careful here because this is trickier than it looks. Use your knowledge of Teradata and think. The answer will really get you thinking and understanding how to tune Teradata for your applications.
170
Copyright OSS 2010
171
Copyright OSS 2010
172
Copyright OSS 2010
An Incredible Quiz Opportunity

Take a look at the answer. It is D. That is because a NUPI is a one-AMP operation and an USI is only a two-AMP operation. What a great combination. Did you fall for the answer of C. That UPI just invites you because of your tendency to want the data to spread completely evenly. You want good distribution, but not if you have to use Full Table Scans or extra AMPs to satisfy your user base. As long as the distribution is reasonable a NUPI is perfectly acceptable. In this case D is the right way to go! Are your eyes beginning to open on how Teradata works? I am pushing you hard in the right direction. Stick with me cause we are going to the mountain top together.
173
Copyright OSS 2010
174
Copyright OSS 2010
175
Copyright OSS 2010
176
Copyright OSS 2010
177
Copyright OSS 2010
A Table used for our Partitioning Example
The entire sum of existence is the magic of being needed by just one other person.
Vi Putnam
The next page shows a table that will be used to show how Teradata Partitioning works. We will take this table and show it in a Non-Partitioned fashion and then Partition the table and show how Teradata runs certain queries faster. All I want you to notice right now is the column Order_Date. Notice that we have dates in both January and February.
178
Copyright OSS 2010
179
Copyright OSS 2010
Range Queries
Cowards die many times before their deaths; the valiant never taste of death but once.
William Shakespeare
The next page shows our Order_Table spread across the AMPs. Notice that I have color coded the January and February dates. Also notice that January and February dates are mixed on every AMP in what is a random order. This is because the Primary Index is Order_Number. So, the January dates are most likely on every AMP and so are the February dates! I also want you to take notice of the query. We are looking for all orders in January. Remember that! The query on the next page is called a Range Query because it uses the keyword BETWEEN. The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date. The BETWEEN statement is said to be inclusive. If someone said to me tell me what is BETWEEN the numbers 8 and 10 I would normally say, The number 9. In Teradata land I would be wrong because the BETWEEN statement is inclusive, so it INCLUDES the starting and ending numbers. What is BETWEEN 8 and 10? The numbers 8, 9 and 10! Partitioned tables work very well on Range Queries using the keyword BETWEEN. Turn the next couple of pages and you will soon see WHY! We will next discuss what a Partitioned Table is all about!
180
Copyright OSS 2010
181
Copyright OSS 2010
Why we had to perform a Full Table Scan
Reality is wrong. Dreams are for real.

Tupac Shakur
The next page shows our Order_Table spread across the AMPs. Notice that I have color coded the January and February dates. Also notice that January and February dates are mixed on every AMP in what is a random order. Because the January Data is on all AMPs and because the January Dates are randomly mixed we have to do Full Table Scan. We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan, but soon we will Partition the table and prevent the Full Table Scan. This brings me to a great point I want you to remember. We partition tables so we wont have to do a Full Table Scan on our Range Queries!
182
Copyright OSS 2010
183
Copyright OSS 2010
A Partitioned Table
A good head and a good heart are always a formidable combination.

Nelson Mandela
Notice the example of AMPs on the top of the page. This table is not partitioned. Now notice the example of AMPs on the bottom of the page. This table is partitioned. This is a very important point so I want to drive it home. The only difference between a Partitioned table and a Non-Partitioned table is how each AMP sorts its rows for a table. We have learned that each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries. Well, a Partitioned Table will have the AMPs first sort their rows by the Partition. Notice that the rows on an AMP dont change AMPs because the table is partitioned. Remember it is the Primary Index alone that will determine which AMP gets a row. If the table is partitioned then the AMP will sort its rows by the partition. What is great about this? The January rows are at the top on each AMP and the February rows are at the bottom. We wont have to do a Full Table Scan on our Range Query now! If we are looking for all order in January then each AMP only has to read from their January Partition and look at the top of their rows!
184
Copyright OSS 2010
185
Copyright OSS 2010
A Partitioned Table
A man who views the world at 50 the same as he did at 20 has wasted 30 years of his life.
Muhammad Ali
Notice the example on the next page. We are running our Range Query on our Partitioned Table to demonstrate visually how Teradata has all AMPs participate, but each AMP only reads from one partition. The Parsing Engine no longer has to instruct the AMPs to do a Full Table Scan. It instructs the AMPs to each read from their January Partition. Remember what we said earlier? A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.
186
Copyright OSS 2010
187
Copyright OSS 2010
One Year of Orders Partitioned
Do not remove a fly from your friends forehead with a hatchet.

- Chinese Proverb
Notice the example on the next page. This is a great visual picture of exactly how a Partitioned Table might look in a real environment. Notice that each AMP holds dates for the entire year, but each AMP sorts the rows in Month order. For Range Queries or even queries that only query a certain month, Teradata can use what they call Partition Elimination and only read certain partitions to satisfy the query. Get it in your mind that Partitioning is only about each AMP sorting their rows!
188
Copyright OSS 2010
189
Copyright OSS 2010
Fundamentals of Partitioning
Nobody believes the official spokesman but everybody trusts an unidentified source.
- Ron Nesenx
Take a look at the statements on the next page. Take your time and take them in! The points I really want you to take notice is that it is the Primary Index that determines with AMP gets a particular row and that Partitioning doesnt affect distribution. Partitioning only affects how each AMP sorts the rows they get!!!!
190
Copyright OSS 2010
191
Copyright OSS 2010
Add the Partition to the Row-ID for the Row Key
I was a vegetarian until I started leaning towards the sunlight.

- Rita Rudner
It took a little while for you to digest the Row-ID. Well now you need to know that if a table is partitioned, the partition number is placed in front of the Row-ID for each row. This combination of the Partition number, Row-Hash, and Uniqueness value are now called the ROW KEY. Instead of sorting by the Row-ID we are merely first sorting by the Partition Number. We are really just sorting by the Row Key! If a table is NOT partitioned the Partition Number is merely set to ZERO! Notice on the next page that when our Typical AMP sorts by the Row Key (in other words, the Partition first and then the Row-ID) the January dates are at the top and the February dates follow the January dates.
192
Copyright OSS 2010
193
Copyright OSS 2010
You Partition a Table when you CREATE the Table
Whenever you are asked if you can do a job, tell em Certainly I can! Then get busy and find out how to do it
- Franklin D. Roosevelt
The next page shows the syntax to CREATE a partitioned table. Please dont assume you can only partition a table when you CREATE it. You can actually CREATE a normal table first and later ALTER the table, but generally you partition a table when you first create it. I want you to notice the Primary Index statement. Our Primary Index for this example is a NUPI on Order_No, but we are partitioning on Month of Order_Date.
194
Copyright OSS 2010
195
Copyright OSS 2010
RANGE_N Partitioning by Week
Examine what is said, not him who speaks.

- Arab Proverb
In the example on the next page we are showing a partition example of RANGE_N. I want you to notice the last line in the CREATE statement. This is where you tell Teradata whether to partition by day, week, or month. In this example we are partitioning by day. Each day from the starting date range to the ending date range will
196
Copyright OSS 2010
197
Copyright OSS 2010
RANGE_N Partitioning Older and Newer Data
Only the Spoon knows what is stirring in the pot.

Sicilian Proverb
What a great example on the next page to stir your imagination. This table contains older data and more recent data. We are partitioning the older data by month and the newer data by day. Wow! Now were cooking!
198
Copyright OSS 2010
199
Copyright OSS 2010
Case_N Partitioning
A man who views the world at 50 the same as he did at 20 has wasted 30 years of his life.
Muhammad Ali
We are partitioning by CASE_N in the next example. This is just like any CASE statement in programming or SQL. In the example I want you to notice that if an Order_Total for a row is less than $1,000.00 it will go into the first partition. If it falls between $1,000.00 and 4,999.99 it will go into partition 2. If it is between $5,000.00 and $9,9999.99 it will fall into partition 3 and so on. I also need you to pay close attention to the UNKNOWN partition and the NO CASE partition. The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria. For example, if an Order_Total is greater than $20,000.00 it wouldnt fall into any of the partitions so it goes to the NO CASE partition. Important note. It is an excellent idea to have a NO CASE and UNKNOWN partition. More Important note: You do not want to include the UNKNOWN or NO RANGE partitions with dates in a RANGE_N partition. I will explain later in detail, but it is because when you delete a partition in the RANGE_N partitions they will go to the NO RANGE or UNKNOWN partitions. This takes a long time and is usually not wanted anyway.
200
Copyright OSS 2010
201
Copyright OSS 2010
Multi-Level Partitioning
Two roads diverged in a wood and I took the one less traveled by, and that has made all the difference.
Robert Frost
Teradata introduced Multi-Level partitioning in Teradata V12. You can have up to 15 levels of partitions within partitions. I want you to remember that a Partitioned Table merely tells each AMP how to sort their rows for the table. So think of Multi-Level partitioning as a table with multiple sort keys. The first partition statement is how the data is sorted first. The second partition statement is the second sort key. Think of a simple sorting of an answer set. Lets imagine we sorted an Employee_Table by Department_Number first. Then we sorted by Last_Name within each Department_Number. That is similar to what we are doing on the next page. We first partition by day. Then within each day we are partitioning by our CASE_N statement. Each AMP will have each day sorted first on their disk and then within each day the data will be sorted with the lower Order_Total values first. This is really getting down to a granular form. The entire purpose of partitioning is to eliminate the Full Table Scan. Instead of reading all rows in a table each AMP merely has to one or more of their partitions.
202
Copyright OSS 2010
203
Copyright OSS 2010
Partitioning Rules
I find that the harder I work, the more luck I seem to have.
Thomas Jefferson
Check out the fundamental rules of partitioning on the next page.
204
Copyright OSS 2010
205
Copyright OSS 2010
See the data
He who walks in anothers tracks leaves no footprints.

- Joan L. Brannon Check out the fundamental rules of partitioning on the next page.
206
Copyright OSS 2010
207
Copyright OSS 2010
Test Your Teradata Access Knowledge
The superior man is modest in his speech, but exceeds in his actions.
- Confucius, 551 BC -479 BC Test your knowledge on the next page and make me proud!
208
Copyright OSS 2010
209
Copyright OSS 2010
210
Copyright OSS 2010
211
Copyright OSS 2010
212
Copyright OSS 2010
213
Copyright OSS 2010
The most Powerful USER
Its time for the human race to enter the Solar System.
- Dan Quayle
Dan Quayle, Vice President during George Herbert Bushs presidency never made it to the the top of the human race, because he was never president. Dan Quayle.never made it to the top of the Teradata hierarchy because his name wasnt DBC. The first Teradata machine ever built was called the DBC 1012. DBC stood for DataBase Computer and the 1012 represented ten to the 12th power, which happens to be a Terabyte. So, in honor of the first Teradata machine, which coincidentally had only one USER when the system first arrived, whose name was DBC. DBC has been the most powerful USER from the beginning of Teradata time (1984). Whoever is assigned to be the USER DBC will have all the power. DBC will create other DATABASES and USERS and the hierarchy begins. It is time for the human race to enter the Teradata System!
214
Copyright OSS 2010
215
Copyright OSS 2010
DBC owns all the Disk Space
Too much of a good thing is just right.

Mae West
When the system arrives DBC owns all the Disk Space. Each AMP will have one virtual disk, really four physical disks, which that AMP can read and write, but no other AMP can read or write to or from another AMPs virtual disk. Add up all the AMPs disks and you will know how much space DBC originally owns. This space is called PERMANENT Space, or PERM SPACE. Think of PERM SPACE like you might think of money. If DBC has 1000 GBs or gives another database or user 100 GBs then DBC only has 900 GBs left. Just like money, too much of a good thing is just right. Remember, A fool and his PERM Space are soon parted.
216
Copyright OSS 2010
217
Copyright OSS 2010
DBC Example of 1000 GBs
Its kind of fun to do the impossible.

- Walt Disney
DBC owns all the PERM Space when the system first arrives. DBC is a user who has a logon and password. DBC will calculate how much PERM space is in the system when each AMP reports the size of their virtual disk. DBC will begin creating another USER or DATABASE and the Parent/Child hierarchy is started. Remember that PERM Space is like money. You have 1000 GBs to start, but if you give it away you lose it. DBC will never give all the space away, but about 80%.
218
Copyright OSS 2010
219
Copyright OSS 2010
DBC will first CREATE a USER or a DATABASE
If you shoot at mimes, should you use a silencer?

- Steven Wright
In our example on the next page DBC has created a USER named Mary. DBC also created two other DATABASES called Sales and MRKT. DBC originaly 1000 GBS, but in creating Mary, Sales, and MRKT, the user DBC gave each of them 100 GBs of PERM Space. Now DBC only has 700 G Bs of PERM Space.
220
Copyright OSS 2010
221
Copyright OSS 2010
Teradata is Hierarchical
When they discover the center of the universe, a lot of people will be disappointed to discover they are not in it.
- Bernard Bailey
In this Teradata Universe you can see that DBC is at the top of the Hierarchy. That will always stay that way. Under DBC you should be able to see that Mary, Sales, and MRKT were CREATED by DBC. Three USERS were added to MRKT named Sam, Don, and BO. Sam then went and CREATED there users named VU, Jane and Jusn. Anyone above you in the hierarchy is a parent. For instance the parent of VU is Sam, MRKT, and DBC. The immediate Parent of VU is Sam.
222
Copyright OSS 2010
223
Copyright OSS 2010
Only two Objects can Receive PERM Space
A lot of people approach risk as if its the enemy when its really fortunes accomplice.
- Sting
Only a DATABASE or a USER can have PERM Space. When a user or database is created they will be given their Perm Space. Other objects such as tables can be created under a database or user. If a user has 100 GBs of space then they can create tables with data that combined take up a maximum of 1000 GBs of space. Once a database or user has used up their PERM space, they cannot add any more data to the tables they own.
224
Copyright OSS 2010
225
Copyright OSS 2010
Only difference between a User and a Database
Everyone is trying to accomplish something big, not realizing that life is made up of little things.
-Frank A. Clark
A USER and a DATABASE are considered the same in Teradata except a USER has a LOGON and PASSWORD so they can actually Logon to Teradata and run queries. Other than that they are considered exactly the same. Both can be created with PERM and SPOOL Space. Both can have objects created beneath them.
226
Copyright OSS 2010
227
Copyright OSS 2010
A Typical approach to Security
Sometimes it is more important to discover what one cannot do than what one can do.
-- Lin Yutang
A USER doesnt have to even know that they dont have access directly to the tables. The following page shows a typical approach to Teradata security. A database will often be setup for USERs. Then another database or user will be setup for Views and Macros. Then another USER or DATABASE will be setup to hold the actual tables. The USER Database is given access to the VIEW and MACRO Database and the VIEW and MACRO Database is given access to the Tables.
228
Copyright OSS 2010
229
Copyright OSS 2010
Example of a DATABASE and USER Interchanged
Everyone I meet is in some way my superior.

-- William Shakespeare
The following page merely shows our previous typical security example, but we have replaced the DATABASEs with USERs. It doesnt matter! A DATABASE and USER are the same thing except a USER has a logon and password so they can run queries. A DATABASE is sometimes referred to as a Passive Repository and a USER is referred to as an Active Repository because of the action of logging on and running queries.
230
Copyright OSS 2010
231
Copyright OSS 2010
PERM and SPOOL Space
Opportunity may only knock once, but temptation leans on the doorbell.
-- Unknown
There are two types of space in Teradata. They are called PERM Space and SPOOL Space. Perm Space is for Permanent Tables and Spool Space is used to temporarily build Answer Sets when users run queries. In actuality Spool Space is unused PERM Space. Most users dont get their own PERM space. All users get Spool Space. Without Spool Space the users couldnt run queries. Although I have listed different things associated with Perm and Spool space on the next page I want you to simply remember that Perm is for your Tables and Data and that Spool is used as space for Users to run queries. Tables, Join Indexes, Permanent Journals, Hash Indexes, Stored Procedures and User Defined Functions (UDF) require Perm Space. Views, Macros and Triggers dont require Perm space.
232
Copyright OSS 2010
233
Copyright OSS 2010
Each AMP will have PERM and SPOOL
We can win at home. We cant win on the road. I just cant figure out where else to play.
-- Coach Pat Williams
Each AMP will have Perm Space to hold tables and have empty space for Spool. The following picture is designed so you can see exactly what a typical AMP will have on its virtual disk. The AMP will go to PERM space and read or write to the tables and then build the answer sets using the Spool Area.
234
Copyright OSS 2010
235
Copyright OSS 2010
A Query using both PERM and SPOOL Space
One thing I like about stones in my path is when I cross them they become milestones.
-- Anonymous
The example on the following page shows a user running a query. The query is selecting all columns and rows from the Employee_Table. Each AMP will read the Employee_Table located in their PERM Space. They will then begin building their portion of the Answer Set by placing these rows in the empty area of disk called SPOOL. When each AMP is finished they will inform the Parsing Engine they are done. Each AMP will pass their Spool Answer Set over the BYNET to the Parsing Engine. The Parsing Engine will take the Answer Set and deliver it to the user. Once the Answer Set is delivered to the PE and the User the Answer Set in Spool will be deleted. Spool is only temporarily used for each query and then deleted when the query is over!
236
Copyright OSS 2010
237
Copyright OSS 2010
Spool is Deleted when the Query is Done
Behold the turtle. He only makes progress when he sticks his neck out.
-- James Bryant Conant
The example on the following page is meant to show that when a query finishes the Spool Answer Set is automatically deleted. What really happens is that spool is deleted as soon as the query no longer needs that portion of spool.
238
Copyright OSS 2010
239
Copyright OSS 2010
Getting a better understanding of Spool
Genius is nothing but a great aptitude for patience.

-- George-Louis De Buffon
In the example on the following page you can see we have given MRKT 20 GBs of Spool. Then we ask the question can three users in MRKT simultaneously run a query that is 15 GBs in size. The answer is YES! PERM should be looked at like money. If you give money away you no longer have that money. Spool should be looked at like a speed limit. MRKT has a speed limit of 20 GBs. No user in MRKT can run a query that uses more than 20 GBs of spool, but every person in MRKT can run queries simultaneously. Thousands of users in MRKT could simultaneously run queries. The total sum of this Spool Space could be enormous, but MRKT isnt tied to everyone only using a sum total of 20 GBs. Nobody in MRKT can go over their speed limit of 20 GBs!
240
Copyright OSS 2010
241
Copyright OSS 2010
Answering the MRKT Spool Query Answer
An eye for an eye only ends up making the whole world blind.
-- Gandhi
In the example on the following page you can see we have given MRKT 20 GBs of Spool. Then we ask the question can three users in MRKT simultaneously run a query that is 15 GBs in size. The answer is YES!
242
Copyright OSS 2010
243
Copyright OSS 2010
Spool is like a Speed Limit
The difference between genius and stupidity is that genius has its limits.
Albert Einstein
Teradata definitely has its limits and these pertain to Spool space. Think of PERM Space like money, but think of Spool Space like a speed limit. If the database MRKT is assigned 20 GBs of Spool then that is MRKTs speed limit. Each user can run queries that travel up to 20 GBs. This goes for all users in MRKT. Imagine you are on the highway and the speed limit is 60 MPH. If you were driving beside another car also going 60 MPH and they pulled off the road you wouldnt be able to now go 120 MPH. The speed limit is 60 MPH and that is everyones limit. The Teradata police will abort your query if at any time you go 1 byte over 20 GBs. Here is why you should think of PERM as money. If the system starts with 1000 GBs or Perm (which is actually equal to 1 Terabyte) then the system will always have 1000 GBs unless an upgrade occurs and more hardware is added. So, there is a limited amount of space that always adds up to 1,000 GBs. DBC starts with the entire 1000 GBs, but if DBC gives away 500 GBs then DBC will only own 500 GBs. It is like having 1,000 dollars in a poker game. It may be split up and won or lost among the players, but there is always $1,000 at the table until the game is over. Spool doesnt equate to the 1000 GBs. The DBA could assign every user and database in the system spool and if you added it all up it could equate to millions of GBs. This is because we are assuming that not everyone will be logged on at the same time. Spool is designed for two purposes. 1) Users have a limit so they cant hog the system resources. 2) If users make a mistake and run a runaway query the system will abort it after it reaches that users spool limit.
244
Copyright OSS 2010
245
Copyright OSS 2010
All Space is calculated on a Per AMP Basis
My son has taken up meditation at least its better than sitting doing nothing.
Max Kauffmann
Teradata calculates Perm and Spool space on a per AMP basis. If the system has 10 AMPs and a user or database is assigned 20 GBs of spool then there are actually two limitations: 1) The user cannot run a query that goes over 20 GBs. 2) The user cannot run a query that goes over 2 GBs on any single AMP (20 GBs / 10 AMPs = 2 GBs per AMP). This design is to ensure that data is spread fairly evenly over the AMPs, which is based solely on the Primary Index choice. This will also ensure that no AMP should be a hot AMP, which means that if the data is skewed badly the system will blow the Spool limit.
246
Copyright OSS 2010
247
Copyright OSS 2010
Examples of Perm and Spool on a Per AMP Basis
It is the mark of an educated mind to be able to entertain a thought without accepting it.
Aristotle
The following page shows an example of both Perm and Spool being calculated on a Per AMP basis. Notice that we have 10 AMPs in our system. We have 20 GBs of Spool and 100 GBs of Perm. This means that this user or database cannot run a query that goes over 20 GBs or one that goes over 2 GBs per AMP. Also notice that in our 10 AMP system the user or database was assigned 100 GBs of Perm. This means that the user or database cannot contain tables with data that exceeds 100 GBs or that goes over 10 GBs on any AMP. Again the philosophy of this is to ensure reasonable data distribution, which is based solely on the Primary Index choice. In a worst case scenario you choose a column for the Primary Index that has only one value. Lets say for example, State Code and the value is California. Then all of the data for that table would be on only 1 AMP. This could cause a prematurely Full Perm Space message or an abort of a query because it exceeded its per AMP limit. Special Note: Sometimes when systems are upgraded to a large number of AMPs the DBA will assign each user or database more space because they dont want the Per AMP limit to cause problems. If a 10 AMP system with 20 GBs of space equals a Per AMP limit of 2 GBs per AMP, then an upgrade to a 100 AMP system would mean that the Per AMP Limit would be .20 GBs. That might be considered too low to run some queries if there is any skewing at all so the DBA will often up the spool limits for everyone.
248
Copyright OSS 2010
249
Copyright OSS 2010
Quiz on Perm and Spool Space
I went to a restaurant that serves breakfast any time. So I ordered French toast during the Renaissance.
Steven Wright
The following page is giving you a chance to show how smart you are. Answer the quiz and decide how much PERM and SPOOL is in MRKT after they create the three users Sam, Don, and Bo.
250
Copyright OSS 2010
251
Copyright OSS 2010
Answers to Quiz on Perm and Spool Space
All human actions have one or more of these seven causes: chance, nature, compulsion, habit, reason, passion and desire.
Aristotle
On the next page are the answers to the quiz.
252
Copyright OSS 2010
253
Copyright OSS 2010
254
Copyright OSS 2010
255
Copyright OSS 2010
Collecting Statistics
If you are not true to your teeth they will be false to you.
Teradata Certified Dentist
I asked my dentist, Do I have to floss all my teeth? He said, No, just the ones you want to keep? Whether the Parsing Engine (PE) is checking a users security rights or if statistics were collected on a table the PE will go to user DBC for the answers. The PE uses statistics to help decide what plan to build so the AMPs can satisfy a users query. Before the PE can come up with a plan it wants to know if a table is large, medium, or small. It wants to know about certain columns or indexes. Does a particular column have a lot of duplicates, nulls or are the values unique? Does a particular index unique or non-unique or is the index strongly or weakly selective? These questions are often answered by Collect Statistics. What is Collect Statistics? When a table is created and loaded with data the DBA will run a COLLECT STATISTICS command on certain columns and indexes of that table. That will help the PE answer key questions that will give the PE a better understanding of the table in general. If more data is loaded or deleted the DBA will then Recollect Statistics to ensure that the statistics reflect the true data inside the table. It is not mandatory to collect statistics on a table as it is not mandatory that a person brushes their teeth or cleans their clothes. If statistics are not collected on a table then the PE will perform a Random Sample and make an educated guess. I asked my DBA, Do I have to Collect Statistics on all the columns and indexes? The answer was, No Only on the important ones, but never the entire table. I hear that is good advice, but I became concerned when I noticed he was missing teeth!
256
Copyright OSS 2010
257
Copyright OSS 2010
Parsing Engine uses Statistics for the Plan
You cannot depend on your eyes when your imagination is out of focus.
Mark Twain
The following page lists some of the key answers that Collect Statistics offers.
258
Copyright OSS 2010
259
Copyright OSS 2010
Columns and Indexes to Collect Statistics On
This is a test. It is only a test. Had it been an actual job, you would have received raises, promotions, and other signs of appreciation.
Anonymous
You sincerely dont collect statistics on every column and index in a table. These statistics are stored inside DBC and it takes up Perm Space. You only want to collect on certain columns and indexes such as: All Non-Unique Indexes Columns frequently used in user queries in the WHERE Clause All Primary Indexes of small tables Columns used as Join Conditions
260
Copyright OSS 2010
261
Copyright OSS 2010
Syntax to Collect Statistics
Ninety percent of the game is half mental.

Yogi Berra
The following page shows you the syntax for the Collect Statistics command. It also provides you with some great advice.
262
Copyright OSS 2010
263
Copyright OSS 2010
Recollecting Statistics
The less their ability, the more their conceit.

Ahad Haam
Whenever a table changes data by 10% it is time to recollect statistics. If you have a billion row table and have collected statistics and someone adds one row it is NOT time to recollect statistics. If rows are deleted or added and about 10% or more of the rows have been changed then recollect. It is better to never collect statistics then to let them become stale. Before the PE creates a plan it checks to see if statistics were collected. If they were NOT collected the PE will perform a random AMP sample of the data and make an educated guess. This is not as good as collected statistics, but it is better than statistics that lie! Make sure once you have collected statistics on a table to recollect when the table data changes by 10% or more. I want you to notice that when we recollect statistics on the following page we merely write the SQL to say, Collect Statistics on Employee_Table. This doesnt mean we collect statistics on every column and index. It means collect statistics on the same columns and indexes you have done in the past. In other words, refresh the statistics you have collected on in the past.
264
Copyright OSS 2010
265
Copyright OSS 2010
Random Sample instead of Collected Statistics
Actions lie louder than words.

Carolyn Wells
What is a Random AMP Sample? The PE will poll a single AMP before running a query and ask questions about the data. It will then multiply the statistics it finds on the random AMP by the number of AMPs in the system. For example, if the PE estimates that the random AMP has 1,000 rows and there are 50 AMPs in the system it will assume the table has 50,000 rows! Before Teradata V12 the PE would only perform a random AMP sample if no statistics were collected on a table. In Teradata V12 and beyond Teradata always performs a Random AMP sample.
266
Copyright OSS 2010
267
Copyright OSS 2010
V12 Statistics Enhancement Stale Statistics
Am I not destroying my enemies when I make friends of them?

Abraham Lincoln
State statistics will now be hunted down and destroyed! When Teradata came out with Teradata V12 they added a great enhancement. The PE will now perform a quick Random AMP Sample on a single AMP to check if statistics are current or stale. If the statistics are current the PE will use the statistics, but if the statistics appear to be stale the PE will use the random AMP sample and make an educated guess. Previous to Teradata V12 the PE would only perform a random AMP sample if no statistics were collected. Now, it will always perform a random AMP sample and then compare the Random AMP Sample with the real statistics to see if the real statistics are stale and out of date. Of course if no statistics were ever collected on a table the PE will use the Random AMP Sample.
268
Copyright OSS 2010
269
Copyright OSS 2010
Where Statistics are Stored in DBC
We see the brightness of a new page where everything yet can happen.
Rainer Maria Rilke, Book of Hours
You dont want to collect statistics on every column or index inside a table. This takes up space, takes up resources, and just isnt needed. There are important columns and indexes that will really help the PE when coming up with a plan and there are other columns or indexes that will never be considered. Only collect on the important ones.
270
Copyright OSS 2010
271
Copyright OSS 2010
A Collect Statistics Example
The future belongs to those who believe in the beauty of their dreams.
Eleanor Roosevelt
The following page shows you how expensive the process of collecting statistics can be. In this example we are collecting statistics on the column Last_Name in the Employee_Table. This requires a Full Table Scan! The results are then sorted in alphabetical order on Last_Name from A-Z and then chopped up or divided up into 200 intervals. There is more to it so pay attention to the next couple of pages.
272
Copyright OSS 2010
273
Copyright OSS 2010
What Statistics are Really Collected
A behaviorist is someone who pulls habits out of rats.

Anonymous
The following page shows you the questions that the PE is trying to answer in each interval. Notice that in each interval the PE wants to know the Maximum Value, Most Frequent Value, Most Frequent Value number of Rows, the Other Values, and the Other Rows. Lets take a look at the first interval. The Most Frequent Value is Allan. That means in this interval that the name Allan is the most popular. Then look at the Most Freq Value Rows and notice it says 3. That means that there are 3 people with the Last_Name of Allan. If someone wrote the query below; SELECT * FROM Employee_Table WHERE Last_Name = Allan The PE would assume that there are 3 people named Allan and would come up with a plan to get the data for the user. I really want you to notice the last two statistics which are Other Values and Other Rows. Other Values means last names OTHER than Allan. Other Rows means the number of rows that are NOT Allan. If the PE was given the name AFRIM it would divide the Other Rows by the Other Values and make an estimated guess of 1.1. Here would be the formula on this example: (6 / 5 = 1.1). Remember there were 6 Other Rows and 5 Other Values.
274
Copyright OSS 2010
275
Copyright OSS 2010
Loner Values and High Bias Intervals
The most exciting phrase to hear in science, the one that heralds the most discoveries, is not Eureka!, but Thats funny...
Isaac Asimov
One of the problems the PE had with Collect Statistics in the past is if there were certain values that were huge. These often expanded into multiple intervals. Teradata came up with a solution. If a value is large it will make it a Loner Value and store it in a High Bias Interval. Now Teradata will know if there are a million people named Davis because Davis wont expand multiple intervals, but instead receive their own interval. Teradata can actually place up to two Loner Values inside one High Bias Interval.
276
Copyright OSS 2010
277
Copyright OSS 2010
Teradata Limits
Asking an incumbent member of Congress to vote for term limits is a bit like asking a chicken to vote for Colonel Sanders.
Bob Inglis, 1995
The following page shows some of the limits of Teradata V12 and V13.
278
Copyright OSS 2010
279
Copyright OSS 2010
280
Copyright OSS 2010
281
Copyright OSS 2010
Data Protection
does not protect you from love. But love, to some extent protects you from age. "
-Jeanne Moreau, French Actress
As a man was driving down the interstate highway, his cell phone rang. When he answered he heard his wife warn him urgently, "George, I just heard on the news that there's a car going the wrong way on I-26!" George replied, "I'm on I-26 right now and it's not just one car. It's hundreds of them!" How do you protect your data when things go the wrong way? Murphys Law states, The more mission critical a data warehouse, the more likely the system will crash at the most critical moment of the mission. Ironically, most DBAs think Murphy was an optimist. A database not prepared to defend itself is like an unsigned contract. It is not worth the paper it is written on. However, Teradata is always prepared and it will protect your data better than a wild pit bull. As a matter of fact, the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go. System and user errors are inevitable in any large system. For example, an associate may accidentally give everyone a 100% raise instead of a 10% raise. Or, what if a million-dollar transaction fails right at the wrong time? Or an AMP or DISK goes down? In any of these cases, Teradata will have many ways to protect your data. Some processes for protection are automatic and some of them are optional. The protection features we will discuss are: Transaction Concept Transient Journal RAID 1 Mirroring Cliques Standby Nodes Fallback Fallback Clusters Archive Permanent Journaling
"Age
282
Copyright OSS 2010
283
Copyright OSS 2010
Transaction Concept
The afternoon knows what the morning never suspected.

- Swedish Proverb
At any time something could go wrong with a transaction. An old proverb suggests, The afternoon often knows what the morning never suspected, likewise the Transient Journal knows what the transaction never suspected. What good would it do if you could gather, store and analyze terabytes of data, but doubted the integrity of the data? Teradata makes every effort to ensure a database doesnt get corrupt. Fundamental to this assurance is the Transaction Concept, which means that an SQL statement is viewed as a transaction. Simply stated, either it works or it fails.
284
Copyright OSS 2010
285
Copyright OSS 2010
Two Modes to Teradata
A life filled with love may have some thorns, but a life empty of love will have no roses.
- Anonymous
Teradata has two different modes in which it operates. Those modes are called Teradata Mode and ANSI Mode. Both modes handle things a little differently. Every Teradata system will have a default mode set by the DBA when the system first arrives. Although there is a default mode set, the user can actually change the mode they want during their sessions. Depending on which mode you are using a transaction takes on a whole new meaning.
286
Copyright OSS 2010
287
Copyright OSS 2010
Differences between ANSI and Teradata Mode
You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions.
- Naquib Mahfouz
Teradata has two different modes in which it operates. Those modes are called Teradata Mode and ANSI Mode. Both modes handle things a little differently. As you can see on the following page there are many differences. I want you to focus on two main areas. The first is that in Teradata mode you dont need to use the word COMMIT, but in ANSI mode you do. The second area of focus is how statements are rolled back. In Teradata mode if a statement in a transaction fails, EVERY Statement in that transaction is Rolled Back, but in ANSI mode if a transaction fails, ONLY the FAILED Statement(s) are Rolled Back.
288
Copyright OSS 2010
289
Copyright OSS 2010
ANSI Mode Commit
You got to be careful if you dont know where youre going, because you might not get there.
- Yogi Berra
With ANSI Mode you must use the words COMMIT WORK or you can just say COMMIT, but this is mandatory anytime you are changing something in the Teradata database. This includes anytime you use the CREATE statement or any INSERT, UPDATE, DELETE also. You dont need it for the queries with SELECT. On the following page you can see both a single statement transaction at the top of the slide and on the bottom of the slide you can see a multi-statement transaction. If the single statement transaction failed for any reason then Teradata would Roll Back this UPDATE statement and ensure the database was exactly like it was before the transaction. If a statement in the multi-statement transaction were to fail only the FAILED Statement would be Rolled Back!
290
Copyright OSS 2010
291
Copyright OSS 2010
Teradata Mode Commit also called BTET
The only thing worse than being talked about is not being talked about.
- Oscar Wilde
With Teradata Mode you never use the words COMMIT WORK or COMMIT. This is implied with each statement. I am sure you are asking, Then how does Teradata Mode run a Multi-Statement Transaction? It uses a BT or BEGIN TRANSACTION Statement, then runs the statements, and then follows them with an ET or END TRANSACTION Statement. On the following page you can see both a single statement transaction at the top of the slide and on the bottom of the slide you can see a multi-statement transaction. If the single statement transaction failed for any reason then Teradata would Roll Back this UPDATE statement and ensure the database was exactly like it was before the transaction. If a statement in the multi-statement transaction were to fail ALL Statements within the transaction would be Rolled Back!
292
Copyright OSS 2010
293
Copyright OSS 2010
Trick to CREATE a Multi-Statement with BTEQ
A government that robs Peter to pay Paul can always depend upon the support of Paul.
- George Bernard Shaw
The next page shows an old trick of creating a Multi-Statement request in the Teradata Utility called BTEQ (Pronounced Bee Teek). BTEQ requires a semi-colon at the end of every SQL Statement. If you put the semicolon as the front of the next line and then place another SQL Statement immediately following, then these statements are considered part of the same transaction.
294
Copyright OSS 2010
295
Copyright OSS 2010
Transient Journal
The Transient Journal knows what the Transaction never suspected.

Swedish Proverb after a rollback
The Transient Journals job is to ensure if an insert, update, or delete fails, then the rows affected can be reverted back to their original state. This is called a Rollback. In Teradata, all SQL statements are considered transactions. This applies whether you have one statement or multiple statements executing (MACRO). If all SQL statements cannot be performed successfully, the following happens: The user receives immediate feedback in the form of a failure message; The entire transaction is rolled back, and any changes made to the database are reversed; Locks are released Spool files are discarded
The Transient Journal is automatic and it takes a before picture of any update or delete for rollback purposes.
296
Copyright OSS 2010
297
Copyright OSS 2010
How the Transient Journal Works
Beware of the young doctor and the old barber.

- Benjamin Franklin
Wouldnt it be great if every time you got a haircut, the barber or stylist took a picture of your hairdo before they cut a single strand? Then after he or she cut your hair, asked if you liked it? If you didnt like it, then you could ask to have it restored? Well, that is what the Transaction Journal does. If a row is going to change because of an UPDATE or DELETE, it takes a BEFORE picture. If the transaction fails, then the journal restores it to the way it was. The TRANSIENT JOURNAL is an automatic system function. It is not optional. The BEFORE image is actually stored in the AMPs Transient Journal. Every AMP has a transient journal that is maintained in DBCs PERM space. If the transaction is aborted for any reason, the AMP restores the data to match the before-image stored in the Transient Journal. The data will then revert to its original state. When a transaction is successful, the PE and the AMPs shake hands on it and the Transient Journal is wiped clean. The handshake is called the COMMIT. After a COMMIT, all the AMPS have a party to celebrate, and the user is invited to join in the festivities! In other words, Transaction Journal Cleanliness is next to Godliness. If it is clean, then things went good! The Transient Journal provides two system events that occur automatically to ensure data integrity. An automatic rollback of changed rows occurs in the event of a transaction failure. This is done because before images are retained on each AMP as changes occur. Data is always returned to its original state after a transaction failure. In the picture on the next page you can see we are updating the budget of Dept_No 100, which is the Sales Department, from 100000 to 500000. Before the transaction can occur the AMP will take a snapshot of the entire row and store it in its Transient Journal. You can see the actual SQL statement doing the UPDATE at the top of the picture. Inside the Disk you can see the Transient Journal with the before picture of the row being updated. You can also see the table inside the disk.
298
Copyright OSS 2010
299
Copyright OSS 2010
The Transient Journal after a Commit
Do you know, my son, with what little understanding the world is ruled?
- Pope Julius III
The Transient Journal rows are discarded once the transaction is committed. The only reason that each AMP takes a BEFORE picture and stores it in its Transient Journal is in case of a problem in which a ROLLBACK occurs. If there are no problems and the transaction is committed then the BEFORE picture is discarded. If a ROLLBACK did occur then the AMP can replace the attempted UPDATE with the BEFORE picture and everything is back to the way it was before the UPDATE Statement.
300
Copyright OSS 2010
301
Copyright OSS 2010
VProcs
The longer I live the more beautiful life becomes.

-Frank Lloyd Wright
Teradata utilizes Parsing Engines (PE) and Access Module Processors (AMPs) in which they call VProcs. These refer to virtual processors or VProcs. Each AMP and PE lives inside the memory of a Node. There are anywhere between 25 and 35 VProcs inside each node. Think of a Node as a giant Personal Computer. One that has 4 Intel Processors that work and act as if there were 8 Intel Processors. This node also has up to 16 GBs of memory. The VProcs get loaded inside the Nodes memory and then we connect this node via the BYNET with all the other nodes and now we are part of the Teradata warehouse.
302
Copyright OSS 2010
303
Copyright OSS 2010
Nodes and MPP
The surprising thing about young fools is how many survive to become old fools.
-Doug Larson
Teradata has taken a simple PC, filled the memory with AMPs and PEs and calls it a node. Connect multiple nodes together with the BYNET and you have a Massively Parallel Processing or MPP system.
304
Copyright OSS 2010
305
Copyright OSS 2010
RAID 1 - Mirroring
You can only perceive real beauty in a person as they get older.
-Anouk Aimee
RAID 1 is mirroring and Teradata always mirrors their disks. As you can see on the following page every AMP is attached to four physical disks. Two hold actual data and two are for backup. This provides excellent protection. Each AMP is said to have four physical disks, but only one Virtual Disk. This really means that no AMP can get into another AMPs disks. Each AMP is the only thing allowed to read that AMPs disks. So, each AMP is said to have its own virtual disk, which is a set of four physical disks. The great thing about mirroring is that if we lose a disk we already have it mirrored and protected. The DBA just has to remove the failed disk and put in a fresh disk and the mirroring will immediately begin. Remember the price for Mirroring is double the disk costs. Each time you have a disk with data you have another disk mirroring and protecting that data disk.
306
Copyright OSS 2010
307
Copyright OSS 2010
Cliques
Never advise anyone to go to war or to marry.

-Spanish Proverb
Teradata CLIQUES (pronounced cleeks) are a method of system protection against the failure of an entire node. Each node contains in memory AMP VPROCs. Each AMP is attached to one virtual disk (Vdisk) and that AMP is the only Vproc allowed access to its Vdisk. A Clique utilizes access to a set of disks from another node. If a node fails the AMP VPROCs can migrate to the node that has the backup access to its virtual disk. The migrating AMP can continue to read and write to its Vdisk while its home node is down. When the home node is fixed and available again the VPROCs return home. If a Teradata system uses two-node cliques then when one node fails all of its AMP VPROCs migrate to the other node. The system is now about 50% slower. To solve this problem Teradata allows bigger cliques such as eight nodes. If one node fails, its VPROCs split up and migrate amongst the seven other nodes in the clique without much performance degradation. In the picture on the following page I want you to notice that we have two nodes. In each Node we have two Parsing Engines (PE) and six AMPs. Each of these nodes directly attaches to its disk farm. Each AMP gets access to four physical disks, which is considered one virtual disk because only this AMP can access its disks. Watch what happens next when one of our nodes fail!
308
Copyright OSS 2010
309
Copyright OSS 2010
VProcs Migrate when a Node Fails
Many receive advice, few profits by it.

-Publilius Syrus 100 BC
When a node fails Teradata resets and the nodes begin their startup routine, but the failed node will now receive instructions for its VProcs to migrate to another node in their clique. A Clique is nothing but extra cables connecting the disk farms of each node together just in case a migration needs to take place. As you can see on the following page a node has failed. The VProcs in that node will now migrate to the memory of a node in their clique. The system is degraded and not up to maximum speed, but at least the system is up and running. This is an example of a 2-node clique. In a 2-node clique if one node fails then all of the VProcs in that node must migrate to the other node. Teradata has been smart to allow for 4-node cliques and even 8-node cliques. When one node fails in an 8-node clique then all the VProcs in the failed node can spread out evenly among the other 7-nodes remaining in the clique. To accomplish this each node is directly attached to its own disk farm, but it is also attached to the other nodes disk farms within the clique. Now, any AMP within the clique could migrate to any other node in the clique if necessary. Cliques are designed to prevent against a NODE Failure.
310
Copyright OSS 2010
311
Copyright OSS 2010
Cliques An 8-Node Example
Half the money I spend on advertising is waster; the trouble is I dont know which half.
-John Wanamaker
In the picture on the following page you see 8 nodes. When we connect each of these nodes to each other nodes disk farms we are essentially creating a clique. Now if there is a node failure, Teradata will reset and the AMPs and PEs in the down node will be able to migrate to the memory of another node within the clique. I want you to notice that Clique 1 has Green AMPs and all the other nodes have purple colored AMPs. We are about to see what happens when Node 1 crashes. Get ready!
312
Copyright OSS 2010
313
Copyright OSS 2010
Cliques An 8-Node Example with Migration
I do not regret one professional enemy I have made. Any actor who doesnt dare to make and enemy should get out of the business.
-Bette Davis
In the picture on the following page you see 8 nodes. When we connect each of these nodes to each other nodes disk farms we are essentially creating a clique. Now if there is a node failure, Teradata will reset and the AMPs and PEs in the down node will be able to migrate to the memory of another node within the clique.
314
Copyright OSS 2010
315
Copyright OSS 2010
Hot Standby Nodes
Conscience is the inner voice which warns us that someone may be looking.
-H. L. Mencken
Teradata actually has hot standby nodes! This is in case of a node failure. Although other AMPs in the Clique could migrate from a down node to other nodes in the clique, an even better way is to have a hot standby node. This is the nodes hardware without running anything until another node goes down. When a node goes down Teradata will reset. When it does the AMPs and PEs in the down node will be instructed to migrate to the hot standby node. Now everything is up and running perfectly. A hot standby node is equivalent to you buying a second car. You would only drive the 2nd car if your other car broke down. Yes it is expensive, but it is great when the first car is down.
316
Copyright OSS 2010
317
Copyright OSS 2010
Hot Standby Nodes in Action
If you are all wrapped up in yourself, you are overdressed.

-Kate Halverson
Notice in our picture that our first node is down, but that the AMPs and PEs migrated to our Hot Standby Node! Isnt it great when a plan comes together?
318
Copyright OSS 2010
319
Copyright OSS 2010
FALLBACK Protection
United we stand divided we fall.

-Circular letter, Boston during the American Revolution
FALLBACK is a table protection feature used in case an AMP fails. Fallback is similar to mirroring in that a duplicate copy of a row is created and maintained on another AMP for redundancy purposes. Essentially, anytime you define a table with Fallback you are using twice the space. You can use FALLBACK on all tables, some tables or no tables. You can also create a table with or without FALLBACK and then add or drop the feature at any time.
Divided we stand united we Fallback.

-AMP during the computer revolution
Fallback is similar to mirroring in that it creates and maintains a duplicate copy of each row, but it is designed in a revolutionary manner for performance purposes. With mirroring if one disk goes down another duplicate disk takes over. Fallback however will take all the rows that one AMP is responsible for in a fallback protected table and store them on multiple AMPs. If the AMP fails then multiple AMPs will be responsible for delivering the failed AMPs rows.
We have the right to bear arms.

-2nd amendment of the constitution
Teradata believes its constitution is to protect the data and so a duplicate copy is always maintained on another AMP.
We have no access rights to bare amps.

-2nd amendment of the Teradata constitution
320
Copyright OSS 2010
321
Copyright OSS 2010
How Fallback Works
Its deja vu all over again!

-Yogi Berra
Fallback is like dj vu all over again because when a table is fallback protected the rows are duplicated on other AMPs. Fallback is similar to mirroring, but different. The similarities is that both provide a duplicate copy, but the difference is that Fallback places copies of its rows on multiple AMPs so if a failure occurs Teradata can use the parallelism to help the failed AMP. On the next page are four AMPs holding a base table. For examples sake, lets assume that the base table is the Employee Table. There are 12 employees with employee numbers ranging from 1 to 12. The data is spread evenly in the table with each AMP responsible for 3 employees. The Employee Table has been created with Fallback, so each row of the base table is duplicated on another AMP in the Fallback Table. Notice three very important features: (1) No base table row is on the same AMP with its Fallback protected duplicate copy. (2) Each AMP spreads their Fallback rows evenly to multiple AMPs. (3) The perm space used for the table is double because of the fallback The system can lose any single AMP or Disk in this system. If multiple AMPs or Disks fail in the picture below then Teradata wont be able to run queries that ask for all the data.
322
Copyright OSS 2010
323
Copyright OSS 2010
Fallback Clusters Exercise
My father taught me to work; he did not teach me to love it.

-Abraham Lincoln
Fallback is always associated with CLUSTERS. Fallback can be specified at the table level. Fallback is worth the price because when an AMP fails users still have access to the data even while the AMP is offline. Any data that has changed is automatically restored during the AMP offline period. If we can lose any one AMP/disk, what happens if we lose two? The chance of losing two AMPs in a four-AMP system is rare, however some systems have nearly 2,000 AMPs. Therefore, the chance of losing two AMPs in a 2,000 AMP system is much greater than in a four-AMP system. Thats why Teradata designed Clustering. With Clustering, Teradata can lose one AMP/Disk per cluster. Lets look at this next example with 8 AMPs in two clusters. Notice that the data in the base table lays out evenly with 24 records on 8 AMPs. What is key to notice is that the fallback copy remains within the cluster. In other words, the base table rows in cluster one are fallback protected within cluster one. The base table rows in cluster two are fallback protected within cluster two. We can lose one AMP/Disk in both cluster one and cluster two and the system is fine. Fallback cluster sizes are set usually by a Teradata representative through a Teradata Console Utility. They can range from 2 AMPs in a cluster up to 16 AMPs in a cluster. The most often used cluster size is 4 AMPs per cluster. Not all clusters in a system have to be the same size, but this is usually desired.
324
Copyright OSS 2010
325
Copyright OSS 2010
Fallback Clusters
Dont worry about people stealing your ideas. If your ideas are any good, youll have to ram them down peoples throats.
-Howard Aiken
Fallback has been placed perfectly in the picture on the following page. Notice we have two clusters. The top cluster and the bottom cluster. The top cluster is in purple and the bottom cluster is in yellow. We laid the data out and it spread evenly among both clusters. Now it is time to layout the fallback data. Notice that the fallback from the top cluster stays within the top cluster. The same rule goes for the bottom cluster. The Fallback data stays within the cluster. Now we can lose 1 AMP in every cluster and still have our data up and running. Teradata will not use the Fallback data unless an AMP in the cluster goes down.
326
Copyright OSS 2010
327
Copyright OSS 2010
Fallback Exercises with Clusters
I have never let my schooling interfere with my education.

-Mark Twain
This is an outstanding exercise that is designed to teach you exactly how Fallback works with clusters. In the example on the following page you will see a 12 AMP system that has 12 Base Rows. In each system we have placed these rows and labeled them 1-12. The 12 records have been spread evenly among the 12 AMPs with each AMP getting one record. I have placed the first Fallback row on the proper AMP. The base row records are on the top of the disk and we have placed the Fallback rows on the bottom of the disk. Your job is to finish the exercise. No looking!
328
Copyright OSS 2010
329
Copyright OSS 2010
Fallback Exercises with Clusters Answer
Grad school is the snooze button on the clock-radio of life.

-John Rogers, Comedian who holds a graduate degree if physics
Your answers are on the following page.
330
Copyright OSS 2010
331
Copyright OSS 2010
More Fallback Exercises
Time is a great teacher, but unfortunately it kills all its pupils.

-Hector Louis Berlioz
I have already completed the first example, which is the 12 AMPs in One Cluster. In the next example there are two clusters of six AMPs each. In the next example there are three clusters of four AMPs. In the final example there are four clusters of three AMPS. Your job is to place the Fallback rows in their proper place. Remember, because I am a nice guy I have helped you out with the first system containing one cluster of 12 AMPs. Now, you should attempt to place the Fallback records on the proper AMP in the proper cluster for the remaining system.
332
Copyright OSS 2010
333
Copyright OSS 2010
More Fallback Exercises with Answers
When you are courting a nice girl an hour seems like a second. When you sit on a redhot cinder a second seems like an hour. Thats relativity.
-Albert Einstein
Check out how the Fallback was laid out in all four systems.
334
Copyright OSS 2010
335
Copyright OSS 2010
Fallback Performance Vs Protection Questions

You will be asked several questions about the slide you have just seen concerning Fallback. By answering these questions we hope you will be able to further your understanding of exactly how Fallback works. We want you to clearly understand the trade-off between protection and performance and then understand why NCR Teradata usually picks a number of AMPs in a cluster that will maximize both. Answer the questions assuming that there are millions of rows in the table. 1. Which System (A, B, C, D) provides the best protection? 2. Which System provides the best performance should a single AMP go down? 3. How many AMPs could you lose in System A and still have Teradata be able to satisfy a query that was a Full Table Scan? 4. How many AMPs could you potentially lose in System D and still have Teradata satisfy a query that was a Full Table Scan? 5. How many AMPs could you lose in System D (Cluster 1) before Teradata would not be able to satisfy a query that was a Full Table Scan? 6. If none of the systems had Fallback, how many AMPs could any system lose before Teradata would not be able to satisfy a query that was a Full Table scan? 7. Why does Teradata usually place four AMPs in each cluster?
336
Copyright OSS 2010
Fallback Performance Vs Protection (Answers)

Here are the questions again, but with the answers. Remember, you were to answer the questions assuming that there are millions of rows in the table. 1. Which System (A, B, C, D) provides the best protection? D (This is because we could potentially lose one AMP in each cluster thus allowing us to lose 4 AMPs. 2. Which System provides the best performance should a single AMP go down? A (This is because if a single AMP went down then 11 other AMPs (all in the same cluster) would hold an equal portion of the down AMPs Fallback rows. Therefore, only 1/12th of the system would be affected with each AMP responsible for a portion of the down AMPs work.) 3. How many AMPs could you lose in System A and still have Teradata be able to satisfy a query that was a Full Table Scan? One (You can only lose one AMP in a cluster. If you lose two AMPs in a cluster the Teradata system cant fulfill requests to the down AMPs). 4. How many AMPs could you potentially lose in System D and still have Teradata satisfy a query that was a Full Table Scan? Four (You can lose one AMP in each Cluster with Fallback, but lose two in any single cluster and the table is in trouble) 5. How many AMPs could you lose in System D (Cluster 1) before Teradata would not be able to satisfy a query that was a Full Table Scan? One (You can only lose one AMP in a cluster and since all 12 AMPs are in the same cluster you better not lose a second one). 6. If none of the systems had Fallback, how many AMPs could any system lose before Teradata would not be able to satisfy a query that was a Full Table scan? None (Since the records are not Fallback protected then there is no way to satisfy a query wanting information from the down AMP. That means the system could not perform a Full Table Scan or satisfy any query that involved the down AMP). 7. Why does Teradata usually place four AMPs in each cluster? Teradata usually places four AMPs in a cluster because of both Performance and Protection. (The Protection with four AMPs is solid because it is not likely that two AMPs out of four will fail. The Performance is solid because if a single AMP goes down then the three other AMPs in the cluster will share responsibility for the down AMPs records. Only 25% of the system performance is gone because the three AMPs will do their work plus what is needed for the failed AMP).
337
Copyright OSS 2010
The Six Rules of Fallback
Dont worry about the world coming to an end today. Its already tomorrow in Australia.
-Charles Schultz
There are a couple of rules I want you to think about with Fallback. Rule 1: Fallback doubles the size of your table. Rule 2: All AMPs are clustered (usually in sets of four). Rule 3: Fallback rows always reside within the same Cluster. Rule 4: Two AMPs in the same Cluster never reside inside the same NODE. Rule 5: Two AMPs in the same Cluster never reside inside the same CLIQUE. Rule 6: Fallback protects you against a Failed AMP
338
Copyright OSS 2010
339
Copyright OSS 2010
Cliques and Clusters
Time is at once the most valuable and most perishable of all our possessions.
-John Randolph
On the following page you can see a picture that has four Cliques. In each Clique are four Nodes. Within each Node is 2 PEs and 4 AMPs. Normally, there would be about 4 PEs and 25 AMPs, but this picture is designed to give you knowledge of how Teradata Clusters the AMPs inside the cliques. This picture is to set you up. Your job is to put in a clustering scheme that follows three rules: 1) Group your Clusters in AMPs of Four. 2) Never have two AMPs in the same Cluster be a part of the same Clique. 3) Never have two AMPs in the same Cluster be a part of the same Node. If you do this correctly you will understand that we can lose a Node or a Clique and still not have more than two AMPs within a Cluster fail.
340
Copyright OSS 2010
341
Copyright OSS 2010
Cliques and Clusters Answers
Though no one can go back and make a rand new start, anyone can start from now and make a brand new ending.
-Anonymous
On the following page you can see a picture that has four Cliques. In each Clique are four Nodes. Within each Node are 2 PEs and 4 AMPs. Normally, there would be about 4 PEs and 25 AMPs, but this picture is designed to give you knowledge of how Teradata Clusters the AMPs inside the cliques. Remember the three rules: 1) Group your Clusters in AMPs of Four. 2) Never have two AMPs in the same Cluster be a part of the same Clique. 3) Never have two AMPs in the same Cluster be a part of the same Node. If you do this correctly you will understand that we can lose a Node or a Clique and still not have more than two AMPs within a Cluster fail. The following page shows the four AMPs in Cluster 1. They are each in a different Clique and a different Node. I have circled the four AMPs and placed the number 1 inside them to represent Cluster 1. Notice that the four AMPs in cluster 1 are very far apart physically. Notice the four AMPs in Cluster 2. Notice the four AMPs in Cluster 3. Notice we have clusters from Cluster 1 to Cluster G. We dont normally call our Clusters by their numbers, but I want you to also notice the four AMPs in Cluster G. If we lost every node in Clique number 1 we would have actually lost one AMP in every Cluster.
342
Copyright OSS 2010
343
Copyright OSS 2010
Down AMP Recovery Journal (DARJ)
Once the game is over, the king and the pawn go back in the same box.
- Italian Proverb
The Down AMP Recovery Journal (DARJ) is started on all AMPs in the cluster when an AMP is down. This allows for three AMPs to check on their mate. Since there are four AMPs in most clusters and all Fallback for a particular AMP remains within the cluster there are Three AMPs that will hold Fallback rows for a down AMP. The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP is not working. Like the TRANSIENT JOURNAL, the DARJ, also known as the RECOVERY JOURNAL, gets it space from the DBCs PERM Space. When an AMP fails, the rest of the AMPs in its cluster initiate a DARJ. The DARJ keeps track of any changes that would have been written to the failed AMP. When the AMP comes back online, the DARJ will catchup the AMP by completing missed transactions. Once everything is caught-up the DARJ is dropped.
344
Copyright OSS 2010
345
Copyright OSS 2010
Permanent Journal
The absent are always in the wrong.

English Proverb
If a system had five million rows and used FALLBACK protection, then it would have five million FALLBACK rows. However, this would be quite costly because FALLBACK actually stores a duplicate copy of all the rows on other AMPs within the same cluster. FALLBACK is used either because the system is mission critical or the system is not backed up regularly. For customers who backup data regularly, another option for data restoration is the Permanent Journal. When a company is not severely impacted by a couple of hours for a restoration to be completed, this is a very good option. The Permanent Journal works in conjunction with backup procedures, plus its a lot more cost effective than FALLBACK.
The absent are always in the write.

Permanent Journal Proverb
The Permanent Journal stores only images of rows that have been changed due to an INSERT, UPDATE, or DELETE command. That is why when data is lost or absent the permanent journal can write it back to the disks. The permanent journal keeps track of all new, deleted or modified data since the last Permanent Journal backup. This option is usually less expensive than storing the additional five million FALLBACK rows. Like FALLBACK, the Permanent Journal is optional. It may be used on specific tables of your choosing or on no tables at all. It provides the flexibility to customize a Journal to meet specific needs. The Permanent Journal must be manually purged from time to time. There are five image options for the Permanent Journal: Before Journal After Journal Dual Before Journal Dual After Journal Journal
346
Copyright OSS 2010
347
Copyright OSS 2010
Table create with Fallback and Permanent Journal
A real friend is one who walks in when the rest of the world walks out.
Walter Winchell
The example created the table called Employee in the Teratom database, and is FALLBACK protected. A BEFORE Journal and a DUAL AFTER Journal are specified. Remember that both FALLBACK and JOURNALING have defaults of NO - meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING.
348
Copyright OSS 2010
349
Copyright OSS 2010
Permanent Journal Rules
If you can find something everyone agrees on, its wrong.

-Mo Udall
The only time you can create a Permanent Journal is when you CREATE a DATABASE or a USER or if you MODIFY a database or user. Remember, Teradata considers a database or user basically the same thing. Both databases and users can have PERM Space or Spool space assigned to them. So, remember that you can only create a Permanent Journal inside a database or user. You can only have one Permanent Journal per database or user so theoretically we could have one Permanent Journal in the entire system or we could go to the other extreme and have one Permanent Journal in every database and every user. When you create a table you specify whether or not you want journaling. You can also tell the table which journal you want as its journal. You can have many tables in a database and each table could potentially write to a default journal in the database itself, or you could have each table write to a default journal in another database or you could even have some tables write to one default journal and other tables write to different default journals. It is up to you.
350
Copyright OSS 2010
351
Copyright OSS 2010
Some Permanent Journal Possibilities
Talk low, talk slow, and dont too much.

-John Wayne, Advice on acting
Notice in the picture below the following scenarios. You could have a scenario where every Database or User has their own Permanent Journal and the tables in that Database or User always select the Permanent Journal in their database or user as their Default Journal. That example is shown in System 1. You might set up one Permanent Journal in a database and have every table in the system choose that one Permanent Journal as their Default Journal. That example is shown in System 2. You will most likely have multiple Permanent Journals and have tables choose one of them as their Default Journal. That example is shown in System 3.
352
Copyright OSS 2010
353
Copyright OSS 2010
Creating a Permanent Journal
Only the educated are free.

-Epictetus
The following page is an excellent example of creating a Permanent Journal. You can only create a Permanent Journal when you use a CREATE DATABASE or CREATE USER statement. Of course this also applies with a MODIFY DATABASE or MODIFY USER statement. Remember your two most basic Permanent Journal rules: Rule 1: You can only have one Permanent Journal per database or per user. Rule 2: Tables within a database can be assigned to any Permanent Journal in any DATABASE or USER. After this next page we will create tables inside Advertising to see examples of different scenarios.
354
Copyright OSS 2010
355
Copyright OSS 2010
Create Table Examples with Permanent Journals
I am desperately trying to figure out why Kamikaze pilots wore helmets.

-Dave Edison
In our previous slide we created a database called Advertising. We gave it a Journal called Journals! We have named a journal called Journals, in order to keep track of our table changes and this will serve as the default for any table created in Advertising. Below are examples of three different tables being created. Watch and learn what happens. The first example shows a table called Department_Table. This table makes no reference to any journals so by default will write an After Journal to Advertising.Journals. The table took the default AFTER Journal that was created in its database. The next example shows a table called Employee_Table. This table overrides the database default and explicitly demands that the table not write to a Journal. The final example shows a table called Department_Table2. This table requests an AFTER Journal, but demands that the table write any changes to rows to a different Journal in another database called Sales.SJournal.
356
Copyright OSS 2010
357
Copyright OSS 2010
Each Permanent Journal is made up of 3 Areas
I became a policeman because I wanted to be in a business where the customer is always wrong.
Anonymous
Every Permanent Journal is comprised of three areas. They are the Active Current Journal, the Saved Current Journal and the Restored Journal. This slide demonstrates the purpose of the Active Current Journal and the Saved Current Journal portions of the Permanent Journal. When a table is defined with a Permanent Journal and a change takes place on a row the changed row is written (appended) to the Active Current Journal. When the Database Administrator (DBA) submits a CHECKPOINT WITH SAVE statement the Active Current Journal appends its rows to the Saved Current Journal. Then, Teradata automatically deletes the Active Current Journal rows so a fresh start to the Active Current Journal can take place. When the DBA submits an ARCHIVE JOURNAL TABLE statement the Saved Current Journal is copied to tape. This is usually done on a daily basis. The DBA must submit a DELETE JOURNAL statement to delete the Saved Current Journal. It is never done automatically.
358
Copyright OSS 2010
359
Copyright OSS 2010
Permanent Journal Rules
You miss 100 percent of the shots you never take.

Wayne Gretzky
You miss 100 percent of the Journals you never save! The following page shows the rules for a consistent Permanent Journal.
360
Copyright OSS 2010
361
Copyright OSS 2010
362
Copyright OSS 2010
363
Copyright OSS 2010
364
Copyright OSS 2010
365
Copyright OSS 2010
The Four Locks of Teradata
Some birds arent meant to be caged, their feathers are just too bright. And when they fly away, the part of you that knows it was a sin to lock them up, does rejoice.
Shawshank Redemption
You dont lock up a bird, but you always lock a query. Teradata uses a lock manager to automatically lock at the database, table or row hash level. Teradata will lock objects using four types of locks: Exclusive - Exclusive locks are placed only on a database or table when the object is going through a structural change. An Exclusive lock restricts access to the object by any other user. This lock can also be explicitly placed using the LOCKING modifier.
Write - A Write lock happens on an INSERT, DELETE, or UPDATE request. A Write lock restricts access by other users. The only exception is for users reading data that are not concerned with data consistency and override the applied lock by specifying an Access lock. This lock can also be explicitly placed using the LOCKING modifier.
Read - This is placed in response to a SELECT request. A Read lock restricts access by users who require Exclusive or Write locks. This lock can also be explicitly placed using the LOCKING modifier. Read locks put the word integrity in data integrity. If you have a multi-user environment with updates occurring and you need to keep data consistent, you want a read lock.
Access - Placed in response to a user-defined LOCKING FOR ACCESS phrase. An Access lock permits the user to access to READ an object that may already be locked for READ or WRITE. An access lock does not restrict access by another user except when an Exclusive lock is required. A user requesting access cannot be concerned with data consistency. When Teradata locks a resource for a user the lifespan of the transaction lock is forever or until the user releases the lock. This is different then a deadlock situation. youngest query is always aborted. 366 If two transactions are deadlocked the
Copyright OSS 2010
367
Copyright OSS 2010
Teradata has 3 levels of Locking
When you go into court you are putting your fate into the hands of twelve people who werent smart enough to get out of jury duty.
- Norm Crosby
Teradata uses a lock manager to be judge, jury, and executioner of SQL. There are four locks placed on objects at the database, table, or row hash level.
368
Copyright OSS 2010
369
Copyright OSS 2010
Quiz Which Level of Locking is Occurring?
In the end well remember not the words of our enemies, but the silence of our friends.
Martin Luther King, Jr.
Dr. Martin Luther King Jr. was a great man with a message that will live on forever. Dr. King believed in equality. In dedication to Dr. King we have set up this quiz about lock equality. Which lock level will Teradata use for the SQL on the following page. Will the SQL cause Teradata to place the lock at the Database level, Table level, or Row level?
370
Copyright OSS 2010
371
Copyright OSS 2010
Quiz Locking Answers
If you are planning for a year, sow rice; if you are planning for a decade, plant trees; if you are planning for a lifetime, educate people.
- Chinese Proverb
On the following page we have the answers to the quiz.
372
Copyright OSS 2010
373
Copyright OSS 2010
The Teradata Lock Manager
You can make more friends in two months by becoming interested in other people, than you will in two years by trying to get other people interested in you.
- Dale Carnegie
Someone on a New York Street Corner was asked, How do you get to Carnegie Hall? He replied, Practice man practice! The following page is designed so you can practice the art of understanding Teradata locks. What I want you to know is that the only lock that users have control over is the Access Lock. If you want to read a table, but dont want to wait on a WRITE Lock, and dont care that the answer set may not be perfect then you want an ACCESS Lock. This is also called a Dirty Read or a Read without Integrity because the data isnt always perfect. Let me explain. When someone is updating a table they are given a WRITE Lock. As they perform the UPDATE a user who wants a READ lock on the table writes their SQL. Teradata makes the READ lock wait until the WRITE lock has completely updated the table. This could take a long time. An Access lock says, I know someone is already updating the table, but I dont want to wait. I am merely trying to get an average of sales for the week and I dont have to have everything perfect. I also want you to notice that an Exclusive Lock, Write Lock and the Read Lock are determined by Teradata based on the SQL that is written.
374
Copyright OSS 2010
375
Copyright OSS 2010
Locking Modifiers The Access Lock
Were fools whether we dance or not, so we might as well dance.

- Japanese Proverb
If the user doesnt want to wait on any write locks they can use the Locking for ACCESS modifier. They wont have to wait on any Write Locks that would normally make them wait in the Pseudo Table line. They are compatible with Write locks. Notice in the picture on the next page that I have highlighted in Yellow the actual SQL Statement. You will find that most views use the Locking for Access modifier. Here is a trick. Merely put Locking row for Access in your SQL. This will try and lock just the row for an Access Lock, but if Teradata needs to lock the entire table it will do so.
376
Copyright OSS 2010
377
Copyright OSS 2010
Locks and their compatibility
Frankly, my dear, I dont give a damn.

- Rhett Butler Gone with the Wind (1939)
Not everyone is compatible and Teradata locks are no exception. Locks that are compatible can lock the same object simultaneously. Clark Gable would have been a great Teradata user because he always used a Rhett Lock and according to Scarlet was almost never Write! Locks that are compatible can share access to objects simultaneously so READ locks are great because one or a thousand users can read the same object at the same time. Teradata will not allow a user to change a table while others are reading it. This prevents database corruption. An ACCESS Lock is an excellent way to avoid waiting for a write lock currently on a particular table. Two statements allow this: Locking Row for Access Locking Tablename for Access
378
Copyright OSS 2010
379
Copyright OSS 2010
Moving Through the Locking Queue
If youre falling off a cliff, you may as well try to fly.

- Captain John Sheridan, Babylon 5
Teradata locks can fly through the Queue with amazing speed. How? Teradata locks work a lot like going to a movie theatre. You decide you want to see a movie or access a table you must first get in line. But in the Teradata Movie Line if you are compatible with the person directly in front of you then you can move up the line. That doesnt mean you can go right to the front of the line if you are compatible with the first person in line. It means that you can move up the line one person at a time as long as you are compatible. The next couple of pages will discuss what locks are compatible with each other.
380
Copyright OSS 2010
381
Copyright OSS 2010
Quiz Which Locks Move Up?
Everyone is kneaded out of the same dough but not baked in the same oven.
- Yiddish Proverb
The quiz on the following page will give you an opportunity to understand how locks move up and read rows of a table simultaneously. If everyone in a company only did SELECTs, then there would only be READ locks. Read locks are compatible so everyone could always immediately read any table they want. However, most of the time a data warehouse environment has users who want to read and analyze data, but others doing updates. This is where there is danger in slowing users down because they often have to wait to access a table. Remember what is compatible to what and also remember that you can only move up 1 person at a time, and that is only if you are compatible!
382
Copyright OSS 2010
383
Copyright OSS 2010
Answers to Locking Quiz
They can conquer who believe they can.

- Virgil 70 BC
You believed you could do it and that is why you probably aced this quiz. Great job! Notice how I put the slide together on the next page. I am showing in the circles which locks simultaneously performed together. These locks moved up because they were compatible with the lock directly in front of them. You can keep moving up until you run into a lock you are not compatible with.
384
Copyright OSS 2010
385
Copyright OSS 2010
A Single AMP Acts as the Locking Gatekeeper
First you imitate, then you innovate.

- Miles Davis
Miles Davis was a musical genius who worked hard at his Jazz. Teradata has struck a chord with its lock abilities too. Please make a note of it. Teradata innovates by making sure that only 1 AMP is responsible for locking a particular table. What is innovative about this is that each AMP is assigned certain tables it is responsible for locking. How does Teradata accomplish this and make sure that each AMP is responsible for an equal amount of tables? It hashes the table name and then goes to the hash map. The hash map then points to the AMP responsible. In our example on the following page you see that we have two users named John and Mary. Both are trying to run SQL on the Order_Table. Teradata will need to assign one of the AMPs the responsibility of locking the table so it hashes the name Order_Table. As you can see the Row Hash came out as a 00000000001100 which equates to a 12. Teradata goes to the 12the bucket in the hash map and the bucket says AMP 4 will be responsible for locking the Order_Table. AMP 4 builds a Pseudo Table and John got there just before Mary because John submitted his query first, but the line has been established. Mary will wait for John to finish unless her lock is compatible with his. If that is the case then both will have access to the Order_Table simultaneously. AMP 4 will send a message over the BYNET to the other AMPs guiding them to lock their tables for John first and then do the same for Mary. This locking architecture is designed to eliminate dead locks and ensure that the first query submitted gets the first lock for a particular table. Teradata refers to this as A sequential locking resource.
386
Copyright OSS 2010
387
Copyright OSS 2010
Every AMP performs Locking Gatekeeper Duties
Fate chooses your relations, you choose your friends.

- Jacques Delille
Notice the picture on the following page. Teradata fate chooses which AMP will be responsible for locking the table(s) your SQL Query needs. What I want you to notice is that each AMP is responsible for locking a different table. AMP 1 is responsible for the Order_Table while AMP 2 is responsible for the Item_Table while AMP 3 has the Sales table and AMP 4 has the Cust_Table. Teradata accomplishes this locking by hashing the table name and so each AMP in theory should be responsible for an equal amount of tables. Notice that each AMP builds what is called a Pseudo table. This is the front of the line for that tables access. The first person to submit the SQL will be at the front of the line and the last to submit will be at the end of the line in the pseudo table. If you are compatible with the lock in front of you then you can move up one lock at a time.
388
Copyright OSS 2010
389
Copyright OSS 2010
Answers to Which AMP is Waiting on Access
Regret for wasted time is more wasted time.

- Mason Cooley
The slide on the following page answers the age old question, Which AMP will have a lock that waits. The answer as you can see is AMP 1 and it is because of the last write in AMP 1s Pseudo table. The Write can move up to the ACCESS directly in front of it, but as the Access also moves up to the Read Lock, the Write has to wait for the reads to finish.
390
Copyright OSS 2010
391
Copyright OSS 2010
Explains The Pseudo Table for Locks
Money doesnt bring happiness. People with ten million dollars are no happier than people with nine million dollars.
- Hobart Brown
The first thing you will see in an EXPLAIN Statement, which is the PEs plan in English, will usually refer to locking. The first line usually says something like, Locking a Pseudo Table for Read. This means that you are now in line in the Pseudo Table. The next line states We lock the Employee_Table for Read, which means you have moved to the front of the line and the lock has been place the table. I like to explain this to users because the EXPLAIN can really give insight into the Parsing Engines plan, but users are often confused by the term Pseudo Table. Now you know!
392
Copyright OSS 2010
393
Copyright OSS 2010
The NOWAIT Locking Option
Destiny is not a matter of chance, it is a matter of choice; it is not a thing to be waited for, it is a thing to be achieved
- William Jennings Bryan
Sometimes your SQL is not a thing to be waited for; it is a thing to be aborted. When the NOWAIT is used, if a lock request cannot be responded to immediately the transaction will abort. The NOWAIT option is used when it is not desirable to have a request wait for resources, or cause resources to be tied up while waiting. The NOWAIT option is an excellent way to avoid waiting on conflicting locks. Dont make the mistake of thinking the NOWAIT option means you have a free dash to the front of the lock line. The NOWAIT option dictates that a transaction is to ABORT immediately if the LOCK MANAGER cannot immediately place the lock. Use a LOCKING modifier with the NOWAIT option when you dont want requests waiting in the queue. A 7423 return code informs the user that the lock could not be placed due to an existing, conflicting, lock.
394
Copyright OSS 2010
395
Copyright OSS 2010
Rules of Teradata Locking
People can have the Model T in any color so long as its black.
- Henry Ford
The rules of locking are right there on the slide on the following page.
396
Copyright OSS 2010
397
Copyright OSS 2010
398
Copyright OSS 2010
399
Copyright OSS 2010
Explains Psuedo Tables
If we knew what we were doing, it wouldnt be called research, would it?

- Albert Einstein
On most EXPLAINS the very first thing you will read is about locking of a Psuedo table. This is very confusing for many users and they often feel lost immediately when reading an EXPLAIN plan. Let me give you the real scoop. Teradata is a parallel processing system that places a portion of the rows of a table on every AMP. That means that each AMP has its own portion of a table, in other words, each AMP has thinks they own that table because to them they have a table header and rows that follow. That is a table to an AMP. This makes locking a little tricky because if you are trying to query the Employee_Table and you have a 500 AMP system then you are essentially telling all 500 AMPs to lock their Employee_Table. It is important to keep this locking process coordinated. Teradata accomplishes this by having only 1 AMP command the locking process. Because Teradata doesnt want 1 AMP to be responsible for locking of every table they spread this duty around to every AMP. Let me explain. When a user wants to query the Employee_Table the Parsing Engine hashes the name Employee_Table and looks to the hash map to see which AMP will be responsible for locking the Employee_Table. Lets just say it is AMP 4 for example sake. AMP 4 has a Psuedo Table which it dedicates to the Employee_Table. The Pseudo table will keep track of who wants to query the Employee_Table. It is considered first come first serve. The first person who enters a query for the Employee_Table will be first in line in the Pseudo Table. Then AMP 4 will communicate with all the other AMPs to lock their Employee_Table for the first user in AMP 4s Pseudo table. This allows Teradata to control and synchronize the locking system from a single resource. Think of a Pseudo table as a queue. The first user in the queue gets the lock and the others wait their turn. So, when you see We lock a distinct Pseudo table for READ that means you are in the locking queue waiting your turn to query the table. When you see the next line of the EXPLAIN say We lock the Employee_Table for READ that means it is now your turn to query the table and the locking has taken place. You now know more about EXPLAINs then 90% of the people in the world. Congratulations friend! Thanks for reading this book!
400
Copyright OSS 2010
401
Copyright OSS 2010
Explain Full Table Scan
The great tragedy of science, the slaying of a beautiful theory by an ugly fact.
- Thomas Henry Huxley
Teradata performs Full Table Scans as fast as any other vendor. They fly through data because of their parallel processing design. In a Full Table Scan each AMP must read all their rows for a particular table thus each row of a table is examined once. You can see in the picture on the following page that we use every AMP when you see the words All-AMPs Retrieve. You can see that every row is read from the words All-Rows Scan. So, when you see an All-AMPs Retrieve by way of an All Rows Scan you will know it is merely a Full Table Scan. There is nothing wrong with doing a Full Table Scan if necessary, but it is silly to do a Full Table Scan if you dont need to do it. Some users wont know the Primary Index and write their queries to do a Full Table Scan when they could have used the Primary Index or a Secondary Index in the query. This wastes time and money. If you think you are writing a query that should use the Primary Index or a Secondary Index or only search certain Partitions and you see the Full Table Scan you should investigate what is wrong. Never do a Full Table Scan unless it is the only choice! For example, if you wanted to find the Average Salary of the employees in the Employee_Table you would have to read every row to get that answer set. Doing a Full Table Scan in that case is 100% acceptable! If however you worked in Human Resources and an employee came in for a meeting and you wanted to look their information up in the Employee_Table you should find out the Primary Index of the Employee_Table. If it is the column Employee_Table you should ask the employee what their Employee_No is and then use that Employee_No in the WHERE clause of the SQL. It will be a 1 AMP operation retrieving only 1 row. It is very quick! Some Human Resources users will leave off the WHERE clause and do a Full Table Scan and then scroll down through the huge report until they find the right employee. This is a huge waste of resources in Human Resources. The irony the irony!
402
Copyright OSS 2010
403
Copyright OSS 2010
Explain Primary Index Reads
Winning is a habit. Unfortunately, so is losing.

- Vince Lombardi
The fastest queries in Teradata use the Primary Index column in the WHERE Clause. As you can see in the example of the following page the Primary Index is the Employee_No and it is being used in the WHERE clause of the query. In the EXPLAIN you will see the beautiful words, We do a Single-AMP Retrieve by way of a UNIQUE PRIMARY INDEX. This is the best way to write queries in Teradata and the fastest way for Teradata to retrieve data! Did you know that Teradata doesnt even use SPOOL Space for this query? It just retrieves the row and immediately delivers it to the user.
404
Copyright OSS 2010
405
Copyright OSS 2010
Explain Secondary Index Read
Sometimes a scream is better than a thesis.

- Ralph Waldo Emerson
The fastest queries in Teradata use the Primary Index in the WHERE clause because this is a 1 AMP operation, but the second fastest queries use a Unique Secondary Index. This is called an USI (Unique Secondary Index) query. An USU Query only uses two AMPs. In the picture on the following page you can see that we first show you the CREATE UNIQUE INDEX statement. Now you know that we have created an USI index. Then in the picture we show you the query that uses the secondary index. In the EXPLAIN you can see the statement, We do a Two-AMP Retrieve by way of a UNIQUE INDEX. This means that the Parsing Engine is going to use the secondary index and that this query will be delivered quickly. An USI query is always a two-AMP operation.
406
Copyright OSS 2010
407
Copyright OSS 2010
Explain - View DDL of a Partitioned Table
Colleges are places where pebbles are polished and diamonds are dimmed.
- Robert G. Ingersoll
The picture on the following page shouldnt be part of the EXPLAIN chapter because it is merely showing the table definition, which is often referred to as the Data Definition Language (DDL). I want you to see that our table called Sales_Table_PPI has been partitioned. On the following page we will show you a query and its EXPLAIN that doesnt do a Full Table Scan, but one that reads only certain partitions.
408
Copyright OSS 2010
409
Copyright OSS 2010
Explain Partition Elimination
To teach is to learn twice.

- Joseph Joubert
On the following page you can see that we are querying the Sales_Table_PPI table and the EXPLAIN tells us that we are not doing a Full Table Scan, but are instead only reading 3 Partitions. A Full Table Scan uses All-AMPs but it reads All Rows. A Partitioned query uses AllAMPs, but they dont read All-Rows. Each AMP only reads from 3 Partitions in this example, thus speeding up the query magnitudes of order! The entire purpose of a Partitioned Table is do eliminate the Full Table Scan on that table as often as possible. Each AMP sorts the rows it owns for their table in partitions and this will often eliminate reading all rows. This is called Partition Elimination because some partitions wont have to be read to satisfy the query.
410
Copyright OSS 2010
411
Copyright OSS 2010
Explain Joins with Duplication on all AMPs
I cannot teach anybody anything; I can only make them think.

- Socrates
Teradata joins up to 128 tables in a single query. That is amazing, but something that most people dont know is two things: 1) Teradata joins only two tables at a time 2) The rows being joined must reside on the same AMP Most of the time two rows are being joined they will be on different AMPs and Teradata must move them to the same AMP for the joining process. Teradata will do this by either redistributing one or both of the tables or by duplicating the smaller table on all AMPs. In the EXPLAIN on the following page the Parsing Engine has decided to duplicate the smaller table (Order_Table) on all AMPs. Now each matching Customer_Table row will be able to join directly to the Order_Table.
412
Copyright OSS 2010
413
Copyright OSS 2010
Explain Joins with Redistribution
A new idea is delicate. It can be killed by a sneer or a yawn; it can be stabbed to death by a joke or worried to death by a frown on the right persons brow.
- Charles Brower
Teradata joins up to 128 tables in a single query. That is amazing, but something that most people dont know is two things: 1) Teradata joins only two tables at a time 2) The rows being joined must reside on the same AMP Most of the time two rows are being joined they will be on different AMPs and Teradata must move them to the same AMP for the joining process. Teradata will do this by either redistributing one or both of the tables or by duplicating the smaller table on all AMPs. In the EXPLAIN on the following page the Parsing Engine has decided to do a Full Table Scan on the Order_Table and then Redistribute the Order_Table by the Customer_Number. This will match up the Customer_Number on each AMP from the Order_Table with its associated Customer_Number rows of the Customer_Table. Whenever a join takes place Teradata will need to ensure that matching rows are on the same AMP if they are to be joined. Watch for the words in your EXPLAIN such as Duplicated on all AMPs or Redistributed by and you will know data is moving across the BYNET in order to place the matching rows on the same AMP for the join process.
414
Copyright OSS 2010
415
Copyright OSS 2010
Explain Bit Mapping with multiple NUSIs
There is one thing stronger than all the armies in the world; and that is an idea whose time has come.
- Victor Hugo
One of the most exciting (and rare) things to see in an EXPLAIN is a BMSMS statement. This happens when the columns in the WHERE clause each have a Non-Unique Secondary Index (NUSI). When multiple NUSI columns are ANDed together with the AND Clause the Parsing Engine may decide to build a bit-map. This can really speed up a large query because the PE will only read from the secondary index Subtables and build a bit-map of the process. This is fast, fast, and fast! For a BMSMS to take place the Parsing Engine wants you to Collect Statistics on any column where there is a Non-Unique Secondary Index (NUSI). This gives the Parsing Engine confidence to perform the bit-map process. The bit-map process takes a little longer to set up, but once it is set up it can speed up the querying enormously. Notice that we are using the columns Shipdate and Partkey in our query example. Also notice the AND in between these two columns in the query. Both of these columns have Non-Unique Secondary Indexes (NUSI) on them. When multiple NUSIs in a query are separated by the AND keyword they are considered ANDed together and the bitmapping process can take place.
416
Copyright OSS 2010
417
Copyright OSS 2010
418
Copyright OSS 2010
419
Copyright OSS 2010
Fundamentals of Teradata Joins
As I would not be a slave, so I would not be a master. This expresses my idea of democracy.
- Abraham Lincoln
For two rows to be joined together they must be physically on the same AMP! Wow! That is probably a surprise, but Teradata is an MPP Parallel Processing System. This means that each AMP has its own disk, its own memory, and its own processor. So, like any system the rows will be moved to the AMPs memory and joined in memory. Therefore two rows being joined must be physically together on the same AMP. The picture on the following page shows you some fundamentals of this concept that are very important for you to get inside your brain. First of all rows reside on a particular AMP because of the Primary Index Value. It is the Primary Index that is hashed, checked with the hash map, and distributed to the proper AMP. Most of the time rows from two joining tables wont match up perfectly on the same AMPs so Teradata will redistribute or duplicate the data to make that happen. Then the join can take place. In the beginning this can be a tricky and confusing concept, but we are going to take our time and get this down to a science.
420
Copyright OSS 2010
421
Copyright OSS 2010
A Join Example
It is better to be feared than loved, if you cannot be both.

- Niccolo Machiavelli, The Prince
One the following page notice the arrows pointing to the two columns joined together in the ON Clause. This is the only thing that matters to Teradata when joining tables. Notice that we are joining WHERE Customer_Number = Customer_Number. If the Customer_Number is the Primary Index of both tables then all joining rows will be on the same AMP. Think about it. Lets assume Customer_Number from the Order_Table was the Primary Index and Customer_Number 1 was originally hashed to AMP 1. If Customer_Number was also the Primary Index of the Customer_Table then Customer_Number 1 would also hash to AMP 1. We learned how consistent the hashing process was in the beginning of this book. As a matter of fact when these two tables are joined all matching rows would be on the same AMP because both columns in the ON Clause are the respective Primary Indexes of their tables. This is Teradata Heaven! Most of the time this wont be the case and the Parsing Engine will have to plan for one of the table or both of the tables to be either redistributed in spool or duplicated on all AMPs. I will explain further in upcoming slides.
422
Copyright OSS 2010
423
Copyright OSS 2010
Joins and the Primary Index
Bad officials are elected by good citizens who do not vote.

- George Jean Nathan
The picture on the following page is designed to enforce the point that when both columns in the ON Clause are the Primary Index of their respective tables then no data needs to move because all joining rows reside on the same AMP. Notice the On Clause and notice that we are joining on Cust.Customer_Number = Ord.Customer_Number. Notice that Customer_Number is the Primary Index of both tables. This ensures that all matching rows are on the same AMP. If Customer_Number 1 of the Order_Table is hashed and goes to AMP 1 then Customer_Number 1 of the Customer_Table will also go to AMP 1. If Customer_Number 99 of the Order_Table is hashed and goes to AMP 4 then Customer_Number 99 of the Customer_Table will also go to AMP 4. This consistency makes every matching row go the same AMP together because the hashing formula is consistent, the hash map is consistent, and the process is flawless. On the upcoming pages we will soon see examples that are not so flawless!
424
Copyright OSS 2010
425
Copyright OSS 2010
Redistributing Rows in Spool
The man with the best job in the country is the Vice President. All he has to do is get up every morning and say, Hows the President?.
- Will Rogers
In the next example on the following page notice the ON Clause in the SQL Join example and notice that Customer_Number is the Primary Index of the Customer_Table, but not the Primary Index of the Order_Table. Joining rows will not be on the same AMP. The Parsing Engine will instruct the Order_Table to be redistributed in spool and rehashed temporarily by Customer_Number. This is literally like making the Primary Index of the Order_Table Customer_Number for just this query. Once the hashing is redone the rows of both tables being joined with be on the same AMP. We have tricked the system and now the join can take place. When you see the words Redistribution in the EXPLAIN plan you will now know that Teradata is temporarily changing the Primary Index of a table for just one query.
426
Copyright OSS 2010
427
Copyright OSS 2010
Redistributing Rows of Both Tables
As always, victory finds a hundred fathers but defeat is an orphan.

- Count Galeazzo Ciano, The Ciano Diaries
In the example on the following page I want you to notice the Join example and especially pay attention to the ON Clause. Notice that we are again joining on Cust.Customer_Number = Ord.Customer_Number. I want you to realize in this example that Customer_Number is NOT the Primary Index of either table. So Teradata uses its trick. The Parsing Engine has the AMPs redistribute both tables as if the Primary Index was Customer_Number. Once both tables are rehashed and redistributed temporarily in spool to the AMPs the matching rows will be on the same AMP together. This is expensive in time and resources, but is the only way Teradata can get the matching rows to the proper AMP simultaneously.
428
Copyright OSS 2010
429
Copyright OSS 2010
Duplicating the Smaller Table
Old soldiers never die, they just fade away.

- General Douglas MacArthur
In the beginning years of Teradata the Parsing Engine would always redistribute one or both tables if necessary to bring matching rows together on the same AMP for join purposes. This included redistributing a large table when joining to a small table. Sometimes Teradata would redistribute a billion row table just to join it to a table with four rows. It didnt make sense. Teradata changed the Parsing Engine to include a Big Table Small Table join. Instead of redistributing the big table Teradata found it faster and more cost effective to copy the smaller table to all AMPs. That also satisfies the requirement to have matching rows on the same AMP. The only difference is that the smaller table is copied in its entirety to all AMPs temporarily for just this one query.
430
Copyright OSS 2010
431
Copyright OSS 2010
Quiz How Many Rows are in Spool?
A mans feet should be planted in his country, but his eyes should survey the world.
- George Santayana
The picture on the following page shows an example of a big table small table join. Notice that the Customer_Table only has 4 rows. Notice that the Order_Table has 4,000 rows. Now Notice that both tables have spread the rows of their respective tables evenly across all AMPs. Here is what Teradata is about to do. The Parsing Engine will come up with a plan to join these two tables by first commanding the AMPs to bring back any Customer_Table rows. After this process the Parsing Engine is looking at all four rows of the Customer_Table. The Parsing Engine then copies all 4 rows to every AMP. The following couple of pages will demonstrate this clearly!
432
Copyright OSS 2010
433
Copyright OSS 2010
Quiz Answer How Many Rows in Spool?
Experience teaches only the teachable.

- Aldous Huxley
Notice that we have 4 customers in spool on every AMP. I have placed these at the bottom of the disks in the color yellow. Now the join is ready to take place between the Order_Table in the color green and the Customer_Table which was duplicated in spool on every AMP. Because we have four customers duplicated on four AMPs we have 16 rows in spool (4 rows multiplied by 4 AMPs = 16). The question at the bottom asks, What if we had 100 AMPs? If that were the case we would have 4 rows copied to 100 AMPs thus we would have 400 rows in spool!
434
Copyright OSS 2010
435
Copyright OSS 2010
How Duplication Appears on Every AMP
Experience is the worst teacher; it gives the test before presenting the lesson.
- Vernon Law
As you can see in our example on the following page we didnt move the Order_Table, but duplicated the four rows of the Customer_Table. Now you can see how easy the rows are to join. This shows only one AMP, but you can imagine the same process going on with all AMPs simultaneously because the four rows from the Customer_Table have been duplicated on all AMPs.
436
Copyright OSS 2010
437
Copyright OSS 2010
How Many Rows in Spool with Redistribution?
Great Spirit, help me never to judge another until I have walked in his moccasins.
- Sioux Indian Prayer
The Parsing Engine will decide whether to duplicate the smaller table or redistribute one or both of the tables to make the matching rows appear on the same AMP. The Parsing Engine is a cost-based optimizer and will attempt to do what is easiest, fastest, and moves the least amount of data. The question to be answered is, How many rows will be in spool if the Parsing Engine decides to redistribute the Order_Table by Customer_Number in spool? The answer is on the next couple of pages! Take a guess!
438
Copyright OSS 2010
439
Copyright OSS 2010
Answer to How Many Rows in Spool
When I give a lecture, I accept that people look at their watches, but what I do not tolerate is when they look at it and raise it to their ear to find out if it stopped.
- Marcel Achard
As you can see on the following page there were 4,000 rows in the Order_Table that we can refer to as the Base Table. Then when we redistributed the 4,000 rows by hashing the Customer_Number there were 4,000 rows in spool. When you redistribute a table the exact same amount of rows are merely rehashed. The only difference is that the rows will move to a different AMP. The great news is that they are moved to the same AMPs as their matching counterparts in the Customer_Table.
440
Copyright OSS 2010
441
Copyright OSS 2010
An Example of an AMP with Redistribution
It is important that students bring a certain ragamuffin, barefoot, irreverence to their studies; they are not here to worship what is known, but to question it.
- J. Bronowski, The Ascent of Man
The following page is an excellent example of a join and the importance of matching rows being on the same AMP. Notice first the Customer_Table in blue. Notice that on this AMP that customers 1-8 landed on this AMP. Also realize that when we rehashed the Order_Table (in yellow) by Customer_Number that customers 1-8 landed on this AMP also. Hashing brilliantly places like values together on the same AMP and this allows for Teradata to fly through joins.
442
Copyright OSS 2010
443
Copyright OSS 2010
444
Copyright OSS 2010
445
Copyright OSS 2010
The System Calendar
Dont count the days, make the days count.

- Mohammed Ali
Teradata has a built-in Calendar called the System Calendar. The System Calendar is a table which has one row for each day starting from the dates of January 1, 1900 to December 31, 2100. I guess Teradata is Y2K compliant! The System Calendar is a fantastic tool, especially when your boss says something like, I want to know all orders that happened on a Friday in the first full week of the month during the fourth quarter. This is usually when you update your resume than risk brain overload. The great news is that this is exactly the type of stuff the System Calendar was designed to handle for you. Notice on the following page that I have written SQL that will show you the System Calendar for the date of June 15, 2010. Skip a couple of pages and you will see the results of this query and much better understand the System Calendar.
446
Copyright OSS 2010
447
Copyright OSS 2010
Columns in the System Calendar Views
Those who dance are considered insane by those who cannot hear the music
- George Carlin
The following slide shows the type of information you can attain by using the System Calendar. Notice some of the key entries: Day_of_Week always goes from 1-7 with 1 being a Sunday. Day_of_Calendar are the Julian days since January 1, 1900. Week_of_Month, Week_of_Year, and Week_of_Calendar will have a zero in them for the first partial week. The first full week will have a 1 and so on. Month_of_Calendar and Quarter_of_Calendar also start from the January 1, 1900 date. The rest are fairly self explanatory.
448
Copyright OSS 2010
449
Copyright OSS 2010
How to use the System Calendar with Tables
I cannot imagine any condition which would cause this ship to founder. Modern shipbuilding has gone beyond that.
- E. I. Smith, Captain of the Titanic
The System Calendar is great for simple things like finding out whether you were born on a Saturday night or on a Monday, but the read gold is done when you join the System Calendar to another table in a query. On the following page we have joined our Order _Table with the System Calendar where the Order_Date from the Order_Table is equal to the Calendar_Date from the System Calendar. Once the Join takes place we can use the WHERE Clause to pinpoint the exact calendar information we are looking to query. In our example we wanted all orders placed in January during the first full week of the month that happened on a Friday. How about that for fancy SQL writing?
450
Copyright OSS 2010
451
Copyright OSS 2010
452
Copyright OSS 2010
453
Copyright OSS 2010
Teradata Temporary Tables
Sure Im helping the elderly. Im going to be old myself some day.

- Lillian Carter, in her 80s
Teradata has three basic types of temporary tables. They are the Derived Table, Volatile Table, and Global Temporary Table. Each functions a little differently. The most popular is the Derived Table. This is a temporary table that is implemented inside users SQL and only exists for the life of the query. It is materialized in the users spool and automatically deleted at query end. Volatile tables can be created by any user and are materialized with an INSERT/SELECT statement. The table is only available to the user who created the volatile table and is available to that user until the user logs off their session. This also takes up the users spool. Global Temporary Tables are CREATED by a user and they are materialized in Temp Space. Any user with Temp Space can create a Global Temporary Table. The table structure will exist until the user DROPS the table. What is interesting about the Global Temporary Table is that after it is populated the data is available until the user logs off the session, but the table definition stays. Many users can use the table definition if they have Temp Space and each user who performs an Insert/Select on the table will in a sense have their own version of the table that only they can access.
454
Copyright OSS 2010
455
Copyright OSS 2010
Derived Tables
My best friend is the one who brings out the best in me.
- Henry Ford
On the following page you can see an example of a Derived Table. Derived Tables are created inside a users query for only the life of that query. Notice that after the FROM Clause we have placed brackets (colored in yellow) around the derived query. We have also named the Derived Table TeraTom and then placed a name for the single column we have created (called AVGSAL) inside the Derived Table TeraTom. We can use the column AVGSAL in the SELECT list or later on in the WHERE Clause. Derived Tables are often used with aggregates and serve as a temporary space to query and hold data for the life of a query. In our example on the following page we were able to compare the Average Salary of all employees to see who was making more than the average salary.
456
Copyright OSS 2010
457
Copyright OSS 2010
A Query Pictorial Example with a Derived Table
Never go to a doctor whose office plants have died.

- Erma Bombeck
The example on the next page shows a pictorial of the AMPs and their disks utilizing a Derived Table. The derived table was created and holds the Average Salary for all employees. As you can see on the picture the Derived Table is called TeraTom and it holds a single column called AVGSAL. The Average Salary happened to be $59,583.00. Now I want you to notice Spool 1. This is the answer set from each AMP. This comes from comparing the Employee_Table rows (seen at the top of each disk) and each Employees Salary with the Average Salary stored in the Derived Table. Each employee who is making more than the average salary will be placed inside Spool 1 so it can be returned to the user.
458
Copyright OSS 2010
459
Copyright OSS 2010
Volatile Tables
If all my possessions were taken from me, with the exception of one, I would choose to keep the power of Speech. For with it, I would soon regain all the rest.
- Daniel Webster
Notice the picture on the following page and the 3 steps in using a Volatile Table. The first step is to CREATE the Volatile Table. This table will now be available until the end of the session or if the user decides to DROP the Volatile Table. The second step is to populate the Volatile Table with an INSERT/SELECT statement. Now the fun actually starts with the third step because this table is now available for the user to query. Only the user who created the Volatile Table has access to it. That user can run an endless amount of queries and joins against their Volatile Table until session end. All Volatile Tables use the users spool space to populate the table.
460
Copyright OSS 2010
461
Copyright OSS 2010
How to Populate a Volatile Table
The brain is a wonderful organ. It starts working the moment you get up in the morning, and does not stop until you get into the office.
- Robert Frost
The following page shows an excellent example of the first two steps and they are in the CREATE statement of the Volatile Table and in the materialization of the Volatile table with an INSERT/SELECT. Now I want you to notice that our Volatile Table named Aggy is in materialized in Spool Space, but appears much like a real table. The rows are spread evenly across all the AMPs and this table is ready for action. It can be queried or joined to other tables.
462
Copyright OSS 2010
463
Copyright OSS 2010
Global Temporary Tables
In case youre worried about whats going to become of the younger generation, its going to grow up and start worrying about the younger generation.
- Roger Allen
The example of the following page shows all three steps for using a Global Temporary Table, but the actual CREATE (step 1) only needs to be performed once by the original CREATING User. After that CREATE statement the table structure will remain in Teradata until the user who created it actually DROPS the table. Global Temporary Tables can be used by any user who has Temp Space. Many users can perform step 2 simultaneously and each will have their own copy of the Global Temporary Table and after their session has ended the data will automatically be deleted, but the table structure will remain.
464
Copyright OSS 2010
465
Copyright OSS 2010
A Pictorial of a Global Temporary Table
There are three great friends: an old wife, and old dog, and ready money.
- Benjamin Franklin
In the picture on the following page you can see we have done an INSERT/SELECT into our Globaggy table. This table is a Global Temporary Table that was created previously. The table is now materialized with data inside Temp space. The rows of the table are spread evenly across the AMPs and this table can be queried or joined with other tables until session end. If 1,000 users did an INSERT/SELECT on GlobAggy then all 1,000 users would have their own version of this table and nobody else can access another users copy.
466
Copyright OSS 2010
467
Copyright OSS 2010
What Happens to Global Tables after the Session
Forgiveness does not change the past, but it does enlarge the future.
- Paul Boese
The following page shows that the user has logged off their session and the Globaggy table that used to be filled with data automatically deletes the data, but the table structure stays available on Teradata. The data is gone, but the table structure stays. Why? Check out the next couple of pages for the answer.
468
Copyright OSS 2010
469
Copyright OSS 2010
Global Temporary Tables and Temp Space
Youre alive. Do something. The directive in life, the moral imperative was so uncomplicated. It could be expressed in single words, not complete sentences. It sounded like this: Look. Listen. Choose. Act.
- Barbara Hall, A summons to New Orleans, 2000
The Global Temporary Table structure is not deleted at session end. Only the data inside the table is deleted at session end. Why? So many users can materialize their own version of the Global Table. This helps in multiple ways: 1) Users populate the table using their Temp Space and then have more Spool Space to actually query the table. 2) Most users wont have PERM Space so they cant create tables or may not know the syntax to create a table. The table structures have been created for them and they are now ready to merely perform an INSERT/SELECT to populate their own version of the table.
470
Copyright OSS 2010
471
Copyright OSS 2010
472
Copyright OSS 2010
473
Copyright OSS 2010
V13 No Primary Index Tables
No one is so generous as he who has nothing to give.

French Proverb
New in Teradata V13 the DBA has the ability to CREATE tables without a Primary Index! These tables are designed to merely spread the rows randomly and evenly. They are called NoPI tables, which stands for No Primary Index tables. A NoPI table is designed for ETL staging tables so data can be quickly transferred from flat files taken from operational systems such as Oracle or DB2. This might be data that needs to be massaged or transformed. Then once the transformation has been completed the DBA can write an INSERT/SELECT command and quickly load the data inside the stating table into a Teradata table that has a Primary Index. Although you can query or JOIN a NoPI table with a traditional table containing a Primary Index they are really meant to quickly import data inside Teradata temporarily so it can be transformed inside Teradata and then loaded inside the data warehouse tables.
474
Copyright OSS 2010
475
Copyright OSS 2010
NoPI CREATE Statement
The Constitution only gives people the right to pursue happiness. You have to catch it yourself.
Ben Franklin
On the following page you can see the NoPI CREATE statement. This is done when you create the table. This can be done with normal SQL as seen on the following page or it can be done with a FastLoad or Tpump Load Utility. The key word to focus on the following page is the NO PRIMARY INDEX highlighted for your convenience.
476
Copyright OSS 2010
477
Copyright OSS 2010
NoPI Row-ID Increments the Uniqueness Value
Its not the size of the dog in the fight, but the size of the fight in the dog.
Archie Griffin
Each AMP will receive an equal amount of rows in an attempt by the Parsing Engine to spread the data evenly. Notice the picture on the following page. The Row Hash for every row in the NoPI table is the same. Only the Uniqueness Value is incremented.
478
Copyright OSS 2010
479
Copyright OSS 2010
NoPI Row-Hash Different on each AMP
When all you have is a hammer, you tend to see every problem as a nail.
- Abraham Maslow
The example on the next page allows you to realize that the Row Hash on each AMP is different, but once the Row Hash is established on each AMP, all rows contain that exact same Row Hash and each AMP only increments the Uniqueness Value. NoPI tables dont need to be sorted and that is another main advantage if you desire to CREATE a staging table.
480
Copyright OSS 2010
481
Copyright OSS 2010
NoPI Options and Facts
Failure accepts no alibis. Success requires no explanation.

Robert Rose
The example on the next page describes the options and facts about NoPI Tables.
482
Copyright OSS 2010
483
Copyright OSS 2010
NoPI Restrictions
He who asks a question may be a fool for five minutes, but he who never asks a question remains a fool forever.
Tom Connelly
The example on the next page shows the restrictions of NoPI Tables.
484
Copyright OSS 2010
485
Copyright OSS 2010
486
Copyright OSS 2010
487
Copyright OSS 2010
Write Ahead Logging (WAL)
The reputation of a thousand years may be determined by the conduct of one hour.
Japanese Proverb
Teradata has traditionally taken a Before Picture of any row being UPDATED or DELETED. This was always called the Transient Journal and Data Integrity was its main purpose. If a transaction was going to UPDATE or DELETE a row the BEFORE PICTURE would be taken of the row and stored in a journal in case a ROLLBACK was done or in case there was a glitch in the system Now this function is done by the Write Ahead Log or WAL. There are two main pieces to WAL and that is the Wal Log and the Wal Depot. The WAL Log takes a BEFORE and AFTER Picture of a row being UPDATED and each AMP has their own WAL log to make sure that Teradata can Rollback a transaction or Commit the transaction. The WAL Depot stores blocks of UPDATED row(s) it receives from FSG Cache to provide a backup copy of the changes and the COMMIT is considered done. Teradata can then WRITE the block of data to the actual table on disk when it deems it a good time to do so. Teradata uses the WAL Log and WAL Depot for transaction integrity in order to Commit or Rollback the data.
488
Copyright OSS 2010
489
Copyright OSS 2010
AMPs have FSG Cache for the Memories
Never insult an alligator until after you have crossed the river.
African Proverb
Memory is a thousand times faster than disk so AMPs attempt to store hot tables inside memory for fast retrieval. This memory dedicated to each AMP is called File System Generating Cache (FSG Cache). This pool of memory is like each AMP having its own swimming pool that only it can use. Lets imagine that 100 users have been querying the Order_Table. Each AMP will be reading the Order_Table hundreds of times in order to provide answer sets back to the users. All AMP will attempt to keep their Order_Table rows inside the FSG cache in order to speed up reads and writes thousands of times. Remember that each AMP has their own FSC Cache memory, their own WAL Log and their own WAL Depot.
490
Copyright OSS 2010
491
Copyright OSS 2010
An Example of an UPDATE Statements
Let every nation know, whether it wishes us well or ill, that we shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe, in order to assure the survival and the success of liberty
-John F. Kennedy (Inaugural Address 1961) I want to run you through the process of UPDATING a row. The picture on the following page shows the UPDATE statement at the top. Notice the Department_Table inside the AMP. We are going to UPDATE this row and change the Dept_Name from Human to HR. Turn the page and see what happens next!
492
Copyright OSS 2010
493
Copyright OSS 2010
AMP Local WALs
The believer is happy, the doubter wise

Greek Proverb
The example on the next page is designed to introduce the AMPs WAL Log. This WAL Log stores a BEFORE image of the entire row being updated in order to backup the row in case something goes wrong. With a backup copy of the row Teradata is confident it can Rollback the UPDATE and put things back exactly the same way they were BEFORE the transaction. The BEFORE picture inside the WAL Log is called the UNDO record because it is designed to UNDO an attempted change to the row. This is the WHOOPs I made a mistake button. Once the transaction has been completed the row in the WAL Log can be erased.
494
Copyright OSS 2010
495
Copyright OSS 2010
AMPs UPDATE Rows in FSG Cache
I have found the best way to give advice to your children is to find out what they want and then advise them to do it.
Harry S. Truman (1884 - 1972)
When an AMP is commanded to UPDATE a row that AMP finds the row inside a data block on its virtual disk and transfers the block inside the node into FSG Cache. Now the AMP can process the rows as fast as lightning. This process of moving data inside FSG Cache is how an AMP can READ, UPDATE, or DELETE a row.
496
Copyright OSS 2010
497
Copyright OSS 2010
Write to WAL then Write Back to Disk
You are educated when you have the ability to listen to almost anything without losing your temper or self-confidence.
Robert Frost
The WAL Log takes a BEFORE picture so a ROLLBACK can be performed and it takes an AFTER picture as a backup to ensure integrity temporarily until the AMP physically writes the data back to its virtual disk. The AMP will UPDATE the row inside its FSG Cache, then write the AFTER image of the row to the WAL log. Now the WAL log contains a BEFORE and AFTER picture of the row. The AMP cans send a message to the PE that the row has been updated. The row really hasnt been completely updated because it hasnt physically been written back to the AMPs disk. Only the WAL Log rows were written to the AMPs disk. The AMP is confident that it can write the update back to its physical disk for permanent storage when the AMP deems it most efficient to write the row back. Plus the AMP knows it has added insurance because the WAL log has both the BEFORE and AFTER image. Even in a disaster where the Teradata System goes down the AMP knows when Teradata is rebooted it can complete the transaction of writing to disk by using the WAL log to catch-up and complete the COMMIT or ROLLBACK.
498
Copyright OSS 2010
499
Copyright OSS 2010
The WAL Depot
Even if I knew that tomorrow the world would go to pieces, I would still plant my apple tree.
Dr. Martin Luther King, Jr.
The AMP could have many updates to rows inside a block of data. Before the AMP writes the block back to its physical disk it writes the entire block to the WAL Depot. Now it has a backup of the entire block of data in case something goes wrong. If the AMP writes the data from FSG Cache back to its physical disk successfully the WAL Depot backup copy can be erased. The WAL Depot only serves as a backup copy in case a problem should occur.
500
Copyright OSS 2010
501
Copyright OSS 2010
Clearing out the Wal Depot and the Wal Log
You dont drown by falling into the water; you drown by staying in the water.
-Edwin Louis Cole
The example on the next page shows a pictorial of the erasing of the WAL Log and WAL Depot. The changes have been made and there is no more reason to have a backup of the rows and the blocks that were changed. These have been written back to the AMPs physical disk successfully. Think of the WAL Log and WAL depot as wearing a seat belt when you are driving in a car. You take off your seat belt when you get home and leave the car dont you?
502
Copyright OSS 2010
503
Copyright OSS 2010
504
Copyright OSS 2010
505
Copyright OSS 2010
V13 Teradata Virtual Storage (TVS)
To succeed... you need to find something to hold on to, something to motivate you, something to inspire you.
-Tony Dorsett
Teradata Virtual Storage or TVS for short is one of the most exciting improvements Teradata has made. TVS changes the way AMP access their disks. TVS manages the disks for each AMP. This will be explained throughout the chapter, but the following page shows some of the topics and advantages that TVS brings to Teradata.
506
Copyright OSS 2010
507
Copyright OSS 2010
AMPs in the 1980s Alone we can do so little; together we can do so much.

Helen Keller
In the 1980s and early 1990s an AMP connected to its physical disks with JBOD technology. This JBOD term stood for Just a Bunch of Disks (JBOD). It meant that four disks were available for the AMP to store its data rows. The disks did not provide any protection features such as RAID so any single disk failure could be a disaster. Back in the early days the disks werent exactly reliable either so Teradata tables were FALLBACK Protected almost always. FALLBACK means that an AMP stores a backup or FALLBACK copy of its rows on other AMPs within its cluster.
508
Copyright OSS 2010
509
Copyright OSS 2010
AMPs in the 1990s
Looking to the stars always makes me dream, as simply as I dream over the black dots representing towns and villages on a map. Why, I ask myself, shouldn't the shining dots of the sky be as accessible as the black dots on the map of France?
-Vincent Van Gogh
In the 1990s an AMP still had one Virtual Disk and four Physical Disks, but the disks were mirrored. The AMP would store data on one disk and then mirror that disk in case of a failure. As you can see on the following page each AMP had two disks for data and two mirrored disks. FALLBACK wasnt a necessity anymore because of the disk protections.
510
Copyright OSS 2010
511
Copyright OSS 2010
Data Blocks and Cylinders make up a Disk
I never lost a game; time just ran out on me.

Michael Jordan
Each AMP stores their data blocks inside cylinders on the disk. Each disk is made up of thousands of cylinders and data blocks are stored inside the cylinders.
512
Copyright OSS 2010
513
Copyright OSS 2010
Cylinders are dedicated to Perm, Spool, etc.
Ones dignity may be assaulted, vandalized, and cruelly mocked, but it cannot be taken away unless it is surrendered.
- Michael J. Fox
Different cylinders store different types of data. For example, some cylinders will be used to hold Permanent Data, while completely other cylinders will be used for Spool files. The following page shows you the type of data used in cylinders. You wont have a cylinder share. This means you cant use a single cylinder to store permanent data and spool data simultaneously. Once the first row of a table is written to a cylinder as PERM Space that cylinder cant also be used to store Spool Files. Cylinders are dedicated to PERM, SPOOL, TEMP, Permanent Journals, and for the WAL Logs.
514
Copyright OSS 2010
515
Copyright OSS 2010
Outside Disk Tracks are much Faster
Make sure you have finished speaking before your audience has finished listening.
- Dorothy Sarnoff
The example on the next page shows cylinders that sit on top of a disk platter. Notice the outside of the disk and see how many more cylinders there are versus the inside track of cylinders. This makes the outside track faster because with one revolution of the spinning of the disk the system can read so many more cylinders on the outside track. I want you to merely realize that the outer track reads and writes are considered faster and the inner tracks, which hold less cylinders are considered the slower tracks.
516
Copyright OSS 2010
517
Copyright OSS 2010
AMPs assigned Disk Cylinders, not Entire Disks
Theres no point in being grown up if you cant be childish sometimes.

- Doctor Who
Teradata TVS uses software to manage the disks for the AMPs. This software is called VSS. The VSS software assigns cylinders to the AMPs. In the past, AMPs were assigned entire disks and each AMP owned all the cylinders inside their disks. Now a single disk can be allocated cylinder by cylinder to all AMPs within the Clique. This will prove to be helpful in many ways. Keep reading!
518
Copyright OSS 2010
519
Copyright OSS 2010
Hot, Warm, and Cold Data
It doesnt make a difference what temperature a room is, its always room temperature.
- Steven Wright
TVS will place data that is being accessed often on the outer tracks of the disks. This is done so Teradata users can feel the need for speed. TVS will also place the data that is not being accessed very often on the inside tracks of the disk. This is called a MultiTemperature data warehouse. TVS gathers metrics automatically about how often a cylinder is accessed and moves the data blocks inside cylinders accessed the most to the outer tracks of the disk to improve the access speeds.
520
Copyright OSS 2010
521
Copyright OSS 2010
The old way Teradata had to add Disk Space
They always say time changes things, but you actually have to change them yourself.
- Andy Warhol
In the past you needed to pretty much double your disk space if you needed more space added to your Teradata system. Notice in the picture on the following page that the system on the top of the picture shows each AMP connected to only two physical disks. The upgraded system doubled the disks to four and we doubled our system size. This is considered very expensive.
522
Copyright OSS 2010
523
Copyright OSS 2010
Doubling the Disk Capacity
The only difference between a rut and a grave is in their dimensions.

- Ellen Glasglow
Teradata also allowed for replacing smaller disks with larger disks. In the picture on the following page you can see the top system has 146 GB disks and the upgraded picture on the bottom of the page shows 300 GB disks. This is again pretty much doubling the space on your system. TVS will allow for much smaller increments of space. Read on!
524
Copyright OSS 2010
525
Copyright OSS 2010
Incremental Disk Growth Is Here
Where facts are few, experts are many.

- Donald R. Gannon
The example on the next page shows how TVS can add additional space to a Teradata system by adding a mixture of disk sizes and incremental disk additions. TVS can take a single disk and assign each AMP in the clique certain cylinders. Teradata used to assign AMPs physical disks, but TVS assigns AMPs to certain cylinders. How clever! One concept stays the same. Each AMP has its own virtual disk that only it can access. Instead of reading and writing to cylinders on four dedicated disks the AMPs read and write to cylinders managed by TVS.
526
Copyright OSS 2010
527
Copyright OSS 2010
Mixed Disks and Solid State Drives
Just because something doesnt do what you planned it to do doesnt mean its useless.
- Thomas Edison
This is the most exciting part about TVS. Teradata is mixing Solid State Drives with traditional spinning disk drives. Solid State Drives are faster because they use Flash Drive technology. Yes, we are talking about the same flash drives you have used to copy a file from one computer to another. These are called Solid State Drives or SSD Drives and they are 100 times faster than traditional disks. This is really like having memory speed on physical disk. TVS will place the hot data on the hot Solid State Drives, the warm data on the faster 146 GB disks, and the data that isnt accessed very much, referred to as cold data on the larger slower spinning disks. This is referred to as a Multi-Temperature data warehouse.
528
Copyright OSS 2010
529
Copyright OSS 2010
Solid State Drives are like Giant Flash Drives
It isnt the mountains ahead that wear you out, its the grain of sand in your shoe.
- Robert Service
The goal for Teradata is eventually have nothing but Solid State Drives for its storage, but the costs are too high. This will eventually happen when the costs become lower.
530
Copyright OSS 2010
531
Copyright OSS 2010
Virtual Storage Metrics
Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house and a collection of facts is not necessarily science.
- Henri Poincare
TVS gathers metrics about cylinders so it can determine the hot, warm, and cold data. This is done in the background. TVS will actually move about 10% of the data each week to the appropriate disk types and appropriate tracks on disks. The DBA can also command the system to move the data inside the cylinders to their respective hot, warm, or cold areas.
532
Copyright OSS 2010
533
Copyright OSS 2010
The Two Modes of Virtual Storage
Good design cant fix broken business models.

- Jeffrey Veen
TVS has two modes and they are TT (Teradata Traditional) and Intelligent Placement. Current customer upgrading from a previous Teradata version will use TT mode. New customers will use Intelligent Placement. The specifics of both are listed on the following page.
534
Copyright OSS 2010
535
Copyright OSS 2010
536
Copyright OSS 2010
537
Copyright OSS 2010
538
Copyright OSS 2010
539
Copyright OSS 2010
540
Copyright OSS 2010
541
Copyright OSS 2010
542
Copyright OSS 2010
What is a Row Hash Lock?

A Row Hash lock always involves a 1-AMP operation where the Primary Index is utilized in the WHERE clause of the query. Instead of locking the entire table and possibly making other users wait Teradata will only lock the rows that have the same Row Hash as the value in the WHERE clause. In our example you can see that we want to SELECT * from the Employee_Table WHERE Last = Jones. Since the column Last is the Primary Index of the Employee_table the Parsing Engine comes up with a plan that is a 1-AMP operation. It knows which AMP holds all the rows where the last name is Jones. Since it is a SELECT statement Teradata places a READ lock at the Row Hash level. As you can see in the picture below all Row Hash values of 0001 are locked and the query can be satisfied without locking the entire table. Even the other rows on AMP 55 are still accessible by other users.
Primary Index
AMP 55
First Joe Mary Dave Sandy Sue Bill Jill Ty Mo Jay Mick May Jan Hanna Hans Tan Emp# 61 65 63 3 7 51 68 69 24 49 22 8 11 12 67 23 Dept 10 20 30 30 10 20 40 30 20 10 30 20 30 20 30 40 Salary 50000.00 64000.50 84000.60 90490.90 25000.50 26089.40 85000.40 65876.40 86900.40 58000.50 45000.40 86000.89 65000.50 56450.00 98654.00 87659.50
2
A Row Hash Lock is placed on all rows on AMP 55 that have a Row Hash of 0001.
Row_ID 0001,1 0001,2 0001,3 0001,4 0001,5 0001,6 0101,1 0110,1 0111,1 1000,1 1001,1 1010,1 1011,1 1100,1 1101,1 1110,1
Last Jones Jones Jones Jones Jones Jones Bjorn Patel Noone Gore Samson Ruler Baker Doron Mistel Wan
1
The PE knows that Last is the Primary Index. It hashes Jones and the Row Hash is 0001. The PE now knows AMP 55 holds the row(s) in this 1-AMP Operation.
SELECT * FROM Employee_Table WHERE Last = Jones ;
543
Copyright OSS 2010
Chapter 6 Loading the Data
My son once told me he did not feel like studying. I said to him, When Abraham Lincoln was your age, he studied by candlelight. My son retorted, When Abraham Lincoln was your age, he was president. Data within a warehouse environment is often historic in nature, so the sheer volume of data can overwhelm many systems. But, not Teradata!
Abraham Lincoln will go down as one of the greatest presidents in history, but Teradata is even better because it will not go down when it loads history.
Tom Coffing, 1st president of Coffing Data Warehousing
Teradata is so advanced in the data-loading department that other database vendors cant hold a candle to it. A Teradata data warehouse brings enormous amounts of data into the system. This is an area that most companies overlook when purchasing a data warehouse. Most company officials think loading of data is simply that just loading data. Some people actually ask, Are data loads that critical? Come on, ASCII stupid question and get a stupid ANSI. Data warehouses fail because customer cannot load the data fast enough once it reaches a certain volume. As one Teradata developer said, It is not the load that brings them down, but the way they carry it. Even an experienced body builder must use a good technique to lift the weight over his head. While most database vendors are new to the data warehouse game, Teradata has had 15 years of experience of loading the largest data
544
Copyright OSS 2010
Master the Teradata Architecture warehouses in the world. The combination of FastLoad, MultiLoad, and TPump can load millions, even billions of records in record time. (SHOULD WE HAVE A HEADER???) FastLoad is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table. This is how a Teradata table is populated the first time. I have personally seen Teradata load over one billion large rows in less than 6 hours. Plus, I have seen Teradata load millions of rows in minutes. How is Teradatas speed and performance accomplished? Once again its through the power of parallel processing. Where FastLoad is meant to populate empty tables with INSERTs, MultiLoad is meant to process INSERTs, UPDATEs, and DELETEs on tables that have existing data. MultiLoad is extremely fast. One major Teradata data warehouse company processes 120 million inserts, updates, and deletes nightly during its batch window. The TPump utility is designed to allow OLTP transactions to immediately load into a data warehouse. When I started working with Teradata, more than 10 years ago, most companies loaded data on a monthly basis. Suddenly, companies began to load data weekly. Today, most companies load data nightly, and industry leaders are loading data hourly. TPump is the beginning step of an Active Data Warehouse (ADW). ADW combines OLTP transactions with the power of a Decision Support System (DSS). The TPump utility theoretically acts like a water faucet. TPump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse daily rush hour. It can also be automatically preset to load levels at certain times during the day, and can be modified at any time. Also, TPump locks at a row level so users have access to the rest of the rows while the table is being loaded. Another advantage of this load utility is that it allows for multiple updates to be conducted on a table simultaneously. When the utilities start, the Parsing Engine comes up with a plan for the AMPs. The Parsing Engine then steps back and lets the AMPs do their work. The data is loaded in large 64K blocks. Each AMP is given a 64K block of rows for loading. Like a line of workers trying to pass sand bags to prevent a flood, Teradata passes these blocks from AMP to AMP until all the data is on Teradata. Next, all AMPs take the blocks they received and hash the Primary Index value sending the rows over the BYNET to their destination AMP. Once this is done, each AMP sorts its data by Row ID and the table is ready for business.
545
Copyright OSS 2010
FastLoad
If you are all wrapped up in yourself, you are overdressed

Kate Halverson
The Teradata FastLoad utility is wrapped up in your data and even though it appears under dressed without fancy dressings it is one of the best utilities every built. It may not be dressed to kill, but it is designed to thrill! FastLoad is actually designed to load flat file data from a mainframe or LAN directly into an empty Teradata table. This is how a Teradata table is populated the first time. I have personally seen Teradata load over one billion large rows in less than 6 hours. Plus, I have seen Teradata load millions of rows in minutes. Teradata has the quickest time to solution, and has the most powerful performance in the data warehousing industry. How is Teradatas speed and performance accomplished? Its done through parallel processing. FastLoad understands one SQL command - INSERT. It inserts rows into an empty table. The process is as follows: A flat file is prepared for loading on a mainframe or LAN. The FastLoad utility needs three pieces of information to process: where the flat file located, what is its file definition, and what table the data should be loaded into in Teradata. When the FastLoad utility starts, the Parsing Engine comes up with a plan for the AMPs. The Parsing Engine then steps back and lets the AMPs do their work. The data is loaded in large 64K blocks. Each AMP is given a 64K block of rows for loading. Like a line of workers trying to pass sand bags to prevent a flood, Teradata passes these blocks from AMP to AMP until all the data is on Teradata. Next, all AMPs take the blocks they received, hash the rows in those blocks (in parallel) and send the rows to the proper AMP over the BYNET. Once this is done, each AMP sorts its data by Row ID and the table is ready for business. FastLoad Basics: Loads data to Teradata from a Mainframe or LAN flat file; Only one table may be loaded at a time; The table to be loaded must be empty; There can be no secondary indexes, referential integrity, or triggers; It locks at the table level.
FastLoad populates empty tables at the block level. Teradata LOADs using FastLoad.
546
Copyright OSS 2010
FastLoad Picture
Input File from Mainframe or LAN 64K Block 64K Block 64K Block 64K Block
Teradata
PE
AMP
AMP
AMP
AMP
Fastload inserts into empty tables at the Block Level. No Secondary Indexes, Referential Integrity or Triggers allowed.
AMP AMP AMP AMP
Empty Table
Empty Table
Empty Table
Empty Table
547
Copyright OSS 2010
Multiload
No wonder nobody comes here Its too crowded

Yogi Berra
Tera-Tom has actually had dinner with Yogi and he was a real pleasure. As an AllAmerican Athlete who placed third in the NCAAs for the University of Arizona in 1979 Tera-Tom got to spend some time with Yogi. Yogi is a lot like Multiload. He is fast on his feet, is extremely versatile, and he knows a little bit about clean-up. Multiload can handle the high heat or the curve when inserting, updating or deleting data. Where FastLoad is meant to populate empty tables with INSERTS, Multiload is meant to process INSERTS, UPDATES, and DELETES on tables that have existing data. Multiload is extremely fast. One major Teradata data warehouse company processes 120 million inserts, updates, and deletes during its nightly batch. Multiload works similar to FastLoad. Data originates as a flat file on either a mainframe or LAN. When the Multiload utility is executed, the Parsing Engine creates a plan for the AMPs to follow. The data is then passed to the AMPs, in parallel, in 64K blocks, and the AMPs hash the rows to the proper AMP. Last, the INSERTS, UPDATES, and DELETES are applied. In the previous diagram the mainframe/LAN is talking to the Parsing Engine. The PE passes the data across the BYNET for the AMPs to retrieve. Keep in mind, many systems have hundreds to thousands of AMPs. The load takes place, continually, in parallel when the 64K packets are delivered to the AMPs. Multiload has been designed for users who have a need for speed. Multiload locks at the table level. Therefore, while Multiload is running, the table is unavailable unless users utilize an Access Lock. Multiload Basics: Loads data to Teradata from a Mainframe or LAN flat file; Up to 20 INSERTS, UPDATES, or DELETES may be executed on up to 5 tables; Receiving tables are usually populated; There can be no Unique secondary indexes, referential integrity, or triggers; It locks at the table level.
Multiload loads to populated tables at the block level. Teradata UPDATEs using MULTILOAD.
548
Copyright OSS 2010
Multiload Picture
Input File from Mainframe or LAN 64K Block 64K Block 64K Block 64K Block
Teradata
PE
AMP
AMP
AMP
AMP
Multiload inserts, updates, upserts and deletes rows into populated tables at the Block Level. It does not allow Triggers, Unique Secondary Indexes (USIs) or Referential Integrity.
AMP AMP AMP AMP
Populated Table
Populated Table
Populated Table
Populated Table
549
Copyright OSS 2010
TPump
You dont drown by falling into the water; you drown by staying in the water.
-Edwin Louis Cole
The TPump utility is designed to allow OLTP transactions to immediately load into a data warehouse. When I started working with Teradata, more than 10 years ago, most companies loaded data on a monthly basis. Suddenly, companies began to load data weekly. Today, most companies load data nightly, and industry leaders are loading data hourly. TPump is the beginning step of an Active Data Warehouse (ADW). ADW combines OLTP transactions with a Decisions Support System (DSS). If the data is not flowing, a company can drown in it! The utility is called TPump because it theoretically acts like a water faucet. TPump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour. It can also be automatically preset to load different levels at certain times during the day, and can be modified at any time. Also, TPump locks at a row level so users have access to the rest of the rows while the table is being loaded. Basics: Loads data to Teradata from a Mainframe or LAN flat file; Processes INSERTS, UPDATES, or DELETES; Tables are usually populated; It can have secondary indexes, triggers, and referential integrity; It locks at the row level.
TPump is used for continuous updates to rows in a table. Teradata STREAMs using TPump.
550
Copyright OSS 2010
TPump Picture
Input File from Mainframe or LAN Packets Packets Packets Packets
Teradata
PE
AMP
AMP
AMP
AMP
Tpump inserts, updates, upserts and deletes rows into populated tables at the Row Level. It supports Triggers, all Secondary Indexes and Referential Integrity.
AMP AMP AMP AMP
Populated Table Row Level Locks
551
Copyright OSS 2010
FastExport
The most exciting phrase to hear in science, the one that heralds the most discoveries, is not Eureka!, but Thats funny
Isaac Asimov
The most exciting words when loading or unloading data is That Fast. Put a seat belt on before running FastExport because this utility will blow your socks off. FastExport is designed to export Teradata data to a flat file on a mainframe or LAN. FastExport merely takes an SQL Select command and places the output to a host. FastExport exports data from multiple tables and exports data to a host file. Teradata LOADs using FASTLOAD Teradata UPDATEs using MULTILOAD Teradata STREAMs using TPump Teradata Exports using FASTEXPORT
552
Copyright OSS 2010
FastExport Picture
Output to a Mainframe or LAN
Teradata
PE
Host File
AMP
AMP
AMP
AMP
Fastexport uses a SELECT statement to retrieve rows from one or more tables and exports the result set to a host file on a mainframe or LAN.
AMP AMP AMP AMP
Populated Table
Populated Table
Populated Table
Populated Table
553
Copyright OSS 2010
554
Copyright OSS 2010
555
Copyright OSS 2010

Mastering Teradata

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mastering Teradata

Hochgeladen von

Copyright:

Verfügbare Formate

Tera-Tom on Teradata Basics for V13

Understanding is the key!

Published by Coffing Publishing

Printed in the United States of America

About Coffing Data Warehousings CEO Tom Coffing

Copyright OSS 2010

Copyright OSS 2010

Copyright OSS 2010

Copyright OSS 2010

Copyright OSS 2010

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Chapter 1 The Teradata Architecture

Let me once again explain the rules. Teradata rules!

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

The Parsing Engine

Fall seven times, stand up eight.

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Not all who wander are lost.

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Only he who attempts the ridiculous may achieve the impossible.

Copyright OSS 2010

Mastering the Teradata Architecture

Teradata never lays out data like this!

Teradata lays out data like this!

Copyright OSS 2010

Mastering the Teradata Architecture

A Journey of a thousand miles begins with a single step.

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

No wonder nobody comes here Its too crowded

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Logical Modeling Primary and Foreign Keys

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Physical Modeling - The Primary Index

Copyright OSS 2010

Mastering the Teradata Architecture

Copyright OSS 2010

Mastering the Teradata Architecture

Two Types of Primary Indexes (UPI or NUPI)

A man who chases two rabbits Catches none.