Beruflich Dokumente
Kultur Dokumente
First Edition
First Edition May, 2010 Web Page: www.Tera-Tom.com and www.CoffingDW.com E-Mail address: Tom.Coffing@CoffingDW.Com Written by W. Coffing
Teradata, NCR, BYNET, V2R3, V2R4, V2R5, V2R6 are registered trademarks of NCR Corporation, Dayton, Ohio, U.S.A., IBM and DB2 are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards Institute. In addition to these products names, all brands and product names in this document are registered names or trademarks of their respective holders. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manual is not a publication of NCR Corporation, nor was it produced in conjunction with NCR Corporation. Copyright May 2010 by Coffing Publishing All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of information contained herein. Coffing Publishing International Standard Book Number: ISBN 0-9704980-8-X
Mr. Coffing has also published over 30 data warehousing articles and has been a contributing columnist to DM Review on the subject of data warehousing. He wrote a monthly column for DM Review entitled, "Teradata Territory". He is a nationally known speaker and gives frequent seminars on Data Warehousing. He is also known as "The Speech Doctor" because of his presentation skills and sales seminars. Tom Coffing has taken his expert speaking and data warehouse knowledge and revolutionized the way technical training and consultant services are delivered. He founded CoffingDW with the same philosophy more than a decade ago. Centered around 20 Teradata Certified Masters this dynamic and growing company teaches every Teradata class, provides world class Teradata consultants, offers a suite of software products to enhance Teradata data warehouses, and has eight books published on Teradata. Tom has a bachelor's degree in Speech Communications and over 35 years of business and technical computer experience. Tom is considered by many to be the best technical and business speaker in the United States. He has trained and consulted at so many Teradata sites that students affectionately call him Tera-Tom. Teradata Certified Master - Teradata Certified Professional - Teradata Certified Administrator - Teradata Certified Developer - Teradata Certified Designer - Teradata Certified SQL Specialist - Teradata Certified Implementation Specialist
Table of Contents Chapter 1 The Teradata Architecture ............................................................................. 2 The Parsing Engine ......................................................................................................... 4 The AMPs ....................................................................................................................... 6 Born to be Parallel .......................................................................................................... 8 The BYNET .................................................................................................................. 10 A Scalable Architecture ................................................................................................ 12 Logical Modeling Primary and Foreign Keys ........................................................... 16 Physical Modeling - The Primary Index ....................................................................... 18 Two Types of Primary Indexes (UPI or NUPI) ............................................................ 20 Unique Primary Index (UPI) ......................................................................................... 22 Non-Unique Primary Index (NUPI).............................................................................. 24 Multi-Column Primary Indexes .................................................................................... 26 When do you define the Primary Index? ...................................................................... 28 Defining a Non-Unique Primary Index (NUPI)............................................................ 30 Defining a Multi-Column Primary Index ..................................................................... 32 How Teradata Distributes and Retrieves Rows ............................................................ 34 Hashing the Primary Index Value ................................................................................. 36 The Hash Map ............................................................................................................... 38 An 8-AMP Hash Map Example .................................................................................... 40 Laying a Row onto the Proper AMP............................................................................. 42 Retrieving a Row by way of the Primary Index ........................................................... 44 Hashing Non-Unique Primary Indexes (NUPI) ............................................................ 46 Placing Non-Unique Primary Indexes (NUPI) Rows ................................................... 48 Placing (NUPI) Rows Continued .................................................................................. 50 Retrieving (NUPI) Rows............................................................................................... 52 Placing Multi-Column Primary Index Rows ................................................................ 54 Retrieving Multi-Column Primary Index Rows ........................................................... 56 Even Distribution with an UPI...................................................................................... 58 Uneven Distribution with a NUPI................................................................................. 60 Unacceptable Distribution with a NUPI ....................................................................... 62 Review Parsing Engines Plan with an UPI ................................................................ 64 Review Parsing Engines Plan with a NUPI ............................................................... 66 Review Big Trouble The Full Table Scan .............................................................. 68 Big Trouble A Picture of a Full Table Scan .............................................................. 70 Test your Teradata Primary Index Knowledge ............................................................. 72 The Row Hash............................................................................................................... 76 The Uniqueness Value .................................................................................................. 78 The Row ID................................................................................................................... 80 Duplicates and the Uniqueness Value........................................................................... 82 AMPs Sort Their Rows by the Row ID ........................................................................ 84 Search the Data like a Phone Book ............................................................................... 86 Why is my Phone Book 00000s and 111111s? .......................................................... 88 Performing a Binary Search .......................................................................................... 90 Opening the Phone Book to the Middle ........................................................................ 92 I can Name that Tune in 5 Notes .................................................................................. 94 A Visual for Data Layout .............................................................................................. 96
Table of Contents Test Your Teradata Access Query Knowledge ............................................................. 98 UPI Row-ID Test ........................................................................................................ 102 NUPI Row-ID Test ..................................................................................................... 106 Secondary Indexes ...................................................................................................... 112 The Base Table ........................................................................................................... 114 Creating a Unique Secondary Index (USI) ................................................................. 116 The Secondary Index Subtable ................................................................................... 118 Inside the Secondary Index Subtable .......................................................................... 120 How Teradata builds the Secondary Index Subtable .................................................. 122 How Teradata builds the Secondary Index Subtable .................................................. 124 Building the Secondary Index Subtable...................................................................... 128 USI Always a Two-AMP Operation ........................................................................ 130 The Parsing Engines Plan with an USI Query ............................................................ 134 Retrieving Base Rows using the USI .......................................................................... 136 Picture that USI in Action ........................................................................................... 138 USI Summary.............................................................................................................. 140 USI Pictorial using the Hash Maps ............................................................................. 142 USI Secondary Index Quiz ......................................................................................... 144 USI Secondary and Primary Index Quiz Answers ...................................................... 146 A Full Table Scan Example ........................................................................................ 148 The Base Table ........................................................................................................... 150 Creating a Non-Unique Secondary Index (NUSI) ...................................................... 152 Columns inside a NUSI Secondary Index Subtable ................................................... 154 NUSI Subtable is AMP-Local .................................................................................... 156 A Query using the NUSI Column ............................................................................... 158 A Query using the NUSI Column ............................................................................... 160 NUSI Recap ................................................................................................................ 162 Secondary Index Summary ......................................................................................... 164 Test Your Teradata Access Query Knowledge ........................................................... 166 An Incredible Quiz Opportunity ................................................................................. 170 An Incredible Quiz Opportunity ................................................................................. 173 A Table used for our Partitioning Example ................................................................ 178 Range Queries ............................................................................................................. 180 Why we had to perform a Full Table Scan ................................................................. 182 A Partitioned Table ..................................................................................................... 184 A Partitioned Table ..................................................................................................... 186 One Year of Orders Partitioned .................................................................................. 188 Fundamentals of Partitioning ...................................................................................... 190 Add the Partition to the Row-ID for the Row Key ..................................................... 192 You Partition a Table when you CREATE the Table ................................................. 194 RANGE_N Partitioning by Week ............................................................................... 196 RANGE_N Partitioning Older and Newer Data ......................................................... 198 Case_N Partitioning .................................................................................................... 200 Multi-Level Partitioning ............................................................................................. 202 Partitioning Rules........................................................................................................ 204 See the data ................................................................................................................. 206
II
Table of Contents Test Your Teradata Access Knowledge ...................................................................... 208 The most Powerful USER........................................................................................... 214 DBC owns all the Disk Space ..................................................................................... 216 DBC Example of 1000 GBs ........................................................................................ 218 DBC will first CREATE a USER or a DATABASE .................................................. 220 Teradata is Hierarchical .............................................................................................. 222 Only two Objects can Receive PERM Space ............................................................. 224 Only difference between a User and a Database ........................................................ 226 A Typical approach to Security .................................................................................. 228 Example of a DATABASE and USER Interchanged ................................................. 230 PERM and SPOOL Space ........................................................................................... 232 Each AMP will have PERM and SPOOL ................................................................... 234 A Query using both PERM and SPOOL Space .......................................................... 236 Spool is Deleted when the Query is Done .................................................................. 238 Getting a better understanding of Spool ..................................................................... 240 Answering the MRKT Spool Query Answer .............................................................. 242 Spool is like a Speed Limit ......................................................................................... 244 All Space is calculated on a Per AMP Basis............................................................... 246 Examples of Perm and Spool on a Per AMP Basis .................................................... 248 Quiz on Perm and Spool Space ................................................................................... 250 Answers to Quiz on Perm and Spool Space................................................................ 252 Collecting Statistics .................................................................................................... 256 Parsing Engine uses Statistics for the Plan ................................................................. 258 Columns and Indexes to Collect Statistics On ............................................................ 260 Syntax to Collect Statistics ......................................................................................... 262 Recollecting Statistics ................................................................................................. 264 Random Sample instead of Collected Statistics.......................................................... 266 V12 Statistics Enhancement Stale Statistics ............................................................ 268 Where Statistics are Stored in DBC ............................................................................ 270 A Collect Statistics Example ...................................................................................... 272 What Statistics are Really Collected ........................................................................... 274 Loner Values and High Bias Intervals ........................................................................ 276 Teradata Limits ........................................................................................................... 278 Data Protection................................................................................................................ 282 Transaction Concept ................................................................................................... 284 Two Modes to Teradata .............................................................................................. 286 Differences between ANSI and Teradata Mode ......................................................... 288 ANSI Mode Commit ................................................................................................... 290 Teradata Mode Commit also called BTET ................................................................. 292 Trick to CREATE a Multi-Statement with BTEQ ...................................................... 294 Transient Journal ......................................................................................................... 296 How the Transient Journal Works .............................................................................. 298 The Transient Journal after a Commit ........................................................................ 300 VProcs ......................................................................................................................... 302 Nodes and MPP........................................................................................................... 304 RAID 1 - Mirroring ..................................................................................................... 306
III
Table of Contents Cliques ........................................................................................................................ 308 VProcs Migrate when a Node Fails ............................................................................ 310 Cliques An 8-Node Example ................................................................................... 312 Cliques An 8-Node Example with Migration .......................................................... 314 Hot Standby Nodes ..................................................................................................... 316 Hot Standby Nodes in Action ..................................................................................... 318 FALLBACK Protection .............................................................................................. 320 How Fallback Works .................................................................................................. 322 Fallback Clusters Exercise .......................................................................................... 324 Fallback Clusters ......................................................................................................... 326 Fallback Exercises with Clusters ................................................................................ 328 Fallback Exercises with Clusters Answer ................................................................... 330 More Fallback Exercises ............................................................................................. 332 More Fallback Exercises with Answers ...................................................................... 334 Fallback Performance Vs Protection Questions ...................................................... 336 The Six Rules of Fallback ........................................................................................... 338 Cliques and Clusters ................................................................................................... 340 Cliques and Clusters Answers .................................................................................... 342 Down AMP Recovery Journal (DARJ) ...................................................................... 344 Permanent Journal ....................................................................................................... 346 Table create with Fallback and Permanent Journal .................................................... 348 Permanent Journal Rules............................................................................................. 350 Some Permanent Journal Possibilities ........................................................................ 352 Creating a Permanent Journal ..................................................................................... 354 Create Table Examples with Permanent Journals ....................................................... 356 Each Permanent Journal is made up of 3 Areas .......................................................... 358 Permanent Journal Rules............................................................................................. 360 The Four Locks of Teradata ........................................................................................ 366 Teradata has 3 levels of Locking ................................................................................ 368 Quiz Which Level of Locking is Occurring? ........................................................... 370 Quiz Locking Answers ............................................................................................... 372 The Teradata Lock Manager ....................................................................................... 374 Locking Modifiers The Access Lock ....................................................................... 376 Locks and their compatibility ..................................................................................... 378 Moving Through the Locking Queue ......................................................................... 380 Quiz Which Locks Move Up? ................................................................................. 382 Answers to Locking Quiz ........................................................................................... 384 A Single AMP Acts as the Locking Gatekeeper ......................................................... 386 Every AMP performs Locking Gatekeeper Duties ..................................................... 388 Answers to Which AMP is Waiting on Access .......................................................... 390 Explains The Pseudo Table for Locks ..................................................................... 392 The NOWAIT Locking Option ................................................................................... 394 Rules of Teradata Locking .......................................................................................... 396 Explains Psuedo Tables ........................................................................................... 400 Explain Full Table Scan ........................................................................................... 402 Explain Primary Index Reads .................................................................................. 404
IV
Table of Contents Explain Secondary Index Read ................................................................................ 406 Explain - View DDL of a Partitioned Table ............................................................... 408 Explain Partition Elimination .................................................................................. 410 Explain Joins with Duplication on all AMPs ........................................................... 412 Explain Joins with Redistribution ............................................................................ 414 Explain Bit Mapping with multiple NUSIs ............................................................. 416 Fundamentals of Teradata Joins.................................................................................. 420 A Join Example ........................................................................................................... 422 Joins and the Primary Index ........................................................................................ 424 Redistributing Rows in Spool ..................................................................................... 426 Redistributing Rows of Both Tables ........................................................................... 428 Duplicating the Smaller Table .................................................................................... 430 Quiz How Many Rows are in Spool? ...................................................................... 432 Quiz Answer How Many Rows in Spool? ............................................................... 434 How Duplication Appears on Every AMP ................................................................. 436 How Many Rows in Spool with Redistribution? ........................................................ 438 Answer to How Many Rows in Spool ........................................................................ 440 An Example of an AMP with Redistribution.............................................................. 442 The System Calendar .................................................................................................. 446 Columns in the System Calendar Views ..................................................................... 448 How to use the System Calendar with Tables ............................................................ 450 Teradata Temporary Tables ........................................................................................ 454 Derived Tables ............................................................................................................ 456 A Query Pictorial Example with a Derived Table ...................................................... 458 Volatile Tables ............................................................................................................ 460 How to Populate a Volatile Table ............................................................................... 462 Global Temporary Tables ........................................................................................... 464 A Pictorial of a Global Temporary Table ................................................................... 466 What Happens to Global Tables after the Session ...................................................... 468 Global Temporary Tables and Temp Space................................................................ 470 V13 No Primary Index Tables ................................................................................. 474 NoPI CREATE Statement........................................................................................... 476 NoPI Row-ID Increments the Uniqueness Value ....................................................... 478 NoPI Row-Hash Different on each AMP ................................................................... 480 NoPI Options and Facts .............................................................................................. 482 NoPI Restrictions ........................................................................................................ 484 Write Ahead Logging (WAL) ..................................................................................... 488 AMPs have FSG Cache for the Memories .................................................................. 490 An Example of an UPDATE Statements .................................................................... 492 AMP Local WALs ...................................................................................................... 494 AMPs UPDATE Rows in FSG Cache ........................................................................ 496 Write to WAL then Write Back to Disk ..................................................................... 498 The WAL Depot ......................................................................................................... 500 Clearing out the Wal Depot and the Wal Log............................................................. 502 V13 Teradata Virtual Storage (TVS) ....................................................................... 506 AMPs in the 1980s .................................................................................................... 508
Table of Contents AMPs in the 1990s .................................................................................................... 510 Data Blocks and Cylinders make up a Disk................................................................ 512 Cylinders are dedicated to Perm, Spool, etc. .............................................................. 514 Outside Disk Tracks are much Faster ......................................................................... 516 AMPs assigned Disk Cylinders, not Entire Disks ...................................................... 518 Hot, Warm, and Cold Data ......................................................................................... 520 The old way Teradata had to add Disk Space ............................................................. 522 Doubling the Disk Capacity ........................................................................................ 524 Incremental Disk Growth Is Here ............................................................................... 526 Mixed Disks and Solid State Drives ........................................................................... 528 Solid State Drives are like Giant Flash Drives ........................................................... 530 Virtual Storage Metrics ............................................................................................... 532 The Two Modes of Virtual Storage ............................................................................ 534 What is a Row Hash Lock? ......................................................................................... 543 Chapter 6 Loading the Data ....................................................................................... 544 FastLoad ...................................................................................................................... 546 Multiload ..................................................................................................................... 548 TPump ......................................................................................................................... 550 FastExport ................................................................................................................... 552
VI
The AMPs
Born to be Parallel
The BYNET
10
11
A Scalable Architecture
12
13
14
15
I have found the best way to give advice to your children is to find out what they want and then advise them to do it.
--Harry S. Truman
Harry Truman was an American president with great logical skills. Tables are logically created for all database systems. This is called Logical Modeling. A table that is modeled or normalized will always have a Primary Key. The Primary Key is usually the first column in a table, but the Primary Key column(s) will have three characteristics: 1. Never be Null 2. Never change 3. Never have duplicate values If a table with a Primary Key has a relationship with another table it will be joined through a Primary Key Foreign Key relationship. When two tables are joined they are joined by taking the Primary Key of one table and matching it with a normal key in another table with the same values. This normal key is called a foreign key. Teradata doesnt care about Primary Keys and Foreign Keys when it lays out the data in Teradata. It only cares about what is called the Primary Index. We will learn about the Primary Index shortly.
16
17
Speak in a moment of anger and youll deliver the greatest speech youll ever regret.
Anonymous
Every table in Teradata has one and only one Primary Index. Teradata uses the Primary Index of each table to provide a row its destination to the proper AMP. This is why each table in Teradata is required to have a Primary Index. The biggest key to a great Teradata Database Design begins with choosing the correct column to be the Primary Index. The Primary Index columns value is the only thing that will determine on which AMP a row will reside. Because this concept is extremely important, let me state again that the Primary Index value for a row is the only thing that will determine on which AMP a row will reside. Many people new to Teradata assume that the most important concept concerning the Primary Index is data distribution. INCORRECT! The Primary Index does determine data distribution, but even more importantly, the Primary Index provides the fastest physical path to retrieving data. The Primary Index also plays an incredibly important role in how joins are performed. Remember these three important concepts of the Primary Index and you are well on your way to a great Physical Database Design.
18
19
A man who chases two rabbits misses both by a HARE! A person who chases two Primary Indexes misses both by an ERR!
Tera-Tom Proverb
Every table must have one and only one Primary Index. Because Teradata distributes the data based on the Primary Index columns value it is quite obvious that you must have a primary index and that there can be only one primary index per table. The Primary index is the Physical Mechanism used to retrieve and distribute data. The primary index is limited to the number of columns in the primary index. This means that the primary index is comprised totally of all the columns in the primary index. You can have up to 64 multi-column keys comprising your primary index or as little as one column as your primary index.. Most databases use the Primary Key as the physical mechanism. Teradata uses the Primary Index and NOT the Primary Key. There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths.
20
21
Always remember that you are unique just like everyone else.
Anonymous
A Unique Primary Index (UPI) is unique and cant have any duplicates. It is as unique as you are. Nobody is like you and you are extremely beautiful and amazing. Not one other person in the history of mankind has ever been exactly like you. You are the creation of your beautiful parents and must realize how important you are to the world. A Unique Primary Index is not as amazing as you are, but it is also very special. A Unique Primary Index means that the values for the selected column must be unique. If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column. A Unique Primary Index will always spread the table rows evenly amongst the AMPs. Please dont assume this is always the best thing to do. The diagram on the next pages shows a table that has a Unique Primary Index. We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table. A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. You will better understand what I am talking about concerning a one-AMP operation by the end of this chapter.
22
23
24
25
26
27
When you go into court you are putting your fate into the hands of twelve people who werent smart enough to get out of jury duty.
- Norm Crosby
When you go to query Teradata you are putting your hands in the fate of the DBA who created the tables Primary Index. When the table is created it is given a table name, the columns and their data types are defined and the Primary Index is specified. As you can see on the following page we have created the table called Employee_Table. It contains five columns which are Emp_No, Dept_No, First_Name, Last_Name and Salary. The Primary Index is Unique and on the column Emp_No. This really means that Emp_No is the most important column in this table. If users query the table and put Emp_No in the WHERE Clause it will always be a 1-AMP query. It is as fast as lightning. If no Primary Index is defined the system will define one for you. It will most likely pick the first column and make it a Non-Unique Primary Index (NUPI). It will however check to see if you have a Primary Key defined for referential integrity purposes. If you do it will choose that column(s) and make it a Unique Primary Index (UPI). If you didnt define a Primary Index or Primary Key then the system will check to see if you defined a Unique Secondary Index (USI) on any column and if you have it will make that column a Unique Primary Index (UPI). This is now way to build a system. The Primary Index should always be explicitly defined when the table is first created.
28
29
I know that you believe that you understand what you think I said, but I am not sure you realize that what you heard is not what I meant.
-Sign on Pentagon office wall
When the table is created it is given a table name, the columns and their data types are defined and the Primary Index is specified. As you can see on the following page we have created the table called Employee_Table. It contains five columns which are Emp_No, Dept_No, First_Name, Last_Name and Salary. The Primary Index is Non-Unique and on the column Last_Name. This really means that Last_Name is the most important column in this table. If users query the table and put Last_Name in the WHERE Clause it will always be a 1-AMP query. There could be duplicates, but duplicate values will be on the same AMP. I will explain this further. Remember, If no Primary Index is defined the system will define one for you. It will most likely pick the first column and make it a Non-Unique Primary Index (NUPI). It will however check to see if you have a Primary Key defined for referential integrity purposes. If you do it will choose that column(s) and make it a Unique Primary Index (UPI). If you didnt define a Primary Index or Primary Key then the system will check to see if you defined a Unique Secondary Index (USI) on any column and if you have it will make that column a Unique Primary Index (UPI). This is now way to build a system. The Primary Index should always be explicitly defined when the table is first created.
30
31
Some birds arent meant to be caged, their feathers are just too bright. And when they fly away, the part of you that knows it was a sin to lock them up, does rejoice.
Shawshank Redemption
When the table is created it is given a table name, the columns and their data types are defined and the Primary Index is specified. The example on the following page shows a multi-column Primary Index on First_Name and Last_Name combined. As you can also see on the following page we have created the table called Employee_Table. It contains five columns which are Emp_No, Dept_No, First_Name, Last_Name and Salary. The Primary Index is Non-Unique and on the columns First_Name and Last_Name. This really means that both First_Name and Last_Name are the most important columns in this table. If users query the table and put both the First_Name and the Last_Name in the WHERE Clause it will always be a 1-AMP query. There could be duplicates, but duplicate values will be on the same AMP. I will explain this further. Remember, if no Primary Index is defined the system will define one for you. It will most likely pick the first column and make it a Non-Unique Primary Index (NUPI). It will however check to see if you have a Primary Key defined for referential integrity purposes. If you do it will choose that column(s) and make it a Unique Primary Index (UPI). If you didnt define a Primary Index or Primary Key then the system will check to see if you defined a Unique Secondary Index (USI) on any column and if you have it will make that column a Unique Primary Index (UPI). This is now way to build a system. The Primary Index should always be explicitly defined when the table is first created.
32
33
I dont know who my grandfather was. I am more interested in who his grandson will become.
Abraham Lincoln, 16th president of the United States
Teradata freed the AMPs from doing everything together by giving each table a Primary Index. The Primary Index is the column(s) that lays out the data row to the proper AMP and the Primary Index column(s) is also the fastest way to retrieve a row from that same AMP. Follow this part closely because this is fundamentally the most important subject you will learn about Teradata. Teradata takes a table and spreads the rows across the AMPs one row at a time. A Unique Primary Index on the table will spread the data rows perfectly evenly across the AMPs. This is pretty amazing in itself, but the more amazing part is that Teradata knows exactly which rows went to which AMPs so retrieval is always a 1-AMP operation when users use the Primary Index in the WHERE Clause of their SQL. Here is how that works. The Teradata Parsing Engine will take the Primary Index Value of a row and run a math calculation called the Hash Formula on that Primary Index column value. The Hash Formula doesnt change and can be calculated on any value or data type. The results of the Hash Formula calculation on the Primary Index value will result in a number ranging from one to one million. Teradata then has a Hash Map with one million buckets. Inside the buckets are AMP numbers. So, when the Hash Formula is calculated on the value of the column designated as the Primary Index, and the result is for example 20, Teradata will go to bucket 20 of the Hash Map, look inside bucket 20 and see which AMP it says should get the row. I will give you some visual examples in the next couple of pages to show you exactly what I am talking about.
34
35
36
37
38
39
A true friend is one who walks in when the rest of the world walks out.
Anonymous
The example on the following page is another Hash Map, but this one is for an 8-AMP system. Notice how the numbers 1, 2, 3, 4, 5, 6, 7, 8 keep repeating inside all one million buckets. Every Teradata System has one Hash Map with a million buckets. Inside the buckets are AMP numbers. The AMP numbers dont change inside the Hash Map. They are static. Do you remember a couple of pages ago when we ran the Hash Formula on the Primary Index value of 2? The result was a Row Hash of 000000101, which equates to a 5. Teradata would count over 5 buckets and look inside bucket number 5. That would tell the PE to place the row on AMP 5 in this system. I have circled bucket 5 in the Hash Map on the following page so you can see exactly how this works. They say a dog is mans best friend, but the best friend to Teradata is the Hash Map. It is its guide dog.
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
What lies behind us and what lies before us are tiny matters compared to what lies within us
-Ralph Waldo Emerson
Remember that the Primary Index for this table example was a Multi-Column Primary Index on both First_Name and Last_Name combined. When the user queries using both the First_Name and the Last_Name Teradata knows this is a 1-AMP operation. Teradata first combines the First_Name of Rakish and the Last_Name of Ratel and it becomes RakishRatel. This produces a Row Hash of 0000000000011010, which equates to a 26. Teradata can go to bucket 26 in the Hash Map and knows this is on AMP2. There are a couple of items I want you to think about. First and foremost, the only way Teradata can use a Multi-Column Primary Index is if you use all columns in the MultiColumn Index in the WHERE Clause of your query. As you can see in our example we used both the First_Name AND the Last_Name in the WHERE clause of our SQL. If the query would have only used one of the columns instead of both, then Teradata would have had to do a Full Table Scan. Partial Indexing does not work in Teradata. The positive news about a Multi-Column Index is that it usually spreads the rows fairly evenly across the AMPs.
56
57
58
59
60
61
When I was 14 I thought my parents were the stupidest people in the world. When I was 21 I was amazed at how much they learned in seven years.
Mark Twain
A Non-Unique Primary Index will seldom lay the data out with perfect distribution. Perfect distribution is nice, but it isnt everything. Then again, the following example on the next page is awful. You should never choose a column to be your Primary Index if it has less UNIQUE values than the number of AMPs. You should never do what the example on the next page shows. Uneven distribution is not a problem unless there are huge spikes causing hot AMPs. The example will not only cause hot AMPs and huge distribution spikes, but only two AMPs will be used when distributing and retrieving data. Horrible! Remember that duplicates always go to the same AMP as their duplicate counterparts. The example on the following page demonstrates this clearly.
62
63
When you are courting a nice girl an hour seems like a second. When you sit on a redhot cinder a second seems like an hour. Thats relativity.
Albert Einstein
A Unique Primary Index always lays the data out perfectly evenly. Plus, the Parsing Engines plan is a 1-AMP operation that can return a maximum of one row. That is because the value the query is seeking is UNIQUE. On the following page you can see the Parsing Engines plan. This is as sweet as it gets.
64
65
If you don't know where you're going, Any road will take you there.
Lewis Carroll
A Non-Unique Primary Index doesnt always lays the data out perfectly evenly, but it is always a 1-AMP operation when used in the WHERE clause. There can be millions of rows returned because the value the query is seeking is NOT UNIQUE. On the following page you can see the Parsing Engines plan. This is pretty sweet as well.
66
67
68
69
If I have seen farther than others, it is because I was standing on the shoulders of giants.
- Isaac Newton
As you can see on the next page a Full Table Scan will cause ALL-AMPs to read every row they own. Each row is read only once and the AMP will return rows that match the criteria. There is nothing wrong with doing a Full Table Scan query unless you dont have to do it. There is nothing wrong with walking across the city if you dont have a car and cant afford transportation, but if you can you might want to consider riding. Especially when you are on the companys payroll and time and resources are important.
70
71
72
73
74
75
76
77
Its not the size of the dog in the fight, but the size of the fight in the dog.
Archie Griffin
We have learned that the Row Hash is placed at the front of every row and that the AMP will sort their rows by the Row Hash, thus keeping things sorted in perfect order. The AMP will also add a Uniqueness Value behind the Row Hash so it can keep track of duplicate values. When a row comes in with its Row Hash the AMP will check to see if it has any other Row Hashes exactly like the one it has just received. If this Row Hash is Unique it will put a 1 as the Uniqueness value. If it already has another Row Hash just like this one it will put a 2 in the Uniqueness value. If this Row Hash is the third duplicate it will put a Uniqueness value of 3, etc., etc., etc. For example, if there are 1000 duplicate Primary Index values such as the Last_Name of Smith, then they would each have the same Row Hash and go to the same AMP. Their Row Hash would be the same, but their Uniqueness Value would range from 1 to 1,000.
78
79
The Row ID
A good plan, violently executed now, is better than a perfect plan next week.
- George S. Patton
The Row Hash and the Uniqueness Value make up the Row ID. Teradata rows placed on an AMP always have the Row ID at the beginning of every row. Each AMP actually sorts their rows by the Row ID, not just the Row Hash. This not only organizes the rows perfectly, but is really how Teradata AMPs find their data so quickly. Just like George S. Patton the Row ID shall RETURN. An Answer Set! The Row Hash is a 32-bit value and the Uniqueness Value is also a 32-bit value. This means that 64-bits (8 Bytes) is placed in front of every Teradata Row!
80
81
82
83
84
85
I've learned that you can't have everything and do everything at the same time.
Oprah Winfrey
Everyone has use a Phone Book at some time in their life. If you decide you want to order a Pizza you know you can go to the Phone Book. How do people handle the Phone Book? Because a Phone Book is organized alphabetically from A-Z people generally open the Phone Book to about the middle. Then they see where they are at alphabetically and adjust the search towards the beginning or the end. It doesnt take long to find where you can order a pizza. Can you imagine if every time you used the Phone Book you started on page 1 and then turned a page at a time until you found Pizza Delivery? That wouldnt be a Pizza search, but a serial search! You might starve before you even found the Pizza Delivery place of your choice. Teradata AMPs dont search for their data serially, but they do it just like a phone book. They go to the middle of the table and see where they are and then adjust.
86
87
I was walking down the street wearing glasses when the prescription ran out.
- Steven Wright
The AMPs dont like to brag, but they are so fast they often make a spectacle of themselves! The phone book is sorted alphabetically from A-Z, but computers read and write data in Binary so they sort their numbers with zeros and ones (000000 to 111111). This is why AMPs sort their rows by the Row-ID. This allows the AMP to search for a row with a Binary Search! This gives each AMP 20 20 vision when searching for a row! The next couple of pages will show you clearly a Binary Search.
88
89
Diplomacy is the art of saying Nice Doggie until you can find a rock.
Will Rogers
The Row ID is the AMPs best friend and guides the AMP to the proper row. Lets just say the Row ID is both a blood hound and a retriever when it comes to looking for a row. Unless Teradata is doing a Full Table Scan it will always perform a Binary Search when looking for a row based on the Primary Index. A Binary Search is always done on only the Row ID when looking for a Primary Index value. That is why AMPs sort their rows by the Row ID. In the picture on the next page you can see that Last_Name is the Primary Index and this AMP has been instructed to find a user named Vey. The AMP is actually instructed to find Row Hash 000011110, and then double check to make sure the Last_Name is actually Vey. Dont let looking for a row frighten you because the disks Bark is bigger than its Byte!
90
91
92
93
94
95
I saw the angel in the marble and carved until I set him free.
--Michelangelo
The example on the following page is a logical view of data on AMPs. Each AMP holds a portion of a table. Each AMP keeps the tables in their own separate drawers. The Row ID is used to sort each table on an AMP.
Each AMP holds a portion of every table. Each AMP keeps their tables in separate drawers. Each table is sorted by Row ID.
96
97
98
99
100
101
Acting is all about honesty. If you can fake that, youve got it made
- George Burns To store the data, the value(s) in the PI are hashed though a calculation to determine which AMP will possess the row. The same data values always hash the same row hash and therefore are always associated with the same AMP. The PI is what makes or breaks the system. The PI is responsible for all of the systems data distribution. Our quiz on the next page is designed to only show in theory how Teradata places a row on an AMP. We are going to divide the Primary Index value by two. The output is called the Row-Hash. We will take our Row-Hash answer and it will point to a bucket in the Hash Map. That bucket will tell Teradata which AMP will hold the row. Your mission, if you decide to accept it, is to place the Row ID and the Row on the proper AMP. I have already completed the first row for you because I am a nice guy!
102
103
104
105
106
107
108
109
110
111
Secondary Indexes
112
113
114
115
116
117
Once the game is over, the king and the pawn go back in the same box.
- Italian Proverb
As soon as the USI is created with the SQL syntax the next move comes from Teradata creating a Subtable on every AMP. This is true for both the USI and the NUSI. Lets say for example the DBA created the maximum of 32 secondary indexes on a table. Then there would be 32 Subtables created, each taking up PERM Space. The entire purpose for the Secondary Index Subtable will be to point back to the real row in the base table via the Row-ID.
118
119
120
121
I dont know who my grandfather was. I am more interested in who his grandson will become.
Abraham Lincoln, 16th president of the United States
As soon as the DBA uses the SQL to create a secondary index Teradata immediately gets to work. Teradata must build the secondary index Subtable immediately before it can become an alternate path to the data. Each AMP Hashes the secondary index value for each row they own with the Hash Formula. The result is a 32-bit Row Hash which points to a bucket in the Hash Map, which tells the secondary index row which AMPs Subtable it will be on. All UNIQUE Secondary Indexes are hashed and the value plus the real Row-ID of the base table are sent to the proper AMP over the BYNET. Pay close attention to the slide on the next page and let me walk you through it. We created the USI on the column Emp_No. Every Emp_No value will now have to also reside inside the Subtable. The first rows value for Emp_No is a 2. The PE hashes the value of 2 and the result is a 32-bit row hash. The PE then points to the bucket in the Hash Map that corresponds to the 32-bit row hash and the Hash Map says that AMP 1 is the destination AMP. So the Emp_No value of 2 goes to AMP 1s Subtable. It also brings with it the real Row-ID for its row from the Base Table, which is 1,1. Now the first secondary row is perfectly placed.
122
123
The most exciting phrase to hear in science, the one that heralds the most discoveries, is not Eureka!, but Thats funny
Isaac Asimov
Your job is to now place the remaining three rows on the proper AMP perfectly. I want you to use the Tera-Tom Hash Formula, which is to divide by 2. This is designed to show you that a consistent formula will produce predictable and repeatable results. Divide each Emp_No by 2 and that will represent the Hash Formula, with a 32-bit Row Hash as the result. You can then point to the corresponding bucket in the Hash Map where you will place the row on the destination AMP. You need to place the USI value and the real Row-ID of the row with it. Good luck!
124
125
126
127
I dont skate to where the puck is; I skate to where I want the puck to be.
Wayne Gretzky
If you received the answers listed on the next page you are no longer skating on thin ice. You are performing a Power Play. Each Secondary Index Subtable row wont perform a one-timer, but instead perform a two-timer because the USI Value and Base Row-ID always make an USI query a 2-AMP operation. Now that the Secondary Index Subtable is built, the users can query the base table. If the USI is used in the WHERE clause the Parsing Engine knows it has an alternate routine to the data. It can use a 1-AMP operation to find the Subtable Row and then use another 1AMP operation to find the base row. If there are 1,000,000 rows in the base table there will be 1,000,000 rows in the Subtable. The Base table will be much larger because it probably has many columns, but the Subtable only has two columns.
128
129
130
131
132
133
If you do what you've always done, you'll get what you've always got.
Anonymous
The above quote is perfect for the secondary index because the Secondary Index and the Hash Map do what they have always done, and they know theyll get what they always got. The Parsing Engine doesnt always put out the entire plan and wait for the data to return. Sometimes the Parsing Engine gives pieces of the plan and helps guide the AMPs. Take a look at the explanation below and the picture on the next page. The first part of the USI plan will be to find the USI value in the Secondary Index Subtable. Lets say for example the WHERE Clause stated: WHERE Emp_No = 2; The Parsing Engine knows that the Subtables Primary Index is Emp_No. It puts out the first part of the plan by stating: Hash the value of 2 and then look at the corresponding bucket in the Hash Map. Go to the Destination AMP inside the Hash Map bucket and tell that AMP to find Emp_No 2 in its Subtable. Then have it return the Row-ID of the base row to me (The Parsing Engine). Once the Parsing Engine receives the Row-ID of the base row it takes the first part of the Row-ID (which is the row hash), looks at the corresponding bucket in the Hash Map and now knows the AMP that holds the base row. It sends a second message to that AMP and says Find this Row-ID in your Employee_Table and retrieve the row.
134
135
136
137
138
139
USI Summary
Nearly all men can stand adversity, but if you want to test a mans character, give him power.
Abraham Lincoln
The following page provides a summary to show you the power of the USI. This will work if you want to test a rows Characters or Integers!
140
141
142
143
144
145
146
147
Those who dance are considered insane by those who cannot hear the music.
George Carlin
The next page is designed to show that a Full Table Scan will be performed when a NonIndexed Column is used by itself in the WHERE clause. We will soon create a NonUnique Secondary Index on this column, but first perform the Full Table Scan. Please make a note of it!
148
149
"He who controls the past commands the future. He who commands the future conquers the past."
George Orwell
The next page is designed to merely remind you that we have two types of tables. Those are the Base Tables that hold the actual data that the users query against and the Secondary Index Subtables designed to point to the real Row-ID of the base table. Once again I want you to notice the Row-IDs at the front of each row in the Base Table. Remember how those were derived? They were derived when the row was originally loaded. The PE hashed the Primary Index Column (Last_Name) Value and that came up with a 32-bit Row Hash. It then counted over the appropriate number of buckets in the Hash Map that corresponded to the Row Hash, and inside the bucket was the Destination AMP for that row. Once the row and the Row Hash went to the AMP the actual AMP placed a 32-bit Uniqueness Value behind the Row Hash. The Row Hash plus the Uniqueness Value make up a rows Row-ID.
150
151
152
153
Darkness cannot drive out darkness; only light can do that. Hate cannot drive out hate; only love can do that.
Martin Luther King, Jr.
Inside the NUSI Subtable resides two columns. They are the column values of the NUSI and the real Row-ID of the row in the base table. This is exactly the same two columns that were in the USI Subtable. Remember that the entire purpose of the NUSI or the USI Subtable is to point to the real row in the Base Table. This pointing is done by capturing the rows Row-ID. The big difference between the USI and the NUSI Subtable is that the USI Subtable rows are Hashed and the NUSI subtable rows are AMP-Local. Read on my friend!
154
155
156
157
Everyone is kneaded out of the same dough but not baked in the same oven.
Yiddish Proverb
On the following page you will see we are running SQL and using the First_Name column in our WHERE clause. This will usually cause the Parsing Engine to use the NUSI Index, but not always. Sometimes the Parsing Engine will decide it is faster to do a Full Table Scan. This is dependent on three things: 1) If the NUSI is weakly or strongly selective. An example of something that is weakly selective might be this. Imagine you are in a large room with hundreds of people and you ask, How many people here usually eat dinner every evening? The answer would be everyone! Here is an example of a strongly selective index. You now ask the same large room of people, How many of you were born in Russia, are a twin, and only speak French? If the NUSI is strongly selective it will be used by the Parsing Engine. 2) If the table is small it is sometimes faster to just do a Full Table Scan. 3) If the DBA collected statistics. We will talk about Statistics in the future, but the short answer is that the DBA will usually collect statistics on all Non-Unique columns for a table so the Parsing Engine knows if the Index is strongly or weakly selective.
158
159
If all my possessions were taken from me but one, I would choose to keep the power of speech, for with it I could soon regain all the rest.
Daniel Webster
A NUSI query is always an All-AMP operation, but not a Full Table Scan. In our query example on the previous page we selected all columns WHERE the First_Name was equal to Rakish. There could potentially be a Rakish or multiple people named Rakish on every AMP, one AMP, No AMPs or some AMPs. That is why each NUSI Subtable is AMP-Local. Now the PE can use this two-step strategy. 1) Each AMP needs to search their own AMP-Local NUSI Subtable and check if you have one or more individuals named Rakish. 2) Each AMP checks and only the AMPs that found Rakish in their Subtable will retrieve rows with Rakish in the rows they own in their Employee_Table Base Table. That is why a NUSI query always involves All-AMPs, but it is not a Full Table Scan. Think about this! Imagine we had a system with 100 AMPs. Now lets say there was only one Rakish found on AMP 99. How many Binary Searches would there be in this query? Well, in step 1 each AMP would perform a Binary Search on their AMP Local NUSI Subtable so this would mean 100 Binary Searches, because there are 100 AMPs. Then only AMP 99 would find a Rakish in its Subtable and so only AMP 99 would have to perform a Binary Search in its Base Table, so the final answer is 101.
160
161
NUSI Recap
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
The entire sum of existence is the magic of being needed by just one other person.
Vi Putnam
The next page shows a table that will be used to show how Teradata Partitioning works. We will take this table and show it in a Non-Partitioned fashion and then Partition the table and show how Teradata runs certain queries faster. All I want you to notice right now is the column Order_Date. Notice that we have dates in both January and February.
178
179
Range Queries
Cowards die many times before their deaths; the valiant never taste of death but once.
William Shakespeare
The next page shows our Order_Table spread across the AMPs. Notice that I have color coded the January and February dates. Also notice that January and February dates are mixed on every AMP in what is a random order. This is because the Primary Index is Order_Number. So, the January dates are most likely on every AMP and so are the February dates! I also want you to take notice of the query. We are looking for all orders in January. Remember that! The query on the next page is called a Range Query because it uses the keyword BETWEEN. The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date. The BETWEEN statement is said to be inclusive. If someone said to me tell me what is BETWEEN the numbers 8 and 10 I would normally say, The number 9. In Teradata land I would be wrong because the BETWEEN statement is inclusive, so it INCLUDES the starting and ending numbers. What is BETWEEN 8 and 10? The numbers 8, 9 and 10! Partitioned tables work very well on Range Queries using the keyword BETWEEN. Turn the next couple of pages and you will soon see WHY! We will next discuss what a Partitioned Table is all about!
180
181
The next page shows our Order_Table spread across the AMPs. Notice that I have color coded the January and February dates. Also notice that January and February dates are mixed on every AMP in what is a random order. Because the January Data is on all AMPs and because the January Dates are randomly mixed we have to do Full Table Scan. We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan, but soon we will Partition the table and prevent the Full Table Scan. This brings me to a great point I want you to remember. We partition tables so we wont have to do a Full Table Scan on our Range Queries!
182
183
A Partitioned Table
184
185
A Partitioned Table
A man who views the world at 50 the same as he did at 20 has wasted 30 years of his life.
Muhammad Ali
Notice the example on the next page. We are running our Range Query on our Partitioned Table to demonstrate visually how Teradata has all AMPs participate, but each AMP only reads from one partition. The Parsing Engine no longer has to instruct the AMPs to do a Full Table Scan. It instructs the AMPs to each read from their January Partition. Remember what we said earlier? A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.
186
187
188
189
Fundamentals of Partitioning
Nobody believes the official spokesman but everybody trusts an unidentified source.
- Ron Nesenx
Take a look at the statements on the next page. Take your time and take them in! The points I really want you to take notice is that it is the Primary Index that determines with AMP gets a particular row and that Partitioning doesnt affect distribution. Partitioning only affects how each AMP sorts the rows they get!!!!
190
191
192
193
Whenever you are asked if you can do a job, tell em Certainly I can! Then get busy and find out how to do it
- Franklin D. Roosevelt
The next page shows the syntax to CREATE a partitioned table. Please dont assume you can only partition a table when you CREATE it. You can actually CREATE a normal table first and later ALTER the table, but generally you partition a table when you first create it. I want you to notice the Primary Index statement. Our Primary Index for this example is a NUPI on Order_No, but we are partitioning on Month of Order_Date.
194
195
196
197
198
199
Case_N Partitioning
A man who views the world at 50 the same as he did at 20 has wasted 30 years of his life.
Muhammad Ali
We are partitioning by CASE_N in the next example. This is just like any CASE statement in programming or SQL. In the example I want you to notice that if an Order_Total for a row is less than $1,000.00 it will go into the first partition. If it falls between $1,000.00 and 4,999.99 it will go into partition 2. If it is between $5,000.00 and $9,9999.99 it will fall into partition 3 and so on. I also need you to pay close attention to the UNKNOWN partition and the NO CASE partition. The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria. For example, if an Order_Total is greater than $20,000.00 it wouldnt fall into any of the partitions so it goes to the NO CASE partition. Important note. It is an excellent idea to have a NO CASE and UNKNOWN partition. More Important note: You do not want to include the UNKNOWN or NO RANGE partitions with dates in a RANGE_N partition. I will explain later in detail, but it is because when you delete a partition in the RANGE_N partitions they will go to the NO RANGE or UNKNOWN partitions. This takes a long time and is usually not wanted anyway.
200
201
Multi-Level Partitioning
Two roads diverged in a wood and I took the one less traveled by, and that has made all the difference.
Robert Frost
Teradata introduced Multi-Level partitioning in Teradata V12. You can have up to 15 levels of partitions within partitions. I want you to remember that a Partitioned Table merely tells each AMP how to sort their rows for the table. So think of Multi-Level partitioning as a table with multiple sort keys. The first partition statement is how the data is sorted first. The second partition statement is the second sort key. Think of a simple sorting of an answer set. Lets imagine we sorted an Employee_Table by Department_Number first. Then we sorted by Last_Name within each Department_Number. That is similar to what we are doing on the next page. We first partition by day. Then within each day we are partitioning by our CASE_N statement. Each AMP will have each day sorted first on their disk and then within each day the data will be sorted with the lower Order_Total values first. This is really getting down to a granular form. The entire purpose of partitioning is to eliminate the Full Table Scan. Instead of reading all rows in a table each AMP merely has to one or more of their partitions.
202
203
Partitioning Rules
I find that the harder I work, the more luck I seem to have.
Thomas Jefferson
204
205
206
207
The superior man is modest in his speech, but exceeds in his actions.
- Confucius, 551 BC -479 BC Test your knowledge on the next page and make me proud!
208
209
210
211
212
213
Its time for the human race to enter the Solar System.
- Dan Quayle
Dan Quayle, Vice President during George Herbert Bushs presidency never made it to the the top of the human race, because he was never president. Dan Quayle.never made it to the top of the Teradata hierarchy because his name wasnt DBC. The first Teradata machine ever built was called the DBC 1012. DBC stood for DataBase Computer and the 1012 represented ten to the 12th power, which happens to be a Terabyte. So, in honor of the first Teradata machine, which coincidentally had only one USER when the system first arrived, whose name was DBC. DBC has been the most powerful USER from the beginning of Teradata time (1984). Whoever is assigned to be the USER DBC will have all the power. DBC will create other DATABASES and USERS and the hierarchy begins. It is time for the human race to enter the Teradata System!
214
215
216
217
218
219
220
221
Teradata is Hierarchical
When they discover the center of the universe, a lot of people will be disappointed to discover they are not in it.
- Bernard Bailey
In this Teradata Universe you can see that DBC is at the top of the Hierarchy. That will always stay that way. Under DBC you should be able to see that Mary, Sales, and MRKT were CREATED by DBC. Three USERS were added to MRKT named Sam, Don, and BO. Sam then went and CREATED there users named VU, Jane and Jusn. Anyone above you in the hierarchy is a parent. For instance the parent of VU is Sam, MRKT, and DBC. The immediate Parent of VU is Sam.
222
223
A lot of people approach risk as if its the enemy when its really fortunes accomplice.
- Sting
Only a DATABASE or a USER can have PERM Space. When a user or database is created they will be given their Perm Space. Other objects such as tables can be created under a database or user. If a user has 100 GBs of space then they can create tables with data that combined take up a maximum of 1000 GBs of space. Once a database or user has used up their PERM space, they cannot add any more data to the tables they own.
224
225
Everyone is trying to accomplish something big, not realizing that life is made up of little things.
-Frank A. Clark
A USER and a DATABASE are considered the same in Teradata except a USER has a LOGON and PASSWORD so they can actually Logon to Teradata and run queries. Other than that they are considered exactly the same. Both can be created with PERM and SPOOL Space. Both can have objects created beneath them.
226
227
Sometimes it is more important to discover what one cannot do than what one can do.
-- Lin Yutang
A USER doesnt have to even know that they dont have access directly to the tables. The following page shows a typical approach to Teradata security. A database will often be setup for USERs. Then another database or user will be setup for Views and Macros. Then another USER or DATABASE will be setup to hold the actual tables. The USER Database is given access to the VIEW and MACRO Database and the VIEW and MACRO Database is given access to the Tables.
228
229
230
231
Opportunity may only knock once, but temptation leans on the doorbell.
-- Unknown
There are two types of space in Teradata. They are called PERM Space and SPOOL Space. Perm Space is for Permanent Tables and Spool Space is used to temporarily build Answer Sets when users run queries. In actuality Spool Space is unused PERM Space. Most users dont get their own PERM space. All users get Spool Space. Without Spool Space the users couldnt run queries. Although I have listed different things associated with Perm and Spool space on the next page I want you to simply remember that Perm is for your Tables and Data and that Spool is used as space for Users to run queries. Tables, Join Indexes, Permanent Journals, Hash Indexes, Stored Procedures and User Defined Functions (UDF) require Perm Space. Views, Macros and Triggers dont require Perm space.
232
233
We can win at home. We cant win on the road. I just cant figure out where else to play.
-- Coach Pat Williams
Each AMP will have Perm Space to hold tables and have empty space for Spool. The following picture is designed so you can see exactly what a typical AMP will have on its virtual disk. The AMP will go to PERM space and read or write to the tables and then build the answer sets using the Spool Area.
234
235
One thing I like about stones in my path is when I cross them they become milestones.
-- Anonymous
The example on the following page shows a user running a query. The query is selecting all columns and rows from the Employee_Table. Each AMP will read the Employee_Table located in their PERM Space. They will then begin building their portion of the Answer Set by placing these rows in the empty area of disk called SPOOL. When each AMP is finished they will inform the Parsing Engine they are done. Each AMP will pass their Spool Answer Set over the BYNET to the Parsing Engine. The Parsing Engine will take the Answer Set and deliver it to the user. Once the Answer Set is delivered to the PE and the User the Answer Set in Spool will be deleted. Spool is only temporarily used for each query and then deleted when the query is over!
236
237
Behold the turtle. He only makes progress when he sticks his neck out.
-- James Bryant Conant
The example on the following page is meant to show that when a query finishes the Spool Answer Set is automatically deleted. What really happens is that spool is deleted as soon as the query no longer needs that portion of spool.
238
239
240
241
An eye for an eye only ends up making the whole world blind.
-- Gandhi
In the example on the following page you can see we have given MRKT 20 GBs of Spool. Then we ask the question can three users in MRKT simultaneously run a query that is 15 GBs in size. The answer is YES!
242
243
The difference between genius and stupidity is that genius has its limits.
Albert Einstein
Teradata definitely has its limits and these pertain to Spool space. Think of PERM Space like money, but think of Spool Space like a speed limit. If the database MRKT is assigned 20 GBs of Spool then that is MRKTs speed limit. Each user can run queries that travel up to 20 GBs. This goes for all users in MRKT. Imagine you are on the highway and the speed limit is 60 MPH. If you were driving beside another car also going 60 MPH and they pulled off the road you wouldnt be able to now go 120 MPH. The speed limit is 60 MPH and that is everyones limit. The Teradata police will abort your query if at any time you go 1 byte over 20 GBs. Here is why you should think of PERM as money. If the system starts with 1000 GBs or Perm (which is actually equal to 1 Terabyte) then the system will always have 1000 GBs unless an upgrade occurs and more hardware is added. So, there is a limited amount of space that always adds up to 1,000 GBs. DBC starts with the entire 1000 GBs, but if DBC gives away 500 GBs then DBC will only own 500 GBs. It is like having 1,000 dollars in a poker game. It may be split up and won or lost among the players, but there is always $1,000 at the table until the game is over. Spool doesnt equate to the 1000 GBs. The DBA could assign every user and database in the system spool and if you added it all up it could equate to millions of GBs. This is because we are assuming that not everyone will be logged on at the same time. Spool is designed for two purposes. 1) Users have a limit so they cant hog the system resources. 2) If users make a mistake and run a runaway query the system will abort it after it reaches that users spool limit.
244
245
My son has taken up meditation at least its better than sitting doing nothing.
Max Kauffmann
Teradata calculates Perm and Spool space on a per AMP basis. If the system has 10 AMPs and a user or database is assigned 20 GBs of spool then there are actually two limitations: 1) The user cannot run a query that goes over 20 GBs. 2) The user cannot run a query that goes over 2 GBs on any single AMP (20 GBs / 10 AMPs = 2 GBs per AMP). This design is to ensure that data is spread fairly evenly over the AMPs, which is based solely on the Primary Index choice. This will also ensure that no AMP should be a hot AMP, which means that if the data is skewed badly the system will blow the Spool limit.
246
247
It is the mark of an educated mind to be able to entertain a thought without accepting it.
Aristotle
The following page shows an example of both Perm and Spool being calculated on a Per AMP basis. Notice that we have 10 AMPs in our system. We have 20 GBs of Spool and 100 GBs of Perm. This means that this user or database cannot run a query that goes over 20 GBs or one that goes over 2 GBs per AMP. Also notice that in our 10 AMP system the user or database was assigned 100 GBs of Perm. This means that the user or database cannot contain tables with data that exceeds 100 GBs or that goes over 10 GBs on any AMP. Again the philosophy of this is to ensure reasonable data distribution, which is based solely on the Primary Index choice. In a worst case scenario you choose a column for the Primary Index that has only one value. Lets say for example, State Code and the value is California. Then all of the data for that table would be on only 1 AMP. This could cause a prematurely Full Perm Space message or an abort of a query because it exceeded its per AMP limit. Special Note: Sometimes when systems are upgraded to a large number of AMPs the DBA will assign each user or database more space because they dont want the Per AMP limit to cause problems. If a 10 AMP system with 20 GBs of space equals a Per AMP limit of 2 GBs per AMP, then an upgrade to a 100 AMP system would mean that the Per AMP Limit would be .20 GBs. That might be considered too low to run some queries if there is any skewing at all so the DBA will often up the spool limits for everyone.
248
249
I went to a restaurant that serves breakfast any time. So I ordered French toast during the Renaissance.
Steven Wright
The following page is giving you a chance to show how smart you are. Answer the quiz and decide how much PERM and SPOOL is in MRKT after they create the three users Sam, Don, and Bo.
250
251
All human actions have one or more of these seven causes: chance, nature, compulsion, habit, reason, passion and desire.
Aristotle
On the next page are the answers to the quiz.
252
253
254
255
Collecting Statistics
If you are not true to your teeth they will be false to you.
Teradata Certified Dentist
I asked my dentist, Do I have to floss all my teeth? He said, No, just the ones you want to keep? Whether the Parsing Engine (PE) is checking a users security rights or if statistics were collected on a table the PE will go to user DBC for the answers. The PE uses statistics to help decide what plan to build so the AMPs can satisfy a users query. Before the PE can come up with a plan it wants to know if a table is large, medium, or small. It wants to know about certain columns or indexes. Does a particular column have a lot of duplicates, nulls or are the values unique? Does a particular index unique or non-unique or is the index strongly or weakly selective? These questions are often answered by Collect Statistics. What is Collect Statistics? When a table is created and loaded with data the DBA will run a COLLECT STATISTICS command on certain columns and indexes of that table. That will help the PE answer key questions that will give the PE a better understanding of the table in general. If more data is loaded or deleted the DBA will then Recollect Statistics to ensure that the statistics reflect the true data inside the table. It is not mandatory to collect statistics on a table as it is not mandatory that a person brushes their teeth or cleans their clothes. If statistics are not collected on a table then the PE will perform a Random Sample and make an educated guess. I asked my DBA, Do I have to Collect Statistics on all the columns and indexes? The answer was, No Only on the important ones, but never the entire table. I hear that is good advice, but I became concerned when I noticed he was missing teeth!
256
257
You cannot depend on your eyes when your imagination is out of focus.
Mark Twain
The following page lists some of the key answers that Collect Statistics offers.
258
259
This is a test. It is only a test. Had it been an actual job, you would have received raises, promotions, and other signs of appreciation.
Anonymous
You sincerely dont collect statistics on every column and index in a table. These statistics are stored inside DBC and it takes up Perm Space. You only want to collect on certain columns and indexes such as: All Non-Unique Indexes Columns frequently used in user queries in the WHERE Clause All Primary Indexes of small tables Columns used as Join Conditions
260
261
262
263
Recollecting Statistics
264
265
266
267
268
269
We see the brightness of a new page where everything yet can happen.
Rainer Maria Rilke, Book of Hours
You dont want to collect statistics on every column or index inside a table. This takes up space, takes up resources, and just isnt needed. There are important columns and indexes that will really help the PE when coming up with a plan and there are other columns or indexes that will never be considered. Only collect on the important ones.
270
271
The future belongs to those who believe in the beauty of their dreams.
Eleanor Roosevelt
The following page shows you how expensive the process of collecting statistics can be. In this example we are collecting statistics on the column Last_Name in the Employee_Table. This requires a Full Table Scan! The results are then sorted in alphabetical order on Last_Name from A-Z and then chopped up or divided up into 200 intervals. There is more to it so pay attention to the next couple of pages.
272
273
274
275
The most exciting phrase to hear in science, the one that heralds the most discoveries, is not Eureka!, but Thats funny...
Isaac Asimov
One of the problems the PE had with Collect Statistics in the past is if there were certain values that were huge. These often expanded into multiple intervals. Teradata came up with a solution. If a value is large it will make it a Loner Value and store it in a High Bias Interval. Now Teradata will know if there are a million people named Davis because Davis wont expand multiple intervals, but instead receive their own interval. Teradata can actually place up to two Loner Values inside one High Bias Interval.
276
277
Teradata Limits
Asking an incumbent member of Congress to vote for term limits is a bit like asking a chicken to vote for Colonel Sanders.
Bob Inglis, 1995
The following page shows some of the limits of Teradata V12 and V13.
278
279
280
281
Data Protection
does not protect you from love. But love, to some extent protects you from age. "
-Jeanne Moreau, French Actress
As a man was driving down the interstate highway, his cell phone rang. When he answered he heard his wife warn him urgently, "George, I just heard on the news that there's a car going the wrong way on I-26!" George replied, "I'm on I-26 right now and it's not just one car. It's hundreds of them!" How do you protect your data when things go the wrong way? Murphys Law states, The more mission critical a data warehouse, the more likely the system will crash at the most critical moment of the mission. Ironically, most DBAs think Murphy was an optimist. A database not prepared to defend itself is like an unsigned contract. It is not worth the paper it is written on. However, Teradata is always prepared and it will protect your data better than a wild pit bull. As a matter of fact, the difference between Teradata and a pit bull is that eventually the pit bull will get bored and let go. System and user errors are inevitable in any large system. For example, an associate may accidentally give everyone a 100% raise instead of a 10% raise. Or, what if a million-dollar transaction fails right at the wrong time? Or an AMP or DISK goes down? In any of these cases, Teradata will have many ways to protect your data. Some processes for protection are automatic and some of them are optional. The protection features we will discuss are: Transaction Concept Transient Journal RAID 1 Mirroring Cliques Standby Nodes Fallback Fallback Clusters Archive Permanent Journaling
"Age
282
283
Transaction Concept
284
285
A life filled with love may have some thorns, but a life empty of love will have no roses.
- Anonymous
Teradata has two different modes in which it operates. Those modes are called Teradata Mode and ANSI Mode. Both modes handle things a little differently. Every Teradata system will have a default mode set by the DBA when the system first arrives. Although there is a default mode set, the user can actually change the mode they want during their sessions. Depending on which mode you are using a transaction takes on a whole new meaning.
286
287
You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions.
- Naquib Mahfouz
Teradata has two different modes in which it operates. Those modes are called Teradata Mode and ANSI Mode. Both modes handle things a little differently. As you can see on the following page there are many differences. I want you to focus on two main areas. The first is that in Teradata mode you dont need to use the word COMMIT, but in ANSI mode you do. The second area of focus is how statements are rolled back. In Teradata mode if a statement in a transaction fails, EVERY Statement in that transaction is Rolled Back, but in ANSI mode if a transaction fails, ONLY the FAILED Statement(s) are Rolled Back.
288
289
You got to be careful if you dont know where youre going, because you might not get there.
- Yogi Berra
With ANSI Mode you must use the words COMMIT WORK or you can just say COMMIT, but this is mandatory anytime you are changing something in the Teradata database. This includes anytime you use the CREATE statement or any INSERT, UPDATE, DELETE also. You dont need it for the queries with SELECT. On the following page you can see both a single statement transaction at the top of the slide and on the bottom of the slide you can see a multi-statement transaction. If the single statement transaction failed for any reason then Teradata would Roll Back this UPDATE statement and ensure the database was exactly like it was before the transaction. If a statement in the multi-statement transaction were to fail only the FAILED Statement would be Rolled Back!
290
291
The only thing worse than being talked about is not being talked about.
- Oscar Wilde
With Teradata Mode you never use the words COMMIT WORK or COMMIT. This is implied with each statement. I am sure you are asking, Then how does Teradata Mode run a Multi-Statement Transaction? It uses a BT or BEGIN TRANSACTION Statement, then runs the statements, and then follows them with an ET or END TRANSACTION Statement. On the following page you can see both a single statement transaction at the top of the slide and on the bottom of the slide you can see a multi-statement transaction. If the single statement transaction failed for any reason then Teradata would Roll Back this UPDATE statement and ensure the database was exactly like it was before the transaction. If a statement in the multi-statement transaction were to fail ALL Statements within the transaction would be Rolled Back!
292
293
A government that robs Peter to pay Paul can always depend upon the support of Paul.
- George Bernard Shaw
The next page shows an old trick of creating a Multi-Statement request in the Teradata Utility called BTEQ (Pronounced Bee Teek). BTEQ requires a semi-colon at the end of every SQL Statement. If you put the semicolon as the front of the next line and then place another SQL Statement immediately following, then these statements are considered part of the same transaction.
294
295
Transient Journal
The Transient Journal is automatic and it takes a before picture of any update or delete for rollback purposes.
296
297
298
299
Do you know, my son, with what little understanding the world is ruled?
- Pope Julius III
The Transient Journal rows are discarded once the transaction is committed. The only reason that each AMP takes a BEFORE picture and stores it in its Transient Journal is in case of a problem in which a ROLLBACK occurs. If there are no problems and the transaction is committed then the BEFORE picture is discarded. If a ROLLBACK did occur then the AMP can replace the attempted UPDATE with the BEFORE picture and everything is back to the way it was before the UPDATE Statement.
300
301
VProcs
302
303
The surprising thing about young fools is how many survive to become old fools.
-Doug Larson
Teradata has taken a simple PC, filled the memory with AMPs and PEs and calls it a node. Connect multiple nodes together with the BYNET and you have a Massively Parallel Processing or MPP system.
304
305
RAID 1 - Mirroring
You can only perceive real beauty in a person as they get older.
-Anouk Aimee
RAID 1 is mirroring and Teradata always mirrors their disks. As you can see on the following page every AMP is attached to four physical disks. Two hold actual data and two are for backup. This provides excellent protection. Each AMP is said to have four physical disks, but only one Virtual Disk. This really means that no AMP can get into another AMPs disks. Each AMP is the only thing allowed to read that AMPs disks. So, each AMP is said to have its own virtual disk, which is a set of four physical disks. The great thing about mirroring is that if we lose a disk we already have it mirrored and protected. The DBA just has to remove the failed disk and put in a fresh disk and the mirroring will immediately begin. Remember the price for Mirroring is double the disk costs. Each time you have a disk with data you have another disk mirroring and protecting that data disk.
306
307
Cliques
308
309
310
311
Half the money I spend on advertising is waster; the trouble is I dont know which half.
-John Wanamaker
In the picture on the following page you see 8 nodes. When we connect each of these nodes to each other nodes disk farms we are essentially creating a clique. Now if there is a node failure, Teradata will reset and the AMPs and PEs in the down node will be able to migrate to the memory of another node within the clique. I want you to notice that Clique 1 has Green AMPs and all the other nodes have purple colored AMPs. We are about to see what happens when Node 1 crashes. Get ready!
312
313
I do not regret one professional enemy I have made. Any actor who doesnt dare to make and enemy should get out of the business.
-Bette Davis
In the picture on the following page you see 8 nodes. When we connect each of these nodes to each other nodes disk farms we are essentially creating a clique. Now if there is a node failure, Teradata will reset and the AMPs and PEs in the down node will be able to migrate to the memory of another node within the clique.
314
315
Conscience is the inner voice which warns us that someone may be looking.
-H. L. Mencken
Teradata actually has hot standby nodes! This is in case of a node failure. Although other AMPs in the Clique could migrate from a down node to other nodes in the clique, an even better way is to have a hot standby node. This is the nodes hardware without running anything until another node goes down. When a node goes down Teradata will reset. When it does the AMPs and PEs in the down node will be instructed to migrate to the hot standby node. Now everything is up and running perfectly. A hot standby node is equivalent to you buying a second car. You would only drive the 2nd car if your other car broke down. Yes it is expensive, but it is great when the first car is down.
316
317
318
319
FALLBACK Protection
320
321
322
323
324
325
Fallback Clusters
Dont worry about people stealing your ideas. If your ideas are any good, youll have to ram them down peoples throats.
-Howard Aiken
Fallback has been placed perfectly in the picture on the following page. Notice we have two clusters. The top cluster and the bottom cluster. The top cluster is in purple and the bottom cluster is in yellow. We laid the data out and it spread evenly among both clusters. Now it is time to layout the fallback data. Notice that the fallback from the top cluster stays within the top cluster. The same rule goes for the bottom cluster. The Fallback data stays within the cluster. Now we can lose 1 AMP in every cluster and still have our data up and running. Teradata will not use the Fallback data unless an AMP in the cluster goes down.
326
327
328
329
330
331
332
333
When you are courting a nice girl an hour seems like a second. When you sit on a redhot cinder a second seems like an hour. Thats relativity.
-Albert Einstein
Check out how the Fallback was laid out in all four systems.
334
335
336
337
Dont worry about the world coming to an end today. Its already tomorrow in Australia.
-Charles Schultz
There are a couple of rules I want you to think about with Fallback. Rule 1: Fallback doubles the size of your table. Rule 2: All AMPs are clustered (usually in sets of four). Rule 3: Fallback rows always reside within the same Cluster. Rule 4: Two AMPs in the same Cluster never reside inside the same NODE. Rule 5: Two AMPs in the same Cluster never reside inside the same CLIQUE. Rule 6: Fallback protects you against a Failed AMP
338
339
Time is at once the most valuable and most perishable of all our possessions.
-John Randolph
On the following page you can see a picture that has four Cliques. In each Clique are four Nodes. Within each Node is 2 PEs and 4 AMPs. Normally, there would be about 4 PEs and 25 AMPs, but this picture is designed to give you knowledge of how Teradata Clusters the AMPs inside the cliques. This picture is to set you up. Your job is to put in a clustering scheme that follows three rules: 1) Group your Clusters in AMPs of Four. 2) Never have two AMPs in the same Cluster be a part of the same Clique. 3) Never have two AMPs in the same Cluster be a part of the same Node. If you do this correctly you will understand that we can lose a Node or a Clique and still not have more than two AMPs within a Cluster fail.
340
341
Though no one can go back and make a rand new start, anyone can start from now and make a brand new ending.
-Anonymous
On the following page you can see a picture that has four Cliques. In each Clique are four Nodes. Within each Node are 2 PEs and 4 AMPs. Normally, there would be about 4 PEs and 25 AMPs, but this picture is designed to give you knowledge of how Teradata Clusters the AMPs inside the cliques. Remember the three rules: 1) Group your Clusters in AMPs of Four. 2) Never have two AMPs in the same Cluster be a part of the same Clique. 3) Never have two AMPs in the same Cluster be a part of the same Node. If you do this correctly you will understand that we can lose a Node or a Clique and still not have more than two AMPs within a Cluster fail. The following page shows the four AMPs in Cluster 1. They are each in a different Clique and a different Node. I have circled the four AMPs and placed the number 1 inside them to represent Cluster 1. Notice that the four AMPs in cluster 1 are very far apart physically. Notice the four AMPs in Cluster 2. Notice the four AMPs in Cluster 3. Notice we have clusters from Cluster 1 to Cluster G. We dont normally call our Clusters by their numbers, but I want you to also notice the four AMPs in Cluster G. If we lost every node in Clique number 1 we would have actually lost one AMP in every Cluster.
342
343
Once the game is over, the king and the pawn go back in the same box.
- Italian Proverb
The Down AMP Recovery Journal (DARJ) is started on all AMPs in the cluster when an AMP is down. This allows for three AMPs to check on their mate. Since there are four AMPs in most clusters and all Fallback for a particular AMP remains within the cluster there are Three AMPs that will hold Fallback rows for a down AMP. The Down AMP Recovery Journal (DARJ) is a special journal used only for FALLBACK rows when an AMP is not working. Like the TRANSIENT JOURNAL, the DARJ, also known as the RECOVERY JOURNAL, gets it space from the DBCs PERM Space. When an AMP fails, the rest of the AMPs in its cluster initiate a DARJ. The DARJ keeps track of any changes that would have been written to the failed AMP. When the AMP comes back online, the DARJ will catchup the AMP by completing missed transactions. Once everything is caught-up the DARJ is dropped.
344
345
Permanent Journal
346
347
A real friend is one who walks in when the rest of the world walks out.
Walter Winchell
The example created the table called Employee in the Teratom database, and is FALLBACK protected. A BEFORE Journal and a DUAL AFTER Journal are specified. Remember that both FALLBACK and JOURNALING have defaults of NO - meaning if you dont specify this protection at either the table or database level the default is NO FALLBACK and NO JOURNALING.
348
349
350
351
352
353
354
355
356
357
I became a policeman because I wanted to be in a business where the customer is always wrong.
Anonymous
Every Permanent Journal is comprised of three areas. They are the Active Current Journal, the Saved Current Journal and the Restored Journal. This slide demonstrates the purpose of the Active Current Journal and the Saved Current Journal portions of the Permanent Journal. When a table is defined with a Permanent Journal and a change takes place on a row the changed row is written (appended) to the Active Current Journal. When the Database Administrator (DBA) submits a CHECKPOINT WITH SAVE statement the Active Current Journal appends its rows to the Saved Current Journal. Then, Teradata automatically deletes the Active Current Journal rows so a fresh start to the Active Current Journal can take place. When the DBA submits an ARCHIVE JOURNAL TABLE statement the Saved Current Journal is copied to tape. This is usually done on a daily basis. The DBA must submit a DELETE JOURNAL statement to delete the Saved Current Journal. It is never done automatically.
358
359
360
361
362
363
364
365
Some birds arent meant to be caged, their feathers are just too bright. And when they fly away, the part of you that knows it was a sin to lock them up, does rejoice.
Shawshank Redemption
You dont lock up a bird, but you always lock a query. Teradata uses a lock manager to automatically lock at the database, table or row hash level. Teradata will lock objects using four types of locks: Exclusive - Exclusive locks are placed only on a database or table when the object is going through a structural change. An Exclusive lock restricts access to the object by any other user. This lock can also be explicitly placed using the LOCKING modifier.
Write - A Write lock happens on an INSERT, DELETE, or UPDATE request. A Write lock restricts access by other users. The only exception is for users reading data that are not concerned with data consistency and override the applied lock by specifying an Access lock. This lock can also be explicitly placed using the LOCKING modifier.
Read - This is placed in response to a SELECT request. A Read lock restricts access by users who require Exclusive or Write locks. This lock can also be explicitly placed using the LOCKING modifier. Read locks put the word integrity in data integrity. If you have a multi-user environment with updates occurring and you need to keep data consistent, you want a read lock.
Access - Placed in response to a user-defined LOCKING FOR ACCESS phrase. An Access lock permits the user to access to READ an object that may already be locked for READ or WRITE. An access lock does not restrict access by another user except when an Exclusive lock is required. A user requesting access cannot be concerned with data consistency. When Teradata locks a resource for a user the lifespan of the transaction lock is forever or until the user releases the lock. This is different then a deadlock situation. youngest query is always aborted. 366 If two transactions are deadlocked the
367
When you go into court you are putting your fate into the hands of twelve people who werent smart enough to get out of jury duty.
- Norm Crosby
Teradata uses a lock manager to be judge, jury, and executioner of SQL. There are four locks placed on objects at the database, table, or row hash level.
368
369
In the end well remember not the words of our enemies, but the silence of our friends.
Martin Luther King, Jr.
Dr. Martin Luther King Jr. was a great man with a message that will live on forever. Dr. King believed in equality. In dedication to Dr. King we have set up this quiz about lock equality. Which lock level will Teradata use for the SQL on the following page. Will the SQL cause Teradata to place the lock at the Database level, Table level, or Row level?
370
371
If you are planning for a year, sow rice; if you are planning for a decade, plant trees; if you are planning for a lifetime, educate people.
- Chinese Proverb
On the following page we have the answers to the quiz.
372
373
You can make more friends in two months by becoming interested in other people, than you will in two years by trying to get other people interested in you.
- Dale Carnegie
Someone on a New York Street Corner was asked, How do you get to Carnegie Hall? He replied, Practice man practice! The following page is designed so you can practice the art of understanding Teradata locks. What I want you to know is that the only lock that users have control over is the Access Lock. If you want to read a table, but dont want to wait on a WRITE Lock, and dont care that the answer set may not be perfect then you want an ACCESS Lock. This is also called a Dirty Read or a Read without Integrity because the data isnt always perfect. Let me explain. When someone is updating a table they are given a WRITE Lock. As they perform the UPDATE a user who wants a READ lock on the table writes their SQL. Teradata makes the READ lock wait until the WRITE lock has completely updated the table. This could take a long time. An Access lock says, I know someone is already updating the table, but I dont want to wait. I am merely trying to get an average of sales for the week and I dont have to have everything perfect. I also want you to notice that an Exclusive Lock, Write Lock and the Read Lock are determined by Teradata based on the SQL that is written.
374
375
376
377
378
379
380
381
Everyone is kneaded out of the same dough but not baked in the same oven.
- Yiddish Proverb
The quiz on the following page will give you an opportunity to understand how locks move up and read rows of a table simultaneously. If everyone in a company only did SELECTs, then there would only be READ locks. Read locks are compatible so everyone could always immediately read any table they want. However, most of the time a data warehouse environment has users who want to read and analyze data, but others doing updates. This is where there is danger in slowing users down because they often have to wait to access a table. Remember what is compatible to what and also remember that you can only move up 1 person at a time, and that is only if you are compatible!
382
383
384
385
386
387
388
389
390
391
Money doesnt bring happiness. People with ten million dollars are no happier than people with nine million dollars.
- Hobart Brown
The first thing you will see in an EXPLAIN Statement, which is the PEs plan in English, will usually refer to locking. The first line usually says something like, Locking a Pseudo Table for Read. This means that you are now in line in the Pseudo Table. The next line states We lock the Employee_Table for Read, which means you have moved to the front of the line and the lock has been place the table. I like to explain this to users because the EXPLAIN can really give insight into the Parsing Engines plan, but users are often confused by the term Pseudo Table. Now you know!
392
393
Destiny is not a matter of chance, it is a matter of choice; it is not a thing to be waited for, it is a thing to be achieved
- William Jennings Bryan
Sometimes your SQL is not a thing to be waited for; it is a thing to be aborted. When the NOWAIT is used, if a lock request cannot be responded to immediately the transaction will abort. The NOWAIT option is used when it is not desirable to have a request wait for resources, or cause resources to be tied up while waiting. The NOWAIT option is an excellent way to avoid waiting on conflicting locks. Dont make the mistake of thinking the NOWAIT option means you have a free dash to the front of the lock line. The NOWAIT option dictates that a transaction is to ABORT immediately if the LOCK MANAGER cannot immediately place the lock. Use a LOCKING modifier with the NOWAIT option when you dont want requests waiting in the queue. A 7423 return code informs the user that the lock could not be placed due to an existing, conflicting, lock.
394
395
People can have the Model T in any color so long as its black.
- Henry Ford
The rules of locking are right there on the slide on the following page.
396
397
398
399
400
401
The great tragedy of science, the slaying of a beautiful theory by an ugly fact.
- Thomas Henry Huxley
Teradata performs Full Table Scans as fast as any other vendor. They fly through data because of their parallel processing design. In a Full Table Scan each AMP must read all their rows for a particular table thus each row of a table is examined once. You can see in the picture on the following page that we use every AMP when you see the words All-AMPs Retrieve. You can see that every row is read from the words All-Rows Scan. So, when you see an All-AMPs Retrieve by way of an All Rows Scan you will know it is merely a Full Table Scan. There is nothing wrong with doing a Full Table Scan if necessary, but it is silly to do a Full Table Scan if you dont need to do it. Some users wont know the Primary Index and write their queries to do a Full Table Scan when they could have used the Primary Index or a Secondary Index in the query. This wastes time and money. If you think you are writing a query that should use the Primary Index or a Secondary Index or only search certain Partitions and you see the Full Table Scan you should investigate what is wrong. Never do a Full Table Scan unless it is the only choice! For example, if you wanted to find the Average Salary of the employees in the Employee_Table you would have to read every row to get that answer set. Doing a Full Table Scan in that case is 100% acceptable! If however you worked in Human Resources and an employee came in for a meeting and you wanted to look their information up in the Employee_Table you should find out the Primary Index of the Employee_Table. If it is the column Employee_Table you should ask the employee what their Employee_No is and then use that Employee_No in the WHERE clause of the SQL. It will be a 1 AMP operation retrieving only 1 row. It is very quick! Some Human Resources users will leave off the WHERE clause and do a Full Table Scan and then scroll down through the huge report until they find the right employee. This is a huge waste of resources in Human Resources. The irony the irony!
402
403
404
405
406
407
Colleges are places where pebbles are polished and diamonds are dimmed.
- Robert G. Ingersoll
The picture on the following page shouldnt be part of the EXPLAIN chapter because it is merely showing the table definition, which is often referred to as the Data Definition Language (DDL). I want you to see that our table called Sales_Table_PPI has been partitioned. On the following page we will show you a query and its EXPLAIN that doesnt do a Full Table Scan, but one that reads only certain partitions.
408
409
410
411
412
413
A new idea is delicate. It can be killed by a sneer or a yawn; it can be stabbed to death by a joke or worried to death by a frown on the right persons brow.
- Charles Brower
Teradata joins up to 128 tables in a single query. That is amazing, but something that most people dont know is two things: 1) Teradata joins only two tables at a time 2) The rows being joined must reside on the same AMP Most of the time two rows are being joined they will be on different AMPs and Teradata must move them to the same AMP for the joining process. Teradata will do this by either redistributing one or both of the tables or by duplicating the smaller table on all AMPs. In the EXPLAIN on the following page the Parsing Engine has decided to do a Full Table Scan on the Order_Table and then Redistribute the Order_Table by the Customer_Number. This will match up the Customer_Number on each AMP from the Order_Table with its associated Customer_Number rows of the Customer_Table. Whenever a join takes place Teradata will need to ensure that matching rows are on the same AMP if they are to be joined. Watch for the words in your EXPLAIN such as Duplicated on all AMPs or Redistributed by and you will know data is moving across the BYNET in order to place the matching rows on the same AMP for the join process.
414
415
There is one thing stronger than all the armies in the world; and that is an idea whose time has come.
- Victor Hugo
One of the most exciting (and rare) things to see in an EXPLAIN is a BMSMS statement. This happens when the columns in the WHERE clause each have a Non-Unique Secondary Index (NUSI). When multiple NUSI columns are ANDed together with the AND Clause the Parsing Engine may decide to build a bit-map. This can really speed up a large query because the PE will only read from the secondary index Subtables and build a bit-map of the process. This is fast, fast, and fast! For a BMSMS to take place the Parsing Engine wants you to Collect Statistics on any column where there is a Non-Unique Secondary Index (NUSI). This gives the Parsing Engine confidence to perform the bit-map process. The bit-map process takes a little longer to set up, but once it is set up it can speed up the querying enormously. Notice that we are using the columns Shipdate and Partkey in our query example. Also notice the AND in between these two columns in the query. Both of these columns have Non-Unique Secondary Indexes (NUSI) on them. When multiple NUSIs in a query are separated by the AND keyword they are considered ANDed together and the bitmapping process can take place.
416
417
418
419
As I would not be a slave, so I would not be a master. This expresses my idea of democracy.
- Abraham Lincoln
For two rows to be joined together they must be physically on the same AMP! Wow! That is probably a surprise, but Teradata is an MPP Parallel Processing System. This means that each AMP has its own disk, its own memory, and its own processor. So, like any system the rows will be moved to the AMPs memory and joined in memory. Therefore two rows being joined must be physically together on the same AMP. The picture on the following page shows you some fundamentals of this concept that are very important for you to get inside your brain. First of all rows reside on a particular AMP because of the Primary Index Value. It is the Primary Index that is hashed, checked with the hash map, and distributed to the proper AMP. Most of the time rows from two joining tables wont match up perfectly on the same AMPs so Teradata will redistribute or duplicate the data to make that happen. Then the join can take place. In the beginning this can be a tricky and confusing concept, but we are going to take our time and get this down to a science.
420
421
A Join Example
422
423
424
425
The man with the best job in the country is the Vice President. All he has to do is get up every morning and say, Hows the President?.
- Will Rogers
In the next example on the following page notice the ON Clause in the SQL Join example and notice that Customer_Number is the Primary Index of the Customer_Table, but not the Primary Index of the Order_Table. Joining rows will not be on the same AMP. The Parsing Engine will instruct the Order_Table to be redistributed in spool and rehashed temporarily by Customer_Number. This is literally like making the Primary Index of the Order_Table Customer_Number for just this query. Once the hashing is redone the rows of both tables being joined with be on the same AMP. We have tricked the system and now the join can take place. When you see the words Redistribution in the EXPLAIN plan you will now know that Teradata is temporarily changing the Primary Index of a table for just one query.
426
427
428
429
430
431
A mans feet should be planted in his country, but his eyes should survey the world.
- George Santayana
The picture on the following page shows an example of a big table small table join. Notice that the Customer_Table only has 4 rows. Notice that the Order_Table has 4,000 rows. Now Notice that both tables have spread the rows of their respective tables evenly across all AMPs. Here is what Teradata is about to do. The Parsing Engine will come up with a plan to join these two tables by first commanding the AMPs to bring back any Customer_Table rows. After this process the Parsing Engine is looking at all four rows of the Customer_Table. The Parsing Engine then copies all 4 rows to every AMP. The following couple of pages will demonstrate this clearly!
432
433
434
435
Experience is the worst teacher; it gives the test before presenting the lesson.
- Vernon Law
As you can see in our example on the following page we didnt move the Order_Table, but duplicated the four rows of the Customer_Table. Now you can see how easy the rows are to join. This shows only one AMP, but you can imagine the same process going on with all AMPs simultaneously because the four rows from the Customer_Table have been duplicated on all AMPs.
436
437
Great Spirit, help me never to judge another until I have walked in his moccasins.
- Sioux Indian Prayer
The Parsing Engine will decide whether to duplicate the smaller table or redistribute one or both of the tables to make the matching rows appear on the same AMP. The Parsing Engine is a cost-based optimizer and will attempt to do what is easiest, fastest, and moves the least amount of data. The question to be answered is, How many rows will be in spool if the Parsing Engine decides to redistribute the Order_Table by Customer_Number in spool? The answer is on the next couple of pages! Take a guess!
438
439
When I give a lecture, I accept that people look at their watches, but what I do not tolerate is when they look at it and raise it to their ear to find out if it stopped.
- Marcel Achard
As you can see on the following page there were 4,000 rows in the Order_Table that we can refer to as the Base Table. Then when we redistributed the 4,000 rows by hashing the Customer_Number there were 4,000 rows in spool. When you redistribute a table the exact same amount of rows are merely rehashed. The only difference is that the rows will move to a different AMP. The great news is that they are moved to the same AMPs as their matching counterparts in the Customer_Table.
440
441
It is important that students bring a certain ragamuffin, barefoot, irreverence to their studies; they are not here to worship what is known, but to question it.
- J. Bronowski, The Ascent of Man
The following page is an excellent example of a join and the importance of matching rows being on the same AMP. Notice first the Customer_Table in blue. Notice that on this AMP that customers 1-8 landed on this AMP. Also realize that when we rehashed the Order_Table (in yellow) by Customer_Number that customers 1-8 landed on this AMP also. Hashing brilliantly places like values together on the same AMP and this allows for Teradata to fly through joins.
442
443
444
445
446
447
Those who dance are considered insane by those who cannot hear the music
- George Carlin
The following slide shows the type of information you can attain by using the System Calendar. Notice some of the key entries: Day_of_Week always goes from 1-7 with 1 being a Sunday. Day_of_Calendar are the Julian days since January 1, 1900. Week_of_Month, Week_of_Year, and Week_of_Calendar will have a zero in them for the first partial week. The first full week will have a 1 and so on. Month_of_Calendar and Quarter_of_Calendar also start from the January 1, 1900 date. The rest are fairly self explanatory.
448
449
I cannot imagine any condition which would cause this ship to founder. Modern shipbuilding has gone beyond that.
- E. I. Smith, Captain of the Titanic
The System Calendar is great for simple things like finding out whether you were born on a Saturday night or on a Monday, but the read gold is done when you join the System Calendar to another table in a query. On the following page we have joined our Order _Table with the System Calendar where the Order_Date from the Order_Table is equal to the Calendar_Date from the System Calendar. Once the Join takes place we can use the WHERE Clause to pinpoint the exact calendar information we are looking to query. In our example we wanted all orders placed in January during the first full week of the month that happened on a Friday. How about that for fancy SQL writing?
450
451
452
453
454
455
Derived Tables
My best friend is the one who brings out the best in me.
- Henry Ford
On the following page you can see an example of a Derived Table. Derived Tables are created inside a users query for only the life of that query. Notice that after the FROM Clause we have placed brackets (colored in yellow) around the derived query. We have also named the Derived Table TeraTom and then placed a name for the single column we have created (called AVGSAL) inside the Derived Table TeraTom. We can use the column AVGSAL in the SELECT list or later on in the WHERE Clause. Derived Tables are often used with aggregates and serve as a temporary space to query and hold data for the life of a query. In our example on the following page we were able to compare the Average Salary of all employees to see who was making more than the average salary.
456
457
458
459
Volatile Tables
If all my possessions were taken from me, with the exception of one, I would choose to keep the power of Speech. For with it, I would soon regain all the rest.
- Daniel Webster
Notice the picture on the following page and the 3 steps in using a Volatile Table. The first step is to CREATE the Volatile Table. This table will now be available until the end of the session or if the user decides to DROP the Volatile Table. The second step is to populate the Volatile Table with an INSERT/SELECT statement. Now the fun actually starts with the third step because this table is now available for the user to query. Only the user who created the Volatile Table has access to it. That user can run an endless amount of queries and joins against their Volatile Table until session end. All Volatile Tables use the users spool space to populate the table.
460
461
The brain is a wonderful organ. It starts working the moment you get up in the morning, and does not stop until you get into the office.
- Robert Frost
The following page shows an excellent example of the first two steps and they are in the CREATE statement of the Volatile Table and in the materialization of the Volatile table with an INSERT/SELECT. Now I want you to notice that our Volatile Table named Aggy is in materialized in Spool Space, but appears much like a real table. The rows are spread evenly across all the AMPs and this table is ready for action. It can be queried or joined to other tables.
462
463
In case youre worried about whats going to become of the younger generation, its going to grow up and start worrying about the younger generation.
- Roger Allen
The example of the following page shows all three steps for using a Global Temporary Table, but the actual CREATE (step 1) only needs to be performed once by the original CREATING User. After that CREATE statement the table structure will remain in Teradata until the user who created it actually DROPS the table. Global Temporary Tables can be used by any user who has Temp Space. Many users can perform step 2 simultaneously and each will have their own copy of the Global Temporary Table and after their session has ended the data will automatically be deleted, but the table structure will remain.
464
465
There are three great friends: an old wife, and old dog, and ready money.
- Benjamin Franklin
In the picture on the following page you can see we have done an INSERT/SELECT into our Globaggy table. This table is a Global Temporary Table that was created previously. The table is now materialized with data inside Temp space. The rows of the table are spread evenly across the AMPs and this table can be queried or joined with other tables until session end. If 1,000 users did an INSERT/SELECT on GlobAggy then all 1,000 users would have their own version of this table and nobody else can access another users copy.
466
467
Forgiveness does not change the past, but it does enlarge the future.
- Paul Boese
The following page shows that the user has logged off their session and the Globaggy table that used to be filled with data automatically deletes the data, but the table structure stays available on Teradata. The data is gone, but the table structure stays. Why? Check out the next couple of pages for the answer.
468
469
Youre alive. Do something. The directive in life, the moral imperative was so uncomplicated. It could be expressed in single words, not complete sentences. It sounded like this: Look. Listen. Choose. Act.
- Barbara Hall, A summons to New Orleans, 2000
The Global Temporary Table structure is not deleted at session end. Only the data inside the table is deleted at session end. Why? So many users can materialize their own version of the Global Table. This helps in multiple ways: 1) Users populate the table using their Temp Space and then have more Spool Space to actually query the table. 2) Most users wont have PERM Space so they cant create tables or may not know the syntax to create a table. The table structures have been created for them and they are now ready to merely perform an INSERT/SELECT to populate their own version of the table.
470
471
472
473
474
475
The Constitution only gives people the right to pursue happiness. You have to catch it yourself.
Ben Franklin
On the following page you can see the NoPI CREATE statement. This is done when you create the table. This can be done with normal SQL as seen on the following page or it can be done with a FastLoad or Tpump Load Utility. The key word to focus on the following page is the NO PRIMARY INDEX highlighted for your convenience.
476
477
Its not the size of the dog in the fight, but the size of the fight in the dog.
Archie Griffin
Each AMP will receive an equal amount of rows in an attempt by the Parsing Engine to spread the data evenly. Notice the picture on the following page. The Row Hash for every row in the NoPI table is the same. Only the Uniqueness Value is incremented.
478
479
When all you have is a hammer, you tend to see every problem as a nail.
- Abraham Maslow
The example on the next page allows you to realize that the Row Hash on each AMP is different, but once the Row Hash is established on each AMP, all rows contain that exact same Row Hash and each AMP only increments the Uniqueness Value. NoPI tables dont need to be sorted and that is another main advantage if you desire to CREATE a staging table.
480
481
482
483
NoPI Restrictions
He who asks a question may be a fool for five minutes, but he who never asks a question remains a fool forever.
Tom Connelly
The example on the next page shows the restrictions of NoPI Tables.
484
485
486
487
The reputation of a thousand years may be determined by the conduct of one hour.
Japanese Proverb
Teradata has traditionally taken a Before Picture of any row being UPDATED or DELETED. This was always called the Transient Journal and Data Integrity was its main purpose. If a transaction was going to UPDATE or DELETE a row the BEFORE PICTURE would be taken of the row and stored in a journal in case a ROLLBACK was done or in case there was a glitch in the system Now this function is done by the Write Ahead Log or WAL. There are two main pieces to WAL and that is the Wal Log and the Wal Depot. The WAL Log takes a BEFORE and AFTER Picture of a row being UPDATED and each AMP has their own WAL log to make sure that Teradata can Rollback a transaction or Commit the transaction. The WAL Depot stores blocks of UPDATED row(s) it receives from FSG Cache to provide a backup copy of the changes and the COMMIT is considered done. Teradata can then WRITE the block of data to the actual table on disk when it deems it a good time to do so. Teradata uses the WAL Log and WAL Depot for transaction integrity in order to Commit or Rollback the data.
488
489
Never insult an alligator until after you have crossed the river.
African Proverb
Memory is a thousand times faster than disk so AMPs attempt to store hot tables inside memory for fast retrieval. This memory dedicated to each AMP is called File System Generating Cache (FSG Cache). This pool of memory is like each AMP having its own swimming pool that only it can use. Lets imagine that 100 users have been querying the Order_Table. Each AMP will be reading the Order_Table hundreds of times in order to provide answer sets back to the users. All AMP will attempt to keep their Order_Table rows inside the FSG cache in order to speed up reads and writes thousands of times. Remember that each AMP has their own FSC Cache memory, their own WAL Log and their own WAL Depot.
490
491
Let every nation know, whether it wishes us well or ill, that we shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe, in order to assure the survival and the success of liberty
-John F. Kennedy (Inaugural Address 1961) I want to run you through the process of UPDATING a row. The picture on the following page shows the UPDATE statement at the top. Notice the Department_Table inside the AMP. We are going to UPDATE this row and change the Dept_Name from Human to HR. Turn the page and see what happens next!
492
493
494
495
I have found the best way to give advice to your children is to find out what they want and then advise them to do it.
Harry S. Truman (1884 - 1972)
When an AMP is commanded to UPDATE a row that AMP finds the row inside a data block on its virtual disk and transfers the block inside the node into FSG Cache. Now the AMP can process the rows as fast as lightning. This process of moving data inside FSG Cache is how an AMP can READ, UPDATE, or DELETE a row.
496
497
You are educated when you have the ability to listen to almost anything without losing your temper or self-confidence.
Robert Frost
The WAL Log takes a BEFORE picture so a ROLLBACK can be performed and it takes an AFTER picture as a backup to ensure integrity temporarily until the AMP physically writes the data back to its virtual disk. The AMP will UPDATE the row inside its FSG Cache, then write the AFTER image of the row to the WAL log. Now the WAL log contains a BEFORE and AFTER picture of the row. The AMP cans send a message to the PE that the row has been updated. The row really hasnt been completely updated because it hasnt physically been written back to the AMPs disk. Only the WAL Log rows were written to the AMPs disk. The AMP is confident that it can write the update back to its physical disk for permanent storage when the AMP deems it most efficient to write the row back. Plus the AMP knows it has added insurance because the WAL log has both the BEFORE and AFTER image. Even in a disaster where the Teradata System goes down the AMP knows when Teradata is rebooted it can complete the transaction of writing to disk by using the WAL log to catch-up and complete the COMMIT or ROLLBACK.
498
499
Even if I knew that tomorrow the world would go to pieces, I would still plant my apple tree.
Dr. Martin Luther King, Jr.
The AMP could have many updates to rows inside a block of data. Before the AMP writes the block back to its physical disk it writes the entire block to the WAL Depot. Now it has a backup of the entire block of data in case something goes wrong. If the AMP writes the data from FSG Cache back to its physical disk successfully the WAL Depot backup copy can be erased. The WAL Depot only serves as a backup copy in case a problem should occur.
500
501
You dont drown by falling into the water; you drown by staying in the water.
-Edwin Louis Cole
The example on the next page shows a pictorial of the erasing of the WAL Log and WAL Depot. The changes have been made and there is no more reason to have a backup of the rows and the blocks that were changed. These have been written back to the AMPs physical disk successfully. Think of the WAL Log and WAL depot as wearing a seat belt when you are driving in a car. You take off your seat belt when you get home and leave the car dont you?
502
503
504
505
To succeed... you need to find something to hold on to, something to motivate you, something to inspire you.
-Tony Dorsett
Teradata Virtual Storage or TVS for short is one of the most exciting improvements Teradata has made. TVS changes the way AMP access their disks. TVS manages the disks for each AMP. This will be explained throughout the chapter, but the following page shows some of the topics and advantages that TVS brings to Teradata.
506
507
508
509
Looking to the stars always makes me dream, as simply as I dream over the black dots representing towns and villages on a map. Why, I ask myself, shouldn't the shining dots of the sky be as accessible as the black dots on the map of France?
-Vincent Van Gogh
In the 1990s an AMP still had one Virtual Disk and four Physical Disks, but the disks were mirrored. The AMP would store data on one disk and then mirror that disk in case of a failure. As you can see on the following page each AMP had two disks for data and two mirrored disks. FALLBACK wasnt a necessity anymore because of the disk protections.
510
511
512
513
Ones dignity may be assaulted, vandalized, and cruelly mocked, but it cannot be taken away unless it is surrendered.
- Michael J. Fox
Different cylinders store different types of data. For example, some cylinders will be used to hold Permanent Data, while completely other cylinders will be used for Spool files. The following page shows you the type of data used in cylinders. You wont have a cylinder share. This means you cant use a single cylinder to store permanent data and spool data simultaneously. Once the first row of a table is written to a cylinder as PERM Space that cylinder cant also be used to store Spool Files. Cylinders are dedicated to PERM, SPOOL, TEMP, Permanent Journals, and for the WAL Logs.
514
515
Make sure you have finished speaking before your audience has finished listening.
- Dorothy Sarnoff
The example on the next page shows cylinders that sit on top of a disk platter. Notice the outside of the disk and see how many more cylinders there are versus the inside track of cylinders. This makes the outside track faster because with one revolution of the spinning of the disk the system can read so many more cylinders on the outside track. I want you to merely realize that the outer track reads and writes are considered faster and the inner tracks, which hold less cylinders are considered the slower tracks.
516
517
518
519
It doesnt make a difference what temperature a room is, its always room temperature.
- Steven Wright
TVS will place data that is being accessed often on the outer tracks of the disks. This is done so Teradata users can feel the need for speed. TVS will also place the data that is not being accessed very often on the inside tracks of the disk. This is called a MultiTemperature data warehouse. TVS gathers metrics automatically about how often a cylinder is accessed and moves the data blocks inside cylinders accessed the most to the outer tracks of the disk to improve the access speeds.
520
521
They always say time changes things, but you actually have to change them yourself.
- Andy Warhol
In the past you needed to pretty much double your disk space if you needed more space added to your Teradata system. Notice in the picture on the following page that the system on the top of the picture shows each AMP connected to only two physical disks. The upgraded system doubled the disks to four and we doubled our system size. This is considered very expensive.
522
523
524
525
526
527
Just because something doesnt do what you planned it to do doesnt mean its useless.
- Thomas Edison
This is the most exciting part about TVS. Teradata is mixing Solid State Drives with traditional spinning disk drives. Solid State Drives are faster because they use Flash Drive technology. Yes, we are talking about the same flash drives you have used to copy a file from one computer to another. These are called Solid State Drives or SSD Drives and they are 100 times faster than traditional disks. This is really like having memory speed on physical disk. TVS will place the hot data on the hot Solid State Drives, the warm data on the faster 146 GB disks, and the data that isnt accessed very much, referred to as cold data on the larger slower spinning disks. This is referred to as a Multi-Temperature data warehouse.
528
529
It isnt the mountains ahead that wear you out, its the grain of sand in your shoe.
- Robert Service
The goal for Teradata is eventually have nothing but Solid State Drives for its storage, but the costs are too high. This will eventually happen when the costs become lower.
530
531
Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house and a collection of facts is not necessarily science.
- Henri Poincare
TVS gathers metrics about cylinders so it can determine the hot, warm, and cold data. This is done in the background. TVS will actually move about 10% of the data each week to the appropriate disk types and appropriate tracks on disks. The DBA can also command the system to move the data inside the cylinders to their respective hot, warm, or cold areas.
532
533
534
535
536
537
538
539
540
541
542
Primary Index
AMP 55
First Joe Mary Dave Sandy Sue Bill Jill Ty Mo Jay Mick May Jan Hanna Hans Tan Emp# 61 65 63 3 7 51 68 69 24 49 22 8 11 12 67 23 Dept 10 20 30 30 10 20 40 30 20 10 30 20 30 20 30 40 Salary 50000.00 64000.50 84000.60 90490.90 25000.50 26089.40 85000.40 65876.40 86900.40 58000.50 45000.40 86000.89 65000.50 56450.00 98654.00 87659.50
2
A Row Hash Lock is placed on all rows on AMP 55 that have a Row Hash of 0001.
Row_ID 0001,1 0001,2 0001,3 0001,4 0001,5 0001,6 0101,1 0110,1 0111,1 1000,1 1001,1 1010,1 1011,1 1100,1 1101,1 1110,1
Last Jones Jones Jones Jones Jones Jones Bjorn Patel Noone Gore Samson Ruler Baker Doron Mistel Wan
1
The PE knows that Last is the Primary Index. It hashes Jones and the Row Hash is 0001. The PE now knows AMP 55 holds the row(s) in this 1-AMP Operation.
543
I dont know who my grandfather was. I am more interested in who his grandson will become.
Abraham Lincoln, 16th president of the United States
My son once told me he did not feel like studying. I said to him, When Abraham Lincoln was your age, he studied by candlelight. My son retorted, When Abraham Lincoln was your age, he was president. Data within a warehouse environment is often historic in nature, so the sheer volume of data can overwhelm many systems. But, not Teradata!
Abraham Lincoln will go down as one of the greatest presidents in history, but Teradata is even better because it will not go down when it loads history.
Tom Coffing, 1st president of Coffing Data Warehousing
Teradata is so advanced in the data-loading department that other database vendors cant hold a candle to it. A Teradata data warehouse brings enormous amounts of data into the system. This is an area that most companies overlook when purchasing a data warehouse. Most company officials think loading of data is simply that just loading data. Some people actually ask, Are data loads that critical? Come on, ASCII stupid question and get a stupid ANSI. Data warehouses fail because customer cannot load the data fast enough once it reaches a certain volume. As one Teradata developer said, It is not the load that brings them down, but the way they carry it. Even an experienced body builder must use a good technique to lift the weight over his head. While most database vendors are new to the data warehouse game, Teradata has had 15 years of experience of loading the largest data
544
Master the Teradata Architecture warehouses in the world. The combination of FastLoad, MultiLoad, and TPump can load millions, even billions of records in record time. (SHOULD WE HAVE A HEADER???) FastLoad is designed to load flat file data from a mainframe or LAN directly into an empty Teradata table. This is how a Teradata table is populated the first time. I have personally seen Teradata load over one billion large rows in less than 6 hours. Plus, I have seen Teradata load millions of rows in minutes. How is Teradatas speed and performance accomplished? Once again its through the power of parallel processing. Where FastLoad is meant to populate empty tables with INSERTs, MultiLoad is meant to process INSERTs, UPDATEs, and DELETEs on tables that have existing data. MultiLoad is extremely fast. One major Teradata data warehouse company processes 120 million inserts, updates, and deletes nightly during its batch window. The TPump utility is designed to allow OLTP transactions to immediately load into a data warehouse. When I started working with Teradata, more than 10 years ago, most companies loaded data on a monthly basis. Suddenly, companies began to load data weekly. Today, most companies load data nightly, and industry leaders are loading data hourly. TPump is the beginning step of an Active Data Warehouse (ADW). ADW combines OLTP transactions with the power of a Decision Support System (DSS). The TPump utility theoretically acts like a water faucet. TPump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse daily rush hour. It can also be automatically preset to load levels at certain times during the day, and can be modified at any time. Also, TPump locks at a row level so users have access to the rest of the rows while the table is being loaded. Another advantage of this load utility is that it allows for multiple updates to be conducted on a table simultaneously. When the utilities start, the Parsing Engine comes up with a plan for the AMPs. The Parsing Engine then steps back and lets the AMPs do their work. The data is loaded in large 64K blocks. Each AMP is given a 64K block of rows for loading. Like a line of workers trying to pass sand bags to prevent a flood, Teradata passes these blocks from AMP to AMP until all the data is on Teradata. Next, all AMPs take the blocks they received and hash the Primary Index value sending the rows over the BYNET to their destination AMP. Once this is done, each AMP sorts its data by Row ID and the table is ready for business.
545
FastLoad
FastLoad populates empty tables at the block level. Teradata LOADs using FastLoad.
546
FastLoad Picture
Input File from Mainframe or LAN 64K Block 64K Block 64K Block 64K Block
Teradata
PE
AMP
AMP
AMP
AMP
Fastload inserts into empty tables at the Block Level. No Secondary Indexes, Referential Integrity or Triggers allowed.
AMP AMP AMP AMP
Empty Table
Empty Table
Empty Table
Empty Table
547
Multiload
Multiload loads to populated tables at the block level. Teradata UPDATEs using MULTILOAD.
548
Multiload Picture
Input File from Mainframe or LAN 64K Block 64K Block 64K Block 64K Block
Teradata
PE
AMP
AMP
AMP
AMP
Multiload inserts, updates, upserts and deletes rows into populated tables at the Block Level. It does not allow Triggers, Unique Secondary Indexes (USIs) or Referential Integrity.
AMP AMP AMP AMP
Populated Table
Populated Table
Populated Table
Populated Table
549
TPump
You dont drown by falling into the water; you drown by staying in the water.
-Edwin Louis Cole
The TPump utility is designed to allow OLTP transactions to immediately load into a data warehouse. When I started working with Teradata, more than 10 years ago, most companies loaded data on a monthly basis. Suddenly, companies began to load data weekly. Today, most companies load data nightly, and industry leaders are loading data hourly. TPump is the beginning step of an Active Data Warehouse (ADW). ADW combines OLTP transactions with a Decisions Support System (DSS). If the data is not flowing, a company can drown in it! The utility is called TPump because it theoretically acts like a water faucet. TPump can be set to full throttle to load millions of transactions during off peak hours or turned down to trickle small amounts of data during the data warehouse rush hour. It can also be automatically preset to load different levels at certain times during the day, and can be modified at any time. Also, TPump locks at a row level so users have access to the rest of the rows while the table is being loaded. Basics: Loads data to Teradata from a Mainframe or LAN flat file; Processes INSERTS, UPDATES, or DELETES; Tables are usually populated; It can have secondary indexes, triggers, and referential integrity; It locks at the row level.
TPump is used for continuous updates to rows in a table. Teradata STREAMs using TPump.
550
TPump Picture
Teradata
PE
AMP
AMP
AMP
AMP
Tpump inserts, updates, upserts and deletes rows into populated tables at the Row Level. It supports Triggers, all Secondary Indexes and Referential Integrity.
AMP AMP AMP AMP
551
FastExport
The most exciting phrase to hear in science, the one that heralds the most discoveries, is not Eureka!, but Thats funny
Isaac Asimov
The most exciting words when loading or unloading data is That Fast. Put a seat belt on before running FastExport because this utility will blow your socks off. FastExport is designed to export Teradata data to a flat file on a mainframe or LAN. FastExport merely takes an SQL Select command and places the output to a host. FastExport exports data from multiple tables and exports data to a host file. Teradata LOADs using FASTLOAD Teradata UPDATEs using MULTILOAD Teradata STREAMs using TPump Teradata Exports using FASTEXPORT
552
FastExport Picture
Teradata
PE
Host File
AMP
AMP
AMP
AMP
Fastexport uses a SELECT statement to retrieve rows from one or more tables and exports the result set to a host file on a mainframe or LAN.
AMP AMP AMP AMP
Populated Table
Populated Table
Populated Table
Populated Table
553
554
555