Teradata Overview

Teradata
An Overview
Traditionally, data processing scenarios have been divided into two categories...
Scenario 1 Customer places an order for iPhone through online store. Steps involved:
A new order O1 is created in the Order table Customer number C1, Product number P1 pulled from master tables and enriched in Order table row A order confirmation number sent to the customer End of process
Key aspects of this transaction:

Quick-in, quick-out approach to the database Only a few of many possible tables were accessed None of the accessed tables, some of which might have billions of rows, were scanned Very little I/O processing was required to complete the transaction
This data processing scenario is called On-line Transaction Processing (OLTP)
Data Processing Scenarios

Scenario 2 Business Analyst wants to identify the customers who bought iPhone in Q1, but are not iPod owners Steps involved:
Product numbers P1, P2 for iPhone, iPod fetched from master table Order Item Shipped table scanned for P1, order date Q1 Result Set 1 Order Item Shipped tables scanned for P2 Result Set 2 Result Set 1 minus Result Set 2 will give required customer segment End of process
Key aspects of this transaction:

Processing-intensive, hence resource-intensive Large number of tables that had to be accessed in order to answer it Requires massive searches, and sometimes multiple scans are required Further processing of fetched data required using aggregation, joins, sorts, conditional requirements, and so on
This data processing scenario is called Decision Support System (DSS)
Access patterns are different, and hence
The access patterns of these two approaches are very different and hence they make very different demands on the underlying database engine The basic database architecture has to be different to be optimized for one type of processing Teradata leader in DSS and Data warehouse space
What is Teradata
Teradata is a Relational Database Management System (RDBMS) composed of hardware and software Designed for worlds largest commercial databases. Used by Customer who are looking out for answers to their business questions from data of over 1 Terabyte
6 of the top 10 Retailers 6 of the top 9 Communications companies Over 40% of the leading Manufacturers in the world 3 of the top 4 Blue Cross/Blue Shield insurance companies Many of the world's leading Banks
Teradata a brief history
1979 - Teradata Corp founded in Los Angeles, California. Development begins on a massively parallel database computer 1984 - Teradata sells first DBC/1012 1986 - Product of the Year 1990 - First Terabyte system installed and in production 1992 - Teradata is merged into NCR 1995 - Teradata Version 2 for UNIX operating systems released
Why Teradata
Capacity:
Scaling from Gigabytes to Terabytes of detailed data stored in billions of rows Scaling to thousands of millions of instructions per second (MIPS) to process data
Performance:
Shared Nothing Architecture - able to achieve parallelism in each and every stage of query execution Makes Teradata Database faster than other relational systems
Single Data Store:
Can be accessed by network-attached and channel-attached systems Supports the requirements of many diverse clients
Fault Tolerance & Availability:
High fault tolerance, no single point failure Automatically detects and recovers from hardware failures
Data Integrity: Scalability:
Ensures that transactions either complete or rollback to a stable state if a fault occurs Linearly expandable - as your database grows, additional nodes may be added Allows expansion without sacrificing performance
Teradata Architecture, the SMP

CPU (Processors) Node Vprocs AMPs AMPs: Access Module set of Parsing Processor is a Processors Virtual Engine: software processes running on a Storing the SQL Syntaxrows to and Checks and retrieving node. Each Vproc is a separate, from the disks independent copy of the processor Resource Availability and Rights software isolated from the other Lock management Parses the SQL vprocs but sharing some of the Sorting rows and Aggregating physical resources of the node such Generates AMP Steps columns as memory and CPUs. Creates plan Join processing Dispatches to the AMPs over Output conversion and formatting BYNET Creating answer sets for clients EBCDIC-ASCII Conversion Disk space management and Handle up to 120 User Sessions Accounting Special utility protocols Recovery processing
PEs
Vdisks
This is called SMP Symmetric Multiprocessor - A multiprocessing node that contains a number of central processing units sharing a single memory pool "Shared Nothing Architecture" - each AMP has its own disk (data) and it shares this with no other AMP and solely responsible for any changes/access to that data
And then comes the MPP

BYNET: Dual redundant, fault-tolerant bi-directional interconnect network that enables: Automatic load balancing of message traffic Automatic reconfiguration after fault detection
BYNET
Scalable bandwidth as nodes are added The BYNET is responsible for: Broadcast, multicast, and point-topoint communications between nodes and virtual processors Merging answer sets back to the PE Making Teradata parallelism possible
MPP (Massively Parallel Processing) consists of a number of nodes (SMPs) that work on a problem at the same time Each node (SMP) has one or more CPUs, own memory, I/O, network connections and disk arrays and doesn't share its resources with other nodes
Important components
SMP Symmetric Multiprocessing is a single node that contains multiple CPUs sharing memory pool. MPP SMP combined with a communication network (BYNET) form a MPP. A MPP comprises of two or more loosely coupled SMP nodes connected by the BYNET with shared SCSI access to multiple disk arrays BYNET Hardware inter-processor network to link nodes on an MPP system. It implements point to point, multicast, broadcast communications depending upon situation. BYNET is usually used for merging and sorting of data from different nodes. The accumulated data is then sent back to the User. Disk Array Teradata employs RAID storage technology where drives are configured logically in one or more logical unit (LUN) which is further sliced into Pdisk that is assigned to each AMP. Group of Pdisk assigned to a AMP is called Vdisk.
More Definitions
PDE - Parallel Database Extension is an interface layer on top of operating system. It enhances the processing by providing capability of parallel processing and priority scheduling. It executes Vprocs. It take advantage of BYNET and Shared Disk hardware to improve performance. It may visualized as a layer on top of Operating System File System - Teradata File System service calls allow Teradata RDBMS to store and retrieve data efficiently without being concerned about underlying operating system interfaces. It divides the disk in to logical blocks, MI, CI, CID, DB, DBD TPA - Teradata Parallel Application is responsible for distribution, coordination and balancing of processes/threads across nodes TDP - Teradata Director Program is responsible for session balancing across multiple PEs, failure notification, logging, verification, recovery, restart and security
Logical Processors
VPROCS - Virtual Processors. Vprocs are set of software processors that run on a node under Teradata PDE within the multitasking environment of the operating system. A single node (SMP) can have as high as 128 Vprocs
PE - Parsing Engine performs session control and dispatches tasks to fetch, return and merge data. It communicates with the client system on one side and with the AMPs on the other side (via BYNET) AMP - Access Modular Processor retrieve and update data on the virtual disks. It is accountable for doing locking, joining, sorting, aggregation, data conversion, disk space management, accounting, and journaling
A single PE can handle a request at a time. This request is parsed, optimized, steps are built and then dispatched to corresponding AMP(s) An AMP has 80 worker task which perform different kind of work related to the steps. If the request is a select, these worker tasks after finishing the work sends data to BYNET where it is merged and sorted PE dispatches the resultant data to the user
Query Lifecycle
Client Server
Application sends the request Application sends the request WHERE id = 4; SELECT * FROM t1 to the PE - PE sends back the to the PE - PE sends back the SELECT * FROM t1 WHERE id IN (2,8); acknowledge to application acknowledge to application The SQL is parsed by the PE CLI The SQL is parsed by the PE CLI
PE uses the Hashmap to locate PE uses the Hashmap to locate the AMP the AMPs TDP (Teradata Director Program) PE sends the request to the PE sends the request to the particular AMP - AMP sends back individual AMPs - AMP sends Hashmap PE (1) PE (2) the acknowledge to PE back the acknowledge to PE AMP retrieves the data from its own Vdisk AMP sends data to BYNET AMP (1) AMP (2) BYNET merges the data BYNET sends merged data to PE V Disk (1) V Disk (2) Result is sent to application from PE - Application sends back acknowledge to PE ID (PI) Desc ID (PI) Desc 3 C 1 A 5 E 4 D BYNET Merge AMP retrieves the data from its own Vdisk AMP sends the data to PE AMP (3) AMP (4) Result is sent to application from PE - Application sends back acknowledge to PE V Disk (3) ID (PI) Desc 2 B 6 F V Disk (4) ID (PI) Desc 7 G 8 H
Data is distributed across all AMPs based on row-hash of PI
Data Distribution and Access Methods

Hashing: Teradata uses hashing for data distribution & access Data row is hashed based on primary index value. Hash maps direct the data row to a particular AMP based on its hash value.
PI
Row Hash
HashMap
Hashing and Indexing

Indexing:
A data value (or values, if the index is compound) from a row acts as an index key to that row Associates the index key with a relative row address that reports the location of the row on disk Stored in order of their index key values and are said to be value-ordered
Hashing:
Index key data value is transformed by a mathematical function to produce an abstract value not related to the original data value in an obvious way Hashed data is assigned to hash buckets that correspond in a 1:1 manner to the relationship a particular hash code with an AMP location There is no obvious correspondence between a hash code and the location of the row it refers to
Teradata does not use indexing. What we refer to as indexes are either row hash values or data tables (join index) Tradeoffs Between Hashing and Indexing:
Hashing is far better suited for the parallel database architecture Hashing provides consistently better performance because rows are always distributed evenly across the AMPs Primary indexes are not stored in an index subtable - directly as part of the row data Primary index columns on frequently used join constraints can be co-located on the same AMP Range queries Retrievals having selection criteria that involve only part of a multicolumn hash key
Hashing
Teradata Database hashing algorithms are proprietary mathematical functions that transform an input data value of any length into a 32-bit value A 32-bit row hash value provides 4.2 billion possible values 16-bit Destination Selection Word Row Hash Row ID First 16 bits - Destination Selection Word - used to define the hash bucket for the hashed row The remaining 16 bits are a remainder from the operation of the hash function on the original input value Uniqueness Value - additional 32-bit system-generated Uniqueness Value to ensure the uniqueness of any RowID. Generated at AMP level There are 65,536 hash buckets, distributed as evenly as possible among the AMPs The BYNET interface board on each AMP maintains a hash map - an index of which hash buckets are assigned to which AMPs Row assignment is performed in a manner that ensures as equal a distribution of table rows as possible among all the AMPs 16-bit Remainder 32-bit Uniqueness Value
Hash-Related Functions
To predict the distribution on AMP for a chosen PI
SELECT HASHAMP (HASHBUCKET (HASHROW (empno))) AS amp_no, COUNT(*) FROM employee GROUP BY 1 ORDER BY 2 DESC; amp_no count(*) 25 3510 29 3468 17 3181
SELECT HASHROW (empno)) AS hash_value, COUNT(*) FROM employee GROUP BY 1 ORDER BY 2 DESC; hash_value count(*) 63524 14 8069 14 4191 1 SELECT(COUNT (*) (FLOAT))/(COUNT(DISTINCT HASHROW(empno))) FROM employee;
To see the selectivity of a PI
If there are no hash collisions, the result ratio is close to 1
Data Distribution Issues

Hash Collisions
Situations in which the row hash value for different rows is identical, making it difficult for a system to discriminate among the hash synonyms when one unique row is requested for retrieval from a set of hash synonyms Systems define 4.2 billion hash values System-generated 32-bit Uniqueness Value to the row hash
Skewing of Hash Bucket Distribution

Caused by wrong selection of PI which is having less unique values It Impacts parallel processing of the data
Data Partitioning
For Join-on columns, a row hash value is recalculated based on new columns involved in the join. If tables are being joined on 3 column (a,b,c), then a row hash value is computed as if (a,b,c) was a PI. If row hash values of the joining columns are not on AMP, then the rows are redistributed across all AMP which is overhead
Teradata Indexes
Indexes are method of storing and retrieving data from Teradata optimally
By default every table would have one index. It is called Primary Index (PI). In addition, if the user is making use of columns other than PI in a query, then he/she can declare Secondary Index (SI) on that column for faster access of data
Types of indexes:
Primary Index Unique and Non-Unique, no Subtable, affects data distribution Secondary Index Unique and Non-Unique, avoids FTS, Subtable, does not affect data distribution, extra overhead of updating Subtable in case insert/delete/update is done on table Join Index Single Table, Multi Table and Aggregate Join Index
Single Table JI allows hashing of rows based on some other column. This column might be used in condition of SQL qualifying the JI for data access Multi-Table JI on columns from more than one table avoids recalculating join values in a query which is frequently used Aggregate JI on columns help queries which perform frequent aggregation on same column(s)
Hash Index:
are file structures that share properties with STJI and SI
Primary Key vs. Primary Index

Teradata uses Primary Index or Secondary Index to enforce a Primary Key
Primary Key Important component of logical data model Used to maintain referential integrity Values can never be changed Cannot be null Does not imply access path Used to distribute and retrieve data Values can be changed Can be null Defines the most common access paths Not required for physical table definition Mandatory for physical table definition Primary Index Not used in logical model
Primary Index (PI)

The Teradata Database distributes tables horizontally across all AMPs on a system. The system assigns rows to AMPs based on the value of their primary index. The determination of which hash bucket, and hence which AMP the row is to be stored on, is made solely on the row hash value of its primary index.
If no explicit definition, a NUPI is created on the 1st column of the table.
Each Teradata Database table must have a primary index.
Restrictions:
Only one PI per table Not more than 64 columns Cannot include columns having BLOB or CLOB data types
No separate physical storage stored in-line with the row in the base table Rows are hash-ordered within the same AMP Types of Primary Index : A PI can be defined over two orthogonal dimensions
Unique (UPI) or non-unique (NUPI) Partitioned (PPI) or non-partitioned (NPPI)
Types of PI
Unique Primary Index Non-unique Primary Index Non-Partitioned Primary Index
Standard Teradata Database primary index Rows are hashed to the appropriate AMPs and stored there in row hash order
Partitioned Primary Index

Rows are hashed to the appropriate AMPs and then assigned to an appropriate partition based on the value of a partitioning expression Rows are stored in row hash order within the same partition Designed to optimize range queries
NPPI & PPI Data Storage within AMPs

NPPI
Create Table
CREATE MULTISET TABLE orders_1, NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL( order_nr VARCHAR(10) NOT NULL, order_cre_dt DATE FORMAT 'YYYY-MM-DD' NOT NULL ) UNIQUE PRIMARY INDEX upi_orders_1 (order_nr); CREATE MULTISET TABLE orders_2,NO FALLBACK, NO BEFORE JOURNAL,NO AFTER JOURNAL( order_nr VARCHAR(10) NOT NULL, order_cre_dt DATE FORMAT 'YYYY-MM-DD' NOT NULL ) UNIQUE PRIMARY INDEX upi_orders_2 (order_nr) PARTITION BY RANGE_N(order_cre_dt BETWEEN DATE '0001-01-01 AND DATE '9999-12-31' EACH INTERVAL '1' MONTH);
PPI
Insert Data
Row Hash A11111 A22222 A33333 A44444 order_nr 10 20 30 40 order_cre_dt 2007-01-11 2007-02-22 2007-01-12 2007-02-23 Row Hash A11111 A22222 A33333 A44444 order_nr 10 20 30 40 order_cre_dt 2007-01-11 2007-02-22 2007-01-12 2007-02-23
Data Distribution within AMPs
Selecting a Primary Index

Uniform Data Distribution:
The more distinct the primary index values, the better Rows having the same primary index value are distributed to the same AMP Parallel processing is more efficient when table rows are distributed evenly across the AMPs The primary index should be chosen on the most frequently used access path Primary index operations must provide the full primary index value Primary index retrievals on a single value are always one-AMP operations
Optimal Data Access:
Volatility:
How often the value of index column is changed. The lesser it is changed the better choice in index it holds
The Trade-Off:
Data Distribution vs. Access Path Normal Access vs. Range Access NPPI vs. PPI
Exercise
Table definition:
Table1 - Order table geo_cd + order_nr defines the uniqueness of a row Table2 - Order item table geo_cd + order_nr + item_nr defines the uniqueness of a row upd_ts on both the tables captures the last modified timestamp of data
Frequent access path:

Atomic selection from Table1 based on geo_cd + order_nr Atomic selection from Table2 based on geo_cd + order_nr + item_nr Frequent range access to select a part of data based on upd_ts in both the tables Header table Table1 is joined frequently with detail table Table2 to get the item level details
Question: What would be the PI options for Table1 and Table2?
Secondary Index
Enhances set selection by specifying access paths other than the primary index path SI storage - System maintains a subtable for each SI. Subtables keep base table SI row hash, column values, and RowID of the base table which contains actual value. There is a overhead in maintaining SI subtable if the table involves INSERT/UPDATE/DELETE operations. Restrictions on Secondary Indexes: A table can have up to 32 secondary, hash and join indexes No more than 64 columns can be included in a secondary index definition Cannot include columns having BLOB or CLOB data types SI Types: Unique Secondary Index (USI) Non-Unique Secondary Index (NUSI) Value-Ordered Secondary Index NUSI and Query Covering NUSI Bit Mapping
USI Subtable Row Layout
USI access is usually a two-AMP operation
The process for locating a row using a USI is as follows: 1. After checking the syntax and lexicon of the query, the Parser looks up the Table ID for the USI subtable that contains the specified USI value 2. The hashing algorithm hashes the USI value 3. The Generator creates an AMP step message containing the USI Table ID, USI row hash value, and USI data value 4. The Dispatcher uses the USI row hash to send the message across the BYNET to AMP 3, which contains the appropriate USI subtable row 5. The file system on AMP 3 locates the appropriate USI subtable using the USI Table ID 6. The file system on AMP 3 uses the USI row ID to locate the appropriate index row in the subtable 7. This operation might require a search through a number of rows with the same row hash value before the row with the desired value is located 8. AMP 3 reads the base table row ID from the USI row and distributes a message containing the base table ID and the row ID for the requested row across the BYNET to AMP 10, which contains the requested base table row 9. The file system uses the row ID to locate the base table row
NUSI - Subtable and access path different from that of USI

NUSI subtables are created and stored locally on the AMPs the corresponding part of the subtable is stored on the same AMP as that of the base table. NUSI Subtable stores RowID of base table that are located on the same AMP NUSI access is always an all-AMPs operation Because NUSI subtable access is not hashed, the subtables must be scanned in order to locate the relevant pointers to base table rows
NUSI Subtable Row Layout
NUSI access is a all-AMP operation
The process used by this example for locating a row using the NUSI value CA is as follows: 1. After checking the syntax and lexicon of the query, the Parser looks up the Table ID for the NUSI subtable that contains the NUSI value CA 2. The hashing algorithm hashes the NUSI value 3. The Generator creates an AMP steps message containing the NUSI Table ID (734596), NUSI row hash value (53), and NUSI data value (CA) and then the Dispatcher distributes it across the BYNET to all AMPs 4. The file system on a receiving AMP locates the appropriate NUSI subtable using the NUSI Table ID 5. The file system on a receiving AMP uses the NUSI row hash value to locate the appropriate index row in the Subtable 6. If there is a NUSI row, its table row ID list is scanned for base table row IDs 7. The file system uses the row IDs to locate the base table rows containing the NUSI value CA
USI and NUSI Examples

CREATE MULTISET TABLE t1,NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL (i INTEGER NOT NULL, j INTEGER NOT NULL, a CHAR(10)) UNIQUE PRIMARY INDEX upi_t1 (i), UNIQUE INDEX usi_t1_01 (j);
i 100 200 300 400 100 200 300 400 j a a a a a
EXPLAIN SELECT * FROM t1 WHERE j = 100; 1) First, we do a two-AMP RETRIEVE step from t1 by way of unique index # 4 "t1.j = 100" with no residual conditions. The estimated time for this step is 0.02 seconds. CREATE MULTISET TABLE t2,NO FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL (i INTEGER NOT NULL, j INTEGER NOT NULL, a CHAR(10)) UNIQUE PRIMARY INDEX upi_t2 (i), INDEX nusi_t2_01 (j);
i 100 200 300 400 100 100 300 400 j a a a a a
EXPLAIN SELECT * FROM t2 WHERE j = 100; 1) We do an all-AMPs RETRIEVE step from t2 by way of an all-rows scan with a condition of ("t2.j = 100") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows. The estimated time for this step is 0.03 seconds.
Value-Ordered NUSI
Value-ordered NUSIs are very efficient for range conditions Because the NUSI rows are sorted by data value, it is possible to search only a portion of the index subtable for a given range of key values Examples:
CREATE INDEX Idx_Date (o_orderdate) ORDER BY VALUES (o_orderdate) ON Orders; SELECT * FROM Orders WHERE o_orderdate BETWEEN 1997-10-01 AND 1997-10-07;
Value-ordered NUSIs have the following limitations:

The sort key is limited to a single numeric or DATE column The sort key column cannot exceed four bytes in length They count as 2 consecutive indexes against the total of 32 non-primary indexes you can define on a base or join index table. One index represents the column list and the other index represents the ordering column
NUSI Bit-Mapping
Bit mapping is a technique used by the Optimizer to effectively link several weakly selective indexes in a way that creates a result that drastically reduces the number of base rows that must be accessed to retrieve the desired data. Teradata only performs NUSI bit mapping when weakly selective indexed conditions are ANDed and their composite selectivity is strong. Optimizer instruct each AMP to construct bit maps to determine which rowIDs their local NUSI rows have in common and then access just those rows, applying the conditions to them exclusively. Example:
Covering Index
An index is said to be covering if all of the columns requested in a query are also available from existing index subtable, making it unnecessary to access the base table rows to complete the query. Example:
Simple Query Considered for Index Covering: CREATE INDEX IdxOrd (o_orderkey, o_date, o_totalprice) ON ORDERS; SELECT o_date, AVG(o_totalprice) FROM ORDERS WHERE o_orderkey >1000 GROUP BY o_date; Aggregate Query Considered for Index Covering: CREATE INDEX IdxEmployee (DeptNo) ON Employee; SELECT DeptNo, COUNT(*) FROM Employee GROUP BY DeptNo;
Secondary Index selection criteria

Consider creating secondary indexes on columns which are highly selective USI is good choice when the table does not have UPI. This helps in avoiding duplicate data check when INSERT/UPDATE Operation is performed on the table While USI retrievals are always very efficient, the efficiency of NUSI retrievals varies greatly depending on their selectivity Consider creating covering indexes wherever possible Consider creating secondary indexes on columns frequently operated on by built-in functions such as aggregates Consider assigning a uniqueness constraint such as PRIMARY KEY, UNIQUE through USI Consider naming secondary indexes whenever possible using a standard naming convention Avoid assigning secondary indexes to frequently updated column sets Avoid creating excessive secondary indexes on a table
Join Index
Join indexes allows denormalization of physical database without affecting the normalization of the physical and logical database models These can serve the purpose of storing aggregated data as being used in Fact table in Dimensional Modeling Unlike traditional indexes, join indexes do not store pointers to their associated base table rows Instead, they are generally used as a fast path final access point that eliminates the need to access and join the base tables they represent. They substitute for rather than point to base table rows. The only exception to this is the case where an index partially covers a query If the index is defined using either the ROWID keyword or the UPI of its base table as one of its columns, then it can be used to join with the base table to cover the query Statistics should be collected on Join Index to have an updated information Join Index provide overhead if the table(s) are updated which are part of its definition. JI would simultaneously be rebuilt User cannot directly select from a Join Index
Types of Join Index

Single Table Join Indexes - allows hashing of rows based on column other than PI. This column might be used in condition of SQL qualifying the JI for data access. This helps in preventing redistribution of underlying base table based on some other column. Multitable Join Indexes - are useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover that part of the query, making it possible to retrieve the requested data from the index rather than accessing its underlying base. Aggregated Join Index allows to define a summary table without violating the normalization of the database schema. This will allow a join index to pre-compute an aggregate value that would otherwise potentially require a full table scan and sort operation.
Examples of different Join Indexes

Single-table Join Index:
CREATE TABLE t1 (x1 INTEGER, y1 INTEGER, z1 INTEGER) PRIMARY INDEX (x1); CREATE TABLE t2 (x2 INTEGER, y2 INTEGER, z2 INTEGER)PRIMARY INDEX (x2); CREATE JOIN INDEX j1 AS SELECT y1, ROWID FROM t1 PRIMARY INDEX (y1);
Multi-table Join Index:

CREATE JOIN INDEX order_join_line AS SELECT (l_orderkey, o_orderdate, o_custkey, o_totalprice), (l_partkey, l_quantity, l_extendedprice, l_shipdate) FROM lineitem LEFT JOIN orders ON l_orderkey = o_orderkey ORDER BY o_orderdate PRIMARY INDEX (l_orderkey);
Aggregated Join Index:

CREATE JOIN INDEX ord_cust_idx AS SELECT c_nationkey, SUM(o_totalprice(FLOAT)) AS price, o_orderdate FROM orders, customer WHERE o_custkey = c_custkey GROUP BY c_nationkey, o_orderdate ORDER BY o_orderdate;
Hash Index
Hash indexes are file structures that share properties with both single-table join indexes and secondary indexes Hash indexes can optionally be specified to be distributed in such a way that their rows are AMP-local with their associated base table rows They can also provide a transparent direct access path to those base table rows to complete a query only partially covered by the index Example:
CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHARACTER(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_orderpriority CHARACTER(21), o_clerk CHARACTER(16), o_shippriority INTEGER, o_comment VARCHAR(79)) UNIQUE PRIMARY INDEX (o_orderkey); CREATE HASH INDEX OrdHIdx_1 (o_orderdate) ON orders BY (o_orderdate) ORDER BY (o_orderdate);
Teradata Joins
Joins available to user: Left Outer Join Right Outer Join Full Outer Join Inner Join Cross Join Self Join Teradata Internal Joins: Product Join Merge Join Nested Join Hash Join Self Join Correlated Join
Product Join and Merge Join

Product Join: Compares every qualifying row from one table to every qualifying row from the other table and saves the rows that match the WHERE condition. Time consuming and hence a costly join. Requires bigger spool spaces. Usually used when
The join condition is not based on equality The join conditions are ORed It is less costly than other join forms
Merge Join: Comparison of rows are done based on hash values of the joining columns. Sorting is performed before comparison. Comparison involves lesser number of rows in comparison to Product Join Different methods to perform comparison of hash values:
Redistribution of rows based on hash values Duplication of rows based on hash values Matching Indexes
Example of Merge Join based on Hash Redistribution

ENum
(UPI,PK)
Name Brown Smith Jones Clay Peters Foster Gray Baker
Dept
(FK)
Dept
(UPI, PK)
Name Delivery Payroll Finance Mfg
1 2 3 4 5 6 7 8
200 310 310 400 150 400 310 310
400 150 200 310
SELECT Name, DeptName, Loc FROM Employee, Department WHERE Employee.DeptNo = Department.DeptNo; Since DeptNo in Employee table is not a UPI, but is a foreign key. The table would be hash redistributed based on the DeptNo Hash Redistribution takes place local to AMP Rows are sorted before applying join condition
Example of Merge Join based on Hash Redistribution

Employee Row Hash Distributed on Employee.ENum (UPI)
6 FOSTER 400 8 BAKER 310
4 CLAY 400 3 JONES 310
1 BROWN 200 7 GRAY 310
5 PETER 150 2 SMITH 310
Employee Row Hash Re-Distributed on Employee.Dept Row Hash

7 3 8 2 GRAY 310 JONES 310 BAKER 310 SMITH 310
5 PETER
150
1 BROWN 200
6 FOSTER 400 4 CLAY 400
J O
Department Row Hash Distributed on Department.Dept (UPI)
I N
150 PAYROLL
310
MFG
200 FINANCE
400 DELIVERY
Example of Merge Join based on Duplication of Table

Department table rows Hash Distributed on Department.Dept (UPI)
150 PAYROLL 310 MFG 200 FINANCE 400 DELIVERY
Employee table rows Hash Distributed on Employee.ENum (UPI)

6 FOSTER 400 8 BAKER 310 4 CLAY 400 3 JONES 310 1 BROWN 200 7 GRAY 310 5 PETER 150 2 SMITH 310
Spool file after duplicating and sorting on Department.Dept Row Hash

150 PAYROLL 200 FINANCE 310 MFG 400 DELIVERY 150 PAYROLL 200 FINANCE 310 MFG 400 DELIVERY 150 PAYROLL 200 FINANCE 310 MFG 400 DELIVERY 150 PAYROLL 200 FINANCE 310 MFG 400 DELIVERY
J O
Spool file after locally copying and sorting on Employee.Dept Row Hash
8 BAKER 310 6 FOSTER 400 3 JONES 310 4 CLAY 400 1 BROWN 200 7 GRAY 310 2 SMITH 310 5 PETER 150
I N
Example of Merge Join using Matching Indexes

If the primary indexes of the joining tables are matching. No Redistribution is required. Example SELECT * FROM Employee, Employee_Phone WHERE Employee.Enum = Employee_Phone.Enum;
Nested Join
A nested join is a join for which the WHERE conditions specify a constant value for a unique index in one table and those conditions also match some column of that single row to the primary or secondary index of the second table. Example SELECT DeptName, Name, YrsExp FROM Employee, Department WHERE Employee.EmpNo = Department.MgrNo AND Department.DeptNo = 100;
Correlated Queries
A correlated query is a subquery whose outer query results are processed a row at a time against the subquery result. SELECT last_name, department_number as DEPTNO, salary_amount FROM employee ee WHERE salary_amount = (SELECT MAX(salary_amount) FROM employee em WHERE em.department_number = ee.department_number); Steps of execution: 1. Read an employee row 2. Get max salary for his/her department from the subquery 3. Compare his/her salary to the max salary 4. If equal, output this row 5. Go to 1
Teradata Database Objects

Tables
Base Tables Global Temporary Tables Volatile Tables Derived Tables
Views Macros Stored Procedures Triggers Join Index Hash Index
Global Temporary Tables

Global Temporary Tables: holds information for intermediate results of queries. Can be accessed by any sessions when materialized but data cannot be shared across sessions Uses spool space to store data Local instance is materialized when data is inserted or an index is defined or collect statistics is issued Optionally emptied at the end of each transaction Materialized tables are valid for session only. Data is lost once the logoff takes place. Stored in database schema CREATE GLOBAL TEMPORARY TABLE gt_deptsal (deptno SMALLINT,avgsal DEC(9,2), maxsal DEC(9,2),minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT) ON COMMIT PRESERVE ROWS; INSERT INTO gt_deptsal SELECT dept ,AVG(sal) ,MAX(sal) ,MIN(sal) ,SUM(sal) ,COUNT(emp) FROM emp GROUP BY 1;
Volatile Tables
Volatile Tables Holds information for intermediate results of queries. Valid for a session only Are not available after a session get a restart during dbs restart No access logging can be done No indexes and referential integrity can be implemented Not stored in database schema CREATE VOLATILE TABLE vt_deptsal, LOG (deptno SMALLINT,avgsal DEC(9,2),maxsal DEC(9,2),minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT) ON COMMIT PRESERVE ROWS; INSERT INTO vt_deptsal SELECT dept ,AVG(sal) ,MAX(sal) ,MIN(sal) ,SUM(sal) ,COUNT(emp)FROM emp GROUP BY 1;
Derived Tables
Derived tables are temporary tables that are created in spool and dropped when the query is completed Example Employees who salary is greater than the company average SELECT last_name, salary_amount, avgsal, FROM (SELECT AVG(salary_amount) FROM employee) my_temp(avgsal), employee WHERE salar_amount > avgsal ORDER BY 2 DESC;
Teradata Macro
A macro consists of one or more statements that are executed in a single transaction Macro is similar to performing a multi statement request. i.e. either all statements in the request complete successfully, or the entire request is aborted All statements can be executed in parallel, making use of the parallel processing architecture of Teradata, thus reducing processing time Macros simplify an operation that is complex or must be performed frequently Can return multi-row answer set Typically called from a trigger Creating a Macro:
CREATE MACRO NewEmpAdd (id INTEGER, name VARCHAR(50)) AS ( INSERT INTO EMPLOYEE values(:Id,:name); );
EXEC NewEmpAdd(25,ABC);
Macro vs. Stored Procedure
Locking in Teradata
Default locking mechanism in Teradata:
READers can simultaneously READ the same database object READer needs to wait while a WRITE operation is in effect on the same database object WRITEer needs to wait while a READ operation is in effect on the same database object Everybody needs to wait while there is an EXCLUSIVE lock on the database object
This definitely affects the transaction concurrency The solution is: ACCESS lock
Down-grade the severity of lock by explicit specification LOCKING t1 FOR ACCESS But at the expense of Uncommitted Dependencies (Dirty Read) chances
So at times, there is a trade-off between transaction concurrency and data integrity

The solution has to be build up at the application level
Locking Severity
The available lock severities, from most restrictive to least restrictive, are as follows:
Compatibility Among Locking Severities
Locking Level
Locking level the database object on which the lock is placed
Default Lock Assignments

The default lock assignments the Lock Manager applies:
Exercise
SL-to-PL and PL-to-Aggregate processing should run in parallel. DML in PL layer due to SL-to-PL processing should not hold Aggregate processing, and vice-versa. Also the data integrity should be maintained. Bad user queries taking more restrictive locks, holding on to other processes. What are the design options? Take realistic application level design considerations:
Maintain a time lag (t) between SL-to-PL and PL-to-Aggregate layers Assumption: no data can remain uncommitted for more than t in PL layer Down-grade the lock to ACCESS while accessing PL layer, read data up to (Max Timestamp t)
Allow user access through views only:

Handle user access through views wherein the lock is down-graded CREATE VIEW v1 AS LOCKING t1 FOR ACCESS SELECT * FROM t1;
Design and optimize queries to have the least restrictive locking level
Do we have these in Teradata

SQL Function UTL_FILE support Table Function External Stored Procedure User Defined Function
Statistics
Statistics on a column or index of a table provides Optimizer about the details of: Total number of rows Total values for the column Unique values for the column Null values of the column Maximum number of rows per value Minimum number of rows per value Minimum value for an interval Maximum value for an interval Number of Intervals Using Statistics values, Optimizer plans for the best plan for the execution of the query Statistics should be updated regularly so that Optimizer has access to the current information about the table Random AMP Samples (RAS) - If statistics are not available, then Teradata Optimizer uses Random AMP Samples which is the information collected from a single AMP about the table columns and the data stored in it.
Collect Statistics
Statistics can be collected on A single column Primary Index Secondary Indexes Primary Index of a Join Index Primary Index of a Hash Index Column which are part of Join Condition in a query
Collect Statistics Example

CREATE TABLE t1 ( i int, j int, k int);
i 100 150 200 250 300 350 400 450 600 650 j 100 150 200 250 400 450 500 550 700 750 100 200 100 200 100 200 200 300 500 600 k
EXPLAIN SEL * FROM t1 WHERE i > 200; 3) We do an all-AMPs RETRIEVE step from t1 by way of an all-rows scan with a condition of ("t1.i > 200") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row. The estimated time for this step is 0.03 seconds.
Collect Statistics Example

COLLECT STATISTICS ON t1 INDEX (i); EXPLAIN SEL * FROM t1 WHERE i > 200; 3) We do an all-AMPs RETRIEVE step from NS.t1 by way of an all-rows scan with a condition of ("NS.t1.i > 200") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 7 rows. The estimated time for this step is 0.03 seconds.
Explain
EXPLAIN <query> Explain describes about the execution plan that Optimizer has prepared for a query. It will tell number of steps involved in the execution of a query Tables/Views to be used in the query Parallel steps Internal Joins to be used Rows estimation for each step Time estimation for each step Explain can be viewed through BTEQ, SQL Assistant and Visual Explain. Visual Explain provides graphics version of the explain steps which is more readable. Using it, explains for two queries can also be compared
Food for Optimizer

Optimizer requires the following information to build a successful plan for the query execution Environmental Cost parameters weights of CPU, disk, and network, disk delays, dbscontrol settings, pde control settings Performance Constraints data transfer rates for each type of storage medium and network interconnection Statistics Information about the table and columns used in the query. It includes total rows, number of unique values, number of rows per unique values, null values, minimum row value, maximum row value. Based on these costs, Optimizer decides how to perform joins, how to pull data from AMPs and how to redistribute it.
Tips for query optimization

Collect statistics on the join fields Check if you have included all the necessary join conditions Isolate the join that is your bottleneck Avoid data transformation in join conditions Avoid DISTINCT. Use GROUP BY Avoid IN and NOT IN. Use EXISTS and NOT-EXISTS Replace Outer Join with UPDATE Replace IN(..,..,..) by UNION for large queries if possible
Client Software
SQL Client / Queryman Based on ODBC Aqua Studio Based on JDBC/ODBC BTEQ Interactive and batch query processor/report generator
Data Load Utilities

MultiLoad utility (MLOAD) loads large quantities of data into unpopulated tables. MultiLoad also supports bulk inserts, updates, and deletions against populated tables FastLoad utility loads unpopulated tables only. This program is similar to BulkLoad except that it runs much faster than BulkLoad and does not support update and delete operations TPump Provides for continuous update of tables; performs insert, update, and delete operations or a combination of these operations on tables using the same source feed FastExport utility Provides parallel export of data Exports large quantities of data from the Teradata RDBMS to a client and is the functional complement of the FastLoad and MultiLoad utilities
References
http://www.teradataforum.com/ncr_pdf.htm http://www.teradata.com

Teradata Overview

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Teradata Overview

Hochgeladen von

Copyright:

Verfügbare Formate

Teradata

Key aspects of this transaction:

This data processing scenario is called On-line Transaction Processing (OLTP)

Data Processing Scenarios

Key aspects of this transaction:

This data processing scenario is called Decision Support System (DSS)

Access patterns are different, and hence

Teradata a brief history

Single Data Store:

Fault Tolerance & Availability:

Data Integrity: Scalability:

Teradata Architecture, the SMP

And then comes the MPP

Data is distributed across all AMPs based on row-hash of PI

Data Distribution and Access Methods

Hashing and Indexing

To predict the distribution on AMP for a chosen PI

To see the selectivity of a PI

If there are no hash collisions, the result ratio is close to 1

Data Distribution Issues

Skewing of Hash Bucket Distribution

Primary Key vs. Primary Index

Primary Index (PI)

Each Teradata Database table must have a primary index.

Partitioned Primary Index

NPPI & PPI Data Storage within AMPs

Data Distribution within AMPs

Selecting a Primary Index

Optimal Data Access:

Frequent access path:

Question: What would be the PI options for Table1 and Table2?

USI Subtable Row Layout

USI access is usually a two-AMP operation

NUSI - Subtable and access path different from that of USI

NUSI Subtable Row Layout

NUSI access is a all-AMP operation

USI and NUSI Examples

Value-ordered NUSIs have the following limitations:

Secondary Index selection criteria

Types of Join Index

Examples of different Join Indexes

Multi-table Join Index:

Aggregated Join Index:

Product Join and Merge Join

Example of Merge Join based on Hash Redistribution

Name Brown Smith Jones Clay Peters Foster Gray Baker

Name Delivery Payroll Finance Mfg

200 310 310 400 150 400 310 310

400 150 200 310

Example of Merge Join based on Hash Redistribution

6 FOSTER 400 8 BAKER 310

4 CLAY 400 3 JONES 310

1 BROWN 200 7 GRAY 310

5 PETER 150 2 SMITH 310

Employee Row Hash Re-Distributed on Employee.Dept Row Hash

6 FOSTER 400 4 CLAY 400

Department Row Hash Distributed on Department.Dept (UPI)

Example of Merge Join based on Duplication of Table

Employee table rows Hash Distributed on Employee.ENum (UPI)

Spool file after duplicating and sorting on Department.Dept Row Hash

Example of Merge Join using Matching Indexes

Teradata Database Objects

Views Macros Stored Procedures Triggers Join Index Hash Index