Introduccion A Teradata

Introduction
to the
Teradata Database
Course 25964
Version 2.1
Course Objectives
After completing this course, you should be able to:
Describe the purpose and function of the Teradata Database.
Navigate relational tables using Primary Keys and Foreign Keys.
List the principal components of the Teradata Database and
describe their functions.
Describe the Teradata Database features that provide fault tolerance.
Describe the Primary Index and the Secondary Index.
Explain data distribution and data access mechanics in the Teradata
Database.
Course Audience
Who Should Attend

This course is designed for anyone who will be working with the
Teradata Database, including programmers, administrators, designers,
support personnel and end users.
Class Format
This course consists of:
One day of classroom instruction
Review exercises following each module
A course handbook in facing-page format
Course Modules
This course consists of:
Module 1:
Teradata Database Overview
Module 2:
Relational Database Concepts
Module 3:
Teradata and the Data Warehouse
Module 4:
Components and Architecture
Module 5:
Databases and Users
Module 6:
Data Distribution and Access
Module 7:
Secondary Indexes and Full-Table Scans
Module 8:
Fault Tolerance and Data Protection
Module 9:
Client Tools and Utilities
Course Appendices
This course contains the following appendices:

Appendix A: Review Questions/Solutions
Appendix B: Born to be Parallel
Appendix C: Third Normal Form
Teradata Database Overview

After completing this module, you should be able to:
Describe the purpose of the Teradata Database product.
Identify supported operating systems.
List activities that Teradata Database Administrators
(DBA) never have to perform.
Describe the advantages of the Teradata Database.
What is the Teradata Database?

Relational Database Management System
Built on a Parallel Architecture
Runs on MP-RAS UNIX,

Microsoft Windows
2000/2003 Server, and SuSE
Linux
Teradata Database Server
LAN
Channel
UNIX
Mainframes
Win
200
3
Win
2000
Clients
Win
XP
Teradata Parallel Architecture

More warehouse data
Parallel-Aware Optimizer
Linear Scalability (10GB to 100+TB)
Single, Administrative View
Hashing provides for automatic data

distribution
Ad hoc queries with ANSI

standard SQL
Teradata Database Server
LAN
Channel
UNIX
Mainframes
Win
200
3
Win
2000
Clients
Win
XP
Teradata Database Advantages

Proven Linear Scalability - increased workload without decreased throughput
Most Concurrent Users - multiple complex queries
Unconditional Parallelism - sorts, aggregations and full-table scans are
performed in parallel
Mature Optimizer - robust and parallel aware, handles complex queries,
multiple joins per query, ad hoc processing
Low TCO - ease of setup and maintenance, robust parallel utilities, no re-orgs,
automatic data distribution, low disk to data ratio, robust expansion utility
High Availability - no single point of failure, fault-tolerant architecture
Single View of the Business - single database server for multiple clients
Teradata Database Manageability

Things Teradata Database Administrators never have to do!
Reorganize data or index space
Pre-allocate table or index space
Physically format partitions or disk space
Pre-prepare data for loading (convert, sort, split, etc.)
Ensure that queries run in parallel
Unload/reload data spaces due to expansion
The Administrator knows that if the data is to be doubled,

the system can be easily expanded to accommodate it.
The amount of work required to create a table which will
contain 100 rows is the same as that to create a table
which will contain 1,000,000,000 rows.
Teradata Database Features

Designed to process large quantities of detail data
Ideal for data warehouse applications
Parallelism makes easy access to very large tables possible
Open architecture - uses industry standard components
Performance increase is linear as components are added
Runs as a database server to client applications
Runs on multiple hardware platforms (SMP) and Teradata hardware
(MPP)
Module 1 Review Questions

1. Name three operating systems that the Teradata Database runs on:
2. Which of the following describes the scalability of the Teradata Database?
a. Linear b. Parallel c. Exponential d. Shared
3. Which feature allows the Teradata Database to process large amounts of data
quickly?
a. High availability software and hardware components
b. Parallelism
c. Proven scalability
d. High performance servers from Intel
4.
The Teradata Database is primarily a:
a. Server
b. Client
5.
Which two tasks do Teradata Database Administrators never have to do? (Choose
two.)
a. Reorganize data
b. Select primary indexes
c. Restart the system
d. Pre-prepare data for loading
Relational Database Concepts

Define the terms associated with relational theory.
Discuss the function of the Primary Key.
Discuss the function of Foreign Keys.
List the advantages of a relational database.
What is a Database?
A database is a collection of permanently stored data that is:
Logically related - the data relates to other data.
Shared - many users may access the data.
Protected - access to data is controlled.
Managed - the data has integrity and value.
Logical/Relational Modeling
The Logical Model
Should be designed without regard to usage
Accommodates a wide variety of front end tools
Allows the database to be created more quickly
Should be the same regardless of data volume
Data is organized according to what it represents (real world business
data in table (relational) form)
Includes all the data definitions within the scope of the application or
enterprise
Is generic the logical model is the template for physical
implementation on any RDBMS platform
Normalization
Process of reducing a complex data structure into a simple, stable one
Involves removing redundant attributes, keys, and relationships from the
conceptual data model
Relational Databases
Relational Databases are founded on Set Theory and based on the Relational Model.
A Relational Database consists of a collection of logically related tables.
A table is a two dimensional representation of data consisting of rows and columns.
Column
EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER
Row
1006
1008
1005
1004
1007
1003
JOB LAST
CODE NAME
1019
1019
0801
1003
301
301
403
401
312101
312102
431100
412101
0801
401
411100
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
The employee table has:

Nine columns of data
Six rows of data - one per employee
Only one row format for the entire table
Missing data values represented by nulls
Column and row order are arbitrary
Primary Keys
Primary Key (PK) values uniquely identify each row in a table.
EMPLOYEE
MANAGER
NUMBER
NUMBER NUMBER
JOB LAST
CODE NAME
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
PK
1006
1008
1005
1004
1007
1003
1019
1019
0801
1003
301
301
403
401
312101
312102
431100
412101
0801
401
411100
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
Primary Key Rules

A Primary Key is required for every table.
Only one Primary Key is allowed in a table.
Primary Keys may consist of one or more columns.
Primary Keys cannot have duplicate values (ND).
Primary Keys cannot be null (NN).
Primary Keys are considered non-changing values (NC).
Foreign Keys
EMPLOYEE (partial listing)
MANAGER
NUMBER
NUMBER NUMBER
JOB LAST
CODE NAME
PK
FK
FK
1006
1008
1005
1004
1007
1003
1019
1019
0801
1003
301
301
403
401
312101
312102
431100
412101
0801
401
411100
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
FK
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
Foreign Key (FK) values

model relationships.
DEPARTMENT
Foreign Keys (FK) are optional.

A table may have more than one FK.
A FK may consist of more than one
column.
FK values may be duplicated.
FK values may be null.
FK values may be changed.
FK values must exist elsewhere as a
PK (i.e. have referential integrity).
DEPARTMENT DEPARTMENT
NUMBER
NAME
PK
501
301
302
403
402
401
201
marketing sales
research and development
product planning
education
software support
customer support
technical operations
MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER
80050000
46560000
22600000
93200000
30800000
98230000
29380000
FK
1017
1019
1016
1005
1011
1003
1025
Answering Questions with a

Relational Database
EMPLOYEE
MANAGER
NUMBER
NUMBER NUMBER
PK
1006
1008
1005
1004
1007
1003
FK
1019
1019
0801
1003
FK
301
301
403
401
0801
401
JOB LAST
CODE NAME
FK
312101
312102
431100
412101
432101
411100
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
DEPARTMENT
DEPARTMENT
NUMBER
DEPARTMENT
NAME
MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER
PK
501
301
302
403
402
401
201
FK
marketing sales
research and development
product planning
education
software support
customer support
technical operations
80050000
46560000
22600000
93200000
30800000
98230000
29380000
1017
1019
1016
1005
1011
1003
1025
1. Name the department in

which James Trader works.
2. Who manages the
Education Department?
3. Identify by name an
employee who works
for James Trader.
4. James Trader manages
which department?
Relational Advantages
Advantages of a Relational Database compared to other database
methodologies include:
More flexible than other types
Allows businesses to quickly respond to changing conditions
Being data-driven vs. application driven
Modeling the business, not the processes
Makes applications easier to build because the data does more
of the work
Supporting trend toward end-user computing
Being easy to understand
No need to know the access path
Solidly founded in set theory

Match each term with its definition:
__b_1. Database
__e_2. Table
__g_3. Relational database
___4. Primary Key
__d_5. Null
C 6. Foreign Key
_a__7. Row
a. A set of columns that uniquely

identify a row.
b. A set of logically related tables.
c. One or more columns that exist
as a PK value in another table in
the database.
d. The absence of a value or an
unknown value.
e. A two-dimensional array of rows
and columns.
f. A collection of permanently
stored data.
g. One instance of all columns in a
table.
Teradata and the Data Warehouse

Identify the different types of enterprise data processing.
Define a data warehouse and an active data warehouse.
Define the different types of data marts.
Explain the advantages of detail data over summary data.
Evolution of Data Processing

A transaction
is a logical
unit of work.
T
R
A
D
I
T
I
O
N
A
L
T
O
D
A
Y
Type
OLTP
Examples
Update a checking account to reflect
a deposit.
Number of Rows
Accessed
Response
Time
Small
Seconds
Large
Seconds or
minutes
Debit transaction takes place against

current balance to reflect amount of
money withdrawn at ATM.
DSS
How many child size blue jeans were

sold across all of our Eastern stores
in the month of March?
What were the monthly sales of
shoes for retailer X?
OLAP
Show the top ten selling items across Large of detail rows or Seconds or
all stores for 1997.
moderate of summary minutes
rows
Show a comparison of sales from this
week to last week.
Data
Mining
Which customers are most likely to

leave?
Which customers are most likely to
respond to this promotion?
Phase 1: Minutes
Moderate to large
detailed historical rows or hours
Phase 2:
Seconds or less
The Advantage of Using Detail Data

STORE ITEM DAY
STORE
ITEM
NUMBER NUMBER
PK
F
1K
1
1
1
1
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
..
..
2
.
.
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
..
..
2
2
.
.
5
2
5
2
5
2
5
5
5
2
5
2
5
2
5
2
5
2
5
5
2
5
2
5
2
DATE
QUANTITY
SOLD
June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
.4.
.June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
.4.
.
June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
4
110
12
12
6
14
7
10
4
34
2
41
4
42
0
29
6
16
7
16
4
115
7
10
89
.9.
.5
4
0
3
7
2
2
3
0
14
7
12
4
16
6
49
4
6
3
8
2
4
3
4
4
0
.5.
.
1
13
1
4
1
0
1
2
3
5
6
2
8
5
1
7
2
8
68
1
2
4
11
2
QUESTION: How effective was the national advertisement

for jeans that ran June 6 through June 8?
STORE ITEM DAY

STORE
ITEM
WEE
NUMBER NUMBER ENDING
K
QUANTITY
SOLD
PK
..
.
1
1
..
.
2
2
..
.
5
5
..
.
F
K
..
.
2
2
..
.
2
2
..
.
2
2
..
.
..
.
June0
June1
7
.4.
.
June
June1
07
.4.
.
June0
June1
7
.4.
.
..
.. .
136
.
137
3
.6.
.. .
45
.
44
6
.1.
.
16
18
1
.6.
.
DETAIL DATA
vs.
SUMMARY DATA
Detail data gives
a more accurate
picture.
Correct business
decisions result.
Data Warehouse Usage Evolution

STAGE 1
STAGE 2
STAGE 3
STAGE 4
STAGE 5
REPORTING
WHAT
happened?
ANALYZING
WHY
did it happen?
PREDICTING
WHAT
will happen?
OPERATIONALIZING
WHAT
is happening?
ACTIVE WAREHOUSING
MAKING
it happen!
Primarily Batch
Pre-Defined
Reports
Increase in
Ad Hoc
Queries
Analytical
Modeling
Grows
Batch
Ad
Hoc
Analytics
Continuous Update &

Time Sensitive Queries
Become Important
Continuous Update
Short Queries
Event Based
Triggering
Takes Hold
Event-Based
Triggering
Active Data Warehousing

Performance - response time within seconds
Scalability
large amounts of detailed data
mixed workloads (both tactical and strategic queries) for
mission critical applications
concurrent users
Availability and Reliability - 7 x 24
Data Freshness - accurate, up to the minute data, including
access to operational data store level information
The Data Warehouse

A central, enterprise-wide database that contains information extracted from
operational systems.
Based on enterprise-wide
model
Can begin small but
may grow large rapidly
Populated by
extraction/loading
of data from
operational systems
Responds to
end-user what if queries
Minimizes data movement/
synchronization
Provides a Single View of
the business
Accounts
Receivable
Inventory
Teradata Database
Cognos
Microstrategy
Operational
Systems
POS
Data Warehouse
Biz
Objects
Access Tools
End Users
Data Marts
A data mart is a special purpose subset of enterprise data for a particular function
or application. It may contain detail or summary data or both.
Data mart types:
Independent - created directly from operational systems to a separate physical
data store
Logical - exists as a subset of existing data warehouse via Views
Dependent - created from data warehouse to a separate physical data store
Operational Systems
Independent
Data Mart
Dependent
Data Mart
Data
Warehouse
Logical
Data
Mart

1. Name three types of enterprise data processing and give examples.
2. What is the difference between a data warehouse and a data mart?
3. Which type of data mart gets its data directly from the data warehouse?
4. Name the two types of queries that an Active Data Warehouse supports
for mission critical applications.
5. Match the data warehouse usage evolution stage to its description:
___ Stage 1
___ Stage 2
___ Stage 3
___ Stage 4
___ Stage 5
a. Continuous updates and time sensitive queries

b. Event-based triggering takes hold
c. Analytical modeling
d. Increase in ad hoc queries
e. Primarily batch
Components and Architecture

Describe a node.
List the major components of the Teradata Database
architecture and their functions.
Describe the overall Teradata Database parallel
architecture.
Explain how the Teradata Database functions with
channel and network attached clients.
What is a Node?
ChannelAttached
Systems
LAN
CHANNEL
W/S
TERADATA
GATEWAY
CHANNEL DRIVER
UNIX
WIN PDE
2K
PC
PE
VPROC
PE
VPROC
BYNET DRIVER
Linux
AMP
VPROC
VDISK
AMP
VPROC
AMP
VPROC
AMP
VPROC
VDISK
VDISK
VDISK
Teradata software, gateway software and channel-driver software run as processes

Parsing Engines (PE) and Access Module Processors (AMP) are Virtual Processors
(VPROC) which run under control of Parallel Database Extensions (PDE)
Each AMP is associated with a Virtual Disk (VDISK)
A single node is called a Symmetric Multi-Processor (SMP)
All AMPs and PEs communicate via the BYNET
Major Components of the Teradata Database
Answer Set Response
SQL Request
Node
Parsing Engine
BYNET
AMP
AMP
AMP
AMP
VDISK
VDISK
VDISK
VDISK
The Parsing Engine (PE)

Answer Set Response
SQL Request
The Parsing Engine is responsible for:

Managing individual sessions
(up to 120 sessions per PE)
Parsing and optimizing your SQL
requests
Building query plans with the
parallel-aware, cost-based,
intelligent Optimizer
Dispatching the optimized plan to
the AMPs
EBCDIC/ASCII input conversion
(if necessary)
Sending the answer set response
back to the requesting client
Parsing Engine
Session
Control
Parsing
System Configuration
Optimizing
Dispatching
Data Demographics
BYNET
AMP
VDISK
AMP
AMP
AMP
VDISK
VDISK
VDISK
The BYNET
Answer Set Response
SQL Request
Parsing
Engine
BYNET
AMP
AMP
AMP
AMP
VDISK
VDISK
VDISK
VDISK
Dual redundant, fault-tolerant, bi-directional interconnect network that enables:

Automatic load balancing of message traffic
Automatic reconfiguration after fault detection
Scalable bandwidth as nodes are added
The BYNET connects and communicates with all the AMPs on the system:
Between nodes, the BYNET hardware carries broadcast and point-to-point
communications
On a node, BYNET software and PDE together control which AMPs receive a
multicast communication
The Access Module Processor (AMP)

Answer Set Response
SQL Request
The AMP is responsible for:

Storing rows to and retrieving
rows from its VDISK
Lock management
Sorting rows and aggregating
columns
Join processing
Output conversion and
formatting (ASCII, EBCDIC)
Creating answer sets for clients
Disk space management and
accounting
Special utility protocols
Recovery processing
Parsing
Engine
BYNET
AMP
VDISK
AMP
AMP
AMP
VDISK
VDISK
VDISK
AMPs perform
all tasks in parallel
The MPP System

The BYNET (both software and hardware) connects two or more SMP
Nodes to create a Massively Parallel Processing (MPP) system.
The Teradata Database is linearly expandable by adding nodes.
SMC
SMC
BYNET
NODE
SMC
SMC
BYNET
Disk Array
NODE
NODE
NODE
NODE
NODE
Disk Array
NODE
Disk Array
NODE
Disk Array
Node Cabinet
Array Cabinet
Node Cabinet
Array Cabinet
Teradata Database Software

Channel-Attached System
Network-Attached System
Client
Application
Client
Application
CLI
CLI
TDP
T
P
A
Channel
Teradata
Database Node
O
S
Parsing
Engine
MOSI
Parsing
Engine
BYNET
AMP
AMP
VDISK
VDISK
MTDP
Teradata Gateway
Channel Driver
P
D
E
LAN
AMP
AMP
VDISK
VDISK
Channel-Attached Client Software

Channel-Attached Host
Connection made via

HCA, Bus & Tag or ESCON cables,
Channel Driver, and PE
Client
Application
CLI
TDP
Channel
Host Channel
Adapters
CLI (Call-Level Interface)

Request and response control
Buffer allocation and initialization
Lowest level interface to the
Teradata Database
Library of routines for
blocking/unblocking requests and
responses to/from RDBMS
Performs logon and logoff
functions
Channel
Driver
Parsing
Engine
TDP (Teradata Director Program)

Manages session traffic between CLI and the
Teradata Database
Session balancing across multiple PEs
Failure notification (application failure, Teradata
Database restart)
Logging, verification, recovery, restart, security
Network-Attached Client Software

Network-Attached Host
ODBC
Call-level interface
Teradata Database ODBC
driver is used to connect
applications with the
Teradata Database
MTDP (Micro Teradata Director
Program)
Performs many TDP
functions including session
management but not session
balancing across PEs
MOSI (Micro Operating System
Interface)
Provides operating system
and network protocol
independent interface
Client
Application
CLI
ODBC
MTDP
LAN
MOSI
Gateway
Connection made via

Ethernet or LAN card, cables,
Teradata Gateway, and PE.
2 LAN connections for redundancy.
Parsing
Engine

1. Name the three major Teradata Database components and state their purpose.
2. Why are there two LANs in a Teradata system?

3. How many sessions can a PE support?
4. What is the communications layer in a Teradata system?
Databases and Users

Define a Database and a User.
Define Perm Space and its purpose.
Define Spool Space and its purpose.
Define Temp Space and its purpose.
Describe the hierarchy of objects in the Teradata Database.
Databases and Users Defined

Databases and Users are the repositories for objects:
Tables - require Perm Space
Views - do not require Perm Space
Macros - do not require Perm Space
Triggers - do not require Perm Space
Stored Procedures - require Perm Space
Space limits are specified for each database and for each user:
Perm Space - maximum amount of space available for permanent tables
Spool Space - maximum amount of work space available for request processing
Temp Space - maximum amount of space available for global temporary tables
A database is created with the CREATE DATABASE command.

A user is created with the CREATE USER command.
The only difference between a database and a user is the user has a
password and may logon to the system.
A database or user with no perm space may not contain permanent tables
but may contain views and macros.
Teradata Database Space Management

DBC 10GB
TDPUSER PUBLIC All
Default
SYSADMIN 5GB
SYSTEMFE
SYSDBA
Current Permanent Space

Maximum Permanent Space
No = No Permanent Space
Box
Database 1
Database 2
10GB
10GB
5GB
CRASHDUMPS 10GB
70 GB
-10 GB
-10 GB
-30 GB
20 GB
User D
Database 3
User A 30GB
10 GB
User B
10GB
User C
10GB
A new database or user must be created from an existing database or user.

All Perm Space limits are subtracted from the owner.
Perm Space is a zero-sum game the total of all Perm Space limits must equal the
total amount of disk space available.
Perm Space currently not being used is available for Spool Space or Temp Space.

Indicate whether a statement is True or False.
____ 1. A database will always have tables.
____ 2. A user will always have a password.
____ 3. A user creating a subordinate user who needs tables must give up some
of its Perm Space.
____ 4. The sum of all user and database Perm Space will equal the total
space on the system.
____ 5. Deleting a view from a database reclaims Perm Space for the database.
Data Distribution and Access

Explain the purpose of the Primary Index.
Distinguish between Primary Index and Primary Key.
State the reasons for selecting a Unique Primary Index
vs. a Non-Unique Primary Index.
Describe how the Teradata Database distributes the
rows in a table.
How Does the Teradata Database

Distribute Rows?
The Teradata Database uses a hashing algorithm to randomly

distribute table rows across the AMPs.
The Primary Index choice determines whether the rows of a table will
be evenly or unevenly distributed across the AMPs.
Evenly distributed table rows result in evenly distributed workloads.
Each AMP is responsible for its subset of the rows of each table.
The rows are not placed in any particular order.
The benefits of unordered rows include:
No maintenance needed to preserve order.
The order is independent of any query being submitted.
The benefits of hashed distribution include:
The distribution is the same regardless of data volume.
The distribution is based on row content, not data demographics.
Primary Key (PK) vs. Primary Index (PI)

The PK is a relational modeling convention which uniquely identifies each row.
The PI is a Teradata convention which determines row distribution and access.
A well designed database will have tables where the PI is the same as the PK as well
as tables where the PI is defined on columns different from the PK.
Join performance and known access paths might dictate a PI that is different from the
PK.
Primary Key (PK)
Primary Index (PI)
Logical concept of data modeling
Mechanism for row distribution and access
Teradata does not need the PK defined
A table must have one Primary Index
No limit on the number of columns
May be from 1 to 64 columns
Documented in the logical data model
Defined in the CREATE TABLE statement
Value must be unique
Value may be unique or non-unique
Uniquely identifies each row
Used to place a row on an AMP
Value should not change
Value may be changed (Updated)
May not be NULL
May be NULL
Does not imply access path
Defines the most efficient access path
Chosen for logical correctness
Chosen for physical performance
Primary Indexes
The physical mechanism used to assign a row to an AMP
A table must have a Primary Index
The Primary Index cannot be changed
UPI
If the index choice of column(s) is unique, we call this a

UPI (Unique Primary Index).
A UPI choice will result in even distribution of the rows of
the table across all AMPs.
Reasons to Choose a UPI: UPIs guarantee even data distribution, eliminate

duplicate row checking, and are always a one-AMP operation.
NUPI
If the index choice of column(s) isnt unique, we call this a NUPI

(Non-Unique Primary Index).
A NUPI choice will result in even distribution of the rows of the
table proportional to the degree of uniqueness of the index.
NUPIs can cause skewed data.
Why would you choose an Index that is different from the Primary Key?
Join performance
Known access paths
Defining the Primary Index
The Primary Index (PI) is defined at table creation.

Every table must have one Primary Index.
The Primary Index may consist of 1 to 64 columns.
The Primary Index of a table may not be changed.
The Primary Index is the mechanism used to assign a row to an AMP.
The Primary Index may be Unique (UPI) or Non-Unique (NUPI).
Unique Primary Indexes result in even row distribution and eliminate
duplicate row checking.
Non-Unique Primary Indexes result in even row distribution proportional
to the number of duplicate values. This may cause skewed distribution.
UPI Table
NUPI Table
CREATE TABLE Table1

( Col1 INTEGER
,Col2 INTEGER
,Col3 INTEGER )
UNIQUE PRIMARY INDEX (Col1);
CREATE TABLE Table2

( Col1 INTEGER
,Col2 INTEGER
,Col3 INTEGER )
PRIMARY INDEX (Col2);
Row Distribution via Hashing

Index value
Hashing Algorithm
Row
HASH BUCKET#
Hash
Hash Map
AMP #
{
{
{
{
A Row's Primary Index value is passed into the Hashing

Algorithm.
The Hashing Algorithm is designed to ensure even
distribution of unique values across all AMPs.
The Hashing Algorithm outputs a 32-bit Row-Hash value.

The first 16-bits (the Hash Bucket Number) are used as a
pointer into the Hash Map.
Hash values are calculated using the hashing algorithm.
The Hash Map is uniquely configured for each system.
The Hash Map is an array which associates the DSW
with a specific AMP.
Two systems with the same number of AMPs will
have the same Hash Map.
Changing the number of AMPs in a system requires a
change to the Hash Map.
Unique Primary Index (UPI) Access

CUSTOMER table
SELECT *
FROM Customer
WHERE Cust = 45;
PE
Cust
37
98
74
95
27
56
45
84
49
51
31
62
12
77
72
40
CREATE TABLE Customer

( Cust INTEGER
,Name CHAR(10)
,Phone CHAR(8) )
UNIQUE PRIMARY INDEX (Cust);
Hashing
Algorithm
BYNET
Base Table
Cust
UP
I49
45
56
51
Name
Smith
Adams
Smith
Marsh
Phone
111-6666
444-6666
555-7777
888-2222
AMP 2
AMP 3
AMP 4
Base Table
Base Table
Base Table
Cust
UPI
62
84
95
77
Phone
PK
UPI
UPI = 45
AMP 1
Name
Name
Black
Rice
Peters
Jones
Phone
444-5555
666-5555
555-7777
777-6666
Cust
UPI
12
74
98
31
Name
Young
Smith
Brown
Adams
Phone
777-4444
555-6666
333-9999
111-2222
Cust
Name
Phone
UPI
27
72
40
37
Jones
Adams
Smith
White
222-8888
666-7777
222-3333
555-4444
White
Brown
Smith
Peters
Jones
Smith
Adams
Rice
Smith
Marsh
Adams
Black
Young
Jones
Adams
Smith
555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
666-5555
111-6666
888-2222
111-2222
444-5555
777-4444
777-6666
666-7777
222-3333
Single AMP
access with 0 to
1 rows returned.
Non-Unique Primary Index (NUPI) Access

CUSTOMER table
SELECT *
FROM Customer
WHERE Phone = '555-7777';
PE
Cust
Hashing
Algorithm
NUPI
37
98
74
95
27
56
45
84
49
51
31
62
12
77
72
40
BYNET
AMP 1
Base Table
Cust Name
Phone
NUPI
37 White 555-4444
666-5555
84 Rice
31 Adams 111-2222
40 Smith 222-3333
AMP 2
AMP 3
AMP 4
Base Table
Base Table
Base Table
Cust Name
45
98
72
74
Adams
Brown
Adams
Smith
Phone
NUPI
444-6666
333-9999
666-7777
555-6666
Phone
PK
CREATE TABLE Customer

( Cust INTEGER
,Name CHAR(10)
,Phone CHAR(8) )
PRIMARY INDEX (Phone);
PI = 555-7777
Name
Cust Name
Phone
NUPI
49 Smith 111-6666
12 Young 777-4444
27 Jones 222-8888
62 Black 444-5555
Cust Name
77
95
56
51
Jones
Peters
Smith
Marsh
Phone
NUPI
777-6666
555-7777
555-7777
888-2222
White
Brown
Smith
Peters
Jones
Smith
Adams
Rice
Smith
Marsh
Adams
Black
Young
Jones
Adams
Smith
555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
666-5555
111-6666
888-2222
111-2222
444-5555
777-4444
777-6666
666-7777
222-3333
Single AMP
access with 0 to
n rows returned.
UPI Row Distribution

Order
Order
Number
PK
Customer
Number
Order
Date
Order
Status
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
UPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
7415 1
4/09
4/13
C
C
AMP 4
AMP 3
AMP 2
AMP 1
7202 2
2
3
1
1
2
1
3
1
2
7325 2
4/13
7103 1
4/10
7402 3
4/16
Order_Number values are unique

(UPI).
The rows will distribute evenly
across the AMPs.
7188 1
4/13
7225 2
4/15
7324 3
4/13
7384 1
4/12
NUPI Row Distribution

Order
Order
Number
PK
Customer
Number
Order
Date
Order
Status
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
AMP 2
AMP 1
AMP 3
7325 2
4/13
7384 1
4/12
7202 2
4/09
7103 1
4/10
7225 2
4/15
7415 1
4/13
7188 1
4/13
Customer_Number values are

non-unique (NUPI).
Rows with the same PI value
distribute to the same AMP
causing skewed row distribution.
AMP 4
7402 3
4/16
7324 3
4/13
Highly Non-Unique NUPI Row Distribution

Order
Order
Number
PK
Customer
Number
Order
Date
Order_Status values are highly

non-unique (NUPI).
Order
Status
Only two values exist. The rows

will be distributed to two AMPs.
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
AMP 1
AMP 2
7402 3
4/16
7202 2
7225 2
7415 1
7188 1
7384 1
4/09
4/15
4/13
4/13
4/12
C
C
C
C
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
This table will not perform well in

parallel operations.
Highly non-unique columns are
poor PI choices.
The degree of uniqueness is
critical to efficiency.
AMP 4
AMP 3
7103 1
7324 3
4/10
4/13
7325 2
4/13
Partitioned Primary Index (PPI)

The Orders table defined with a
Non-Partitioned Primary Index
(NPPI) on Order_Number (O_#)
Partitioned Primary Indexes:

Improve performance on
range constraint queries
Use partition elimination
to reduce the number of
rows accessed
The Orders table defined with a

Primary Index on Order_Number
(O_#) Partitioned By Order_Date
(O_Date) (PPI)

1. Indicate whether the following apply to: UPI, NUPI, Either, or Neither
__________A. Specified in CREATE TABLE statement.
__________B. Provides even row distribution via the hashing algorithm.
__________C. May be up to 64 columns.
__________D. Always a one-AMP operation.
__________E. Access will return a single row.
__________F. Used to assign a row to a specific AMP.
__________G. Allows NULL.
__________H. Value cannot be changed.
__________I.
Required on every table.
__________J. Permits duplicate values.

__________K. Can never be the Primary Key.
2. Why is the choice of Primary Index important? ____________________________
3. True/False: Tables should be assigned a PI at creation._____________________
Secondary Indexes and Full-Table Scans

Define Secondary Indexes.
List the various types of secondary indexes.
Describe the operation of a full-table scan in a
parallel environment.
Secondary Indexes
A secondary index is an alternate path to the rows of a table.
A table may have from 0 to 32 secondary indexes.
A secondary index:
does not affect table row distribution.
is chosen to improve access performance.
may reference from 1 to 64 table columns.
may be defined at table creation.
may be defined after the table is created.
may be dropped at any time.
uses a sub-table which utilizes Perm Space.
may impact table maintenance performance (row inserts,
row updates and/or row deletes).
Defining a Secondary Index

Unique Secondary Index (USI)
A Unique Secondary Index requires unique column values in each row.
Access to a referenced value requires 2 AMPs (serial operation) and returns 0 or 1 rows.
SQL to create:
CREATE UNIQUE INDEX (social_security) on Employee;
Non-Unique Secondary Index (NUSI)

A Non-Unique Secondary Index (NUSI) allows duplicate column values in the rows.
Access to a referenced value requires all AMPs (parallel operation) and returns 0 to n
rows.
SQL to create:
CREATE INDEX (last_name) on Employee;

CREATE INDEX (last_name, first_name) on Employee;
Other Types of Secondary Indexes

Join Index
Define a pre-join table on frequently joined columns (with optional aggregation) without
denormalizing the database.
Create a full or partial replication of a base table with a PI on a FK column to
facilitate joins of large tables by hashing their rows to the same AMP.
Define a summary table without denormalizing the database.
Can be defined on one or several tables.
Sparse Index
Any join index, whether simple or aggregate, multi-table or single-table, can be sparse.
Uses a constant expression in the WHERE clause of its definition to narrowly filter its row
population.
Hash Index
Used for the same purposes as single-table join indexes.
Create a full or partial replication of a base table with a PI on a FK column to
facilitate joins of large tables by hashing them to the same AMP.
Can be defined on one table only.
Value-Ordered NUSI
Very efficient for range conditions and conditions with an inequality on the secondary index
column set.
Primary Index vs. Secondary Index

Index Feature
Primary Index
Secondary Index
Yes
No
Number per table
0 to 32
Maximum number of columns
64
64
Unique or Non-Unique
Both
Both
Affects row distribution
Yes
No
Created/Dropped dynamically
No
Yes
Improves row access
Yes
Yes
Separate Sub-Table
No
Yes
Extra Processing Overhead
No
Yes
Required Index
Full-Table Scans
CUSTOMER
Cust_ID
Cust_Name
Cust_Phone
USI
NUSI
NUPI
Every data block of the table is read once

All AMPs scan their portion of the table in parallel.
The Primary Index choice will affect parallel scan performance
(UPI is even; NUPI is potentially skewed).
Full-table scans typically occur when:
the index columns are not used in the query
a non-equality or range test is specified for the index columns
SQL requests that result in a full-table scan:
SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name <> 'Davis';
SELECT * FROM Customer WHERE Cust_ID > 1000;

For each type of access, fill each box with either Yes, No, or the appropriate number.
USI Access
NUSI Access
Full-Table Scan
Number of AMPs accessed?

Number of rows returned?
A parallel operation?
Uses separate sub-table?
Reads all data blocks?
Indicate whether each statement is True or False.

1. A USI can be used to enforce uniqueness on a PK column.
2. You can create or drop USIs and NUSIs at any time.
3. A full-table scan is not efficient because it accesses rows multiple times.
4. A full-table scan can occur when there is a range of values specified
for columns in a primary index.
Fault Tolerance and Data Protection

Explain how locks protect data integrity.
List the types and levels of locking provided by the Teradata
Database.
Explain the concept of Fallback tables.
Describe the purpose and function of the Down AMP
Recovery Journal and the Transient Journal.
List the utilities available for archive and recovery.
Locks
Exclusiveprevents any other type of concurrent access

Types
of Locks Writeprevents other Read, Write, Exclusive locks
Readprevents Write and Exclusive locks
Accessprevents Exclusive locks only
Locks may be applied at the following levels:

Levels
Of Locks
SQL
Databaseapplies to all tables/views in the database

Table/Viewapplies to all rows in the table/view
Row Hashapplies to all rows with same row hash
Lock requests are based on the SQL request:
SELECTrequests a Read lock
UPDATErequests a Write lock
CREATE TABLErequests an Exclusive lock
Lock requests may be upgraded or downgraded:
LOCKING TABLE Table1 FOR ACCESS . . .
LOCKING TABLE Table1 FOR EXCLUSIVE . . .
Transient Journal
Transient Journal
Maintains a copy on each AMP of before images of all rows affected.

Provides rollback of changed rows in the event of TXN failure.
Activities are automatic and transparent to user.
Before images are reapplied to table if TXN fails.
Before images are discarded upon TXN completion.
Successful TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded
(Add $100 to checking)
UPDATE Row B
Before image Row B recorded
(Subtract $100 from savings)
END TRANSACTION
Discard before images
Failed TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded
UPDATE Row B
Before image Row B recorded
(Failure occurs)
(Rollback occurs) Reapply before images
(Terminate TXN) Discard before images
RAID Protection
RAID 1 (Mirroring)
Primary
Each physical disk in the array has an exact copy in the same
array.
The array controller can read from either disk and write to both.
When one disk of the pair fails, there is no change in performance.
Mirroring reduces available disk space by 50%.
Array controller reconstructs failed disks quickly.
RAID 5 (Parity)
Block 0
Data and parity striped across rank of 4 disks.
Parity
If a disk fails, any missing block may be
Block 6
reconstructed using the other three disks.
Parity reduces available disk space by 25% in a 4-disk rank.
Reconstruction of failed disks takes longer than RAID 1.
Summary
RAID-1 - Good performance with disk failures

Higher cost in terms of disk space
RAID-5 - Reduced performance with disk
failures
Lower cost in terms of disk space
Block 1
Block 3
Parity
Block 2
Block 4
Block 7
Mirror
Parity
Block 5
Block 8
Fallback
A Fallback table is
fully available in the
event of an
unavailable AMP.
PE
PE
BYNET
AMP 1
A Fallback row is a
copy of a primary
row stored on a
different AMP in the
same CLUSTER of
AMPs.
2
3
AMP 2
6
8
11
5
AMP 3
3
2
12
5
1
AMP 4
11
Primary
rows
8
6
1
12
Benefits of
Fallback:
May be specified at the table or database level

Permits access to table data during AMP off-line period
Adds a level of data protection beyond disk array RAID 1 & 5
Highest level of data protection is RAID 1 and Fallback
Automatically restores data changed during AMP off-line
Critical for high availability applications
Costs of
Fallback:
Twice the disk space for table storage is needed

Twice the I/O for INSERTs, UPDATEs and DELETEs is needed
Fallback
rows
Recovery Journal for Down AMPs

Recovery Journal is:
Automatically activated when an AMP is taken off-line

Maintained by other AMPs in the cluster
Totally transparent to users of the system
While AMP is off-line:
Journal is active
Table updates continue as normal
Journal logs Row-IDs of changed rows for down-AMP
When AMP is
back on-line:
Restores rows on recovered AMP to current status

Journal discarded when recovery complete
AMP 2
AMP 1
41
66
93
72
88
AMP 3
AMP 4
58
93
20
88
45
17
37
72
45
17
37
58
41
20
66
RJ
Row-ID 7
RJ
Row-ID 41
RJ
Row-ID 66
Cliques
Clique 1
Clique 3
Clique 2
A clique is a defined set of nodes

with failover capability.
All nodes in a clique are able to
access the vdisks of all AMPs in
the clique.
If a node fails, its vprocs will
migrate to the remaining nodes
in the clique.
Each node can support 128
vprocs.
Disk cabling groups

nodes into cliques.
Archiving and Recovering Data

Archive Recovery Utility (ARC)
Runs on IBM, UNIX, Linux and Win2K
Other Archive Applications

BakBone NetVault
Archives data from RDBMS
Restores data from archive media
Permits data recovery to a

specified checkpoint
Symantec NetBackup
Common uses of ARC
NCR 6476
6000 Slots
2 - 80 Drives
Dump database objects for backup or disaster recovery

Restore non-fallback tables after disk failure.
Restore tables after corruption from failed batch processes.
Recover accidentally dropped tables, views, or macros.
Recover from miscellaneous user errors.
Copy a table and restore it to another Teradata Database.

Match each item with its definition:
____ 1. Database locks
a. Provides for TXN rollback in case of failure
____ 2. Table locks
b. Protects all rows of a table
____ 3. Row Hash locks
c. Logs changed rows for down AMP
____ 4. Fallback
d. Protects from node failure
____ 5. Cluster
e. Logical group of AMPs for fault-tolerance
____ 6. Recovery journal
f. Applies to all tables and views within
____ 7. Transient journal
g. Multi-platform archive utility
____ 8. ARC
h. Lowest level of protection granularity
____ 9. Clique
i. Protects tables from AMP failure
Client Tools and Utilities

List the various load and unload utilities available for use with
the Teradata Database.
List the various support tools available to Teradata Database
Administrators.
List the various query tools available for use with the Teradata
Database.
Query Tools - BTEQ
Query Tools Teradata SQL Assistant

SQL front-end to Teradata Database and other ODBC compliant databases
FastLoad Utility
Fast batch utility for loading a single empty table

Automatic checkpoint/restart capability
Errors reported and collected in error tables
Supports INMOD routines and Access Modules
Loads data in two phases
MultiLoad Utility
Loads/maintains up to five empty or populated tables
Performs block level operations against target tables
Affected data blocks are written once
Multiple operations with one pass of input files
Uses conditional logic to applying updates
Supports INSERT, UPDATE, DELETE and UPSERT operations
Supports INMOD routines and Access Modules
Errors reported and collected in error tables
Provides automatic checkpoint/restart capability
FastExport Utility
Exports large volumes of formatted data from one or more

tables on the Teradata Database to a host file or user-written
application
Supports multiple sessions
Export from multiple tables
Provides automatic checkpoint/restart capability
TPump Utility
Allows near real-time updates from transactional systems into the warehouse
Allows constant loading of data into a table
Performs INSERT, UPDATE, DELETE, and ATOMIC UPSERT operations, or a
combination, to more than 60 tables at a time
High-volume SQL-based continuous update of multiple tables
Allows target tables to:
Have secondary indexes, referential integrity, constraints and enabled triggers
Be MULTISET or SET
Be populated or empty
Allows conditional processing
Supports automatic restarts
No session limituse as many sessions as necessary
No limit to the number of concurrent instances
Uses row-hash locks, allowing concurrent updates on the same table
Can be stopped at any time with work committed with no ill effect
Designed for highest possible throughput
Gives users the control over the rate per minute (throttle) at which statements are
sent to the database either dynamically or by script
Teradata Parallel Transporter
Parallel Extract, Transform and Load (end-to-end parallelism) eliminates

sequential bottlenecks
Data Streams eliminate the overhead of persistent storage
Single SQL-like scripting language
Access to various data sources
Open API enables Third Party and user application integration
Teradata Parallel Transporter Operators
TPT Operator
Teradata Utility
Description
LOAD
FastLoad
A consumer-type operator that uses the Teradata

FastLoad protocol. Supports Error limits and Checkpoint/
Restart. Both support Multi-Value Compression and PPI.
UPDATE
MultiLoad
Utilizes the Teradata MultiLoad protocol to enable job

based table updates. This allows highly scalable and
parallel inserts and updates to a pre-existing table.
EXPORT
FastExport
A producer operator that emulates the FastExport utility
STREAM
TPump
Uses multiple sessions to perform DML transactions in

near real-time.
DataConnector
N/A
This operator emulates the Data Connector API. Reads

external data files, writes data to external data files, reads
an unspecified number of data files.
ODBC
N/A
Reads data from an ODBC Provider.
Teradata Manager
Graphical system management tool - Collects, analyzes, and displays:
Performance information
Teradata Dynamic Workload Manager

Query workload management tool (formerly Teradata Dynamic Query Manager) that:
Restricts (i.e. runs, suspends, schedules later or rejects) query based on set thresholds
Based on analysis control:

Too long
-- Too
many rows
Based on object control:
- User ID
- Table
- Day/time
- Group ID
Logs workload
performance for analysis
Based on environmental
factors
- CPU
- Disk utilization
- Network activity
- Number of users
Users
Accounts
Profiles
Analyst Tools Teradata Visual Explain

Provides the ability to capture and graphically represent the steps of a query
plan and perform comparisons of two or more plans
Stores query plans in a Query Capture Database (QCD)
Analyst Tools
Teradata System Emulation Tool
Emulates a target system by exporting and importing all information necessary to
emulate in a test environment
- Use with Target

Level Emulation
to generate query
plans on a test
system as if they
were run on the
target system
- Verifies queries
and reproduces
optimizer related
issues in a test
environment
Analyst Tools Teradata Index Wizard

Recommends secondary indexes for tables, based on a particular workload
Analyst Tools Teradata Statistics Wizard

Recommends and automates the Statistics Collection process
Recommends Statistics to be re-collected due to table growth

___2. MultiLoad
a.
b.
c.
___3. FastLoad
d.
___1. TPump
___4. FastExport
e.
___5. Teradata Manager

___6. Teradata Dynamic
Query Manager
f.
g.
___7. BTEQ
___8. Teradata SET
h.
i.
___9. Teradata Index Wizard

___10. Teradata Statistics
Wizard
___11. Teradata Visual Explain
j.
k.
Graphical system management tool.

Query workload management tool.
Utility that performs block level operations
against populated tables.
Utility that allows constant loading of data
(streaming) into a table.
Utility that allows export of data from multiple
tables.
Utility that performs fast batch loads into
unpopulated tables.
SQL query front-end that runs on all client
platforms.
Recommends Secondary Indexes for a workload.
Utility that uses a Query Capture Database to
store query plans.
Utility that recommends and automates the
Statistics Collection process.
Utility that verifies queries and reproduces
optimizer related (query plans) issues on a test
environment.
More Information
For more information on topics discussed in this course, see the following
resources:
Documentation: http://www.info.Teradata.com
Practice tests for certification: http://www.Teradata.com/certification
Available courses: Teradata Education Network
http://www.TeradataEducationNetwork.com
Appendix A
Review Questions/Solutions

1. Name three operating systems that the Teradata Database runs on:
MP-RAS UNIX
MS Windows 2003 Server SuSE LINUX
2. Which of the following describes the scalability of the Teradata Database?
a. Linear b. Parallel c. Exponential d. Shared
3. Which feature allows the Teradata Database to process large amounts of data
quickly?
a. High availability software and hardware components
b. Parallelism
c. Proven scalability
d. High performance servers from Intel
4.
The Teradata Database is primarily a:
a. Server
b. Client
5.
Which two tasks do Teradata Database Administrators never have to do? (Choose
two.)
a. Reorganize data
b. Select primary indexes
c. Restart the system
d. Pre-prepare data for loading

Match each term with its definition:
F
1. Database
a. A set of columns that uniquely

identify a row.
2. Table
b. A set of logically related tables.
3. Relational database
c. One or more columns that exist

as a PK value in another table in
the database.
4. Primary Key
d. The absence of a value or an

unknown value.
5. Null
e. A two-dimensional array of rows

and columns.
6. Foreign Key
7. Row
f.
A collection of permanently
stored data.
g. One instance of all columns in a

table.

1. Name three types of enterprise data processing and give examples.
OLTP
Withdraw cash from ATM
DSS
How many items sold for a given month?
OLAP
What are the top 10 selling items for a given month?
2. What is the difference between a data warehouse (DW) and a data mart (DM)?
DW
Central enterprise-wide, detail data, historically unlimited
DM
Subset of enterprise data, detail data, summary data, limited history
3. Which type of data mart gets its data directly from the data warehouse?
Dependent
4. Name the two types of queries that an Active Data Warehouse supports
for mission critical applications. Strategic and Tactical
5. Match the data warehouse usage evolution stage to its description:
E
D
C
A
B
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
a. Continuous updates and time sensitive queries

b. Event-based triggering takes hold
c. Analytical modeling
d. Increase in ad hoc queries
e. Primarily batch
1. Name the three major Teradata Database components and state their purpose.
PE Parse, Optimize and Dispatch queries
AMPData storage and retrieval
BYNET Communication between PEs and AMPs
2. Why are there two LANs in a Teradata system? For redundancy
3. How many sessions can a PE support? 120
4. What is the communications layer in a Teradata system? BYNET driver

Indicate whether a statement is True or False.
F
1. A database will always have tables.
2. A user will always have a password.
3. A user creating a subordinate user who needs tables must give up some
of its Perm Space.
4. The sum of all user and database Perm Space will equal the total
space on the system.
5. Deleting a view from a database reclaims Perm Space for the database.

1. Indicate whether the following apply to: UPI, NUPI, Either, or Neither
EITHER
A. Specified in CREATE TABLE statement.
UPI
B. Provides even row distribution via the hashing algorithm.
EITHER
C. May be up to 64 columns.
EITHER
D. Always a one-AMP operation.
UPI
E. Access will return a single row.
EITHER
F. Used to assign a row to a specific AMP.
EITHER
G. Allows NULL.
NEITHER
H. Value cannot be changed.
EITHER
I.
NUPI
J. Permits duplicate values.
NUPI
K. Can never be the Primary Key.
Required on every table.
2. Why is the choice of Primary Index important?

Data distribution and access performance.
3. True/False: Tables should be assigned a PI at creation.
True

For each type of access, fill each box with either Yes, No, or the appropriate number.
USI Access
NUSI Access
Full-Table Scan
Number of AMPs accessed?

Number of rows returned?
A parallel operation?
Uses separate sub-table?
Reads all data blocks?
Indicate whether each statement is True or False.

1. A USI can be used to enforce uniqueness on a PK column.
2. You can create or drop USIs and NUSIs at any time.
3. A full-table scan is not efficient because it accesses rows multiple times.
4. A full-table scan can occur when there is a range of values specified
for columns in a primary index.

F
1. Database locks
a. Provides for TXN rollback in case of failure
B 2. Table locks
b. Protects all rows of a table
H 3. Row Hash locks
c. Logs changed rows for down AMP
4. Fallback
d. Protects from node failure
5. Cluster
e. Logical group of AMPs for fault-tolerance
C 6. Recovery journal
f. Applies to all tables and views within
A 7. Transient journal
g. Multi-platform archive utility
G 8. ARC
h. Lowest level of protection granularity
D 9. Clique
i. Protects tables from AMP failure

C 2. MultiLoad
a.
b.
c.
3. FastLoad
d.
4. FastExport
D 1. TPump
e.
A 5. Teradata Manager
B 6. Teradata Dynamic
Query Manager
f.
g.
G 7. BTEQ
K 8. Teradata SET
h.
i.
H 9. Teradata Index Wizard

J
I
10. Teradata Statistics

Wizard
11. Teradata Visual Explain
j.
k.
Graphical system management tool.

Query workload management tool.
Utility that performs block level operations
against populated tables.
Utility that allows constant loading of data
(streaming) into a table.
Utility that allows export of data from multiple
tables.
Utility that performs fast batch loads into
unpopulated tables.
SQL query front-end that runs on all client
platforms.
Recommends Secondary Indexes for a workload.
Utility that uses a Query Capture Database to
store query plans.
Utility that recommends and automates the
Statistics Collection process.
Utility that verifies queries and reproduces
optimizer related (query plans) issues on a test
environment.

Introduccion A Teradata

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Introduccion A Teradata

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction

Who Should Attend

Teradata Database Overview

Relational Database Concepts

Teradata and the Data Warehouse

Components and Architecture

Databases and Users

Data Distribution and Access

Secondary Indexes and Full-Table Scans

Fault Tolerance and Data Protection

Client Tools and Utilities

This course contains the following appendices:

Teradata Database Overview

What is the Teradata Database?

Runs on MP-RAS UNIX,

Teradata Database Server

Teradata Parallel Architecture

Linear Scalability (10GB to 100+TB)

Single, Administrative View

Hashing provides for automatic data

Ad hoc queries with ANSI

Teradata Database Server

Teradata Database Advantages

Teradata Database Manageability

The Administrator knows that if the data is to be doubled,

Teradata Database Features

Module 1 Review Questions

The Teradata Database is primarily a:

Relational Database Concepts

The employee table has:

Primary Key Rules

Foreign Key (FK) values

Foreign Keys (FK) are optional.

Answering Questions with a

1. Name the department in

Module 2 Review Questions

a. A set of columns that uniquely

Teradata and the Data Warehouse

After completing this module, you should be able to:

Evolution of Data Processing

Debit transaction takes place against

How many child size blue jeans were

Which customers are most likely to

The Advantage of Using Detail Data

QUESTION: How effective was the national advertisement

STORE ITEM DAY

Data Warehouse Usage Evolution

Continuous Update &

Active Data Warehousing

The Data Warehouse

Module 3 Review Questions

a. Continuous updates and time sensitive queries

Components and Architecture

Teradata software, gateway software and channel-driver software run as processes

Major Components of the Teradata Database

Answer Set Response

The Parsing Engine (PE)

The Parsing Engine is responsible for:

Dual redundant, fault-tolerant, bi-directional interconnect network that enables:

The Access Module Processor (AMP)

The AMP is responsible for:

The MPP System

Teradata Database Software