Beruflich Dokumente
Kultur Dokumente
to the
Teradata Database
Course 25964
Version 2.1
Course Objectives
After completing this course, you should be able to:
Describe the purpose and function of the Teradata Database.
Navigate relational tables using Primary Keys and Foreign Keys.
List the principal components of the Teradata Database and
describe their functions.
Describe the Teradata Database features that provide fault tolerance.
Describe the Primary Index and the Secondary Index.
Explain data distribution and data access mechanics in the Teradata
Database.
Course Audience
Class Format
This course consists of:
One day of classroom instruction
Review exercises following each module
A course handbook in facing-page format
Course Modules
This course consists of:
Module 1:
Module 2:
Module 3:
Module 4:
Module 5:
Module 6:
Module 7:
Module 8:
Module 9:
Course Appendices
LAN
Channel
UNIX
Mainframes
Win
200
3
Win
2000
Clients
Win
XP
Parallel-Aware Optimizer
LAN
Channel
UNIX
Mainframes
Win
200
3
Win
2000
Clients
Win
XP
a. Server
b. Client
5.
Which two tasks do Teradata Database Administrators never have to do? (Choose
two.)
a. Reorganize data
b. Select primary indexes
c. Restart the system
d. Pre-prepare data for loading
What is a Database?
A database is a collection of permanently stored data that is:
Logically related - the data relates to other data.
Shared - many users may access the data.
Protected - access to data is controlled.
Managed - the data has integrity and value.
Logical/Relational Modeling
The Logical Model
Should be designed without regard to usage
Accommodates a wide variety of front end tools
Allows the database to be created more quickly
Should be the same regardless of data volume
Data is organized according to what it represents (real world business
data in table (relational) form)
Includes all the data definitions within the scope of the application or
enterprise
Is generic the logical model is the template for physical
implementation on any RDBMS platform
Normalization
Process of reducing a complex data structure into a simple, stable one
Involves removing redundant attributes, keys, and relationships from the
conceptual data model
Relational Databases
Relational Databases are founded on Set Theory and based on the Relational Model.
A Relational Database consists of a collection of logically related tables.
A table is a two dimensional representation of data consisting of rows and columns.
Column
EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER
Row
1006
1008
1005
1004
1007
1003
JOB LAST
CODE NAME
1019
1019
0801
1003
301
301
403
401
312101
312102
431100
412101
0801
401
411100
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
Primary Keys
Primary Key (PK) values uniquely identify each row in a table.
EMPLOYEE
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER
JOB LAST
CODE NAME
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
PK
1006
1008
1005
1004
1007
1003
1019
1019
0801
1003
301
301
403
401
312101
312102
431100
412101
0801
401
411100
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
Foreign Keys
EMPLOYEE (partial listing)
MANAGER
EMPLOYEE EMPLOYEE DEPARTMENT
NUMBER
NUMBER NUMBER
JOB LAST
CODE NAME
PK
FK
FK
1006
1008
1005
1004
1007
1003
1019
1019
0801
1003
301
301
403
401
312101
312102
431100
412101
0801
401
411100
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
FK
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
DEPARTMENT
DEPARTMENT DEPARTMENT
NUMBER
NAME
PK
501
301
302
403
402
401
201
marketing sales
research and development
product planning
education
software support
customer support
technical operations
MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER
80050000
46560000
22600000
93200000
30800000
98230000
29380000
FK
1017
1019
1016
1005
1011
1003
1025
FK
1019
1019
0801
1003
FK
301
301
403
401
0801
401
JOB LAST
CODE NAME
FK
312101
312102
431100
412101
432101
411100
Stein
Kanieski
Ryan
Johnson
Villegas
Trader
FIRST
NAME
HIRE
DATE
BIRTH
DATE
SALARY
AMOUNT
John
Carol
Loretta
Darlene
Arnando
James
861015
870201
861015
861015
870102
860731
631015
680517
650910
560423
470131
570619
3945000
3925000
4120000
4630000
5970000
4785000
DEPARTMENT
DEPARTMENT
NUMBER
DEPARTMENT
NAME
MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER
PK
501
301
302
403
402
401
201
FK
marketing sales
research and development
product planning
education
software support
customer support
technical operations
80050000
46560000
22600000
93200000
30800000
98230000
29380000
1017
1019
1016
1005
1011
1003
1025
Relational Advantages
Advantages of a Relational Database compared to other database
methodologies include:
More flexible than other types
Allows businesses to quickly respond to changing conditions
Being data-driven vs. application driven
Modeling the business, not the processes
Makes applications easier to build because the data does more
of the work
Supporting trend toward end-user computing
Being easy to understand
No need to know the access path
Solidly founded in set theory
T
R
A
D
I
T
I
O
N
A
L
T
O
D
A
Y
Type
OLTP
Examples
Update a checking account to reflect
a deposit.
Number of Rows
Accessed
Response
Time
Small
Seconds
Large
Seconds or
minutes
DSS
OLAP
Show the top ten selling items across Large of detail rows or Seconds or
all stores for 1997.
moderate of summary minutes
rows
Show a comparison of sales from this
week to last week.
Data
Mining
Phase 1: Minutes
Moderate to large
detailed historical rows or hours
Phase 2:
Seconds or less
DATE
QUANTITY
SOLD
June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
.4.
.June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
.4.
.
June0
June0
1
June0
2
June0
3
June0
4
June0
5
June0
6
June0
7
June0
8
June1
9
June11
0
June1
June1
2
June1
3
4
110
12
12
6
14
7
10
4
34
2
41
4
42
0
29
6
16
7
16
4
115
7
10
89
.9.
.5
4
0
3
7
2
2
3
0
14
7
12
4
16
6
49
4
6
3
8
2
4
3
4
4
0
.5.
.
1
13
1
4
1
0
1
2
3
5
6
2
8
5
1
7
2
8
68
1
2
4
11
2
QUANTITY
SOLD
PK
..
.
1
1
..
.
2
2
..
.
5
5
..
.
F
K
..
.
2
2
..
.
2
2
..
.
2
2
..
.
..
.
June0
June1
7
.4.
.
June
June1
07
.4.
.
June0
June1
7
.4.
.
..
.. .
136
.
137
3
.6.
.. .
45
.
44
6
.1.
.
16
18
1
.6.
.
DETAIL DATA
vs.
SUMMARY DATA
Detail data gives
a more accurate
picture.
Correct business
decisions result.
STAGE 2
STAGE 3
STAGE 4
STAGE 5
REPORTING
WHAT
happened?
ANALYZING
WHY
did it happen?
PREDICTING
WHAT
will happen?
OPERATIONALIZING
WHAT
is happening?
ACTIVE WAREHOUSING
MAKING
it happen!
Primarily Batch
Pre-Defined
Reports
Increase in
Ad Hoc
Queries
Analytical
Modeling
Grows
Batch
Ad
Hoc
Analytics
Event Based
Triggering
Takes Hold
Event-Based
Triggering
Accounts
Receivable
Inventory
Teradata Database
Cognos
Microstrategy
Operational
Systems
POS
Data Warehouse
Biz
Objects
Access Tools
End Users
Data Marts
A data mart is a special purpose subset of enterprise data for a particular function
or application. It may contain detail or summary data or both.
Data mart types:
Independent - created directly from operational systems to a separate physical
data store
Logical - exists as a subset of existing data warehouse via Views
Dependent - created from data warehouse to a separate physical data store
Operational Systems
Independent
Data Mart
Dependent
Data Mart
Data
Warehouse
Logical
Data
Mart
3. Which type of data mart gets its data directly from the data warehouse?
4. Name the two types of queries that an Active Data Warehouse supports
for mission critical applications.
5. Match the data warehouse usage evolution stage to its description:
___ Stage 1
___ Stage 2
___ Stage 3
___ Stage 4
___ Stage 5
What is a Node?
ChannelAttached
Systems
LAN
CHANNEL
W/S
TERADATA
GATEWAY
CHANNEL DRIVER
UNIX
WIN PDE
2K
PC
PE
VPROC
PE
VPROC
BYNET DRIVER
Linux
AMP
VPROC
VDISK
AMP
VPROC
AMP
VPROC
AMP
VPROC
VDISK
VDISK
VDISK
SQL Request
Node
Parsing Engine
BYNET
AMP
AMP
AMP
AMP
VDISK
VDISK
VDISK
VDISK
SQL Request
Parsing Engine
Session
Control
Parsing
System Configuration
Optimizing
Dispatching
Data Demographics
BYNET
AMP
VDISK
AMP
AMP
AMP
VDISK
VDISK
VDISK
The BYNET
Answer Set Response
SQL Request
Parsing
Engine
BYNET
AMP
AMP
AMP
AMP
VDISK
VDISK
VDISK
VDISK
SQL Request
Parsing
Engine
BYNET
AMP
VDISK
AMP
AMP
AMP
VDISK
VDISK
VDISK
AMPs perform
all tasks in parallel
SMC
SMC
BYNET
NODE
SMC
SMC
BYNET
Disk Array
NODE
NODE
NODE
NODE
NODE
Disk Array
NODE
Disk Array
NODE
Disk Array
Node Cabinet
Array Cabinet
Node Cabinet
Array Cabinet
Network-Attached System
Client
Application
Client
Application
CLI
CLI
TDP
T
P
A
Channel
Teradata
Database Node
O
S
Parsing
Engine
MOSI
Parsing
Engine
BYNET
AMP
AMP
VDISK
VDISK
MTDP
Teradata Gateway
Channel Driver
P
D
E
LAN
AMP
AMP
VDISK
VDISK
Client
Application
CLI
TDP
Channel
Host Channel
Adapters
Channel
Driver
Parsing
Engine
Client
Application
CLI
ODBC
MTDP
LAN
MOSI
Gateway
Parsing
Engine
Space limits are specified for each database and for each user:
Perm Space - maximum amount of space available for permanent tables
Spool Space - maximum amount of work space available for request processing
Temp Space - maximum amount of space available for global temporary tables
Default
SYSADMIN 5GB
SYSTEMFE
SYSDBA
Database 2
10GB
10GB
5GB
CRASHDUMPS 10GB
70 GB
-10 GB
-10 GB
-30 GB
20 GB
User D
Database 3
User A 30GB
10 GB
User B
10GB
User C
10GB
May be NULL
Primary Indexes
The physical mechanism used to assign a row to an AMP
A table must have a Primary Index
The Primary Index cannot be changed
UPI
Why would you choose an Index that is different from the Primary Key?
Join performance
Known access paths
NUPI Table
Row
HASH BUCKET#
Hash
Hash Map
AMP #
{
{
{
{
SELECT *
FROM Customer
WHERE Cust = 45;
PE
Cust
37
98
74
95
27
56
45
84
49
51
31
62
12
77
72
40
Hashing
Algorithm
BYNET
Base Table
Cust
UP
I49
45
56
51
Name
Smith
Adams
Smith
Marsh
Phone
111-6666
444-6666
555-7777
888-2222
AMP 2
AMP 3
AMP 4
Base Table
Base Table
Base Table
Cust
UPI
62
84
95
77
Phone
PK
UPI
UPI = 45
AMP 1
Name
Name
Black
Rice
Peters
Jones
Phone
444-5555
666-5555
555-7777
777-6666
Cust
UPI
12
74
98
31
Name
Young
Smith
Brown
Adams
Phone
777-4444
555-6666
333-9999
111-2222
Cust
Name
Phone
UPI
27
72
40
37
Jones
Adams
Smith
White
222-8888
666-7777
222-3333
555-4444
White
Brown
Smith
Peters
Jones
Smith
Adams
Rice
Smith
Marsh
Adams
Black
Young
Jones
Adams
Smith
555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
666-5555
111-6666
888-2222
111-2222
444-5555
777-4444
777-6666
666-7777
222-3333
Single AMP
access with 0 to
1 rows returned.
SELECT *
FROM Customer
WHERE Phone = '555-7777';
PE
Cust
Hashing
Algorithm
NUPI
37
98
74
95
27
56
45
84
49
51
31
62
12
77
72
40
BYNET
AMP 1
Base Table
Cust Name
Phone
NUPI
37 White 555-4444
666-5555
84 Rice
31 Adams 111-2222
40 Smith 222-3333
AMP 2
AMP 3
AMP 4
Base Table
Base Table
Base Table
Cust Name
45
98
72
74
Adams
Brown
Adams
Smith
Phone
NUPI
444-6666
333-9999
666-7777
555-6666
Phone
PK
PI = 555-7777
Name
Cust Name
Phone
NUPI
49 Smith 111-6666
12 Young 777-4444
27 Jones 222-8888
62 Black 444-5555
Cust Name
77
95
56
51
Jones
Peters
Smith
Marsh
Phone
NUPI
777-6666
555-7777
555-7777
888-2222
White
Brown
Smith
Peters
Jones
Smith
Adams
Rice
Smith
Marsh
Adams
Black
Young
Jones
Adams
Smith
555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
666-5555
111-6666
888-2222
111-2222
444-5555
777-4444
777-6666
666-7777
222-3333
Single AMP
access with 0 to
n rows returned.
Customer
Number
Order
Date
Order
Status
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
UPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
7415 1
4/09
4/13
C
C
AMP 4
AMP 3
AMP 2
AMP 1
7202 2
2
3
1
1
2
1
3
1
2
7325 2
4/13
7103 1
4/10
7402 3
4/16
7188 1
4/13
7225 2
4/15
7324 3
4/13
7384 1
4/12
Customer
Number
Order
Date
Order
Status
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
AMP 2
AMP 1
AMP 3
7325 2
4/13
7384 1
4/12
7202 2
4/09
7103 1
4/10
7225 2
4/15
7415 1
4/13
7188 1
4/13
AMP 4
7402 3
4/16
7324 3
4/13
Customer
Number
Order
Date
Order
Status
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
AMP 1
AMP 2
7402 3
4/16
7202 2
7225 2
7415 1
7188 1
7384 1
4/09
4/15
4/13
4/13
4/12
C
C
C
C
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
AMP 4
AMP 3
7103 1
7324 3
4/10
4/13
7325 2
4/13
Secondary Indexes
A secondary index is an alternate path to the rows of a table.
A table may have from 0 to 32 secondary indexes.
A secondary index:
does not affect table row distribution.
is chosen to improve access performance.
may reference from 1 to 64 table columns.
may be defined at table creation.
may be defined after the table is created.
may be dropped at any time.
uses a sub-table which utilizes Perm Space.
may impact table maintenance performance (row inserts,
row updates and/or row deletes).
Sparse Index
Any join index, whether simple or aggregate, multi-table or single-table, can be sparse.
Uses a constant expression in the WHERE clause of its definition to narrowly filter its row
population.
Hash Index
Used for the same purposes as single-table join indexes.
Create a full or partial replication of a base table with a PI on a FK column to
facilitate joins of large tables by hashing them to the same AMP.
Can be defined on one table only.
Value-Ordered NUSI
Very efficient for range conditions and conditions with an inequality on the secondary index
column set.
Primary Index
Secondary Index
Yes
No
0 to 32
64
64
Unique or Non-Unique
Both
Both
Yes
No
Created/Dropped dynamically
No
Yes
Yes
Yes
Separate Sub-Table
No
Yes
No
Yes
Required Index
Full-Table Scans
CUSTOMER
Cust_ID
Cust_Name
Cust_Phone
USI
NUSI
NUPI
NUSI Access
Full-Table Scan
Locks
SQL
Transient Journal
Transient Journal
Successful TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded
(Add $100 to checking)
UPDATE Row B
Before image Row B recorded
(Subtract $100 from savings)
END TRANSACTION
Discard before images
Failed TXN
BEGIN TRANSACTION
UPDATE Row A
UPDATE Row B
Before image Row B recorded
(Failure occurs)
(Rollback occurs) Reapply before images
(Terminate TXN) Discard before images
RAID Protection
RAID 1 (Mirroring)
Primary
Each physical disk in the array has an exact copy in the same
array.
The array controller can read from either disk and write to both.
When one disk of the pair fails, there is no change in performance.
Mirroring reduces available disk space by 50%.
Array controller reconstructs failed disks quickly.
RAID 5 (Parity)
Block 0
Data and parity striped across rank of 4 disks.
Parity
If a disk fails, any missing block may be
Block 6
reconstructed using the other three disks.
Parity reduces available disk space by 25% in a 4-disk rank.
Reconstruction of failed disks takes longer than RAID 1.
Summary
Block 1
Block 3
Parity
Block 2
Block 4
Block 7
Mirror
Parity
Block 5
Block 8
Fallback
A Fallback table is
fully available in the
event of an
unavailable AMP.
PE
PE
BYNET
AMP 1
A Fallback row is a
copy of a primary
row stored on a
different AMP in the
same CLUSTER of
AMPs.
2
3
AMP 2
6
8
11
5
AMP 3
3
2
12
5
1
AMP 4
11
Primary
rows
8
6
1
12
Benefits of
Fallback:
Costs of
Fallback:
Fallback
rows
Journal is active
Table updates continue as normal
Journal logs Row-IDs of changed rows for down-AMP
When AMP is
back on-line:
AMP 1
41
66
93
72
88
AMP 3
AMP 4
58
93
20
88
45
17
37
72
45
17
37
58
41
20
66
RJ
Row-ID 7
RJ
Row-ID 41
RJ
Row-ID 66
Cliques
Clique 1
Clique 3
Clique 2
Symantec NetBackup
NCR 6476
6000 Slots
2 - 80 Drives
____ 4. Fallback
____ 5. Cluster
____ 8. ARC
____ 9. Clique
FastLoad Utility
MultiLoad Utility
Loads/maintains up to five empty or populated tables
Performs block level operations against target tables
Affected data blocks are written once
Multiple operations with one pass of input files
Uses conditional logic to applying updates
Supports INSERT, UPDATE, DELETE and UPSERT operations
Supports INMOD routines and Access Modules
Errors reported and collected in error tables
Provides automatic checkpoint/restart capability
FastExport Utility
TPump Utility
Allows near real-time updates from transactional systems into the warehouse
Allows constant loading of data into a table
Performs INSERT, UPDATE, DELETE, and ATOMIC UPSERT operations, or a
combination, to more than 60 tables at a time
High-volume SQL-based continuous update of multiple tables
Allows target tables to:
Have secondary indexes, referential integrity, constraints and enabled triggers
Be MULTISET or SET
Be populated or empty
Allows conditional processing
Supports automatic restarts
No session limituse as many sessions as necessary
No limit to the number of concurrent instances
Uses row-hash locks, allowing concurrent updates on the same table
Can be stopped at any time with work committed with no ill effect
Designed for highest possible throughput
Gives users the control over the rate per minute (throttle) at which statements are
sent to the database either dynamically or by script
TPT Operator
Teradata Utility
Description
LOAD
FastLoad
UPDATE
MultiLoad
EXPORT
FastExport
STREAM
TPump
DataConnector
N/A
ODBC
N/A
Teradata Manager
Graphical system management tool - Collects, analyzes, and displays:
Performance information
Users
Accounts
Profiles
Analyst Tools
Teradata System Emulation Tool
Emulates a target system by exporting and importing all information necessary to
emulate in a test environment
___2. MultiLoad
a.
b.
c.
___3. FastLoad
d.
___1. TPump
___4. FastExport
e.
f.
g.
___7. BTEQ
___8. Teradata SET
h.
i.
j.
k.
More Information
For more information on topics discussed in this course, see the following
resources:
Documentation: http://www.info.Teradata.com
Practice tests for certification: http://www.Teradata.com/certification
Available courses: Teradata Education Network
http://www.TeradataEducationNetwork.com
Appendix A
Review Questions/Solutions
a. Server
b. Client
5.
Which two tasks do Teradata Database Administrators never have to do? (Choose
two.)
a. Reorganize data
b. Select primary indexes
c. Restart the system
d. Pre-prepare data for loading
1. Database
2. Table
3. Relational database
4. Primary Key
5. Null
6. Foreign Key
7. Row
f.
A collection of permanently
stored data.
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
1. Name the three major Teradata Database components and state their purpose.
PE Parse, Optimize and Dispatch queries
AMPData storage and retrieval
BYNET Communication between PEs and AMPs
2. Why are there two LANs in a Teradata system? For redundancy
3. How many sessions can a PE support? 120
4. What is the communications layer in a Teradata system? BYNET driver
3. A user creating a subordinate user who needs tables must give up some
of its Perm Space.
4. The sum of all user and database Perm Space will equal the total
space on the system.
5. Deleting a view from a database reclaims Perm Space for the database.
UPI
EITHER
C. May be up to 64 columns.
EITHER
UPI
EITHER
EITHER
G. Allows NULL.
NEITHER
EITHER
I.
NUPI
NUPI
True
NUSI Access
Full-Table Scan
1. Database locks
B 2. Table locks
4. Fallback
5. Cluster
C 6. Recovery journal
A 7. Transient journal
G 8. ARC
D 9. Clique
C 2. MultiLoad
a.
b.
c.
3. FastLoad
d.
4. FastExport
D 1. TPump
e.
A 5. Teradata Manager
B 6. Teradata Dynamic
Query Manager
f.
g.
G 7. BTEQ
K 8. Teradata SET
h.
i.
j.
k.