Sie sind auf Seite 1von 86

Teradata Overview

What Is Teradata?
Teradata is a relational database management system (RDBMS) that provides the foundation
to give a company the power to grow, to compete in todays dynamic marketplace, and to
evolve the business by getting answers to a new generation of questions. Teradatas scalability
allows the system to grow as the business grows, from gigabytes to terabytes and beyond.
Teradatas unique technology has been proven at customer sites across industries and around
the world.
How Is Teradata Used?
Each Teradata implementation can model a companys business. The ability to keep up with
rapid changes in todays business environment makes Teradata an ideal foundation for many
applications, including:
Enterprise data warehousing
Data warehousing is a process for properly assembling and managing data from various
servers to answer business-critical questions. Teradata is ideal for enterprise data
warehousing, which is commonly characterized by:

1.
2.
3.
4.
5.
6.

Multiple subject areas


Many concurrent users
Many concurrent queries, including ad-hoc queries
Large quantity of tables
Hundreds of gigabytes (and terabytes) of detail data
Historical data stored (months or years)

Without an enterprise data warehouse, a financial institution may be able to identify profitable
customers for separate products such as mortgages or credit cards, but not know the overall
profitability of each customer. An enterprise data warehouse brings together the different
subject areas into a central repository, creating one single version of the truth for a complete
picture of the customer.
An enterprise data warehouse environment built on Teradata simplifies the system
maintenance task, resulting in a lower total cost of ownership. In addition, Teradatas ability to
handle large-scale, decision-support queries against huge volumes of detail data makes it the
obvious choice for companies wanting to start at any level and grow.
Active data warehousing
The active data warehouse extends a companys ability beyond historical data and strategic
decisions to bring the decision-making capability to front-line personnel. The tactical decisions
such as, Who should get the empty seat on this airplane? or What should I offer this
customer to keep her from leaving, based on her history with our company? can be made
more effectively with the right information.
With an active data warehouse, employees who interact directly with customers and suppliers
are empowered with information-based decision making at their fingertips. The Teradata
Warehouse supports active data warehousing with:

1. Capability to handle thousands of additional users and mixed workloads


2. High availability and reliability to support mission-critical applications
3. Scalability to accommodate an increase in the amount of data, the number of data
sources, and the number of applications supported in the data warehouse
environment

Teradata Workshop

Customer relationship management


Customer Relationship Management solutions help companies capture and analyze data to
maximize customer acquisition, retention, and profitability. You can use Teradatas detailed
data and analysis capabilities to identify and optimize business relationships with the highest
potential of profitability and growth. Examples include:

1. A telephone company can conduct and refine marketing programs targeted at a


certain type of profitable customer.

2. A supermarket can create incentives based on specific combinations of products that


customers tend to buy together.

3. A bank can recognize changes in a customers life circumstances, such as a new


baby or a college-bound son or daughter, and offer timely services such as a new
home loan, mortgage insurance, additional checking account, extra credit card, or
student loan.
4. A retailer can run a department store credit card sales program and filter out those
customers who already have that card.
The NCR CRM solution consists of software, professional and customer services, and the
Teradata RDBMS to create, maintain, and enhance customer relationships.
Internet and E-Business
Teradata provides a single repository for customer information that helps E-Businesses build
and maintain one-to-one customer relationships that are critical to their success on the
Internet. Teradata supports the fast-paced style of E-Business by allowing many concurrent
users to ask complicated questions as they think of themand get quick answers.
Teradata allows E-Businesses to:

1. Capture massive amounts of click-stream data.


2. Enable multiple users to ask complex questions of the customer click-stream data
with near real-time response.

3. Protect customers privacy with consumer opt-in/opt-out preferences and ability for
consumers to check and revise their information stored on the Teradata database
through the Internet or a company call center.

Data marts
A Teradata system may start out small as a data mart, and be easily expanded with linear
scalability to include more subject areas, applications, and users.
A data mart is a special purpose subset of a companys enterprise data used by a particular
department, function, or application. Often, these single-subject area data marts contain data
that was aggregated or transformed in some way to better handle the requests of a specific
user community. Vendors implement data marts using different architectures:

1. Independent data marts - Created directly from operational systems to an


individual data store.

2. Dependent data marts - Created from detail data in the data warehouse. It still

requires movement and transformation of data, but may provide better performance
for some specific user queries.
3. Logical data marts - Existing parts of the data warehouse, not separate physical
structures. Because in theory the data warehouse contains the detail data of the
entire enterprise, a logical data mart would then provide the specific information for
a specific user community. With the proper technology, this can be an ideal way to
remove the need for massive data loading and transforming.
4. Independent and dependent data marts are architectures endorsed by other
database vendors and tend to be associated with higher maintenance costs for
physically moving and maintaining the data, inconsistent data (and resulting
inconsistent decisions), and indirect ways to get the complete picture of the data.

Teradata Workshop

Teradata is ideal for the logical data mart environment, where different user
communities view subsets of a single repository of enterprise data.
Q What do you think are some Teradata features that make it so successful in todays business
environment? (Details on the following are coming up next.)
Top of Form
Scalability.
Single data store.
High degree of parallelism.
Ability to model the business.
All of the above.
Bottom of Form
What Makes Teradata Unique?
We will learn about many features that make the Teradata RDBMS right for business-critical
applications. To start with, this section covers these key features:

Single data store


Scalability
Unconditional parallelism
Ability to model the business

Single Data Store

Teradata acts as a single data store, with multiple client applications making inquiries against
it concurrently.
Instead of replicating a database for different purposes, with Teradata you store the data once
and use it for all clients. Teradata provides the same connectivity for an entry-level system as
it does for a massive enterprise data warehouse.
Scalability

Teradata Workshop

Linear scalability means that as you add components to the system, the performance
increase is linear. Adding components allows the system to accommodate increased workload
without decreased throughput. Teradata is scalable in multiple ways, including hardware,
complexity, and concurrent users.
Hardware
Growth is a fundamental goal of business. A Teradata MPP system easily accommodates that
growth whenver it happens. The Teradata RDBMS runs on highly optimized NCR servers in the
following configurations:

SMP - Symmetric mutiprocessing platforms manage gigabytes of data to support an


entry-level data warehousing system.
MPP - Massively parallel processing systems can manage hundreds of terabytes of data.
You can start small with a couple of nodes, and later expand the system as your business
grows.
With Teradata, you can increase the size of your system without replacing:

Databases - When you expand your system, the data is automatically redistributed
through the reconfiguration process, without manual interventions such as sorting, unloading
and reloading, or partitioning.

Platforms - Teradatas modular structure allows you to add components to your


existing system.
Graphical user interface (GUI) tools - The Teradata Warehouse tool suite is designed
to work on all Teradata systems, small and large.
Data model - The physical and logical data models remain the same regardless of data
volume.
Applications - Applications you develop for Teradata configurations will continue to
work as the system grows, protecting your investment in application development.

Complexity
Teradata is adept at complex data models that satisfy the information needs throughout an
enterprise. Teradata efficiently processes increasingly sophisticated business questions as
users realize the value of the answers they are getting. It has the ability to perform large
aggregations during query run time and can perform up to 64 joins in a single query.
Concurrent Users
Teradata has the proven ability to handle from hundreds to thousands of users on the system
simultaneously. Adding many concurrent users typically reduces system performance.
However, adding more components can enable the system to accommodate the new users with
equal or even better performance.
Unconditional Parallelism

Teradata Workshop

Teradata provides exceptional performance using parallelism to achieve a single answer faster
than a non-parallel system. Parallelism uses multiple processors working together to accomplish
a task quickly.
An example of parallelism can be seen at an amusement park, as guests stand in line for an
attraction such as a roller coaster. As the line approaches the boarding platform, it typically will
split into multiple, parallel lines. That way, groups of people can step into their seats
simultaneously. The line moves faster than if the guests step onto the attraction one at a time.
At the biggest amusement parks, the parallel loading of the rides becomes essential to their
successful operation.
Parallelism is evident throughout a Teradata system, from the architecture to data loading to
complex request processing. Teradata processes requests in parallel without mandatory query
tuning. Teradatas parallelism does not depend on limited data quantity, column range
constraints, or specialized data modelsTeradata has unconditional parallelism.
Ability to Model the Business
A data warehouse built on a business model contains information from across the enterprise.
Individual departments can use their own assumptions and views of the data for analysis, yet
these varying perspectives have a common basis for a single version of the truth. Companies
can get a cohesive view of their operations across functional areas to:

Find out which divisions share customers.


Track products throughout the supply chain, from initial manufacture, to inventory, to
sale, to delivery, to maintenance, to customer satisfaction.
Analyze relationships between results of different departments.
Determine if a customer on the phone has used the companys website.
Vary levels of service based on a customers profitability.
You get consistent answers from the different viewpoints above using a single business model,
not functional models for different departments. In a functional model, data is organized
according to what is done with it. But what happens if users later want to do some analysis
that has never been done before? When a system is optimized for one departments function,
the other departments needs (and future needs) may not be met.
A Teradata system allows the data to represent a business model, with data organized
according to what it is, not what it does. The data model is the same regardless of data
volume. With Teradata as the enterprise data warehouse, users can ask new questions of the
data that were never anticipated, throughout the business cycle and even through changes in
the business environment.
What Is a Relational Database?
A database is a collection of permanently stored data that is:

Logically related (the data was created for a purpose).


Shared (many users may access the data).
Protected (access to the data is controlled).
Managed (the data integrity and value are maintained).

Teradata Workshop

The Teradata RDBMS is a relational database. Relational databases are based on the relational
model, which has its foundation in the mathematical theory of sets. The relational model uses
and extends many principles of set theory to provide a disciplined approach to data
management.
Relational databases present data as of a set of tables. A table is a two-dimensional
representation of data that consists of rows and columns. According to the relational model,
a valid table does not have to be populated with data rows, it just needs to be defined with at
least one column.
Tables are logically related to each other by a common field, so information such as customer
telephone numbers and addresses can exist in one table, yet be accessible for multiple
purposes. The example below shows customer, order, and billing statement data, related by a
common field. The common field of Customer ID lets you look up information such as a
customer name for a particular statement number, even though the data exists in two different
tables.

Rows
Each row contains all the columns
in the table, so each table has
only one row format. The
sequence of rows is arbitrary and
does not imply priority, hierarchy,
or significance.
Each row represents an
occurrence of an entity defined
by the table. An entity is a
person, place, or thing about
which the table contains
information. In this example, the
entity is the employee and each
row represents a single
employee.

Columns
Each column contains like data, such as only part
names, or only supplier names, or only employee
numbers. In the example below, the Last_Name column
contains last names only, and nothing else. The data in
the columns is atomic data(Data that is indivisible.
Atomic data cannot be divided into smaller units that
have meaning. For example, a column that has both Last_Name and First_Name in it is not
atomic, because it can be divided further into components)., so a telephone number might be
divided into three columns: the area code, the prefix, and the suffix, so the customer data can
be analyzed according to area code, etc. Missing data values would be represented by nulls.
Within a table, the column position is arbitrary.

Teradata Workshop

Primary Key
In the relational model (A set of principles for relational databases, formalized by Dr. E.F. Codd
in the late 1960s. The relational model of data prescribes how the data should be represented in
terms of structure, integrity, and manipulation. The rules of the relational model are widely
accepted in the information technology industry, though actual implementations may vary.), a
Primary Key (PK) is used to designate a unique identifier for each row when you design a
database. A Primary Key can be composed of one or more columns. In the example below, the
Primary Key is the employee number.

Primary Key Rules


Rules governing how Primary Keys must be defined and how they function are:
Rule
Rule
Rule
Rule
Rule
Rule

1:
2:
3:
4:
5:
6:

Only one Primary Key per table.


A Primary Key value must be unique.
The Primary Key value cannot be NULL.
The Primary Key value should not be changed.
The Primary Key column should not be changed.
A Primary Key may be any number of columns.

Rule 1: One Primary Key


Each table must have one, and only one, Primary Key. In any given row, the value of the
Primary Key uniquely identifies the row. The Primary Key may span more than one column, but
even then, there is only one Primary Key.

Teradata Workshop

Rule 2: Unique PK
Within the column(s) designated as the Primary Key, the values in each row must be unique. No
duplicate values are allowed. The Primary Keys purpose is to uniquely identify a row. In a multicolumn Primary Key, the combined value of the columns must be unique, even if an individual
column in the Primary Key has duplicate values.

Rule 3: PK Cannot Be NULL


Within the Primary Key column,
each row must have a Primary Key
value and cannot be NULL (without
a value). Because NULL is
indeterminate, it cannot identify
anything.

Rule 4: PK Value Should Not Change


Primary Key values should not be changed. If you changed a Primary Key, you would lose all
historical tracking of that row.

Teradata Workshop

Rule 5: PK Column Should Not Change


Additionally, the column(s) designated as the Primary Key should not be changed. If you
changed a Primary Key, you would lose all the information relating that table to other tables.

Rule 6: No Column Limit


In the relational model, there is no limit to the number of columns that can be designated as
the Primary Key, so it may consist of one or more columns. In the example below, the Primary
Key consists of three columns: EMPLOYEE NUMBER, LAST NAME, and FIRST NAME.

Teradata Workshop

Foreign Key
A Foreign Key (FK) is an identifier that links related tables. A Foreign Key defines how two tables
are related to each other. Each Foreign Key references a matching Primary Key in another table
in the database. For example, in the table below, the Department Number column that is a
Foreign Key actually exists in another table as a Primary Key.

Having tables related to each other gives users the flexibility to look at the data in different
ways, without the database administrator having to manage and maintain many tables of
duplicate data for different applications.
Foreign Key Rules
Rules governing how Foreign Keys must be defined and how they operate are:
Rule
Rule
Rule
Rule
Rule
Rule
Rule

1:
2:
3:
4:
5:
6:
7:

Foreign Keys are optional.


A Foreign Key value may be non-unique.
The Foreign Key value may be NULL.
The Foreign Key value may be changed.
The Foreign Key column may be changed.
A Foreign Key may be any number of columns.
Each Foreign Key must exist as a Primary Key in the related table.

Rule 1: Optional FKs

Teradata Workshop

10

Foreign Keys are optional; not all tables have them. Tables that do have them can have multiple
Foreign Keys because a table can relate to multiple other tables. In fact, a table can have an
unlimited number of foreign keys. In the example table below:

The Department Number Foreign Key relates to the Department Number Primary Key in
the Department Table elsewhere in the database.
The Job Code FK relates to the Job Code PK in the Job Code Table, elsewhere in the
database.

Having tables related to each other makes a relational database flexible so that different users
can look up information they need, while simplifying the database administration so the data
doesnt have to be duplicated for each purpose or application.
Rule 2: Unique or Non-Unique FKs
Duplicate Foreign Key values are allowed. More than one employee could be assigned to the
same department.

Rule 3: FKs Can Be NULL


NULL (missing) Foreign Key values are allowed. For example, under special circumstances, an
employee might not be assigned to a department.

Teradata Workshop

11

Rule 4: FK Value Can Change


Foreign Key values may be changed. For example, if Arnando Villegas moves from Department
403 to Department 587, the Foreign Key value in his row would change.

Rule 5: FK Column Can Change (in Certain Circumstances)


You may add additional Foreign Key columns as needed. If you change an existing Foreign Key
column, however, you may lose the relationship information between that table and the table
containing the Primary Key.

Teradata Workshop

12

Rule 6: FK Has No Column Limit


The Foreign Key may consist of one or more columns. A multi-column foreign key is used to
relate to a multi-column Primary Key in the related table. In the relational model, there is no
limit to the number of columns that can be designated as a Foreign Key.

Rule 7: FK Must Be PK in Related Table


Each Foreign Key must exist as a Primary Key in the related table. A department number that
does not exist in the Department Table would be invalid as a Foreign Key value in the Employee
Table.
This rule can apply even if the Foreign Key is NULL, or missing. Remember, a missing value is
defined as a non-value; there is no value present. So the rule could be better stated: if a value
exists in the Foreign Key column, it must match a Primary Key value in the related table.

Teradata Workshop

13

To check on your understanding of Primary Keys and Foreign Keys, complete this
sentence. According to the relational model, a single table can have either: (Choose
two.)
A. Multiple primary keys.
B. Multiple foreign keys.
C. No primary keys.
D. No foreign keys.
Teradata: A Proven Product
Teradata was built for data warehousing from the start. Teradata Corporation was founded in
Los Angeles, California, and incorporated on July 13, 1979. The corporate goal was to create a
database computer that could handle billions of rows of data, up to and beyond a terabyte of
data storage. The first product, a Teradata Database Computer (DBC/1012), was shipped to the
first customer with the Teradata RDBMS on a proprietary platform in 1984.
Teradata became an open system in 1996, when the Teradata RDBMS Version 2 was ported to
the general-purpose UNIX platform. Since then, Teradata has been ported to Microsoft Windows
NT, then Microsoft Windows 2000.
Today, Teradata is the brand name of NCRs premiere relational database management
system, the foundation for active data warehousing and other NCR solutions.
How Large Is a Trillion?
Teradata was the first commercial database system to support a trillion bytes of data. The origin
of the name Teradata is tera-, which is derived from Greek and means trillion.
The chart below lists the meaning of the prefixes:

Prefix
kilomegagigaterapeta-

Exponent
103
106
109
1012
1015

Meaning
1,000 (thousand)
1,000,000 (million)
1,000,000,000 (billion)
1,000,000,000,000 (trillion)
1,000,000,000,000,000 (quadrillion)

Teradata Workshop

14

exa-

1018

1,000,000,000,000,000,000 (quintillion)

Examples

1
1
1
1
1

million seconds = 11.57 days


billion seconds = 31.6 years
trillion seconds = 31,688 years
million inches = 15.7 miles
trillion inches = 15,700,000 miles (30 round trips to the moon)

Teradata: Evolving to Meet Customer Requirements


Teradata is a longtime leader in large-scale relational database management. The Teradata
RDBMS has evolved over the years, anticipating customers future information processing
needs. One of its major evolutions involved being ported from a proprietary platform to an open
environment.

Version 1 and Version 2


Version 1: A Revolution in Its Time
The Teradata RDBMS Version 1 ran on the proprietary Teradata Operating System (TOS).
Version 1 was supported on the following hardware platforms:

Teradata DBC/1012
NCR System 3600

While these systems are older technologies, a few are still in use at some customer sites.
Version 1 has the following characteristics:

Hardware processors are physically cabled to the system (AMPs, IFPs, COPs, and PEPs).
Messages are passed between hardware processors using the Ynet interconnect.
Support for channel-attached and network-attached clients, called hosts.
Runs on the Teradata Operating System (TOS).

Version 2: Significant Improvements in Processing Power and Flexibility


The current version, Teradata RDBMS Version 2, evolved from the TOS-based version, providing
additional processing options, better scalability, better performance, and conformance to ANSI
standards for SQL. Some technology advances include:

Teradata Workshop

15

Virtual processors called vprocs (AMPs and PEs) allow for better resiliency, more
efficient use of system resources, and more flexibility than the hardware processors.
Messages are passed over the BYNET, which allows Teradata to be a more linearly
expandable system. As the database grows, additional nodes may be added without
performance penalties.
Support for channel-attached and network-attached clients.
Runs on UNIX and Microsoft Windows.

Teradata RDBMS Version 2 hardware platforms include:

NCR
NCR
NCR
NCR
NCR

5100S and 5100M


4300, 4700, and 5150
4400, 4800, and 5200
4850 and 5250
4475, 4950, and 5350

The remainder of this Web-Based Training course covers the characteristics of Teradata RDBMS
Version 2.

Teradata architecture
A Teradata System
A Teradata system contains one or more nodes. A node is a term for a processing unit under the
control of a single operating system. The node is where the processing occurs for the Teradata
RDBMS. There are two types of Teradata systems:

Symmetric multiprocessing (SMP) - An SMP Teradata system has a single node that
contains multiple CPUs sharing a memory pool.

Massively parallel processing (MPP) - Multiple SMP nodes working together comprise a larger,
MPP implementation of Teradata. The nodes are connected using the BYNET(The BYNET (banyan
network) is a combination of hardware and software that provides high performance networking
between the nodes of a Teradata system. A dual-redundant, bi-directional, multi-staged
network, the BYNET enables the nodes to communicate in a high speed, loosely-coupled
fashion. It is based on banyan topology, a mathematically defined structure that has branches
reminiscent of a banyan tree.), which allows multiple virtual processors on multiple nodes to
communicate with each other.

To manage a Teradata system, you use:

SMP system: System Console (keyboard and monitor) attached directly to the SMP node

Teradata Workshop

16

MPP system: Administration Workstation (AWS) The AWS (Administration Workstation) is a


specialized workstation for an MPP system that:

o
o
o

Provides a single operational and graphical view to the system.


Provides the environment to configure, monitor, and manage the system.
Serves as the primary interface to all system management utilities.

To access a Teradata system, a user typically logs on through one of multiple client platforms
(channel-attached mainframes or network-attached workstations). Client access is discussed
in the next module.
Node Components
A node is a basic building block of a Teradata system, and contains a large number of hardware
and software components. A conceptual diagram of a node and its major components is shown
below. Hardware components are shown on the left side of the node and software components
are shown on the right side.
For a description, click on each component.

Shared Nothing Architecture


The Teradata vprocs (which are the PEs and AMPs) share the components of the nodes
(memory and cpu). The main component of the shared-nothing architecture is that each
AMP manages its own dedicated portion of the systems disk space (called the vdisk) and this
space is not shared with other AMPs. Each AMP uses system resources independently of the
other AMPs so they can all work in parallel for high system performance overall.

Notes
PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the
requests into steps. The PEs send the steps to the AMPs and subsequently return the answer
to the client.
A vdisk (pronounced, VEE-disk) is the logical disk space that is managed by an AMP.
Depending on the configuration, a vdisk may not be contained on the node; however, it is
managed by an AMP, which is always a part of the node.

Teradata Workshop

17

The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose
size and configuration vary based on RAID level). The pdisks logically combine to comprise the
AMPs vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP
manages only its own vdisk, not the vdisk of any other AMP.
Question
Which of the following statements is true?
1.
2.
3.
4.

PDE is an application that runs on the Teradata RDBMS software.


AMPs manage system disks on the node.
The host channel adapter card connects to bus and tag cables through a Teradata
Gateway.
An Ethernet card is a hardware component used in the connection between a networkattached client and the node. Bottom of Form

Using the BYNET


The BYNET (pronounced, bye-net) is a high-speed interconnect (network) that enables
multiple nodes in the system to communicate. It has several unique features:

Scalable: As you add more nodes to the system, the overall network bandwidth scales
linearly. This linear scalability means you can increase system size without performance
penaltyand sometimes even increase performance.

High performance: An MPP system typically has two BYNET networks (BYNET 0 and
BYNET 1). Because both networks in a system are active, the system benefits from
having full use of the aggregate bandwidth of both the networks.

Fault tolerant: Each network has multiple connection paths. If the BYNET detects an
unusable path in either network, it will automatically reconfigure that network so all
messages avoid the unusable path. Additionally, in the rare case that BYNET 0 cannot be
reconfigured, hardware on BYNET 0 is disabled and messages are re-routed to BYNET 1.

Load balanced: Traffic is automatically and dynamically distributed between both


BYNETs.
BYNET Hardware and Software
The BYNET hardware and software handle the communication between the vprocs and the
nodes.

Hardware: The nodes of an MPP system are connected with the BYNET hardware,
consisting of BYNET boards and cables.

Software: The BYNET software is installed on every node. This BYNET driver is an
interface between the PDE software and the BYNET hardware.
SMP systems do not contain BYNET hardware. The PDE and BYNET software
emulate BYNET activity in a single-node environment. The SMP implementation
is sometimes called boardless BYNET.

Teradata Workshop

18

Communication Between Nodes


The BYNET hardware can carry the following types of messages between nodes:

Broadcast message to all nodes


Point-to-point message from one node to another node

Communication Between Vprocs


On an MPP system, BYNET hardware is used to first send the communication across nodes
(using either the point-to-point or broadcast messaging described previously).
On an SMP system, this first step is unnecessary since there is only one node.
Once a node receives a communication, vproc communication within the node is done by the
PDE and BYNET software using the following types of messaging:.

Point-to-point
Multicast
Broadcast

Point-to-Point Messages
With point-to-point messaging between vprocs, a vproc can send a message to another vproc
on:

The same node (using PDE and BYNET software)


A different node using two steps:
1. Send a point-to-point message from the sending node to the node containing
the recipient vproc. This is a communication between nodes using the BYNET
hardware.

Teradata Workshop

19

2.

Within the recipient node, the message is sent to the recipient vproc. This is a
point-to-point communication between vprocs using the PDE and BYNET
software.

Point-to-Point Message on the Same Node

Point-to-Point Message on a Different Node

Multicast Messages
A vproc can send a message to multiple vprocs using two steps:
1.
2.

Send a broadcast message from the sending node to all nodes. This is a
communication between nodes using the BYNET hardware.
Within the recipient nodes, the PDE and BYNET software determine which, if any, of its
vprocs should receive the message and delivers the message accordingly. This is a
multicast communication between vprocs within the node, using the PDE and BYNET
software.

Broadcast Messages
A vproc can send a message to all the vprocs in the system using two steps:

Teradata Workshop

20

1.
2.

Send a broadcast message from the sending node to all nodes. This is a communication
between nodes using the BYNET hardware.
Within each recipient node, the message is sent to all vprocs. This is a broadcast
communication between vprocs using the PDE and BYNET software.

Questions
What types of messages is BYNET hardware capable of sending between nodes on a system?
(Check all that apply.)
A.
B.
C.
D.

Broadcast
Multicast
Point-to-point
Simulcast

2. When a message is delivered to a node using BYNET hardware and software, PDE software
on the node has the ability to route the message to: (Check all that apply.)
A.
B.
C.
D.

A single vproc on a node


A group of vprocs on a node
All vprocs on a node
None of the above

Cliques
A clique (pronounced, kleek) is a group of nodes that share access to the same disk arrays.
Each multi-node system has at least one clique. The cabling determines which nodes are in
which cliquesthe nodes of a clique are connected to the disk array controllers of the same
disk arrays.

Teradata Workshop

21

Cliques Provide Resiliency


In the rare event of a node failure, cliques provide for data access through vproc migration.
When a node resets, the following happens to the AMPs:
1.
2.
3.

When the node fails, the Teradata RDBMS restarts across all remaining nodes in the
system.
The vprocs from the failed node migrate to the operational nodes in its clique.
Processing continues while the failed node is being repaired.

Cliques in a System
Vprocs are distributed across all nodes in the system. There are two recommendations for
cliques:

Maximum of four nodes per clique.


Multiple cliques in the system should have the same number of nodes.

The diagram below shows three cliques. The nodes in each clique are cabled to the same disk
arrays. The overall system is connected by the BYNET.

Teradata Workshop

22

Software Components
A Teradata node requires three distinct pieces of software:

For each node in the system, you need both of the following:

Operating system license (UNIX or Microsoft Windows)


Teradata software license

Operating System

Teradata Workshop

23

The Teradata RDBMS can run on the following operating systems:

UNIX MP-RAS
Microsoft Windows

Parallel Database Extensions (PDE)


The Parallel Database Extensions (PDE) software layer was added to the operating system by
NCR to support the parallel software environment.

Trusted Parallel Application (TPA)


A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The
Teradata RDBMS is classified as a TPA. The four components of the Teradata TPA are:

AMP (Top Right)


PE (Bottom Right)
Channel Driver (Top Left)
Teradata Gateway (Bottom Left)

Teradata Software: PE
A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and
the Teradata RDBMS, once a valid session has been established. Each PE can support a
maximum of 120 sessions. The PE handles an incoming request in the following manner:
1.
2.

o
o
o
3.

4.

The Session Control component verifies the request for session authorization (user
names and passwords), and either allows or disallows the request.
The Parser does the following:
Interprets the SQL statement received from the application.
Verifies SQL requests for the proper syntax.
Consults the Data Dictionary to ensure that all objects exist and that the user
has authority to access them.
The Optimizer is parallel aware, meaning that it has knowledge of the system
components (how many nodes, vprocs, etc.). The Optimizer develops the least expensive plan
(in terms of time) to return the requested response set by evaluating alternative plans and
choosing the fastest one. The plan is converted into executable steps and passed on to the
Dispatcher.
The Dispatcher controls the sequence in which the steps are executed and passes the
steps on to the BYNET for execution by the AMPs.

Teradata Workshop

24

5.
6.

After the AMPs process the steps, the PE receives their responses over the BYNET.
The Dispatcher builds a response message and sends the message back to the user.

PE Session Control
When you log on to the Teradata RDBMS through your application, the session control software
on the PE establishes that session. Session control also manages and terminates sessions on
that PE.
PE Parser
The Parser is a component of the PE and performs the following tasks:

Interprets an incoming Teradata SQL request and checks the syntax, evaluating the
request semantically.

Consults the Data Dictionary to ensure that all the objects exist and that the user has
authority to access them.

Decomposes the request into manageable pieces of work called AMP steps.

Sends the optimized steps to the Dispatcher.


PE Optimizer
The Optimizer component of the PE determines how a request is executed by the system. The
Optimizer is parallel aware, meaning that it accounts for the parallel environment in which
the AMP steps are processed. The Optimizer develops the least expensive plan (in terms of
time and system resources) for returning the requested response. Processing alternatives are
evaluated, and the fastest alternative is chosen. The selected alternative is converted to
executable steps that will performed by the AMPs.
PE Dispatcher
The Dispatcher is responsible for a number of tasks, depending on the operation it is
performing:

Processing Requests: Controls the sequence in which the steps are executed and
passes the steps to the AMPs through the BYNET.

Processing Responses: After the AMPs process the steps, the Dispatcher builds a
response message and sends the response back to the user.
In Teradata RDBMS V2R5, the Parser and Dispatcher components are combined into the same
module for better performance, but their functionality remains the same.
Teradata Software: AMP
The AMP is a vproc that controls its portion of the data on the system. The AMPs work in
parallel, each AMP managing the data rows stored on its vdisk. AMPs are involved in data
distribution and data access in different ways.

Teradata Workshop

25

AMP is AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps
from PEs (Parsing Engines) and perform database functions to retrieve or update data. Each
AMP is associated with one virtual disk (vdisk), where the data is stored. An AMP manages only
its own vdisk, not the vdisk of any other AMP.
Data Distribution
When data is loaded, inserted, and updated, the AMP receives incoming data from the PE,
formats rows and distributes them on its vdisk.
Data Access
When data is accessed, the AMP retrieves the rows requested by the PE in the following
manner:
1.
2.

o
o
o
o
o
o
3.

The database management subsystem receives the steps from the Dispatcher over the
BYNET.
The database management subsystem processes the steps. The subsystem on the AMP
can:
Lock databases and tables
Create, modify, or delete definitions of tables
Join tables
Insert, delete, or modify rows within tables
Sort, aggregate, or format data
Retrieve information from definitions and rows from tables
The database management subsystem returns responses over the BYNET to the
Dispatcher.

AMP Worker Task Functions


The AWT functions in the AMP perform a number of operations, including:

Locking tables to ensure data consistency.


Executing AMP step operations such as select, insert, update, delete and sort.
Joining tables as required.
Executing end transaction steps as required to support multi-AMP operations.

AMP File System


The file system software accesses the data on the virtual disks. Each AMP uses the file system
software to read from and write to the virtual disks.
AMP Console Utilities
The AMP software includes utilities to perform systems management functions such as:

Teradata Workshop

26

Configure and reconfigure the system


Rebuild tables
Reveal details about locks and space status

Review Questions
Node Failure causes vprocs to migrate to other nodes.
Bynet Hardware carries the communication between nodes in a system.
Clique is a group of nodes with access to the same disk arrays.
A clique allows processing to continue if an MPP Node has a failed BYNET
A copy of BYNET Hardware is installed on each node in the system.
Client Access
Aim

Explain the relationship between the Teradata RDBMS and its client applications.
Illustrate how the Teradata RDBMS processes a request.
Describe how the clients access the Teradata RDBMS.
Describe the Teradata client utilities and their use.

Users can access data in the Teradata RDBMS through an application on both channel-attached
and network-attached clients. Additionally, the node itself can act as a client. Teradata client
software is installed on each client (channel-attached, network-attached, or node) and
communicates with RDBMS software on the node. You may occasionally hear either type of
client referred to by the legacy term of host, though this term is not typically used in
documentation or product literature.

Channel-Attached Client

Channel-attached clients are IBM-compatible mainframe systems supported by the Teradata


RDBMS. The following software components installed on the mainframe are responsible for
communications between client applications and the Channel Driver on a Teradata node:

Teradata Director Program (TDP) software to manage session traffic, installed on the
channel-attached client.

Teradata Workshop

27


Call-Level Interface (CLI), a library of routines that are the lowest-level interface to
Teradata.
Communication with the Teradata System
Communication from client applications on the mainframe goes through the mainframe
channel, to the Host Channel Adapter on the node, to the Channel Driver software.

Communication from applications on the mainframe (represented by colored balls) goes


through the Channel Driver software, installed on a node.
Network Attached Client

The Teradata RDBMS supports network-attached clients connected to the node over a LAN. The
following software components installed on the network-attached client are responsible for
communication between client applications and the Teradata Gateway on a Teradata node:

ODBC
ODBC (Open Database Connectivity) is an application programming standard that defines
common database access mechanisms to simplify the exchange of data between a client and
server. ODBC-compliant applications connect with a database through the use of a driver that
translates the application's ODBC commands into database syntax.

CLIv2
CLIv2 (Call-Level Interface, Version2) is a library of routines that enable an application
program to access data stored in the Teradata RDBMS. When used with network-attached
clients, CLIv2 contains the following components:

CLI (Call-Level Interface)


MTDP (Micro Teradata Director Program)
MOSI (Micro Operating System Interface)
Node

Teradata Workshop

28

The node is considered a network-attached client. If you install application software on a node,
it will be treated like an application on a network-attached client. In other words,
communications from applications on the node go through the Teradata Gateway. An application
on a node can be executed through:

System Console that manages an SMP system.


Remote login, such as over a network-attached client connection.

Communication from applications on the node (represented by colored balls) goes through the
Teradata Gateway. The node processes these sessions the same way as network-attached
client sessions.
Question
Which of the following can you use to run an application that is installed on a node? (Check all
that apply.)
A. Mainframe terminal
B. Bus terminal
C. System console
D. Network-attached workstation
Request Processing
Query: How many widgets had more than 15% profit margin for Eastern region in
the last quarter?
A request like the one above is processed a little differently, depending on whether the user is
accessing Teradata through a channel-attached or network-attached client:
1.

o
o
2.
3.
4.
5.
6.
7.
8.

SQL request is sent from the client to the appropriate component on the node:
Channel-attached client: request is sent to Channel Driver (through the TDP).
Network-attached client: request is sent to Teradata Gateway (through CLIv2 or
ODBC).
Request is passed to the PE(s).
PEs parse the request into AMP steps.
PE Dispatcher sends steps to the AMPs over the BYNET.
AMPs perform operations on data on the vdisks.
Response is sent back to PEs over the BYNET.
PE Dispatcher receives response.
Response is returned to the client (channel-attached or network-attached).
Mainframe Request Flow

1.
2.
3.
4.
5.
6.
7.
8.

Request sent through TDP and CLI to Channel Driver


Request passed to PE[s]
Parsing
Dispatcher sending it to AMPs
Read/insert/update operations.
Sending response back to PE[s]
Dispatcher receiving
Response returned to mainframe

Teradata Workshop

29

Workstation Request Flow


1.
2.
3.
4.
5.
6.
7.
8.

Request sent via CLIv2 or ODBC to Teradata Gateway


Request passed to PE[s]
Parsing
Dispatcher sending it to AMPs
Read/insert/update operations.
Sending response back to PE[s]
Dispatcher receiving
Response returned to workstation

Teradata Client Utilities


Teradata has a robust suite of client utilities that enable users and system administrators to
enjoy optimal response time and system manageability. Various client utilities are available for
tasks from loading data to managing the system.
Teradata utilities leverage Teradatas high performance capabilities and are fully parallel and
scalable. The same utilities run on smaller entry-level systems, as well as the largest MPP
implementations.
Teradata client utilities include the following, described in this section:

o
o

o
o
o
o
o

o
o

Query Submitting Utilities


BTEQ
Queryman/Teradata SQL Assistant
Load and Unload Utilities
FastLoad
MultiLoad
TPump
FastExport
Teradata Warehouse Builder
Administrative Utilities
Teradata Manager
DBQM/Teradata Dynamic Query Manager

Teradata Workshop

30


o
o
o
o

Archive Utilities
ARC
NetVault
NetBackup
ASF2
Query Submitting Utilities
Teradata provides a number of tools that are front-end interfaces for submitting SQL queries.
Two mentioned in this section are BTEQ and Queryman.
BTEQ
BTEQ (Basic Teradata Query) -- often pronounced BEE-teek -- is a Teradata tool used for
submitting SQL queries on all platforms. BTEQ provides the following functionality:

Standard report writing and formatting


Basic import and export of small amounts data to and from the Teradata RDBMS across
all platforms. For tables more than a few thousand rows, the Teradata load utilities are
recommended for more efficiency.
Ability to submit SQL requests in the following ways:
o Interactive
o Batch

Queryman
Queryman is an information discovery/query tool that runs on Microsoft Windows. Queryman
enables you to access ODBC-based databases (including Teradata). Some of its features include:

Ability to save data in PC-based formats, such as Microsoft Excel, Microsoft Access, and
text files.
History of submitted SQL syntax, to help you build scripts for data mining and
knowledge discovery.
Help with SQL syntax.
Import and export of small amounts of data to and from ODBC-compliant databases. For
tables more than a few thousand rows, the Teradata load utilities are recommended for
more efficiency.

Teradata Workshop

31

In Teradata V2R5.0, Queryman has been renamed to Teradata SQL Assistant.


Data Load and Unload Utilities
In a data warehouse environment, the database tables are populated from a variety of sources,
such as mainframe applications, operational data marts, or other distributed systems
throughout a company. These systems are the source of data such as daily transaction files,
orders, usage records, ERP (enterprise resource planning) information, and Internet statistics.
Teradata has a suite of data load and unload utilities optimized for use with the Teradata
RDBMS. They run on any of the supported client platforms:

Channel-attached client
Network-attached client
Node

Using Teradata Load and Unload Utilities


Teradata load and unload utilities are fully parallel. Because the utilities are scalable, they
accommodate the size of the system. Performance is not limited by the capacity of the load and
unloads tools.
The utilities have full restart capability. This feature means that if a load or unload job should be
interrupted for some reason, it can be restarted again from the last checkpoint, without having
to start the job from the beginning.
The load and unload utilities are:

FastLoad
MultiLoad
TPump
FastExport
Teradata Warehouse Builder

By default, you can run up to 15 instances FastLoad, MultiLoad, and FastExport in any
combination. There is no limit to the number of concurrent TPump jobs.
FastLoad
Use the FastLoad utility to:

Load data into empty tables


Delete all rows from a populated table

Teradata Workshop

32

FastLoad loads data into an empty table in parallel, using multiple sessions to transfer blocks of
data. FastLoad achieves high performance by fully exploiting the resources of the system. After
the data load is complete, the table can be made available to users.

MultiLoad
Use the MultiLoad utility to maintain tables by:

Inserting rows into a populated or empty table


Updating rows in a table
Deleting multiple rows from a table
MultiLoad can load multiple input files concurrently and work on up to five tables at a time,
using multiple sessions. MultiLoad is optimized to apply multiple rows in block-level operations.
MultiLoad usually is run during a batch window, and places a lock on on the destination
table(s) to prevent user queries from getting inconsistent results before the data load or
update is complete.
Batch Window
A period of time during which a table or database is locked for maintenance (inserting,
updating, or deleting data). This is a traditional and commonly used method for maintaining
data in a relational database with changes such as:

This month's customer service records


This week's sales from all retail stores in the region
Today's transactions from all branches in the bank
During a batch window, users are temporarily unable to access the data in the locked tables
and databases that are being processed. The batch processing usually occurs at regular
intervals, such as every night, at the end of each week, or at the end of each month. In the
time periods between batch windows, the data in the database may not reflect all updates up
to the minute.

TPump
Use TPump to:

Continuously load, update, or delete data in tables


Update lower volumes of data using fewer system resources than other load utilities
Vary the resource consumption and speed of the data loading activity over time

Teradata Workshop

33

The TPump utility complements MultiLoad as a data loading utility. A major difference is that
TPump uses row hash locks, which eliminates the need for table locks and "batch windows"
typical with MultiLoad. Users can continue to run queries during TPump data loads. In addition,
TPump is designed for smaller volumes of data than MultiLoad, and maintains up to 60 tables
at a time.
TPump has a dynamic throttle that operators can set to specify the percentage of system
resources to be used for an operation. This enables operators to set when TPump should run at
full capacity during low system usage, or within limits when TPump may affect other business
users of Teradata.
FastExport
Use the FastExport utility to transfer either of the following to a file on a client platform:

Table
View

FastExport is a data extract utility. It transfers large amounts of data using block transfers over
multiple sessions to a file on the network-attached or channel-attached client. Typically,
FastExport is run during a batch window, and the tables being exported are locked.

Teradata Warehouse Builder


Teradata Warehouse Builder is a high-performance data warehouse loading tool specifically
optimized for Teradata. It enables data extraction, transformation and loading processes
common to all data warehouses. Using built-in operators, Teradata Warehouse Builder combines
the functionality of the legacy Teradata utilities FastLoad, MultiLoad, FastExport, and TPump in a
single parallel environment. Its extensible environment includes support for Access Modules,
SQL Inserter, SQL Selector, and others. There is also an API (Application Progammer Interface)
to add third party or custom data transformation to Teradata Warehouse Builder scripts. Using
multiple, parallel tasks, a single Teradata Warehouse Builder script can load data from disparate
sources into the Teradata RDBMS in the same job.
Teradata Warehouse Builder has a single, SQL-like scripting language, as well as a GUI to make
scripting faster and easier. Script converters are available to help transition any legacy FastLoad,
MultiLoad, FastExport, and TPump scripts on existing systems to Teradata Warehouse Builder
scripts.

Teradata Workshop

34

A single Teradata Warehouse Builder job can load data from


multiple disparate sources into the Teradata RDBMS, as indicated by the green arrow.

Administrative Utilities
Administrative utilities use a graphical user interface (GUI) to monitor and manage various
aspects of a Teradata system.
Teradata Manager
Teradata Manager is a production and performance monitoring system that helps a DBA or
system manager to monitor, control, and administer one or more Teradata systems through a
GUI. Running on LAN-attached clients, Teradata Manager has a variety of tools and
applications to gather, manipulate, and analyze information about each Teradata RDBMS being
administered.
Some tasks that a Teradata Database Administrator (DBA) or system manager can perform
with the associated Teradata Manager applications (shown in bold text) include:

View the overall system performance (and drill down to identify problem areas) through
a graphical user interface - Alert Viewer
View near real-time resource usage data in chart format - Dynamic Utilization
Charting (DUC) (Using PMPC)
Monitor Teradata system performance and perform related production control functions PMON
Monitor, identify, and abort sessions on the Teradata RDBMS - Session Information
Monitor usage of vprocs and run node usage macros - Resource History
Compare performance history over different periods of time - Performance Data
Analyzer
Perform database administration tasks on the Teradata RDBMS - WinDDI
Run many of the Teradata RDBMS console utilities from the Teradata Manager PCRemote Console
Create a log to determine whether an application mix is causing delays because of
database lock contention - Locking Logger
Teradata Manager has a number of other applications that are useful in managing a Teradata
system.
Database Query Manager (DBQM)
Database Query Manager (DBQM) is a query workload management tool that dynamically
tunes the Teradata RDBMS. DBQM can run, suspend, reschedule, or reject a query based on
current workload and set thresholds.

Teradata Workshop

35

For example, with DBQM a request can be scheduled to run periodically or during a specified
time period without an active system connection. Results can be retrieved any time after the
request has been submitted by DBQM and executed.
DBQM can restrict queries based on factors such as:

Analysis control thresholds


DBQM can restrict requests that will exceed a certain processing time, or whose expected
result set size exceeds a specified number of rows.

Object control thresholds


DBQM can limit access to and use of static criteria such as database objects and other items.
Object controls can control workload requests based on user IDs, tables, views, date, time,
macros, databases, and group IDs.

Environmental factors
DBQM can manage requests based on dynamic environment factors, including database
system CPU and disk utilization, network activity, and number of users.
Archival Utilities
Teradata has utilities specifically designed for data archive and recovery purposes. There are
different utilities for channel-attached clients and network-attached clients.
Archiving on Channel-Attached Clients
The Archive/Recovery (ARC) utility backs up data in a channel-attached (mainframe) client
environment. It supports commands written in Job Control Language (JCL). It is scalable and
parallel, and can run on a channel-attached client or a node.

Archiving on Network-Attached Clients


To back up data in the network-attached client environment, either of the following products are
used:

NetVault (from BakBone Software Inc.)


NetBackup (from VERITAS Software Corporation)

Teradata Workshop

36

NetVault and NetBackup have modules created for Teradata systems for use in a scalable,
parallel, enterprise environment. They run on network-attached clients or a node (Microsoft
Windows or UNIX MP-RAS).
Some legacy Teradata systems may also use the ASF2 (Archive Storage Facility 2) utility for
backup and restore functions on a UNIX platform.

In Teradata RDBMS V2R5.0, the name of DBQM has been changed to Teradata Dynamic Query
Manager.
Questions
Select the appropriate Teradata load or unload utility from the pull-down menus.
TPump Maintains up to 60 tables at a time.
FastExport Data extract utility.
MultiLoad Updates, inserts, or deletes empty or populated tables.
FastLoad Uses parallel processing to load an empty table.
Which of the following statements are true? (Choose two.)
A. There are multiple Teradata utilities available for archiving data using a channel-attached
client.
B. The two utilities used for Teradata system management are DBQM and Queryman.
C. BTEQ runs on all client platforms to access the Teradata RDBMS.
D. ASF2 is used on legacy systems on a single environment.
E. NetVault and NetBackup are utilities used for network management.
TERADATA SQL
Aim of the module

Teradata Workshop

37

After completing this module, you should be able to:

Define the role of SQL in accessing a relational database.


List the types of SQL commands.
Describe the use of the SELECT and EXPLAIN statements
Explain how macros are used in Teradata.

Teradata is accessed using SQL (Structured Query Language). SQL is the industry standard
access language for communicating with a relational database. It is a set-oriented language
included in the relational model. A user or application can use SQL statements to perform
operations on the data and define how an answer set should be returned from an RDBMS.
Teradata supports two types of SQL:

Generic SQL: Teradata SQL is compliant with ANSI standards (an industry standard).
Teradata SQL Extensions: NCR has added Teradata SQL extensions above and beyond
standard SQL capabilities, including one-step SQL statements for complex administrative
operations.
Teradata SQL Benefits
Teradata SQL is the set of SQL commands used with the Teradata RDBMS. Some benefits of
Teradata SQL are:

Parallel Execution - The Optimizer breaks up an SQL statement into tasks that can be
executed in parallel to minimize resource contention. The design of the Teradata
RDBMS, along with its automatic data distribution, balances the workload and reduces
bottlenecks.
ANSI Compliant - Teradata SQL is compliant with ANSI standards. If you have
programs already written with ANSI-compliant SQL for a different relational database,
you can run them with Teradata, as well.
High-Performance Extensions - NCR has added Teradata SQL extensions that are
above and beyond the standard SQL capabilities, including one-step SQL statements for
complex administrative operations.
Types of SQL Statements
SQL statements commonly are categorized as follows:

Data Definition Language (DDL)


Data Manipulation Language (DML)
Data Control Language (DCL)

Data Definition Language (DDL)


Data Definition Language (DDL) is used to define and create Users, Databases, and the objects
they contain (tables, views, macros, triggers, and stored procedures).
Examples:
CREATE - Define a new Database, User, database object, or index.
DROP - Remove an existing Database, User, database object, index, or statistics.
ALTER - Change table structure and protection definition, or enable and disable triggers.
Data Manipulation Language (DML)

Teradata Workshop

38

Data Manipulation Language (DML) is used to work with data, including tasks such as inserting
data into a table, updating an existing record, or performing queries.
Examples:
SELECT - Perform relation query functions (Select, Project, Join, Union, Intersect,
Minus).
INSERT - Place a new row into a table.
UPDATE - Modify values in an existing row.
DELETE - Remove a row from a table.
Data Control Language (DCL)
The SELECT Statement
The SELECT statement is the most commonly used SQL statement. It is a DML
statement that allows you to retrieve data from one or more tables. In its most common form,
you specify certain rows to be returned as shown.
SELECT *
FROM employee
WHERE department_number = 401;
The asterisk, "*", is a "wild card" character. In this example, it specifies that when the result is
displayed, we want to see all the columns of the rows where the department number is 401.
The FROM clause specifies from which table in our database to retrieve the rows. The WHERE
clause acts as a filter that passes only rows meeting the specified condition -- in this case, rows
of employees in department 401.
NOTE: SQL does not require a trailing semicolon to end a statement, but the Basic Teradata
Query (BTEQ) utility that can be used to enter SQL statements does. The semicolon is used in
the examples, as if it were entered in BTEQ.
If you do not specify a WHERE clause, the query would return all columns and all rows from the
employee table, for example:
SELECT * FROM employee;
EMPLOYEE_ NUMBER
NUMBER
JOB_ CODE
BIRTH_ DATE
1006
1019
531015
1008
1019
580517
1005
0801
550910
1004
1003
460423
1007
1005
370131
1003
0801
470619

MANAGER_ EMPLOYEE_ NUMBER


LAST_ NAME
FIRST_ NAME
SALARY_ AMOUNT
301
312101 Stein
2945000
301
312102 Kanieski
2925000
403
431100 Ryan
3120000
401
412101 Johnson
3630000
403
432101 Villegas
4970000
401
411100 Trader
3785000

DEPARTMENT_
HIRE_ DATE
John

761015

Carol

770201

Loretta

761015

Darlene

761015

Arnando

770102

James

760731

Returning a Subset of Columns


Instead of using the asterisk symbol to specify all columns, we could name specific columns
separated by a comma:

Teradata Workshop

39

SELECT
employee_number
, hire_date
, last_name
, first_name
FROM
employee
WHERE
department_number = 401;
Unsorted Results
Results include the columns named in the SQL statement. The results are unsorted unless you
specify that you want them sorted in a certain way. How to retrieve ordered results is covered in
the following section.
employee_number
1004
1003
1013
1010
1022
1001
1002

hire_date
76/10/15
76/07/31
77/04/01
77/03/01
79/03/01
76/06/18
76/07/31

last_name
Johnson
Trader
Phillips
Rogers
Machado
Hoover
Brown

first_name
Darlene
James
Charles
Frank
Albert
William
Alan

The ORDER BY Clause


To have your results displayed in a sorted order, use the ORDER BY clause, for example:
ORDER BY

hire_date;

Sort Order
Using this example, results are returned in ascending order. If a sort order is not specified, we
get results in ascending order by default. To specify ascending or descending order, add ASC or
DESC to the end of your ORDER BY clause. The following is an example of specifying the results
in ascending order.
SELECT
,last_name
,first_name
,hire_date
FROM
WHERE
ORDER BY

employee_number

employee
department_number = 401
hire_date ASC;

Output
employee_number
1001
1003
1002
1004
1010
1013
1022

hire_date
76/06/18
76/07/31
76/07/31
76/10/15
77/03/01
77/04/01
79/03/01

last_name
Hoover
Trader
Brown
Johnson
Rogers
Phillips
Machado

first_name
William
James
Alan
Darlene
Frank
Charles
Albert

Naming

Teradata Workshop

40

Specify the column to sort on by either naming it directly (for example, hire_date) or by naming
its position within the SELECT statement. Since hire_date is the fourth column in the SELECT
clause, the following SQL statement is equivalent to the one in the example above:
ORDER BY 4 ASC;

Data Control Language (DCL) is used for administrative tasks such as granting and revoking
privileges to database objects or controlling ownership of those objects.
Examples:
GRANT - Give user privileges.
REVOKE - Remove user privileges.
GIVE - Transfer database ownership.
User Assistance Statements and Modifiers
SQL user assistance statements (and modifiers) vary widely from database vendor to
database vendor. Teradata's user assistance statements are commonly called Teradata
extensions. These Teradata extensions are additions to the DDL, DML, and DCL statements in
standard SQL, and make some operations less time consuming.
This page discusses the following Teradata SQL user assistance commands:

HELP
HELP SESSION
SHOW

This page also discusses the statement modifier:

EXPLAIN

The HELP Statement


The HELP statement is used to display information about database objects. You can get
help on the following:
HELP DATABASE
HELP USER
HELP TABLE
HELP VIEW
HELP MACRO
HELP TRIGGER
HELP PROCEDURE
HELP COLUMN
HELP INDEX
HELP STATISTICS
. . . and much more!
Example:
HELP DATABASE databasename

Teradata Workshop

41

Displays all the objects in the specified database.


Example Output of HELP DATABASE statement.

The HELP SESSION Statement


Use the HELP SESSION statement to see specific information about your SQL session.
Example:
HELP SESSION;
Displays the user name with which you logged in, the log-on date and time, your default
database, and other information related to your current session.
Example Output of HELP SESSION statement.

The SHOW Statement


Use the SHOW statement to display the data definition language (DDL) associated with
database objects (tables, views, macros, triggers, or stored procedures). You can show the DDL
for the following:
SHOW
SHOW
SHOW
SHOW
SHOW
SHOW

TABLE
VIEW
MACRO
TRIGGER
PROCEDURE
JOIN INDEX

Example:
SHOW TABLE tablename
Displays the CREATE TABLE statement that was used to create the specified table.
Example Output of SHOW TABLE statement.

The EXPLAIN Modifier


The EXPLAIN modifier allows you to preview how Teradata will execute an SQL request.
It is a good way to see what database resources will be used in processing the request. Use the
EXPLAIN modifier preceding any SQL statement to see a plan with:

Teradata Workshop

42

English text describing a plan for how the statement will be processed.
An estimate of the number of rows involved.
A relative cost of the request.

The relative cost is shown in units of time, and should not be used to predict actual response
time for an SQL request. This time estimate can be used to compare the duration of request
processing relative to other plans.
When you execute a request preceded by the EXPLAIN modifier, the request is not executed.
Instead, the system:

Fully parses the request.


Optimizes the request.
Reports the complete plan for executing the request in readable English.

Example:
EXPLAIN

SELECT

FROM tablename;

Displays the steps involved in processing the request, SELECT

* FROM the specified table.

Example Output of using the EXPLAIN modifier.

What Is Teradata?

Teradata is a relational database management system (RDBMS) that provides the


foundation to give a company the power to grow, to compete in todays dynamic
marketplace, and to evolve the business by getting answers to a new generation of
questions. Teradatas scalability allows the system to grow as the business grows,
from gigabytes to terabytes and beyond. Teradatas unique technology has been
proven at customer sites across industries and around the world.
How Is Teradata Used?
Each Teradata implementation can model a companys business. The ability to keep up
with rapid changes in todays business environment makes Teradata an ideal
foundation for many applications, including:

Enterprise data warehousing


Enterprise Data Warehouse

Data warehousing is a process for properly assembling and managing data from
various servers to answer business-critical questions. Teradata is ideal for enterprise
data warehousing, which is commonly characterized by:

7. Multiple subject areas


8. Many concurrent users
9. Many concurrent queries, including ad-hoc queries
10. Large quantity of tables
11. Hundreds of gigabytes (and terabytes) of detail data

Teradata Workshop

43

12. Historical data stored (months or years)


Without an enterprise data warehouse, a financial institution may be able to identify
profitable customers for separate products such as mortgages or credit cards, but not
know the overall profitability of each customer. An enterprise data warehouse brings
together the different subject areas into a central repository, creating one single
version of the truth for a complete picture of the customer.
An enterprise data warehouse environment built on Teradata simplifies the system
maintenance task, resulting in a lower total cost of ownership. In addition, Teradatas
ability to handle large-scale, decision-support queries against huge volumes of detail
data makes it the obvious choice for companies wanting to start at any level and
grow.

Active data warehousing


Active Data Warehouse

The active data warehouse extends a companys ability beyond historical data and
strategic decisions to bring the decision-making capability to front-line personnel. The
tactical decisions such as, Who should get the empty seat on this airplane? or What
should I offer this customer to keep her from leaving, based on her history with our
company? can be made more effectively with the right information.
With an active data warehouse, employees who interact directly with customers and
suppliers are empowered with information-based decision making at their fingertips.
The Teradata Warehouse supports active data warehousing with:

Capability to handle thousands of additional users and mixed


workloads
High availability and reliability to support mission-critical applications
Scalability to accommodate an increase in the amount of data, the
number of data sources, and the number of applications supported in
the data warehouse environment

Customer relationship management

Customer Relationship Management solutions help companies capture and


analyze data to maximize customer acquisition, retention, and profitability. You can
use Teradatas detailed data and analysis capabilities to identify and optimize business
relationships with the highest potential of profitability and growth. Examples include:

A telephone company can conduct and refine marketing programs


targeted at a certain type of profitable customer.

A supermarket can create incentives based on specific combinations of


products that customers tend to buy together.
A bank can recognize changes in a customers life circumstances, such
as a new baby or a college-bound son or daughter, and offer timely
services such as a new home loan, mortgage insurance, additional
checking account, extra credit card, or student loan.
A retailer can run a department store credit card sales program and
filter out those customers who already have that card.

The NCR CRM solution consists of software, professional and customer services, and

Teradata Workshop

44

the Teradata RDBMS to create, maintain, and enhance customer relationships.

Internet and E-Business

Teradata provides a single repository for customer information that helps EBusinesses build and maintain one-to-one customer relationships that are critical to
their success on the Internet. Teradata supports the fast-paced style of E-Business by
allowing many concurrent users to ask complicated questions as they think of them
and get quick answers.
Teradata allows E-Businesses to:

Capture massive amounts of click-stream data.


Enable multiple users to ask complex questions of the customer clickstream data with near real-time response.
Protect customers privacy with consumer opt-in/opt-out preferences
and ability for consumers to check and revise their information stored
on the Teradata database through the Internet or a company call
center.

Data marts

A Teradata system may start out small as a data mart, and be easily expanded with
linear scalability to include more subject areas, applications, and users.
A data mart is a special purpose subset of a companys enterprise data used by a
particular department, function, or application. Often, these single-subject area data
marts contain data that was aggregated or transformed in some way to better handle
the requests of a specific user community. Vendors implement data marts using
different architectures:

Independent data marts - Created directly from operational


systems to an individual data store.
Dependent data marts - Created from detail data in the data
warehouse. It still requires movement and transformation of data, but
may provide better performance for some specific user queries.
Logical data marts - Existing parts of the data warehouse, not
separate physical structures. Because in theory the data warehouse
contains the detail data of the entire enterprise, a logical data mart
would then provide the specific information for a specific user
community. With the proper technology, this can be an ideal way to
remove the need for massive data loading and transforming.

Independent and dependent data marts are architectures endorsed by other database
vendors and tend to be associated with higher maintenance costs for physically
moving and maintaining the data, inconsistent data (and resulting inconsistent
decisions), and indirect ways to get the complete picture of the data. Teradata is ideal
for the logical data mart environment, where different user communities view subsets
of a single repository of enterprise data.
Q What do you think are some Teradata features that make it so successful in todays
business environment? (Details on the following are coming up next.)
Top of Form

Teradata Workshop

45

Scalability.
Single data store.
High degree of parallelism.
Ability to model the business.
All of the above.
Bottom of Form

What Makes Teradata Unique?


In this Web-Based Training, you will learn about many features that make the
Teradata RDBMS right for business-critical applications. To start with, this section
covers these key features:

Single data store


Scalability
Unconditional parallelism

Ability to model the business

Single Data Store

Teradata acts as a single data store, with multiple client applications making inquiries
against it concurrently.
Instead of replicating a database for different purposes, with Teradata you store the data
once and use it for all clients. Teradata provides the same connectivity for an entry-level
system as it does for a massive enterprise data warehouse.
Scalability
Unconditional Parallelism

Teradata Workshop

46

Teradata provides exceptional performance using parallelism to achieve a single


answer faster than a non-parallel system. Parallelism uses multiple processors
working together to accomplish a task quickly.
An example of parallelism can be seen at an amusement park, as guests stand in line
for an attraction such as a roller coaster. As the line approaches the boarding
platform, it typically will split into multiple, parallel lines. That way, groups of people
can step into their seats simultaneously. The line moves faster than if the guests step
onto the attraction one at a time. At the biggest amusement parks, the parallel
loading of the rides becomes essential to their successful operation.
Parallelism is evident throughout a Teradata system, from the architecture to data
loading to complex request processing. Teradata processes requests in parallel without
mandatory query tuning. Teradatas parallelism does not depend on limited data
quantity, column range constraints, or specialized data modelsTeradata has
unconditional parallelism.
Ability to Model the Business

A data warehouse built on a business model contains information from across the
enterprise. Individual departments can use their own assumptions and views of the
data for analysis, yet these varying perspectives have a common basis for a single
version of the truth. Companies can get a cohesive view of their operations across
functional areas to:

Find out which divisions share customers.


Track products throughout the supply chain, from initial manufacture, to
inventory, to sale, to delivery, to maintenance, to customer satisfaction.
Analyze relationships between results of different departments.
Determine if a customer on the phone has used the companys website.
Vary levels of service based on a customers profitability.

You get consistent answers from the different viewpoints above using a single
business model, not functional models for different departments. In a functional
model, data is organized according to what is done with it. But what happens if users
later want to do some analysis that has never been done before? When a system is
optimized for one departments function, the other departments needs (and future
needs) may not be met.
A Teradata system allows the data to represent a business model, with data organized
according to what it is, not what it does. The data model is the same regardless of
data volume. With Teradata as the enterprise data warehouse, users can ask new
questions of the data that were never anticipated, throughout the business cycle and
even through changes in the business environment.

What Is a Relational Database?

Teradata Workshop

47

A database is a collection of permanently stored data that is:

Logically related (the data was created for a purpose).


Shared (many users may access the data).
Protected (access to the data is controlled).
Managed (the data integrity and value are maintained).

The Teradata RDBMS is a relational database. Relational databases are based on the
relational model, which has its foundation in the mathematical theory of sets. The
relational model uses and extends many principles of set theory to provide a
disciplined approach to data management.
Relational databases present data as of a set of tables. A table is a two-dimensional
representation of data that consists of rows and columns. According to the relational
model, a valid table does not have to be populated with data rows, it just needs to be
defined with at least one column.
Tables are logically related to each other by a common field, so information such as
customer telephone numbers and addresses can exist in one table, yet be accessible
for multiple purposes. The example below shows customer, order, and billing
statement data, related by a common field. The common field of Customer ID lets you
look up information such as a customer name for a particular statement number, even
though the data exists in two different tables.

Rows
Each row contains all the columns in the table, so each table has only one row format.
The sequence of rows is arbitrary and does not imply priority, hierarchy, or
significance.
Each row represents an occurrence of an entity defined by the table. An entity is a
person, place, or thing about which the table contains information. In this example,
the entity is the employee and each row represents a single employee.

Teradata Workshop

48

Columns
Each column contains like data, such as only part names, or only supplier names, or
only employee numbers. In the example below, the Last_Name column contains last
names only, and nothing else. The data in the columns is atomic data(Data that is
indivisible. Atomic data cannot be divided into smaller units that have meaning. For
example, a column that has both Last_Name and First_Name in it is not atomic,
because it can be divided further into components)., so a telephone number might be
divided into three columns: the area code, the prefix, and the suffix, so the customer
data can be analyzed according to area code, etc. Missing data values would be
represented by nulls. Within a table, the column position is arbitrary.

Primary Key
In the relational model (A set of principles for relational databases, formalized by Dr.
E.F. Codd in the late 1960s. The relational model of data prescribes how the data
should be represented in terms of structure, integrity, and manipulation. The rules of
the relational model are widely accepted in the information technology industry,
though actual implementations may vary.), a Primary Key (PK) is used to designate a
unique identifier for each row when you design a database. A Primary Key can be
composed of one or more columns. In the example below, the Primary Key is the
employee number.

Teradata Workshop

49

Primary Key Rules


Rules governing how Primary Keys must be defined and how they function are:
Rule
Rule
Rule
Rule
Rule
Rule

1:
2:
3:
4:
5:
6:

Only one Primary Key per table.


A Primary Key value must be unique.
The Primary Key value cannot be NULL.
The Primary Key value should not be changed.
The Primary Key column should not be changed.
A Primary Key may be any number of columns.

Rule 1: One Primary Key


Each table must have one, and only one, Primary Key. In any given row, the value of
the Primary Key uniquely identifies the row. The Primary Key may span more than one
column, but even then, there is only one Primary Key.

Rule 2: Unique PK
Within the column(s) designated as the Primary Key, the values in each row must be
unique. No duplicate values are allowed. The Primary Keys purpose is to uniquely

Teradata Workshop

50

identify a row. In a multi-column Primary Key, the combined value of the columns
must be unique, even if an individual column in the Primary Key has duplicate values.

Rule 3: PK Cannot Be NULL


Within the Primary Key column, each row must have a Primary Key value and cannot
be NULL (without a value). Because NULL is indeterminate, it cannot identify
anything.

Rule 4: PK Value Should Not Change


Primary Key values should not be changed. If you changed a Primary Key, you would
lose all historical tracking of that row.

Teradata Workshop

51

Rule 5: PK Column Should Not Change


Additionally, the column(s) designated as the Primary Key should not be changed. If
you changed a Primary Key, you would lose all the information relating that table to
other tables.

Rule 6: No Column Limit


In the relational model, there is no limit to the number of columns that can be
designated as the Primary Key, so it may consist of one or more columns. In the
example below, the Primary Key consists of three columns: EMPLOYEE NUMBER, LAST
NAME, and FIRST NAME.

Teradata Workshop

52

Foreign Key
A Foreign Key (FK) is an identifier that links related tables. A Foreign Key defines how
two tables are related to each other. Each Foreign Key references a matching Primary
Key in another table in the database. For example, in the table below, the Department
Number column that is a Foreign Key actually exists in another table as a Primary Key.

Having tables related to each other gives users the flexibility to look at the data in
different ways, without the database administrator having to manage and maintain
many tables of duplicate data for different applications.
Foreign Key Rules
Rules governing how Foreign Keys must be defined and how they operate are:
Rule
Rule
Rule
Rule
Rule
Rule
Rule

1:
2:
3:
4:
5:
6:
7:

Foreign Keys are optional.


A Foreign Key value may be non-unique.
The Foreign Key value may be NULL.
The Foreign Key value may be changed.
The Foreign Key column may be changed.
A Foreign Key may be any number of columns.
Each Foreign Key must exist as a Primary Key in the related table.

Teradata Workshop

53

Rule 1: Optional FKs


Foreign Keys are optional; not all tables have them. Tables that do have them can
have multiple Foreign Keys because a table can relate to multiple other tables. In fact,
a table can have an unlimited number of foreign keys. In the example table below:

The Department Number Foreign Key relates to the Department Number


Primary Key in the Department Table elsewhere in the database.
The Job Code FK relates to the Job Code PK in the Job Code Table, elsewhere
in the database.

Having tables related to each other makes a relational database flexible so that
different users can look up information they need, while simplifying the database
administration so the data doesnt have to be duplicated for each purpose or
application.
Rule 2: Unique or Non-Unique FKs
Duplicate Foreign Key values are allowed. More than one employee could be assigned
to the same department.

Rule 3: FKs Can Be NULL


NULL (missing) Foreign Key values are allowed. For example, under special

Teradata Workshop

54

circumstances, an employee might not be assigned to a department.

Rule 4: FK Value Can Change


Foreign Key values may be changed. For example, if Arnando Villegas moves from
Department 403 to Department 587, the Foreign Key value in his row would change.

Rule 5: FK Column Can Change (in Certain Circumstances)


You may add additional Foreign Key columns as needed. If you change an existing
Foreign Key column, however, you may lose the relationship information between that
table and the table containing the Primary Key.

Teradata Workshop

55

Rule 6: FK Has No Column Limit


The Foreign Key may consist of one or more columns. A multi-column foreign key is
used to relate to a multi-column Primary Key in the related table. In the relational
model, there is no limit to the number of columns that can be designated as a Foreign
Key.

Rule 7: FK Must Be PK in Related Table

Teradata Workshop

56

Each Foreign Key must exist as a Primary Key in the related table. A department number that
does not exist in the Department Table would be invalid as a Foreign Key value in the Employee
Table.
This rule can apply even if the Foreign Key is NULL, or missing. Remember, a missing value is
defined as a non-value; there is no value present. So the rule could be better stated: if a value
exists in the Foreign Key column, it must match a Primary Key value in the related table.

To check on your understanding of Primary Keys and Foreign Keys, complete this
sentence. According to the relational model, a single table can have either:
(Choose two.)
A. Multiple primary keys.
B. Multiple foreign keys.
C. No primary keys.
D. No foreign keys.
Teradata: A Proven Product
Teradata was built for data warehousing from the start. Teradata Corporation was
founded in Los Angeles, California, and incorporated on July 13, 1979. The corporate
goal was to create a database computer that could handle billions of rows of data, up
to and beyond a terabyte of data storage. The first product, a Teradata Database
Computer (DBC/1012), was shipped to the first customer with the Teradata RDBMS on
a proprietary platform in 1984.
Teradata became an open system in 1996, when the Teradata RDBMS Version 2 was
ported to the general-purpose UNIX platform. Since then, Teradata has been ported to
Microsoft Windows NT, then Microsoft Windows 2000.
Today, Teradata is the brand name of NCRs premiere relational database management
system, the foundation for active data warehousing and other NCR solutions.
How Large Is a Trillion?

Teradata Workshop

57

Teradata was the first commercial database system to support a trillion bytes of data.
The origin of the name Teradata is tera-, which is derived from Greek and means
trillion.
The chart below lists the meaning of the prefixes:

Prefix

Exponent

Meaning

kilo-

10

mega-

10

1,000,000 (million)

giga-

10

1,000,000,000 (billion)

tera-

10

12

1,000,000,000,000 (trillion)

peta-

10

15

1,000,000,000,000,000 (quadrillion)

exa-

10

18

1,000,000,000,000,000,000 (quintillion)

3
6

1,000 (thousand)

Examples

1
1
1
1

1 trillion inches = 15,700,000 miles (30 round trips to the moon)

million seconds = 11.57 days


billion seconds = 31.6 years
trillion seconds = 31,688 years
million inches = 15.7 miles

Teradata: Evolving to Meet Customer Requirements


Teradata is a longtime leader in large-scale relational database management. The
Teradata RDBMS has evolved over the years, anticipating customers future
information processing needs. One of its major evolutions involved being ported from
a proprietary platform to an open environment.

Version 1 and Version 2


Version 1: A Revolution in Its Time
The Teradata RDBMS Version 1 ran on the proprietary Teradata Operating System

Teradata Workshop

58

(TOS). Version 1 was supported on the following hardware platforms:

Teradata DBC/1012
NCR System 3600

While these systems are older technologies, a few are still in use at some customer
sites. Version 1 has the following characteristics:

Hardware processors are physically cabled to the system (AMPs, IFPs, COPs,
and PEPs).
Messages are passed between hardware processors using the Ynet
interconnect.
Support for channel-attached and network-attached clients, called hosts.
Runs on the Teradata Operating System (TOS).

Version 2: Significant Improvements in Processing Power and Flexibility


The current version, Teradata RDBMS Version 2, evolved from the TOS-based version,
providing additional processing options, better scalability, better performance, and
conformance to ANSI standards for SQL. Some technology advances include:

Virtual processors called vprocs (AMPs and PEs) allow for better resiliency,
more efficient use of system resources, and more flexibility than the hardware
processors.
Messages are passed over the BYNET, which allows Teradata to be a more
linearly expandable system. As the database grows, additional nodes may be
added without performance penalties.
Support for channel-attached and network-attached clients.
Runs on UNIX and Microsoft Windows.

Teradata RDBMS Version 2 hardware platforms include:

NCR
NCR
NCR
NCR
NCR

5100S and 5100M


4300, 4700, and 5150
4400, 4800, and 5200
4850 and 5250
4475, 4950, and 5350

The remainder of this Web-Based Training course covers the characteristics of


Teradata RDBMS Version 2.

Teradata architecture
A Teradata System
A Teradata system contains one or more nodes. A node is a term for a processing unit
under the control of a single operating system. The node is where the processing
occurs for the Teradata RDBMS. There are two types of Teradata systems:

Symmetric multiprocessing (SMP) - An SMP Teradata system has a single


node that contains multiple CPUs sharing a memory pool.

Massively parallel processing (MPP) - Multiple SMP nodes working together comprise a
larger, MPP implementation of Teradata. The nodes are connected using the

Teradata Workshop

59

BYNET(The BYNET (banyan network) is a combination of hardware and software that


provides high performance networking between the nodes of a Teradata system. A
dual-redundant, bi-directional, multi-staged network, the BYNET enables the nodes to
communicate in a high speed, loosely-coupled fashion. It is based on banyan topology,
a mathematically defined structure that has branches reminiscent of a banyan tree.),
which allows multiple virtual processors on multiple nodes to communicate with each
other.

To manage a Teradata system, you use:

SMP system: System Console (keyboard and monitor) attached directly to the
SMP node

MPP system: Administration Workstation (AWS) The AWS (Administration


Workstation) is a specialized workstation for an MPP system that:

o
o
o

Provides a single operational and graphical view to the system.


Provides the environment to configure, monitor, and manage the
system.
Serves as the primary interface to all system management utilities.

To access a Teradata system, a user typically logs on through one of multiple client
platforms (channel-attached mainframes or network-attached workstations). Client
access is discussed in the next module.
Node Components
A node is a basic building block of a Teradata system, and contains a large number of
hardware and software components. A conceptual diagram of a node and its major
components is shown below. Hardware components are shown on the left side of the
node and software components are shown on the right side.
For a description, click on each component.

Teradata Workshop

60

Shared Nothing Architecture

The Teradata vprocs (which are the PEs and AMPs) share the components of the nodes
(memory and cpu). The main component of the shared-nothing architecture is that
each AMP manages its own dedicated portion of the systems disk space (called the
vdisk) and this space is not shared with other AMPs. Each AMP uses system resources
independently of the other AMPs so they can all work in parallel for high system
performance overall.

Notes
PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the
requests into steps. The PEs send the steps to the AMPs and subsequently return the answer to
the client.
A vdisk (pronounced, VEE-disk) is the logical disk space that is managed by an AMP.
Depending on the configuration, a vdisk may not be contained on the node; however, it is
managed by an AMP, which is always a part of the node.
The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose
size and configuration vary based on RAID level). The pdisks logically combine to comprise the
AMPs vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP
manages only its own vdisk, not the vdisk of any other AMP.
Question
Which of the following statements is true?
5.
6.
7.
8.

PDE is an application that runs on the Teradata RDBMS software.


AMPs manage system disks on the node.
The host channel adapter card connects to bus and tag cables through a Teradata
Gateway.
An Ethernet card is a hardware component used in the connection between a networkattached client and the node. Bottom of Form

Using the BYNET

Teradata Workshop

61

The BYNET (pronounced, bye-net) is a high-speed interconnect (network) that


enables multiple nodes in the system to communicate. It has several unique features:

Scalable: As you add more nodes to the system, the overall network
bandwidth scales linearly. This linear scalability means you can increase
system size without performance penaltyand sometimes even increase
performance.

High performance: An MPP system typically has two BYNET networks


(BYNET 0 and BYNET 1). Because both networks in a system are active, the
system benefits from having full use of the aggregate bandwidth of both the
networks.

Fault tolerant: Each network has multiple connection paths. If the BYNET
detects an unusable path in either network, it will automatically reconfigure
that network so all messages avoid the unusable path. Additionally, in the rare
case that BYNET 0 cannot be reconfigured, hardware on BYNET 0 is disabled
and messages are re-routed to BYNET 1.

Load balanced: Traffic is automatically and dynamically distributed between


both BYNETs.

BYNET Hardware and Software


The BYNET hardware and software handle the communication between the vprocs and
the nodes.

Hardware: The nodes of an MPP system are connected with the BYNET
hardware, consisting of BYNET boards and cables.

Software: The BYNET software is installed on every node. This BYNET driver
is an interface between the PDE software and the BYNET hardware.
SMP systems do not contain BYNET hardware. The PDE and BYNET
software emulate BYNET activity in a single-node environment. The
SMP implementation is sometimes called boardless BYNET.

Communication Between Nodes


The BYNET hardware can carry the following types of messages between nodes:

Broadcast message to all nodes

Teradata Workshop

62

Point-to-point message from one node to another node

Communication Between Vprocs


On an MPP system, BYNET hardware is used to first send the communication across
nodes (using either the point-to-point or broadcast messaging described previously).
On an SMP system, this first step is unnecessary since there is only one node.
Once a node receives a communication, vproc communication within the node is done
by the PDE and BYNET software using the following types of messaging:.

Point-to-point
Multicast
Broadcast

Point-to-Point Messages
With point-to-point messaging between vprocs, a vproc can send a message to
another vproc on:

The same node (using PDE and BYNET software)


A different node using two steps:
1. Send a point-to-point message from the sending node to the node
containing the recipient vproc. This is a communication between
nodes using the BYNET hardware.
2. Within the recipient node, the message is sent to the recipient vproc.
This is a point-to-point communication between vprocs using the PDE
and BYNET software.

Point-to-Point Message on the Same Node

Teradata Workshop

63

Point-to-Point Message on a Different Node

Multicast Messages
A vproc can send a message to multiple vprocs using two steps:
3.
4.

Send a broadcast message from the sending node to all nodes. This is a
communication between nodes using the BYNET hardware.
Within the recipient nodes, the PDE and BYNET software determine which, if
any, of its vprocs should receive the message and delivers the message
accordingly. This is a multicast communication between vprocs within the
node, using the PDE and BYNET software.

Broadcast Messages
A vproc can send a message to all the vprocs in the system using two steps:
3.

Send a broadcast message from the sending node to all nodes. This is a
communication between nodes using the BYNET hardware.

Teradata Workshop

64

4.

Within each recipient node, the message is sent to all vprocs. This is a
broadcast communication between vprocs using the PDE and BYNET software.

Questions
What types of messages is BYNET hardware capable of sending between nodes on a system?
(Check all that apply.)
A.
B.
C.
D.

Broadcast
Multicast
Point-to-point
Simulcast

2. When a message is delivered to a node using BYNET hardware and software, PDE software on
the node has the ability to route the message to: (Check all that apply.)
A.
B.
C.
D.

A single vproc on a node


A group of vprocs on a node
All vprocs on a node
None of the above

Cliques
A clique (pronounced, kleek) is a group of nodes that share access to the same disk arrays.
Each multi-node system has at least one clique. The cabling determines which nodes are in
which cliquesthe nodes of a clique are connected to the disk array controllers of the same disk
arrays.

Teradata Workshop

65

Cliques Provide Resiliency


In the rare event of a node failure, cliques provide for data access through vproc
migration. When a node resets, the following happens to the AMPs:
4.
5.
6.

When the node fails, the Teradata RDBMS restarts across all remaining nodes
in the system.
The vprocs from the failed node migrate to the operational nodes in its clique.
Processing continues while the failed node is being repaired.

Cliques in a System
Vprocs are distributed across all nodes in the system. There are two recommendations
for cliques:

Maximum of four nodes per clique.


Multiple cliques in the system should have the same number of nodes.

The diagram below shows three cliques. The nodes in each clique are cabled to the
same disk arrays. The overall system is connected by the BYNET.

Software Components

Teradata Workshop

66

A Teradata node requires three distinct pieces of software:

For each node in the system, you need both of the following:

Operating system license (UNIX or Microsoft Windows)

Teradata software license

Operating System

The Teradata RDBMS can run on the following operating systems:

UNIX MP-RAS

Microsoft Windows

Parallel Database Extensions (PDE)

Teradata Workshop

67

The Parallel Database Extensions (PDE) software layer was added to the operating
system by NCR to support the parallel software environment.

Trusted Parallel Application (TPA)


A Trusted Parallel Application (TPA) uses PDE to implement virtual processors
(vprocs). The Teradata RDBMS is classified as a TPA. The four components of the
Teradata TPA are:

AMP (Top Right)


PE (Bottom Right)
Channel Driver (Top Left)

Teradata Gateway (Bottom Left)

Teradata Software: PE
A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the
Teradata RDBMS, once a valid session has been established. Each PE can support a maximum of
120 sessions. The PE handles an incoming request in the following manner:
7.

The Session Control component verifies the request for session authorization (user
names and passwords), and either allows or disallows the request.
8. The Parser does the following:
o Interprets the SQL statement received from the application.
o Verifies SQL requests for the proper syntax.
o Consults the Data Dictionary to ensure that all objects exist and that the user
has authority to access them.
9. The Optimizer is parallel aware, meaning that it has knowledge of the system
components (how many nodes, vprocs, etc.). The Optimizer develops the least
expensive plan (in terms of time) to return the requested response set by evaluating
alternative plans and choosing the fastest one. The plan is converted into executable
steps and passed on to the Dispatcher.
10. The Dispatcher controls the sequence in which the steps are executed and passes the
steps on to the BYNET for execution by the AMPs.

Teradata Workshop

68

11. After the AMPs process the steps, the PE receives their responses over the BYNET.
12. The Dispatcher builds a response message and sends the message back to the user.

PE Session Control
When you log on to the Teradata RDBMS through your application, the session control software
on the PE establishes that session. Session control also manages and terminates sessions on
that PE.
PE Parser
The Parser is a component of the PE and performs the following tasks:

Interprets an incoming Teradata SQL request and checks the syntax, evaluating the
request semantically.
Consults the Data Dictionary to ensure that all the objects exist and that the user has
authority to access them.
Decomposes the request into manageable pieces of work called AMP steps.
Sends the optimized steps to the Dispatcher.

PE Optimizer
The Optimizer component of the PE determines how a request is executed by the system. The
Optimizer is parallel aware, meaning that it accounts for the parallel environment in which the
AMP steps are processed. The Optimizer develops the least expensive plan (in terms of time
and system resources) for returning the requested response. Processing alternatives are
evaluated, and the fastest alternative is chosen. The selected alternative is converted to
executable steps that will performed by the AMPs.
PE Dispatcher
The Dispatcher is responsible for a number of tasks, depending on the operation it is
performing:

Processing Requests: Controls the sequence in which the steps are executed and
passes the steps to the AMPs through the BYNET.
Processing Responses: After the AMPs process the steps, the Dispatcher builds a
response message and sends the response back to the user.

In Teradata RDBMS V2R5, the Parser and Dispatcher components are combined into the same
module for better performance, but their functionality remains the same.
Teradata Software: AMP
The AMP is a vproc that controls its portion of the data on the system. The AMPs work in
parallel, each AMP managing the data rows stored on its vdisk. AMPs are involved in data
distribution and data access in different ways.

Teradata Workshop

69

AMP is AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps from
PEs (Parsing Engines) and perform database functions to retrieve or update data. Each AMP is
associated with one virtual disk (vdisk), where the data is stored. An AMP manages only its own
vdisk, not the vdisk of any other AMP.
Data Distribution
When data is loaded, inserted, and updated, the AMP receives incoming data from the PE,
formats rows and distributes them on its vdisk.
Data Access
When data is accessed, the AMP retrieves the rows requested by the PE in the following
manner:
4.
5.

6.

The database management subsystem receives the steps from the Dispatcher over the
BYNET.
The database management subsystem processes the steps. The subsystem on the AMP
can:
o Lock databases and tables
o Create, modify, or delete definitions of tables
o Join tables
o Insert, delete, or modify rows within tables
o Sort, aggregate, or format data
o Retrieve information from definitions and rows from tables
The database management subsystem returns responses over the BYNET to the
Dispatcher.

AMP Worker Task Functions


The AWT functions in the AMP perform a number of operations, including:

Locking tables to ensure data consistency.


Executing AMP step operations such as select, insert, update, delete and sort.
Joining tables as required.
Executing end transaction steps as required to support multi-AMP operations.

AMP File System


The file system software accesses the data on the virtual disks. Each AMP uses the file system
software to read from and write to the virtual disks.
AMP Console Utilities
The AMP software includes utilities to perform systems management functions such as:

Teradata Workshop

70

Configure and reconfigure the system


Rebuild tables
Reveal details about locks and space status

Review Questions
Node Failure causes vprocs to migrate to other nodes.
Bynet Hardware carries the communication between nodes in a system.
Clique is a group of nodes with access to the same disk arrays.
A clique allows processing to continue if an MPP Node has a failed BYNET
A copy of BYNET Hardware is installed on each node in the system.
Client Access
Aim

Explain the relationship between the Teradata RDBMS and its client applications.

Illustrate how the Teradata RDBMS processes a request.

Describe how the clients access the Teradata RDBMS.

Describe the Teradata client utilities and their use.


Users can access data in the Teradata RDBMS through an application on both channel-attached
and network-attached clients. Additionally, the node itself can act as a client. Teradata client
software is installed on each client (channel-attached, network-attached, or node) and
communicates with RDBMS software on the node. You may occasionally hear either type of
client referred to by the legacy term of host, though this term is not typically used in
documentation or product literature.

Channel-Attached Client

Channel-attached clients are IBM-compatible mainframe systems supported by the Teradata


RDBMS. The following software components installed on the mainframe are responsible for
communications between client applications and the Channel Driver on a Teradata node:

Teradata Director Program (TDP) software to manage session traffic, installed on the
channel-attached client.
Call-Level Interface (CLI), a library of routines that are the lowest-level interface to
Teradata.

Teradata Workshop

71

Communication with the Teradata System


Communication from client applications on the mainframe goes through the mainframe channel,
to the Host Channel Adapter on the node, to the Channel Driver software.

Communication from applications on the mainframe (represented by colored balls) goes through
the Channel Driver software, installed on a node.
Network Attached Client

Node

The node is considered a network-attached client. If you install application software on


a node, it will be treated like an application on a network-attached client. In other
words, communications from applications on the node go through the Teradata
Gateway. An application on a node can be executed through:

System Console that manages an SMP system.

Remote login, such as over a network-attached client connection.

Teradata Software: Channel Driver

Teradata Workshop

72

Channel Driver software is the means of communication between an application and


the PEs assigned to channel-attached clients. There is one Channel Driver per node.
In the diagram below, the blue dots show the communication from the channelattached client, to the host channel adapter in the node, to the Channel Driver
software, to the PE, and back to the client.

Teradata Software: Teradata Gateway


Teradata Gateway software is the means of communication between an application
and the PEs assigned to network-attached clients. There is one Teradata Gateway per
node.
In the diagram below, the blue dots show the communication from the networkattached client, to the Ethernet card in the node, to the Teradata Gateway software,
to the PE, and back to the client.

Teradata Workshop

73

Client Access
Processing Environments
An RDBMS is used in two main processing environments:
Decision Support
Transaction Processing
Decision Support
In a decision support environment, users submit requests to analyze historical detail
data stored in the tables. The results are used to establish strategies, reveal trends,
and make projections. A database used as a decision support system (DSS) usually
receives fewer, very complex, ad-hoc queries and may involve numerous tables.

Transaction Processing

Teradata Workshop

74

In contrast to the DSS environment shown above, a transaction processing


environment typically has users accessing current data to update, insert, and delete
rows in the data tables. A database used for on-line transaction processing (OLTP)
generally receives a greater number of simpler queries, with fewer tables involved in a
single request.

Client Connections
Users can access data in the Teradata RDBMS through an application on both channelattached and network-attached clients. Additionally, the node itself can act as a client.
Teradata client software is installed on each client (channel-attached, networkattached, or node) and communicates with RDBMS software on the node. You may
occasionally hear either type of client referred to by the legacy term of host, though
this term is not typically used in documentation or product literature.

Channel-Attached Client

Channel-attached clients are IBM-compatible mainframe systems supported by the


Teradata RDBMS. The following software components installed on the mainframe are
responsible for communications between client applications and the Channel Driver on
a Teradata node:
Teradata Director Program (TDP) software to manage session traffic, installed on the
channel-attached client.
Call-Level Interface (CLI), a library of routines that are the lowest-level interface to
Teradata.

Teradata Workshop

75

Communication with the Teradata System


Communication from client applications on the mainframe goes through the
mainframe channel, to the Host Channel Adapter on the node, to the Channel Driver
software.

Network Attached Client

The Teradata RDBMS supports network-attached clients connected to the node over a
LAN. The following software components installed on the network-attached client are
responsible for communication between client applications and the Teradata Gateway
on a Teradata node:
ODBC
CLIv2
Communication with the Teradata System
Communication from applications on the network-attached client goes over the LAN,
to the Ethernet card on the node, to the Teradata Gateway software.

Node

The node is considered a network-attached client. If you install application software on


a node, it will be treated like an application on a network-attached client. In other
words, communications from applications on the node go through the Teradata
Gateway. An application on a node can be executed through:
System Console that manages an SMP system.

Teradata Workshop

76

Remote login, such as over a network-attached client connection.

Request Processing

How many widgets had more than 15% profit margin in the eastern region last month
A request like the one above is processed a little differently, depending on whether
the user is accessing Teradata through a channel-attached or network-attached client:
SQL request is sent from the client to the appropriate component on the node:
Channel-attached client: request is sent to Channel Driver (through the TDP).
Network-attached client: request is sent to Teradata Gateway (through CLIv2 or
ODBC).
Request is passed to the PE(s).
PEs parse the request into AMP steps.
PE Dispatcher sends steps to the AMPs over the BYNET.
AMPs perform operations on data on the vdisks.
Response is sent back to PEs over the BYNET.
PE Dispatcher receives response.
Response is returned to the client (channel-attached or network-attached).

Mainframe Request Flow


Workstation Request Flow

Teradata Client Utilities


Teradata has a robust suite of client utilities that enable users and system
administrators to enjoy optimal response time and system manageability. Various
client utilities are available for tasks from loading data to managing the system.
Teradata utilities leverage Teradatas high performance capabilities and are fully
parallel and scalable. The same utilities run on smaller entry-level systems, as well as

Teradata Workshop

77

the largest MPP implementations.


Teradata client utilities include the following, described in this section:
Query Submitting Utilities
BTEQ
Queryman/Teradata SQL Assistant
Load and Unload Utilities
FastLoad
MultiLoad
TPump
FastExport
Teradata Warehouse Builder
Administrative Utilities
Teradata Manager
DBQM/Teradata Dynamic Query Manager
Archive Utilities
ARC
NetVault
NetBackup
ASF2
Query Submitting Utilities

Teradata provides a number of tools that are front-end interfaces for submitting SQL
queries. Two mentioned in this section are BTEQ and Queryman.
BTEQ
BTEQ (Basic Teradata Query) -- often pronounced BEE-teekis a Teradata tool used
for submitting SQL queries on all platforms. BTEQ provides the following functionality:
Standard report writing and formatting
Basic import and export of small amounts data to and from the Teradata RDBMS
across all platforms. For tables more than a few thousand rows, the Teradata load

Teradata Workshop

78

utilities are recommended for more efficiency.


Ability to submit SQL requests in the following ways:
Interactive
Batch

Queryman
Queryman is an information discovery/query tool that runs on Microsoft Windows.
Queryman enables you to access ODBC-based databases (including Teradata). Some
of its features include:
Ability to save data in PC-based formats, such as Microsoft Excel, Microsoft Access,
and text files.
History of submitted SQL syntax, to help you build scripts for data mining and
knowledge discovery.
Help with SQL syntax.
Import and export of small amounts of data to and from ODBC-compliant databases.
For tables more than a few thousand rows, the Teradata load utilities are
recommended for more efficiency.

Teradata Workshop

79

In Teradata V2R5.0, Queryman has been renamed to Teradata SQL Assistant.

Data Load and Unload Utilities


In a data warehouse environment, the database tables are populated from a variety of
sources, such as mainframe applications, operational data marts, or other distributed
systems throughout a company. These systems are the source of data such as daily
transaction files, orders, usage records, ERP (enterprise resource planning)
information, and Internet statistics. Teradata has a suite of data load and unload
utilities optimized for use with the Teradata RDBMS. They run on any of the supported
client platforms:
Channel-attached client
Network-attached client
Node
Using Teradata Load and Unload Utilities
Teradata load and unload utilities are fully parallel. Because the utilities are scalable,
they accommodate the size of the system. Performance is not limited by the capacity
of the load and unload tools.
The utilities have full restart capability. This feature means that if a load or unload job
should be interrupted for some reason, it can be restarted again from the last
checkpoint, without having to start the job from the beginning.
The load and unload utilities are:
FastLoad
MultiLoad
TPump

Teradata Workshop

80

FastExport
Teradata Warehouse Builder
By default, you can run up to 15 instances FastLoad, MultiLoad, and FastExport in any
combination. There is no limit to the number of concurrent TPump jobs.
FastLoad
Use the FastLoad utility to:
Load data into empty tables
Delete all rows from a populated table
FastLoad loads data into an empty table in parallel, using multiple sessions to transfer
blocks of data. FastLoad achieves high performance by fully exploiting the resources
of the system. After the data load is complete, the table can be made available to
users.

MultiLoad
Use the MultiLoad utility to maintain tables by:
Inserting rows into a populated or empty table
Updating rows in a table
Deleting multiple rows from a table
MultiLoad can load multiple input files concurrently and work on up to five tables at a
time, using multiple sessions. MultiLoad is optimized to apply multiple rows in blocklevel operations. MultiLoad usually is run during a batch window, and places a lock on
on the destination table(s) to prevent user queries from getting inconsistent results
before the data load or update is complete.

TPump

Teradata Workshop

81

Use TPump to:


Continuously load, update, or delete data in tables
Update lower volumes of data using fewer system resources than other load utilities
Vary the resource consumption and speed of the data loading activity over time
The TPump utility complements MultiLoad as a data loading utility. A major difference
is that TPump uses row hash locks, which eliminates the need for table locks and
batch windows typical with MultiLoad. Users can continue to run queries during
TPump data loads. In addition, TPump is designed for smaller volumes of data than
MultiLoad, and maintains up to 60 tables at a time.
TPump has a dynamic throttle that operators can set to specify the percentage of
system resources to be used for an operation. This enables operators to set when
TPump should run at full capacity during low system usage, or within limits when
TPump may affect other business users of Teradata.

FastExport
Use the FastExport utility to transfer either of the following to a file on a client
platform:
Table
View
FastExport is a data extract utility. It transfers large amounts of data using block
transfers over multiple sessions to a file on the network-attached or channel-attached
client. Typically, FastExport is run during a batch window, and the tables being
exported are locked.

Teradata Warehouse Builder

Teradata Workshop

82

Teradata Warehouse Builder is a high-performance data warehouse loading tool


specifically optimized for Teradata. It enables data extraction, transformation and
loading processes common to all data warehouses. Using built-in operators, Teradata
Warehouse Builder combines the functionality of the legacy Teradata utilities FastLoad,
MultiLoad, FastExport, and TPump in a single parallel environment. Its extensible
environment includes support for Access Modules, SQL Inserter, SQL Selector, and
others. There is also an API (Application Progammer Interface) to add third party or
custom data transformation to Teradata Warehouse Builder scripts. Using multiple,
parallel tasks, a single Teradata Warehouse Builder script can load data from disparate
sources into the Teradata RDBMS in the same job.
Teradata Warehouse Builder has a single, SQL-like scripting language, as well as a GUI
to make scripting faster and easier. Script converters are available to help transition
any legacy FastLoad, MultiLoad, FastExport, and TPump scripts on existing systems to
Teradata Warehouse Builder scripts.

A single Teradata Warehouse Builder job can load data from


multiple disparate sources into the Teradata RDBMS, as indicated by the green arrow.

Administrative Utilities

Administrative utilities use a graphical user interface (GUI) to monitor and manage
various aspects of a Teradata system.
Teradata Manager
Teradata Manager is a production and performance monitoring system that helps a
DBA or system manager to monitor, control, and administer one or more Teradata
systems through a GUI. Running on LAN-attached clients, Teradata Manager has a
variety of tools and applications to gather, manipulate, and analyze information about
each Teradata RDBMS being administered.
For examples of Teradata Manager functions, click here: Teradata Manager Examples

Teradata Workshop

83

Database Query Manager (DBQM)


Database Query Manager (DBQM) is a query workload management tool that
dynamically tunes the Teradata RDBMS. DBQM can run, suspend, reschedule, or reject
a query based on current workload and set thresholds.
For example, with DBQM a request can be scheduled to run periodically or during a
specified time period without an active system connection. Results can be retrieved
any time after the request has been submitted by DBQM and executed.
DBQM can restrict queries based on factors such as:
Analysis control thresholds
Object control thresholds
Environmental factors

Teradata Workshop

84

In Teradata RDBMS V2R5.0, the name of DBQM has been changed to Teradata
Dynamic Query Manager.

Archival Utilities

Teradata has utilities specifically designed for data archive and recovery purposes.
There are different utilities for channel-attached clients and network-attached clients.
Archiving on Channel-Attached Clients
The Archive/Recovery (ARC) utility backs up data in a channel-attached (mainframe)
client environment. It supports commands written in Job Control Language (JCL). It is
scalable and parallel, and can run on a channel-attached client or a node.

Teradata Workshop

85

Archiving on Network-Attached Clients


To back up data in the network-attached client environment, either of the following
products are used:
NetVault (from BakBone Software Inc.)
NetBackup (from VERITAS Software Corporation)
NetVault and NetBackup have modules created for Teradata systems for use in a
scalable, parallel, enterprise environment. They run on network-attached clients or a
node (Microsoft Windows or UNIX MP-RAS).
Some legacy Teradata systems may also use the ASF2 (Archive Storage Facility 2)
utility for backup and restore functions on a UNIX platform.

Teradata Workshop

86

Das könnte Ihnen auch gefallen