Beruflich Dokumente
Kultur Dokumente
What Is Teradata?
Teradata is a relational database management system (RDBMS) that provides the foundation
to give a company the power to grow, to compete in todays dynamic marketplace, and to
evolve the business by getting answers to a new generation of questions. Teradatas scalability
allows the system to grow as the business grows, from gigabytes to terabytes and beyond.
Teradatas unique technology has been proven at customer sites across industries and around
the world.
How Is Teradata Used?
Each Teradata implementation can model a companys business. The ability to keep up with
rapid changes in todays business environment makes Teradata an ideal foundation for many
applications, including:
Enterprise data warehousing
Data warehousing is a process for properly assembling and managing data from various
servers to answer business-critical questions. Teradata is ideal for enterprise data
warehousing, which is commonly characterized by:
1.
2.
3.
4.
5.
6.
Without an enterprise data warehouse, a financial institution may be able to identify profitable
customers for separate products such as mortgages or credit cards, but not know the overall
profitability of each customer. An enterprise data warehouse brings together the different
subject areas into a central repository, creating one single version of the truth for a complete
picture of the customer.
An enterprise data warehouse environment built on Teradata simplifies the system
maintenance task, resulting in a lower total cost of ownership. In addition, Teradatas ability to
handle large-scale, decision-support queries against huge volumes of detail data makes it the
obvious choice for companies wanting to start at any level and grow.
Active data warehousing
The active data warehouse extends a companys ability beyond historical data and strategic
decisions to bring the decision-making capability to front-line personnel. The tactical decisions
such as, Who should get the empty seat on this airplane? or What should I offer this
customer to keep her from leaving, based on her history with our company? can be made
more effectively with the right information.
With an active data warehouse, employees who interact directly with customers and suppliers
are empowered with information-based decision making at their fingertips. The Teradata
Warehouse supports active data warehousing with:
Teradata Workshop
3. Protect customers privacy with consumer opt-in/opt-out preferences and ability for
consumers to check and revise their information stored on the Teradata database
through the Internet or a company call center.
Data marts
A Teradata system may start out small as a data mart, and be easily expanded with linear
scalability to include more subject areas, applications, and users.
A data mart is a special purpose subset of a companys enterprise data used by a particular
department, function, or application. Often, these single-subject area data marts contain data
that was aggregated or transformed in some way to better handle the requests of a specific
user community. Vendors implement data marts using different architectures:
2. Dependent data marts - Created from detail data in the data warehouse. It still
requires movement and transformation of data, but may provide better performance
for some specific user queries.
3. Logical data marts - Existing parts of the data warehouse, not separate physical
structures. Because in theory the data warehouse contains the detail data of the
entire enterprise, a logical data mart would then provide the specific information for
a specific user community. With the proper technology, this can be an ideal way to
remove the need for massive data loading and transforming.
4. Independent and dependent data marts are architectures endorsed by other
database vendors and tend to be associated with higher maintenance costs for
physically moving and maintaining the data, inconsistent data (and resulting
inconsistent decisions), and indirect ways to get the complete picture of the data.
Teradata Workshop
Teradata is ideal for the logical data mart environment, where different user
communities view subsets of a single repository of enterprise data.
Q What do you think are some Teradata features that make it so successful in todays business
environment? (Details on the following are coming up next.)
Top of Form
Scalability.
Single data store.
High degree of parallelism.
Ability to model the business.
All of the above.
Bottom of Form
What Makes Teradata Unique?
We will learn about many features that make the Teradata RDBMS right for business-critical
applications. To start with, this section covers these key features:
Teradata acts as a single data store, with multiple client applications making inquiries against
it concurrently.
Instead of replicating a database for different purposes, with Teradata you store the data once
and use it for all clients. Teradata provides the same connectivity for an entry-level system as
it does for a massive enterprise data warehouse.
Scalability
Teradata Workshop
Linear scalability means that as you add components to the system, the performance
increase is linear. Adding components allows the system to accommodate increased workload
without decreased throughput. Teradata is scalable in multiple ways, including hardware,
complexity, and concurrent users.
Hardware
Growth is a fundamental goal of business. A Teradata MPP system easily accommodates that
growth whenver it happens. The Teradata RDBMS runs on highly optimized NCR servers in the
following configurations:
Databases - When you expand your system, the data is automatically redistributed
through the reconfiguration process, without manual interventions such as sorting, unloading
and reloading, or partitioning.
Complexity
Teradata is adept at complex data models that satisfy the information needs throughout an
enterprise. Teradata efficiently processes increasingly sophisticated business questions as
users realize the value of the answers they are getting. It has the ability to perform large
aggregations during query run time and can perform up to 64 joins in a single query.
Concurrent Users
Teradata has the proven ability to handle from hundreds to thousands of users on the system
simultaneously. Adding many concurrent users typically reduces system performance.
However, adding more components can enable the system to accommodate the new users with
equal or even better performance.
Unconditional Parallelism
Teradata Workshop
Teradata provides exceptional performance using parallelism to achieve a single answer faster
than a non-parallel system. Parallelism uses multiple processors working together to accomplish
a task quickly.
An example of parallelism can be seen at an amusement park, as guests stand in line for an
attraction such as a roller coaster. As the line approaches the boarding platform, it typically will
split into multiple, parallel lines. That way, groups of people can step into their seats
simultaneously. The line moves faster than if the guests step onto the attraction one at a time.
At the biggest amusement parks, the parallel loading of the rides becomes essential to their
successful operation.
Parallelism is evident throughout a Teradata system, from the architecture to data loading to
complex request processing. Teradata processes requests in parallel without mandatory query
tuning. Teradatas parallelism does not depend on limited data quantity, column range
constraints, or specialized data modelsTeradata has unconditional parallelism.
Ability to Model the Business
A data warehouse built on a business model contains information from across the enterprise.
Individual departments can use their own assumptions and views of the data for analysis, yet
these varying perspectives have a common basis for a single version of the truth. Companies
can get a cohesive view of their operations across functional areas to:
Teradata Workshop
The Teradata RDBMS is a relational database. Relational databases are based on the relational
model, which has its foundation in the mathematical theory of sets. The relational model uses
and extends many principles of set theory to provide a disciplined approach to data
management.
Relational databases present data as of a set of tables. A table is a two-dimensional
representation of data that consists of rows and columns. According to the relational model,
a valid table does not have to be populated with data rows, it just needs to be defined with at
least one column.
Tables are logically related to each other by a common field, so information such as customer
telephone numbers and addresses can exist in one table, yet be accessible for multiple
purposes. The example below shows customer, order, and billing statement data, related by a
common field. The common field of Customer ID lets you look up information such as a
customer name for a particular statement number, even though the data exists in two different
tables.
Rows
Each row contains all the columns
in the table, so each table has
only one row format. The
sequence of rows is arbitrary and
does not imply priority, hierarchy,
or significance.
Each row represents an
occurrence of an entity defined
by the table. An entity is a
person, place, or thing about
which the table contains
information. In this example, the
entity is the employee and each
row represents a single
employee.
Columns
Each column contains like data, such as only part
names, or only supplier names, or only employee
numbers. In the example below, the Last_Name column
contains last names only, and nothing else. The data in
the columns is atomic data(Data that is indivisible.
Atomic data cannot be divided into smaller units that
have meaning. For example, a column that has both Last_Name and First_Name in it is not
atomic, because it can be divided further into components)., so a telephone number might be
divided into three columns: the area code, the prefix, and the suffix, so the customer data can
be analyzed according to area code, etc. Missing data values would be represented by nulls.
Within a table, the column position is arbitrary.
Teradata Workshop
Primary Key
In the relational model (A set of principles for relational databases, formalized by Dr. E.F. Codd
in the late 1960s. The relational model of data prescribes how the data should be represented in
terms of structure, integrity, and manipulation. The rules of the relational model are widely
accepted in the information technology industry, though actual implementations may vary.), a
Primary Key (PK) is used to designate a unique identifier for each row when you design a
database. A Primary Key can be composed of one or more columns. In the example below, the
Primary Key is the employee number.
1:
2:
3:
4:
5:
6:
Teradata Workshop
Rule 2: Unique PK
Within the column(s) designated as the Primary Key, the values in each row must be unique. No
duplicate values are allowed. The Primary Keys purpose is to uniquely identify a row. In a multicolumn Primary Key, the combined value of the columns must be unique, even if an individual
column in the Primary Key has duplicate values.
Teradata Workshop
Teradata Workshop
Foreign Key
A Foreign Key (FK) is an identifier that links related tables. A Foreign Key defines how two tables
are related to each other. Each Foreign Key references a matching Primary Key in another table
in the database. For example, in the table below, the Department Number column that is a
Foreign Key actually exists in another table as a Primary Key.
Having tables related to each other gives users the flexibility to look at the data in different
ways, without the database administrator having to manage and maintain many tables of
duplicate data for different applications.
Foreign Key Rules
Rules governing how Foreign Keys must be defined and how they operate are:
Rule
Rule
Rule
Rule
Rule
Rule
Rule
1:
2:
3:
4:
5:
6:
7:
Teradata Workshop
10
Foreign Keys are optional; not all tables have them. Tables that do have them can have multiple
Foreign Keys because a table can relate to multiple other tables. In fact, a table can have an
unlimited number of foreign keys. In the example table below:
The Department Number Foreign Key relates to the Department Number Primary Key in
the Department Table elsewhere in the database.
The Job Code FK relates to the Job Code PK in the Job Code Table, elsewhere in the
database.
Having tables related to each other makes a relational database flexible so that different users
can look up information they need, while simplifying the database administration so the data
doesnt have to be duplicated for each purpose or application.
Rule 2: Unique or Non-Unique FKs
Duplicate Foreign Key values are allowed. More than one employee could be assigned to the
same department.
Teradata Workshop
11
Teradata Workshop
12
Teradata Workshop
13
To check on your understanding of Primary Keys and Foreign Keys, complete this
sentence. According to the relational model, a single table can have either: (Choose
two.)
A. Multiple primary keys.
B. Multiple foreign keys.
C. No primary keys.
D. No foreign keys.
Teradata: A Proven Product
Teradata was built for data warehousing from the start. Teradata Corporation was founded in
Los Angeles, California, and incorporated on July 13, 1979. The corporate goal was to create a
database computer that could handle billions of rows of data, up to and beyond a terabyte of
data storage. The first product, a Teradata Database Computer (DBC/1012), was shipped to the
first customer with the Teradata RDBMS on a proprietary platform in 1984.
Teradata became an open system in 1996, when the Teradata RDBMS Version 2 was ported to
the general-purpose UNIX platform. Since then, Teradata has been ported to Microsoft Windows
NT, then Microsoft Windows 2000.
Today, Teradata is the brand name of NCRs premiere relational database management
system, the foundation for active data warehousing and other NCR solutions.
How Large Is a Trillion?
Teradata was the first commercial database system to support a trillion bytes of data. The origin
of the name Teradata is tera-, which is derived from Greek and means trillion.
The chart below lists the meaning of the prefixes:
Prefix
kilomegagigaterapeta-
Exponent
103
106
109
1012
1015
Meaning
1,000 (thousand)
1,000,000 (million)
1,000,000,000 (billion)
1,000,000,000,000 (trillion)
1,000,000,000,000,000 (quadrillion)
Teradata Workshop
14
exa-
1018
1,000,000,000,000,000,000 (quintillion)
Examples
1
1
1
1
1
Teradata DBC/1012
NCR System 3600
While these systems are older technologies, a few are still in use at some customer sites.
Version 1 has the following characteristics:
Hardware processors are physically cabled to the system (AMPs, IFPs, COPs, and PEPs).
Messages are passed between hardware processors using the Ynet interconnect.
Support for channel-attached and network-attached clients, called hosts.
Runs on the Teradata Operating System (TOS).
Teradata Workshop
15
Virtual processors called vprocs (AMPs and PEs) allow for better resiliency, more
efficient use of system resources, and more flexibility than the hardware processors.
Messages are passed over the BYNET, which allows Teradata to be a more linearly
expandable system. As the database grows, additional nodes may be added without
performance penalties.
Support for channel-attached and network-attached clients.
Runs on UNIX and Microsoft Windows.
NCR
NCR
NCR
NCR
NCR
The remainder of this Web-Based Training course covers the characteristics of Teradata RDBMS
Version 2.
Teradata architecture
A Teradata System
A Teradata system contains one or more nodes. A node is a term for a processing unit under the
control of a single operating system. The node is where the processing occurs for the Teradata
RDBMS. There are two types of Teradata systems:
Symmetric multiprocessing (SMP) - An SMP Teradata system has a single node that
contains multiple CPUs sharing a memory pool.
Massively parallel processing (MPP) - Multiple SMP nodes working together comprise a larger,
MPP implementation of Teradata. The nodes are connected using the BYNET(The BYNET (banyan
network) is a combination of hardware and software that provides high performance networking
between the nodes of a Teradata system. A dual-redundant, bi-directional, multi-staged
network, the BYNET enables the nodes to communicate in a high speed, loosely-coupled
fashion. It is based on banyan topology, a mathematically defined structure that has branches
reminiscent of a banyan tree.), which allows multiple virtual processors on multiple nodes to
communicate with each other.
SMP system: System Console (keyboard and monitor) attached directly to the SMP node
Teradata Workshop
16
o
o
o
To access a Teradata system, a user typically logs on through one of multiple client platforms
(channel-attached mainframes or network-attached workstations). Client access is discussed
in the next module.
Node Components
A node is a basic building block of a Teradata system, and contains a large number of hardware
and software components. A conceptual diagram of a node and its major components is shown
below. Hardware components are shown on the left side of the node and software components
are shown on the right side.
For a description, click on each component.
Notes
PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the
requests into steps. The PEs send the steps to the AMPs and subsequently return the answer
to the client.
A vdisk (pronounced, VEE-disk) is the logical disk space that is managed by an AMP.
Depending on the configuration, a vdisk may not be contained on the node; however, it is
managed by an AMP, which is always a part of the node.
Teradata Workshop
17
The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose
size and configuration vary based on RAID level). The pdisks logically combine to comprise the
AMPs vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP
manages only its own vdisk, not the vdisk of any other AMP.
Question
Which of the following statements is true?
1.
2.
3.
4.
Scalable: As you add more nodes to the system, the overall network bandwidth scales
linearly. This linear scalability means you can increase system size without performance
penaltyand sometimes even increase performance.
High performance: An MPP system typically has two BYNET networks (BYNET 0 and
BYNET 1). Because both networks in a system are active, the system benefits from
having full use of the aggregate bandwidth of both the networks.
Fault tolerant: Each network has multiple connection paths. If the BYNET detects an
unusable path in either network, it will automatically reconfigure that network so all
messages avoid the unusable path. Additionally, in the rare case that BYNET 0 cannot be
reconfigured, hardware on BYNET 0 is disabled and messages are re-routed to BYNET 1.
Hardware: The nodes of an MPP system are connected with the BYNET hardware,
consisting of BYNET boards and cables.
Software: The BYNET software is installed on every node. This BYNET driver is an
interface between the PDE software and the BYNET hardware.
SMP systems do not contain BYNET hardware. The PDE and BYNET software
emulate BYNET activity in a single-node environment. The SMP implementation
is sometimes called boardless BYNET.
Teradata Workshop
18
Point-to-point
Multicast
Broadcast
Point-to-Point Messages
With point-to-point messaging between vprocs, a vproc can send a message to another vproc
on:
Teradata Workshop
19
2.
Within the recipient node, the message is sent to the recipient vproc. This is a
point-to-point communication between vprocs using the PDE and BYNET
software.
Multicast Messages
A vproc can send a message to multiple vprocs using two steps:
1.
2.
Send a broadcast message from the sending node to all nodes. This is a
communication between nodes using the BYNET hardware.
Within the recipient nodes, the PDE and BYNET software determine which, if any, of its
vprocs should receive the message and delivers the message accordingly. This is a
multicast communication between vprocs within the node, using the PDE and BYNET
software.
Broadcast Messages
A vproc can send a message to all the vprocs in the system using two steps:
Teradata Workshop
20
1.
2.
Send a broadcast message from the sending node to all nodes. This is a communication
between nodes using the BYNET hardware.
Within each recipient node, the message is sent to all vprocs. This is a broadcast
communication between vprocs using the PDE and BYNET software.
Questions
What types of messages is BYNET hardware capable of sending between nodes on a system?
(Check all that apply.)
A.
B.
C.
D.
Broadcast
Multicast
Point-to-point
Simulcast
2. When a message is delivered to a node using BYNET hardware and software, PDE software
on the node has the ability to route the message to: (Check all that apply.)
A.
B.
C.
D.
Cliques
A clique (pronounced, kleek) is a group of nodes that share access to the same disk arrays.
Each multi-node system has at least one clique. The cabling determines which nodes are in
which cliquesthe nodes of a clique are connected to the disk array controllers of the same
disk arrays.
Teradata Workshop
21
When the node fails, the Teradata RDBMS restarts across all remaining nodes in the
system.
The vprocs from the failed node migrate to the operational nodes in its clique.
Processing continues while the failed node is being repaired.
Cliques in a System
Vprocs are distributed across all nodes in the system. There are two recommendations for
cliques:
The diagram below shows three cliques. The nodes in each clique are cabled to the same disk
arrays. The overall system is connected by the BYNET.
Teradata Workshop
22
Software Components
A Teradata node requires three distinct pieces of software:
For each node in the system, you need both of the following:
Operating System
Teradata Workshop
23
UNIX MP-RAS
Microsoft Windows
Teradata Software: PE
A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and
the Teradata RDBMS, once a valid session has been established. Each PE can support a
maximum of 120 sessions. The PE handles an incoming request in the following manner:
1.
2.
o
o
o
3.
4.
The Session Control component verifies the request for session authorization (user
names and passwords), and either allows or disallows the request.
The Parser does the following:
Interprets the SQL statement received from the application.
Verifies SQL requests for the proper syntax.
Consults the Data Dictionary to ensure that all objects exist and that the user
has authority to access them.
The Optimizer is parallel aware, meaning that it has knowledge of the system
components (how many nodes, vprocs, etc.). The Optimizer develops the least expensive plan
(in terms of time) to return the requested response set by evaluating alternative plans and
choosing the fastest one. The plan is converted into executable steps and passed on to the
Dispatcher.
The Dispatcher controls the sequence in which the steps are executed and passes the
steps on to the BYNET for execution by the AMPs.
Teradata Workshop
24
5.
6.
After the AMPs process the steps, the PE receives their responses over the BYNET.
The Dispatcher builds a response message and sends the message back to the user.
PE Session Control
When you log on to the Teradata RDBMS through your application, the session control software
on the PE establishes that session. Session control also manages and terminates sessions on
that PE.
PE Parser
The Parser is a component of the PE and performs the following tasks:
Interprets an incoming Teradata SQL request and checks the syntax, evaluating the
request semantically.
Consults the Data Dictionary to ensure that all the objects exist and that the user has
authority to access them.
Decomposes the request into manageable pieces of work called AMP steps.
Processing Requests: Controls the sequence in which the steps are executed and
passes the steps to the AMPs through the BYNET.
Processing Responses: After the AMPs process the steps, the Dispatcher builds a
response message and sends the response back to the user.
In Teradata RDBMS V2R5, the Parser and Dispatcher components are combined into the same
module for better performance, but their functionality remains the same.
Teradata Software: AMP
The AMP is a vproc that controls its portion of the data on the system. The AMPs work in
parallel, each AMP managing the data rows stored on its vdisk. AMPs are involved in data
distribution and data access in different ways.
Teradata Workshop
25
AMP is AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps
from PEs (Parsing Engines) and perform database functions to retrieve or update data. Each
AMP is associated with one virtual disk (vdisk), where the data is stored. An AMP manages only
its own vdisk, not the vdisk of any other AMP.
Data Distribution
When data is loaded, inserted, and updated, the AMP receives incoming data from the PE,
formats rows and distributes them on its vdisk.
Data Access
When data is accessed, the AMP retrieves the rows requested by the PE in the following
manner:
1.
2.
o
o
o
o
o
o
3.
The database management subsystem receives the steps from the Dispatcher over the
BYNET.
The database management subsystem processes the steps. The subsystem on the AMP
can:
Lock databases and tables
Create, modify, or delete definitions of tables
Join tables
Insert, delete, or modify rows within tables
Sort, aggregate, or format data
Retrieve information from definitions and rows from tables
The database management subsystem returns responses over the BYNET to the
Dispatcher.
Teradata Workshop
26
Review Questions
Node Failure causes vprocs to migrate to other nodes.
Bynet Hardware carries the communication between nodes in a system.
Clique is a group of nodes with access to the same disk arrays.
A clique allows processing to continue if an MPP Node has a failed BYNET
A copy of BYNET Hardware is installed on each node in the system.
Client Access
Aim
Explain the relationship between the Teradata RDBMS and its client applications.
Illustrate how the Teradata RDBMS processes a request.
Describe how the clients access the Teradata RDBMS.
Describe the Teradata client utilities and their use.
Users can access data in the Teradata RDBMS through an application on both channel-attached
and network-attached clients. Additionally, the node itself can act as a client. Teradata client
software is installed on each client (channel-attached, network-attached, or node) and
communicates with RDBMS software on the node. You may occasionally hear either type of
client referred to by the legacy term of host, though this term is not typically used in
documentation or product literature.
Channel-Attached Client
Teradata Director Program (TDP) software to manage session traffic, installed on the
channel-attached client.
Teradata Workshop
27
Call-Level Interface (CLI), a library of routines that are the lowest-level interface to
Teradata.
Communication with the Teradata System
Communication from client applications on the mainframe goes through the mainframe
channel, to the Host Channel Adapter on the node, to the Channel Driver software.
The Teradata RDBMS supports network-attached clients connected to the node over a LAN. The
following software components installed on the network-attached client are responsible for
communication between client applications and the Teradata Gateway on a Teradata node:
ODBC
ODBC (Open Database Connectivity) is an application programming standard that defines
common database access mechanisms to simplify the exchange of data between a client and
server. ODBC-compliant applications connect with a database through the use of a driver that
translates the application's ODBC commands into database syntax.
CLIv2
CLIv2 (Call-Level Interface, Version2) is a library of routines that enable an application
program to access data stored in the Teradata RDBMS. When used with network-attached
clients, CLIv2 contains the following components:
Teradata Workshop
28
The node is considered a network-attached client. If you install application software on a node,
it will be treated like an application on a network-attached client. In other words,
communications from applications on the node go through the Teradata Gateway. An application
on a node can be executed through:
Communication from applications on the node (represented by colored balls) goes through the
Teradata Gateway. The node processes these sessions the same way as network-attached
client sessions.
Question
Which of the following can you use to run an application that is installed on a node? (Check all
that apply.)
A. Mainframe terminal
B. Bus terminal
C. System console
D. Network-attached workstation
Request Processing
Query: How many widgets had more than 15% profit margin for Eastern region in
the last quarter?
A request like the one above is processed a little differently, depending on whether the user is
accessing Teradata through a channel-attached or network-attached client:
1.
o
o
2.
3.
4.
5.
6.
7.
8.
SQL request is sent from the client to the appropriate component on the node:
Channel-attached client: request is sent to Channel Driver (through the TDP).
Network-attached client: request is sent to Teradata Gateway (through CLIv2 or
ODBC).
Request is passed to the PE(s).
PEs parse the request into AMP steps.
PE Dispatcher sends steps to the AMPs over the BYNET.
AMPs perform operations on data on the vdisks.
Response is sent back to PEs over the BYNET.
PE Dispatcher receives response.
Response is returned to the client (channel-attached or network-attached).
Mainframe Request Flow
1.
2.
3.
4.
5.
6.
7.
8.
Teradata Workshop
29
o
o
o
o
o
o
o
o
o
Teradata Workshop
30
o
o
o
o
Archive Utilities
ARC
NetVault
NetBackup
ASF2
Query Submitting Utilities
Teradata provides a number of tools that are front-end interfaces for submitting SQL queries.
Two mentioned in this section are BTEQ and Queryman.
BTEQ
BTEQ (Basic Teradata Query) -- often pronounced BEE-teek -- is a Teradata tool used for
submitting SQL queries on all platforms. BTEQ provides the following functionality:
Queryman
Queryman is an information discovery/query tool that runs on Microsoft Windows. Queryman
enables you to access ODBC-based databases (including Teradata). Some of its features include:
Ability to save data in PC-based formats, such as Microsoft Excel, Microsoft Access, and
text files.
History of submitted SQL syntax, to help you build scripts for data mining and
knowledge discovery.
Help with SQL syntax.
Import and export of small amounts of data to and from ODBC-compliant databases. For
tables more than a few thousand rows, the Teradata load utilities are recommended for
more efficiency.
Teradata Workshop
31
Channel-attached client
Network-attached client
Node
FastLoad
MultiLoad
TPump
FastExport
Teradata Warehouse Builder
By default, you can run up to 15 instances FastLoad, MultiLoad, and FastExport in any
combination. There is no limit to the number of concurrent TPump jobs.
FastLoad
Use the FastLoad utility to:
Teradata Workshop
32
FastLoad loads data into an empty table in parallel, using multiple sessions to transfer blocks of
data. FastLoad achieves high performance by fully exploiting the resources of the system. After
the data load is complete, the table can be made available to users.
MultiLoad
Use the MultiLoad utility to maintain tables by:
TPump
Use TPump to:
Teradata Workshop
33
The TPump utility complements MultiLoad as a data loading utility. A major difference is that
TPump uses row hash locks, which eliminates the need for table locks and "batch windows"
typical with MultiLoad. Users can continue to run queries during TPump data loads. In addition,
TPump is designed for smaller volumes of data than MultiLoad, and maintains up to 60 tables
at a time.
TPump has a dynamic throttle that operators can set to specify the percentage of system
resources to be used for an operation. This enables operators to set when TPump should run at
full capacity during low system usage, or within limits when TPump may affect other business
users of Teradata.
FastExport
Use the FastExport utility to transfer either of the following to a file on a client platform:
Table
View
FastExport is a data extract utility. It transfers large amounts of data using block transfers over
multiple sessions to a file on the network-attached or channel-attached client. Typically,
FastExport is run during a batch window, and the tables being exported are locked.
Teradata Workshop
34
Administrative Utilities
Administrative utilities use a graphical user interface (GUI) to monitor and manage various
aspects of a Teradata system.
Teradata Manager
Teradata Manager is a production and performance monitoring system that helps a DBA or
system manager to monitor, control, and administer one or more Teradata systems through a
GUI. Running on LAN-attached clients, Teradata Manager has a variety of tools and
applications to gather, manipulate, and analyze information about each Teradata RDBMS being
administered.
Some tasks that a Teradata Database Administrator (DBA) or system manager can perform
with the associated Teradata Manager applications (shown in bold text) include:
View the overall system performance (and drill down to identify problem areas) through
a graphical user interface - Alert Viewer
View near real-time resource usage data in chart format - Dynamic Utilization
Charting (DUC) (Using PMPC)
Monitor Teradata system performance and perform related production control functions PMON
Monitor, identify, and abort sessions on the Teradata RDBMS - Session Information
Monitor usage of vprocs and run node usage macros - Resource History
Compare performance history over different periods of time - Performance Data
Analyzer
Perform database administration tasks on the Teradata RDBMS - WinDDI
Run many of the Teradata RDBMS console utilities from the Teradata Manager PCRemote Console
Create a log to determine whether an application mix is causing delays because of
database lock contention - Locking Logger
Teradata Manager has a number of other applications that are useful in managing a Teradata
system.
Database Query Manager (DBQM)
Database Query Manager (DBQM) is a query workload management tool that dynamically
tunes the Teradata RDBMS. DBQM can run, suspend, reschedule, or reject a query based on
current workload and set thresholds.
Teradata Workshop
35
For example, with DBQM a request can be scheduled to run periodically or during a specified
time period without an active system connection. Results can be retrieved any time after the
request has been submitted by DBQM and executed.
DBQM can restrict queries based on factors such as:
Environmental factors
DBQM can manage requests based on dynamic environment factors, including database
system CPU and disk utilization, network activity, and number of users.
Archival Utilities
Teradata has utilities specifically designed for data archive and recovery purposes. There are
different utilities for channel-attached clients and network-attached clients.
Archiving on Channel-Attached Clients
The Archive/Recovery (ARC) utility backs up data in a channel-attached (mainframe) client
environment. It supports commands written in Job Control Language (JCL). It is scalable and
parallel, and can run on a channel-attached client or a node.
Teradata Workshop
36
NetVault and NetBackup have modules created for Teradata systems for use in a scalable,
parallel, enterprise environment. They run on network-attached clients or a node (Microsoft
Windows or UNIX MP-RAS).
Some legacy Teradata systems may also use the ASF2 (Archive Storage Facility 2) utility for
backup and restore functions on a UNIX platform.
In Teradata RDBMS V2R5.0, the name of DBQM has been changed to Teradata Dynamic Query
Manager.
Questions
Select the appropriate Teradata load or unload utility from the pull-down menus.
TPump Maintains up to 60 tables at a time.
FastExport Data extract utility.
MultiLoad Updates, inserts, or deletes empty or populated tables.
FastLoad Uses parallel processing to load an empty table.
Which of the following statements are true? (Choose two.)
A. There are multiple Teradata utilities available for archiving data using a channel-attached
client.
B. The two utilities used for Teradata system management are DBQM and Queryman.
C. BTEQ runs on all client platforms to access the Teradata RDBMS.
D. ASF2 is used on legacy systems on a single environment.
E. NetVault and NetBackup are utilities used for network management.
TERADATA SQL
Aim of the module
Teradata Workshop
37
Teradata is accessed using SQL (Structured Query Language). SQL is the industry standard
access language for communicating with a relational database. It is a set-oriented language
included in the relational model. A user or application can use SQL statements to perform
operations on the data and define how an answer set should be returned from an RDBMS.
Teradata supports two types of SQL:
Generic SQL: Teradata SQL is compliant with ANSI standards (an industry standard).
Teradata SQL Extensions: NCR has added Teradata SQL extensions above and beyond
standard SQL capabilities, including one-step SQL statements for complex administrative
operations.
Teradata SQL Benefits
Teradata SQL is the set of SQL commands used with the Teradata RDBMS. Some benefits of
Teradata SQL are:
Parallel Execution - The Optimizer breaks up an SQL statement into tasks that can be
executed in parallel to minimize resource contention. The design of the Teradata
RDBMS, along with its automatic data distribution, balances the workload and reduces
bottlenecks.
ANSI Compliant - Teradata SQL is compliant with ANSI standards. If you have
programs already written with ANSI-compliant SQL for a different relational database,
you can run them with Teradata, as well.
High-Performance Extensions - NCR has added Teradata SQL extensions that are
above and beyond the standard SQL capabilities, including one-step SQL statements for
complex administrative operations.
Types of SQL Statements
SQL statements commonly are categorized as follows:
Teradata Workshop
38
Data Manipulation Language (DML) is used to work with data, including tasks such as inserting
data into a table, updating an existing record, or performing queries.
Examples:
SELECT - Perform relation query functions (Select, Project, Join, Union, Intersect,
Minus).
INSERT - Place a new row into a table.
UPDATE - Modify values in an existing row.
DELETE - Remove a row from a table.
Data Control Language (DCL)
The SELECT Statement
The SELECT statement is the most commonly used SQL statement. It is a DML
statement that allows you to retrieve data from one or more tables. In its most common form,
you specify certain rows to be returned as shown.
SELECT *
FROM employee
WHERE department_number = 401;
The asterisk, "*", is a "wild card" character. In this example, it specifies that when the result is
displayed, we want to see all the columns of the rows where the department number is 401.
The FROM clause specifies from which table in our database to retrieve the rows. The WHERE
clause acts as a filter that passes only rows meeting the specified condition -- in this case, rows
of employees in department 401.
NOTE: SQL does not require a trailing semicolon to end a statement, but the Basic Teradata
Query (BTEQ) utility that can be used to enter SQL statements does. The semicolon is used in
the examples, as if it were entered in BTEQ.
If you do not specify a WHERE clause, the query would return all columns and all rows from the
employee table, for example:
SELECT * FROM employee;
EMPLOYEE_ NUMBER
NUMBER
JOB_ CODE
BIRTH_ DATE
1006
1019
531015
1008
1019
580517
1005
0801
550910
1004
1003
460423
1007
1005
370131
1003
0801
470619
DEPARTMENT_
HIRE_ DATE
John
761015
Carol
770201
Loretta
761015
Darlene
761015
Arnando
770102
James
760731
Teradata Workshop
39
SELECT
employee_number
, hire_date
, last_name
, first_name
FROM
employee
WHERE
department_number = 401;
Unsorted Results
Results include the columns named in the SQL statement. The results are unsorted unless you
specify that you want them sorted in a certain way. How to retrieve ordered results is covered in
the following section.
employee_number
1004
1003
1013
1010
1022
1001
1002
hire_date
76/10/15
76/07/31
77/04/01
77/03/01
79/03/01
76/06/18
76/07/31
last_name
Johnson
Trader
Phillips
Rogers
Machado
Hoover
Brown
first_name
Darlene
James
Charles
Frank
Albert
William
Alan
hire_date;
Sort Order
Using this example, results are returned in ascending order. If a sort order is not specified, we
get results in ascending order by default. To specify ascending or descending order, add ASC or
DESC to the end of your ORDER BY clause. The following is an example of specifying the results
in ascending order.
SELECT
,last_name
,first_name
,hire_date
FROM
WHERE
ORDER BY
employee_number
employee
department_number = 401
hire_date ASC;
Output
employee_number
1001
1003
1002
1004
1010
1013
1022
hire_date
76/06/18
76/07/31
76/07/31
76/10/15
77/03/01
77/04/01
79/03/01
last_name
Hoover
Trader
Brown
Johnson
Rogers
Phillips
Machado
first_name
William
James
Alan
Darlene
Frank
Charles
Albert
Naming
Teradata Workshop
40
Specify the column to sort on by either naming it directly (for example, hire_date) or by naming
its position within the SELECT statement. Since hire_date is the fourth column in the SELECT
clause, the following SQL statement is equivalent to the one in the example above:
ORDER BY 4 ASC;
Data Control Language (DCL) is used for administrative tasks such as granting and revoking
privileges to database objects or controlling ownership of those objects.
Examples:
GRANT - Give user privileges.
REVOKE - Remove user privileges.
GIVE - Transfer database ownership.
User Assistance Statements and Modifiers
SQL user assistance statements (and modifiers) vary widely from database vendor to
database vendor. Teradata's user assistance statements are commonly called Teradata
extensions. These Teradata extensions are additions to the DDL, DML, and DCL statements in
standard SQL, and make some operations less time consuming.
This page discusses the following Teradata SQL user assistance commands:
HELP
HELP SESSION
SHOW
EXPLAIN
Teradata Workshop
41
TABLE
VIEW
MACRO
TRIGGER
PROCEDURE
JOIN INDEX
Example:
SHOW TABLE tablename
Displays the CREATE TABLE statement that was used to create the specified table.
Example Output of SHOW TABLE statement.
Teradata Workshop
42
English text describing a plan for how the statement will be processed.
An estimate of the number of rows involved.
A relative cost of the request.
The relative cost is shown in units of time, and should not be used to predict actual response
time for an SQL request. This time estimate can be used to compare the duration of request
processing relative to other plans.
When you execute a request preceded by the EXPLAIN modifier, the request is not executed.
Instead, the system:
Example:
EXPLAIN
SELECT
FROM tablename;
What Is Teradata?
Data warehousing is a process for properly assembling and managing data from
various servers to answer business-critical questions. Teradata is ideal for enterprise
data warehousing, which is commonly characterized by:
Teradata Workshop
43
The active data warehouse extends a companys ability beyond historical data and
strategic decisions to bring the decision-making capability to front-line personnel. The
tactical decisions such as, Who should get the empty seat on this airplane? or What
should I offer this customer to keep her from leaving, based on her history with our
company? can be made more effectively with the right information.
With an active data warehouse, employees who interact directly with customers and
suppliers are empowered with information-based decision making at their fingertips.
The Teradata Warehouse supports active data warehousing with:
The NCR CRM solution consists of software, professional and customer services, and
Teradata Workshop
44
Teradata provides a single repository for customer information that helps EBusinesses build and maintain one-to-one customer relationships that are critical to
their success on the Internet. Teradata supports the fast-paced style of E-Business by
allowing many concurrent users to ask complicated questions as they think of them
and get quick answers.
Teradata allows E-Businesses to:
Data marts
A Teradata system may start out small as a data mart, and be easily expanded with
linear scalability to include more subject areas, applications, and users.
A data mart is a special purpose subset of a companys enterprise data used by a
particular department, function, or application. Often, these single-subject area data
marts contain data that was aggregated or transformed in some way to better handle
the requests of a specific user community. Vendors implement data marts using
different architectures:
Independent and dependent data marts are architectures endorsed by other database
vendors and tend to be associated with higher maintenance costs for physically
moving and maintaining the data, inconsistent data (and resulting inconsistent
decisions), and indirect ways to get the complete picture of the data. Teradata is ideal
for the logical data mart environment, where different user communities view subsets
of a single repository of enterprise data.
Q What do you think are some Teradata features that make it so successful in todays
business environment? (Details on the following are coming up next.)
Top of Form
Teradata Workshop
45
Scalability.
Single data store.
High degree of parallelism.
Ability to model the business.
All of the above.
Bottom of Form
Teradata acts as a single data store, with multiple client applications making inquiries
against it concurrently.
Instead of replicating a database for different purposes, with Teradata you store the data
once and use it for all clients. Teradata provides the same connectivity for an entry-level
system as it does for a massive enterprise data warehouse.
Scalability
Unconditional Parallelism
Teradata Workshop
46
A data warehouse built on a business model contains information from across the
enterprise. Individual departments can use their own assumptions and views of the
data for analysis, yet these varying perspectives have a common basis for a single
version of the truth. Companies can get a cohesive view of their operations across
functional areas to:
You get consistent answers from the different viewpoints above using a single
business model, not functional models for different departments. In a functional
model, data is organized according to what is done with it. But what happens if users
later want to do some analysis that has never been done before? When a system is
optimized for one departments function, the other departments needs (and future
needs) may not be met.
A Teradata system allows the data to represent a business model, with data organized
according to what it is, not what it does. The data model is the same regardless of
data volume. With Teradata as the enterprise data warehouse, users can ask new
questions of the data that were never anticipated, throughout the business cycle and
even through changes in the business environment.
Teradata Workshop
47
The Teradata RDBMS is a relational database. Relational databases are based on the
relational model, which has its foundation in the mathematical theory of sets. The
relational model uses and extends many principles of set theory to provide a
disciplined approach to data management.
Relational databases present data as of a set of tables. A table is a two-dimensional
representation of data that consists of rows and columns. According to the relational
model, a valid table does not have to be populated with data rows, it just needs to be
defined with at least one column.
Tables are logically related to each other by a common field, so information such as
customer telephone numbers and addresses can exist in one table, yet be accessible
for multiple purposes. The example below shows customer, order, and billing
statement data, related by a common field. The common field of Customer ID lets you
look up information such as a customer name for a particular statement number, even
though the data exists in two different tables.
Rows
Each row contains all the columns in the table, so each table has only one row format.
The sequence of rows is arbitrary and does not imply priority, hierarchy, or
significance.
Each row represents an occurrence of an entity defined by the table. An entity is a
person, place, or thing about which the table contains information. In this example,
the entity is the employee and each row represents a single employee.
Teradata Workshop
48
Columns
Each column contains like data, such as only part names, or only supplier names, or
only employee numbers. In the example below, the Last_Name column contains last
names only, and nothing else. The data in the columns is atomic data(Data that is
indivisible. Atomic data cannot be divided into smaller units that have meaning. For
example, a column that has both Last_Name and First_Name in it is not atomic,
because it can be divided further into components)., so a telephone number might be
divided into three columns: the area code, the prefix, and the suffix, so the customer
data can be analyzed according to area code, etc. Missing data values would be
represented by nulls. Within a table, the column position is arbitrary.
Primary Key
In the relational model (A set of principles for relational databases, formalized by Dr.
E.F. Codd in the late 1960s. The relational model of data prescribes how the data
should be represented in terms of structure, integrity, and manipulation. The rules of
the relational model are widely accepted in the information technology industry,
though actual implementations may vary.), a Primary Key (PK) is used to designate a
unique identifier for each row when you design a database. A Primary Key can be
composed of one or more columns. In the example below, the Primary Key is the
employee number.
Teradata Workshop
49
1:
2:
3:
4:
5:
6:
Rule 2: Unique PK
Within the column(s) designated as the Primary Key, the values in each row must be
unique. No duplicate values are allowed. The Primary Keys purpose is to uniquely
Teradata Workshop
50
identify a row. In a multi-column Primary Key, the combined value of the columns
must be unique, even if an individual column in the Primary Key has duplicate values.
Teradata Workshop
51
Teradata Workshop
52
Foreign Key
A Foreign Key (FK) is an identifier that links related tables. A Foreign Key defines how
two tables are related to each other. Each Foreign Key references a matching Primary
Key in another table in the database. For example, in the table below, the Department
Number column that is a Foreign Key actually exists in another table as a Primary Key.
Having tables related to each other gives users the flexibility to look at the data in
different ways, without the database administrator having to manage and maintain
many tables of duplicate data for different applications.
Foreign Key Rules
Rules governing how Foreign Keys must be defined and how they operate are:
Rule
Rule
Rule
Rule
Rule
Rule
Rule
1:
2:
3:
4:
5:
6:
7:
Teradata Workshop
53
Having tables related to each other makes a relational database flexible so that
different users can look up information they need, while simplifying the database
administration so the data doesnt have to be duplicated for each purpose or
application.
Rule 2: Unique or Non-Unique FKs
Duplicate Foreign Key values are allowed. More than one employee could be assigned
to the same department.
Teradata Workshop
54
Teradata Workshop
55
Teradata Workshop
56
Each Foreign Key must exist as a Primary Key in the related table. A department number that
does not exist in the Department Table would be invalid as a Foreign Key value in the Employee
Table.
This rule can apply even if the Foreign Key is NULL, or missing. Remember, a missing value is
defined as a non-value; there is no value present. So the rule could be better stated: if a value
exists in the Foreign Key column, it must match a Primary Key value in the related table.
To check on your understanding of Primary Keys and Foreign Keys, complete this
sentence. According to the relational model, a single table can have either:
(Choose two.)
A. Multiple primary keys.
B. Multiple foreign keys.
C. No primary keys.
D. No foreign keys.
Teradata: A Proven Product
Teradata was built for data warehousing from the start. Teradata Corporation was
founded in Los Angeles, California, and incorporated on July 13, 1979. The corporate
goal was to create a database computer that could handle billions of rows of data, up
to and beyond a terabyte of data storage. The first product, a Teradata Database
Computer (DBC/1012), was shipped to the first customer with the Teradata RDBMS on
a proprietary platform in 1984.
Teradata became an open system in 1996, when the Teradata RDBMS Version 2 was
ported to the general-purpose UNIX platform. Since then, Teradata has been ported to
Microsoft Windows NT, then Microsoft Windows 2000.
Today, Teradata is the brand name of NCRs premiere relational database management
system, the foundation for active data warehousing and other NCR solutions.
How Large Is a Trillion?
Teradata Workshop
57
Teradata was the first commercial database system to support a trillion bytes of data.
The origin of the name Teradata is tera-, which is derived from Greek and means
trillion.
The chart below lists the meaning of the prefixes:
Prefix
Exponent
Meaning
kilo-
10
mega-
10
1,000,000 (million)
giga-
10
1,000,000,000 (billion)
tera-
10
12
1,000,000,000,000 (trillion)
peta-
10
15
1,000,000,000,000,000 (quadrillion)
exa-
10
18
1,000,000,000,000,000,000 (quintillion)
3
6
1,000 (thousand)
Examples
1
1
1
1
Teradata Workshop
58
Teradata DBC/1012
NCR System 3600
While these systems are older technologies, a few are still in use at some customer
sites. Version 1 has the following characteristics:
Hardware processors are physically cabled to the system (AMPs, IFPs, COPs,
and PEPs).
Messages are passed between hardware processors using the Ynet
interconnect.
Support for channel-attached and network-attached clients, called hosts.
Runs on the Teradata Operating System (TOS).
Virtual processors called vprocs (AMPs and PEs) allow for better resiliency,
more efficient use of system resources, and more flexibility than the hardware
processors.
Messages are passed over the BYNET, which allows Teradata to be a more
linearly expandable system. As the database grows, additional nodes may be
added without performance penalties.
Support for channel-attached and network-attached clients.
Runs on UNIX and Microsoft Windows.
NCR
NCR
NCR
NCR
NCR
Teradata architecture
A Teradata System
A Teradata system contains one or more nodes. A node is a term for a processing unit
under the control of a single operating system. The node is where the processing
occurs for the Teradata RDBMS. There are two types of Teradata systems:
Massively parallel processing (MPP) - Multiple SMP nodes working together comprise a
larger, MPP implementation of Teradata. The nodes are connected using the
Teradata Workshop
59
SMP system: System Console (keyboard and monitor) attached directly to the
SMP node
o
o
o
To access a Teradata system, a user typically logs on through one of multiple client
platforms (channel-attached mainframes or network-attached workstations). Client
access is discussed in the next module.
Node Components
A node is a basic building block of a Teradata system, and contains a large number of
hardware and software components. A conceptual diagram of a node and its major
components is shown below. Hardware components are shown on the left side of the
node and software components are shown on the right side.
For a description, click on each component.
Teradata Workshop
60
The Teradata vprocs (which are the PEs and AMPs) share the components of the nodes
(memory and cpu). The main component of the shared-nothing architecture is that
each AMP manages its own dedicated portion of the systems disk space (called the
vdisk) and this space is not shared with other AMPs. Each AMP uses system resources
independently of the other AMPs so they can all work in parallel for high system
performance overall.
Notes
PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the
requests into steps. The PEs send the steps to the AMPs and subsequently return the answer to
the client.
A vdisk (pronounced, VEE-disk) is the logical disk space that is managed by an AMP.
Depending on the configuration, a vdisk may not be contained on the node; however, it is
managed by an AMP, which is always a part of the node.
The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose
size and configuration vary based on RAID level). The pdisks logically combine to comprise the
AMPs vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP
manages only its own vdisk, not the vdisk of any other AMP.
Question
Which of the following statements is true?
5.
6.
7.
8.
Teradata Workshop
61
Scalable: As you add more nodes to the system, the overall network
bandwidth scales linearly. This linear scalability means you can increase
system size without performance penaltyand sometimes even increase
performance.
Fault tolerant: Each network has multiple connection paths. If the BYNET
detects an unusable path in either network, it will automatically reconfigure
that network so all messages avoid the unusable path. Additionally, in the rare
case that BYNET 0 cannot be reconfigured, hardware on BYNET 0 is disabled
and messages are re-routed to BYNET 1.
Hardware: The nodes of an MPP system are connected with the BYNET
hardware, consisting of BYNET boards and cables.
Software: The BYNET software is installed on every node. This BYNET driver
is an interface between the PDE software and the BYNET hardware.
SMP systems do not contain BYNET hardware. The PDE and BYNET
software emulate BYNET activity in a single-node environment. The
SMP implementation is sometimes called boardless BYNET.
Teradata Workshop
62
Point-to-point
Multicast
Broadcast
Point-to-Point Messages
With point-to-point messaging between vprocs, a vproc can send a message to
another vproc on:
Teradata Workshop
63
Multicast Messages
A vproc can send a message to multiple vprocs using two steps:
3.
4.
Send a broadcast message from the sending node to all nodes. This is a
communication between nodes using the BYNET hardware.
Within the recipient nodes, the PDE and BYNET software determine which, if
any, of its vprocs should receive the message and delivers the message
accordingly. This is a multicast communication between vprocs within the
node, using the PDE and BYNET software.
Broadcast Messages
A vproc can send a message to all the vprocs in the system using two steps:
3.
Send a broadcast message from the sending node to all nodes. This is a
communication between nodes using the BYNET hardware.
Teradata Workshop
64
4.
Within each recipient node, the message is sent to all vprocs. This is a
broadcast communication between vprocs using the PDE and BYNET software.
Questions
What types of messages is BYNET hardware capable of sending between nodes on a system?
(Check all that apply.)
A.
B.
C.
D.
Broadcast
Multicast
Point-to-point
Simulcast
2. When a message is delivered to a node using BYNET hardware and software, PDE software on
the node has the ability to route the message to: (Check all that apply.)
A.
B.
C.
D.
Cliques
A clique (pronounced, kleek) is a group of nodes that share access to the same disk arrays.
Each multi-node system has at least one clique. The cabling determines which nodes are in
which cliquesthe nodes of a clique are connected to the disk array controllers of the same disk
arrays.
Teradata Workshop
65
When the node fails, the Teradata RDBMS restarts across all remaining nodes
in the system.
The vprocs from the failed node migrate to the operational nodes in its clique.
Processing continues while the failed node is being repaired.
Cliques in a System
Vprocs are distributed across all nodes in the system. There are two recommendations
for cliques:
The diagram below shows three cliques. The nodes in each clique are cabled to the
same disk arrays. The overall system is connected by the BYNET.
Software Components
Teradata Workshop
66
For each node in the system, you need both of the following:
Operating System
UNIX MP-RAS
Microsoft Windows
Teradata Workshop
67
The Parallel Database Extensions (PDE) software layer was added to the operating
system by NCR to support the parallel software environment.
Teradata Software: PE
A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the
Teradata RDBMS, once a valid session has been established. Each PE can support a maximum of
120 sessions. The PE handles an incoming request in the following manner:
7.
The Session Control component verifies the request for session authorization (user
names and passwords), and either allows or disallows the request.
8. The Parser does the following:
o Interprets the SQL statement received from the application.
o Verifies SQL requests for the proper syntax.
o Consults the Data Dictionary to ensure that all objects exist and that the user
has authority to access them.
9. The Optimizer is parallel aware, meaning that it has knowledge of the system
components (how many nodes, vprocs, etc.). The Optimizer develops the least
expensive plan (in terms of time) to return the requested response set by evaluating
alternative plans and choosing the fastest one. The plan is converted into executable
steps and passed on to the Dispatcher.
10. The Dispatcher controls the sequence in which the steps are executed and passes the
steps on to the BYNET for execution by the AMPs.
Teradata Workshop
68
11. After the AMPs process the steps, the PE receives their responses over the BYNET.
12. The Dispatcher builds a response message and sends the message back to the user.
PE Session Control
When you log on to the Teradata RDBMS through your application, the session control software
on the PE establishes that session. Session control also manages and terminates sessions on
that PE.
PE Parser
The Parser is a component of the PE and performs the following tasks:
Interprets an incoming Teradata SQL request and checks the syntax, evaluating the
request semantically.
Consults the Data Dictionary to ensure that all the objects exist and that the user has
authority to access them.
Decomposes the request into manageable pieces of work called AMP steps.
Sends the optimized steps to the Dispatcher.
PE Optimizer
The Optimizer component of the PE determines how a request is executed by the system. The
Optimizer is parallel aware, meaning that it accounts for the parallel environment in which the
AMP steps are processed. The Optimizer develops the least expensive plan (in terms of time
and system resources) for returning the requested response. Processing alternatives are
evaluated, and the fastest alternative is chosen. The selected alternative is converted to
executable steps that will performed by the AMPs.
PE Dispatcher
The Dispatcher is responsible for a number of tasks, depending on the operation it is
performing:
Processing Requests: Controls the sequence in which the steps are executed and
passes the steps to the AMPs through the BYNET.
Processing Responses: After the AMPs process the steps, the Dispatcher builds a
response message and sends the response back to the user.
In Teradata RDBMS V2R5, the Parser and Dispatcher components are combined into the same
module for better performance, but their functionality remains the same.
Teradata Software: AMP
The AMP is a vproc that controls its portion of the data on the system. The AMPs work in
parallel, each AMP managing the data rows stored on its vdisk. AMPs are involved in data
distribution and data access in different ways.
Teradata Workshop
69
AMP is AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps from
PEs (Parsing Engines) and perform database functions to retrieve or update data. Each AMP is
associated with one virtual disk (vdisk), where the data is stored. An AMP manages only its own
vdisk, not the vdisk of any other AMP.
Data Distribution
When data is loaded, inserted, and updated, the AMP receives incoming data from the PE,
formats rows and distributes them on its vdisk.
Data Access
When data is accessed, the AMP retrieves the rows requested by the PE in the following
manner:
4.
5.
6.
The database management subsystem receives the steps from the Dispatcher over the
BYNET.
The database management subsystem processes the steps. The subsystem on the AMP
can:
o Lock databases and tables
o Create, modify, or delete definitions of tables
o Join tables
o Insert, delete, or modify rows within tables
o Sort, aggregate, or format data
o Retrieve information from definitions and rows from tables
The database management subsystem returns responses over the BYNET to the
Dispatcher.
Teradata Workshop
70
Review Questions
Node Failure causes vprocs to migrate to other nodes.
Bynet Hardware carries the communication between nodes in a system.
Clique is a group of nodes with access to the same disk arrays.
A clique allows processing to continue if an MPP Node has a failed BYNET
A copy of BYNET Hardware is installed on each node in the system.
Client Access
Aim
Explain the relationship between the Teradata RDBMS and its client applications.
Channel-Attached Client
Teradata Director Program (TDP) software to manage session traffic, installed on the
channel-attached client.
Call-Level Interface (CLI), a library of routines that are the lowest-level interface to
Teradata.
Teradata Workshop
71
Communication from applications on the mainframe (represented by colored balls) goes through
the Channel Driver software, installed on a node.
Network Attached Client
Node
Teradata Workshop
72
Teradata Workshop
73
Client Access
Processing Environments
An RDBMS is used in two main processing environments:
Decision Support
Transaction Processing
Decision Support
In a decision support environment, users submit requests to analyze historical detail
data stored in the tables. The results are used to establish strategies, reveal trends,
and make projections. A database used as a decision support system (DSS) usually
receives fewer, very complex, ad-hoc queries and may involve numerous tables.
Transaction Processing
Teradata Workshop
74
Client Connections
Users can access data in the Teradata RDBMS through an application on both channelattached and network-attached clients. Additionally, the node itself can act as a client.
Teradata client software is installed on each client (channel-attached, networkattached, or node) and communicates with RDBMS software on the node. You may
occasionally hear either type of client referred to by the legacy term of host, though
this term is not typically used in documentation or product literature.
Channel-Attached Client
Teradata Workshop
75
The Teradata RDBMS supports network-attached clients connected to the node over a
LAN. The following software components installed on the network-attached client are
responsible for communication between client applications and the Teradata Gateway
on a Teradata node:
ODBC
CLIv2
Communication with the Teradata System
Communication from applications on the network-attached client goes over the LAN,
to the Ethernet card on the node, to the Teradata Gateway software.
Node
Teradata Workshop
76
Request Processing
How many widgets had more than 15% profit margin in the eastern region last month
A request like the one above is processed a little differently, depending on whether
the user is accessing Teradata through a channel-attached or network-attached client:
SQL request is sent from the client to the appropriate component on the node:
Channel-attached client: request is sent to Channel Driver (through the TDP).
Network-attached client: request is sent to Teradata Gateway (through CLIv2 or
ODBC).
Request is passed to the PE(s).
PEs parse the request into AMP steps.
PE Dispatcher sends steps to the AMPs over the BYNET.
AMPs perform operations on data on the vdisks.
Response is sent back to PEs over the BYNET.
PE Dispatcher receives response.
Response is returned to the client (channel-attached or network-attached).
Teradata Workshop
77
Teradata provides a number of tools that are front-end interfaces for submitting SQL
queries. Two mentioned in this section are BTEQ and Queryman.
BTEQ
BTEQ (Basic Teradata Query) -- often pronounced BEE-teekis a Teradata tool used
for submitting SQL queries on all platforms. BTEQ provides the following functionality:
Standard report writing and formatting
Basic import and export of small amounts data to and from the Teradata RDBMS
across all platforms. For tables more than a few thousand rows, the Teradata load
Teradata Workshop
78
Queryman
Queryman is an information discovery/query tool that runs on Microsoft Windows.
Queryman enables you to access ODBC-based databases (including Teradata). Some
of its features include:
Ability to save data in PC-based formats, such as Microsoft Excel, Microsoft Access,
and text files.
History of submitted SQL syntax, to help you build scripts for data mining and
knowledge discovery.
Help with SQL syntax.
Import and export of small amounts of data to and from ODBC-compliant databases.
For tables more than a few thousand rows, the Teradata load utilities are
recommended for more efficiency.
Teradata Workshop
79
Teradata Workshop
80
FastExport
Teradata Warehouse Builder
By default, you can run up to 15 instances FastLoad, MultiLoad, and FastExport in any
combination. There is no limit to the number of concurrent TPump jobs.
FastLoad
Use the FastLoad utility to:
Load data into empty tables
Delete all rows from a populated table
FastLoad loads data into an empty table in parallel, using multiple sessions to transfer
blocks of data. FastLoad achieves high performance by fully exploiting the resources
of the system. After the data load is complete, the table can be made available to
users.
MultiLoad
Use the MultiLoad utility to maintain tables by:
Inserting rows into a populated or empty table
Updating rows in a table
Deleting multiple rows from a table
MultiLoad can load multiple input files concurrently and work on up to five tables at a
time, using multiple sessions. MultiLoad is optimized to apply multiple rows in blocklevel operations. MultiLoad usually is run during a batch window, and places a lock on
on the destination table(s) to prevent user queries from getting inconsistent results
before the data load or update is complete.
TPump
Teradata Workshop
81
FastExport
Use the FastExport utility to transfer either of the following to a file on a client
platform:
Table
View
FastExport is a data extract utility. It transfers large amounts of data using block
transfers over multiple sessions to a file on the network-attached or channel-attached
client. Typically, FastExport is run during a batch window, and the tables being
exported are locked.
Teradata Workshop
82
Administrative Utilities
Administrative utilities use a graphical user interface (GUI) to monitor and manage
various aspects of a Teradata system.
Teradata Manager
Teradata Manager is a production and performance monitoring system that helps a
DBA or system manager to monitor, control, and administer one or more Teradata
systems through a GUI. Running on LAN-attached clients, Teradata Manager has a
variety of tools and applications to gather, manipulate, and analyze information about
each Teradata RDBMS being administered.
For examples of Teradata Manager functions, click here: Teradata Manager Examples
Teradata Workshop
83
Teradata Workshop
84
In Teradata RDBMS V2R5.0, the name of DBQM has been changed to Teradata
Dynamic Query Manager.
Archival Utilities
Teradata has utilities specifically designed for data archive and recovery purposes.
There are different utilities for channel-attached clients and network-attached clients.
Archiving on Channel-Attached Clients
The Archive/Recovery (ARC) utility backs up data in a channel-attached (mainframe)
client environment. It supports commands written in Job Control Language (JCL). It is
scalable and parallel, and can run on a channel-attached client or a node.
Teradata Workshop
85
Teradata Workshop
86