Sie sind auf Seite 1von 89

Introduction to Teradata

Teradata Architecture

LEVEL – LEARNER
Icons Used

Hands-on Exercise Reference Questions Points To Ponder

Coding Standards Lend A Hand Summary Test Your Understanding

2
Module 1: Teradata basics

Objectives:
After completing this chapter you will be able to answer below questions
• What is Teradata?
• What are the unique features of Teradata?
• What are Teradata components and its functions?
• What is Teradata Architecture?
Introduction to Teradata Database

 Teradata is a relational database management system that drives company’s data warehouse
Compatible with Industry standards (ANSI Complaint)
 The architecture supports both single-node, Symmetric Multiprocessing (SMP) systems and
multinode,. Massively Parallel Processing (MPP) systems
 It uses parallelism to manage terabytes of data
 It is built on a parallel Architecture
 Its scalability ranges from 10GB to 100+TB of data
 Teradata runs on UNIX MP –RAS, Windows 2000 server platform
 It is capable of supporting many concurrent users from various platforms
 Over TCP/IP or IBM channel connection
Unique Features of Teradata

• Parallel processing
– Each AMP holds a portion of the data and they them in parallel
• Linear Scalability
– Double the AMPS and double the speed
• Mature Optimizer
– PE is the Matured optimizer
• Automatic Data distribution
– Each table has Primary index which is hashed and distributes to AMP
automatically
• Shared Nothing Architecture
– Each AMP has their own Memory, CPU and disk, so called shared Nothing
Architecture
• Single Data Store
– Teradata scalability allows all data to be on one system. This is Single data store
Teradata –Parallel processing

• The rows of a Teradata table are spread across the AMPs, so each AMP can
then process in parallel when a USER queries the table.

Parsing engine
(PE)

BYNET
Teradata – Linear Scalability

Teradata Systems can Add AMPs for Linear Scalability


Linear Scalability means if you double your AMPs and their supporting
nodes the performance doubles!
Teradata Architecture

Teradata Components
• Parsing engine (PE)
• BYNET (BanYan NETwork)
• AMP
• Disk
What is a Node?

• Gateway and Channel-drive software run as processes.


• Users connecting via the Mainframe access Teradata though the Channel
and all other users utilize the LAN gateway.
• The Parallel Database Extension (PDE) controls the Access Module
Processors (AMPs) and Parsing Engines (PEs) which are referred to as
Virtual Processors (Vprocs) and they reside in the nodes memory.
• The operating system running the node is Linux.
Node

Each Node is attached via a Network to a Disk Farm


• A Teradata AMP will be assigned a Virtual disk to store its tables and the
rows .
• Only the AMP assigned to the virtual disk can read or write to that disk.
• A node holds 40-50 AMPs.
Number of Nodes and Amps

Query to identify number of nodes in Teradata server


SELECT NodeID FROM dbc.ResUsageSPma
GROUP BY 1
Query to identify number of AMPs in Teradata server
SELECT Vproc FROM dbc.diskspace
GROUP BY 1
SMP Node

• SMP stands for symmetric multi-processing which means each CPU


processor performs equally, and all CPUs share a pool of memory and
operate under one operating system.
MPP

• Two SMP nodes connected via the BYNETs are now one Massively Parallel
Processing (MPP) system.
Teradata Functional Overview

Picture depicts LAN Connections for Network Attached Client


Teradata Functional Overview

Picture depicts Mainframe connection to Teradata


Parsing Engine

• When a user logs into Teradata, a PE will log them in and be responsible
for their entire session
• The PE checks the SQL Syntax
• The PE creates the EXPLAIN plan checks security and builds a plan for the
AMPs to follow. Hence PE is also known as ‘Optimizer’.
• The PE converts EBCDIC (from the mainframe queries) to ASCII on the way
in and the AMPs are responsible for converting from ASCII to EBCDIC on
the way out.
• The PE always delivers the final answer set to the user.

The Parsing Engine's biggest responsibility is building a parallel-


aware, cost-based plan for the AMPs to follow to retrieve the data
Parsing Engine Components

Parsing Engine Process


Elements

• Manages session activities, such as logon, password


Session Control validation, and logoff.
• Recovers sessions following client or server failures.

Parser • Decomposes SQL into relational data management


processing steps.
Optimizer • Determines the most efficient path to access data.
• Receives processing steps from the parser and sends
Dispatcher them to the appropriate AMPs via the BYNET.
• Monitors the completion of steps and handles errors
encountered during processing.
How does PE builds best plan?

The PE uses the COLLECTED STATISTICS to build the best plan (least cost
plan).

Collect stats defines the confidence level of PE in estimating "how many


rows it is going to access ? how many unique values does a table have , null
values and all this info is stored in data dictionary. Once you submit a query in
Teradata, the parsing engine checks if the stats are available for the requested
table , if it has collected stats earlier PE generates a plan with "high
confidence" . in absence of collect stats plan will be with "low confidence" in
data dictionary
BYNET

• BYNET connects PE and AMP for passing various instructions and corresponding outputs.
• In Teradata system, there are two BYNET systems viz. ‘BYNET 0’ and ‘BYNET 1’. This is
because, in case one BYNET fails, the other one carries the instruction. It also fastens
communication and hence enhances query performance.
• Symmetric Multiprocessing Node (SMP) – It has Boardless BYNET and no Physical BYNET
• Massively Parallel Processing system (MPP) - Nodes are connected by then two physical
BYNET boards.
• BYNET is responsible for Broadcast, multicast and point –to – point communications between
nodes and virtual processors.
AMP

• AMPS are responsible for storing and retrieving rows from their assigned disk (Vdisk).
• AMPs lock the tables and rows.
• AMPs sort rows and do all aggregation.
• AMPs handle all space management and space accounting.
• AMPs convert ASCII to EBCDIC when returning answer sets to the mainframe.
• In Teradata 13, the AMP Worker Task (AWT) per AMP is increased for better performance.
All Teradata Tables are spread across ALL AMPS
Disk Array
• Each AMP Vproc is assigned to a disk
• A Vdisk may contain 119 GB of its disk space
Teradata Components

• The maximum number of vprocs per node can be as high as 128


• Each Parsing Engine (PE) can manage up to 120 individual sessions
• Each nodes will hold up to 40-50 AMPs
• The maximum number of vprocs that can be supported in a single system
is 16,384
• Each BYNET supports up to 1024 nodes in a system
Questions

23
Test Your Understanding

Questions:

1. What is Parsing engine?


2. AMP stands for ?
3. What is the function performed by BYNET?
4. How many BYNET systems are there in Teradata? Explain their
functionalities.
5. What is TDP?

24
Summary

The chapters give a detailed overview of the following processes in Teradata:


 The PE checks the syntax of the query, also checks the security right of the
user accessing.
 The PE comes up with the best optimized plan for execution of the query.
 The PE passes this plan through BYNET to AMP.
 The AMPs follow the plan to retrieve data from its DISKS.
 The AMP passes the data to PE through BYNET.
 The PE then passes the data to the user.

25
Module 2: RDBMS Overview

Objectives:
• After completing this chapter you will be able to answer the following
questions
• What is RDBMS?
• Describe Logical/Relational Modeling?
• What is the relationship between primary and
foreign keys?
• What are the advantages of Relational Modeling?
Introduction to RBMS

A database is the collection of permanently stored data that is


• Logically related – data relates to other data
• Shared – many users may access data
• Protected – access to data is controlled
• Managed – Data has integrity and value
• Based on relational model
Logical/Relational Model

• The Logical Model


 Should be designed without regard to usage
 It cannot accommodate wide variety of front end tools
 It allows database to be created more quickly
 Should be same regardless of data volume
 Represents real world business in a tabular (relational) form.
 Includes all the data definitions within the scope of enterprise or
application
 Is generic , Logical model is the template for physical implementation
on any RDBMS platform.
 Teradata supports fully normalized logical models
• Ability to perform 64 table joins
• Ability to perform large aggregations
Logical/Relational Model

 A column always contain like data


 Relational database contains set of logically related tables
 A table is a two dimensional representation of a data consisting of rows and
columns
 Column always contain like data
 A row is one instance of all the columns in a table
 In a relational database, tables are defined as a named collection of one or more
named columns that can have zero or many rows of related information
 Each row represents an occurrence of entity defined by the table. An entity is
defined as a person, place, thing or event about which the table causes
information.
 In relational math, the following stand true
• Table = a relation or equivalent to that
• Row –a tuple
• Column – an attribute
Primary and Foreign keys

Primary Key rules:


• A Primary Key is required for every table.
• Only one Primary key is allowed in a table.
• Primary keys may consists of one or more columns.
• Primary keys cannot have duplicate values (ND).
• Primary keys cannot be Null (NN).
• Primary keys are considered non- changing values (NC)
Foreign Key rules:
• FK are optional.
• More than one Foreign key is allowed in a table.
• FKs may consists of one or more columns.
• Foreign keys can have duplicate values .
• Foreign keys can be Null.
• Changes to Foreign keys are allowed.
• Each FK must exist somewhere as primary key (Referential integrity)
Relational Advantage
Advantages of relational database:
Ease of use: The revision of any information as tables consisting of rows and columns is much easier to
understand .

Flexibility: Different tables from which information has to be linked and extracted can be easily manipulated by
operators such as project and join to give information in the form in which it is desired.

Security: Security control and authorization can also be implemented more easily by moving sensitive attributes
in a given table into a separate relation with its own authorization controls. If authorization requirement permits,
a particular attribute could be joined back with others to enable full information retrieval.

Data Independence: Data independence is achieved more easily with normalization structure used in a relational
database than in the more complicated tree or network structure.

Data Manipulation Language: The possibility of responding to query by means of a language based on relational
algebra and relational calculus e.g SQL is easy in the relational database approach. For data organized in other
structure the query language either becomes complex or extremely limited in its capabilities.

Cater for future requirements: By having data held in separate tables, it is simple to add records that are not yet
needed but may be in the future. For example, the city table could be expanded to include every city and town in
the country, even though no other records are using them all as yet. A flat file database cannot do this
Module 3: Teradata Index

Objectives:
After completing this chapter you will be able to answer below questions
• What is Primary Index?
• What is Secondary Index?
• How data rows are stored and retrieved?
Indexing

Index is the physical mechanism to store the data

Index

Primary Index Secondary Index

Unique Primary Index Non Unique Primary Index Unique Secondary Index Non – unique Secondary index
Primary keys Vs. Primary Indexes

Indexes are conceptually different from Keys


• A PK is a relational modeling convention which allows each row to be
uniquely identified
• A PI is a Teradata convention which determines how row will be stored
and accessed
Primary Index

• The Primary Index is defined when the table is created.


• The Primary Index cannot be changed. Changing the PI requires dropping
and recreating the table.
• It is a mechanism to assign a row to an AMP

When the Primary Index is not specified , Teradata will default to the first column in the
table, and it will be defined as Non-Unique.
Unique Primary Index (UPI)

• If Index choice of column is Unique then it is UPI.


• UPI will result in even distribution of the rows of table across all AMPs
Unique Primary Index (UPI)

• Use the Primary Index column in your SQL WHERE clause and only 1-AMP
retrieves
• UPI is a one AMP operation and returns one row
Non-Unique Primary Index (NUPI)

• If Index choice of column is not Unique then it is NUPI.


• NUPI will result in even distribution of the rows of table proportional to the
degree of uniqueness of the Index.

• A Non-Unique Primary Index (NUPI) will have duplicates grouped together on the same
AMP, so data will always be skewed (uneven). The above skew is reasonable
Non-Unique Primary Index (NUPI)

• Use the Primary Index column in your SQL WHERE clause and only 1-AMP
retrieves.
• NUPI is a one AMP operation and returns multiple rows
Multi-Column Primary Index

A table can have only one Primary Index, but you can combine up to 64
columns together max to form one Multi-Column Primary Index.
Multi-Column Primary Index

• Use the Primary Index column in your SQL WHERE clause, and only 1-AMP
retrieves
NO Primary Index

• A table that specifically states NO PRIMARY INDEX will receive no primary


index. It will distribute the data evenly but randomly, and this is often used
as a staging table.
NO Primary Index

To retrieve a record , Teradata performs Full table scan as there is no primary


index.
NO Primary Index

• NoPI is generally preferred when the need is to load records temporarily


into staging table.
• Data can be quickly loaded from the source to the staging table. From the
staging table the data can be moved to Production table using
Insert/select statement.
How Teradata distributes and retrieves
data
• The Teradata Parsing Engine will take the Primary Index Value of a row and run a math
calculation called the Hash Formula on that Primary Index column value.
• It produces 32 - bit row hash which equates to an integer
• The Row Hash will go to a bucket in the Hash Map and is assigned to an AMP
32 – bit row hash 00000000000000000101 = 13

• Every Teradata System has one Hash Map with a million buckets. Inside the buckets are AMP
numbers
Placing rows on AMP

• The below example hashed Emp_No 1001 (Primary Index value) and the output was a Row
Hash of 13. Teradata counted over to bucket 13 in the Hash Map, and it has the number one
(1) inside that bucket. This means that this row will go to AMP 1.
• Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5. Teradata counted
over to bucket 5 in the Hash Map, and it has the number two (2) inside that bucket. This
means that this row will go to AMP 2.
• There is one Hashing Formula in Teradata, and it is consistent.

Emp No 1001 Emp No 1002


Review of Hashing process

• Hash the Primary Index Value for a row with the Hash Formula.
• The output of the Hash Formula is a 32-bit Row Hash.
• Take the Row Hash and find its corresponding bucket in the Hash Map.
• Send the row and its Row Hash to the AMP listed in the Hash Map Bucket.
Skew Factor

• Skew refers to the row distribution on AMPs. If the data is highly skewed, it means
some AMPs are having more rows and some very less i.e. data is not
properly/evenly distributed. This in turn will result in poor performance. Choice of
Indexes should be made with utmost care to avoid Skewness.

• NULL values in the Primary Index is the main reason for skew. A Table with a
Unique Primary Index can have only one Null value, but a NUPI table can have
many NULL values, and each NULL value hashes to the same AMP.
Uniqueness Value

• Each AMP will place a Uniqueness Value after the row hash to track
duplicate values
• The Hash Formula is consistent so every Smith has the same Row Hash
and the same goes for each Jones and each Patel. Therefore, duplicate
values land on the same AMP.

• Row-ID equals the Row Hash of the Primary Index column and the
Uniqueness Value.
Row ID

UNIQUE PRIMARY INDEX NON - UNIQUE PRIMARY INDEX


• The Uniqueness Value on each Row- • Uniqueness Value increases on all
ID is 1. duplicate names
• Each AMP sorts their rows by the • Each AMP sorts their rows by the
Row-ID. Row-ID.

AMPs sort rows by Row-ID so like data is grouped together and for
Binary searches.
Example

Sel * from Employee_table where last_name


=‘Smith’;

Plan:
1. PE sees the last name as Priamry index
2. It hash Smith and get row hash
3. Row hash =7
4. Counts the bucket in hash map 7 times and it says
Amp 1
5. Passes message to AMP1 through BYNET to
retrieve row has 7’s
6. Bring back all columns for Row hash 7 (‘Smith’)
Binary Search - Example

Sel * from order_table where Order_Number=50;


Plan:
1. PE sees the order_number as Priamry index
2. It hash 50 and get row hash
3. Row hash =75
4. Counts the bucket in hash map 75 times and it
says Amp 1
5. Passes message to AMP1 through BYNET to
retrieve row has 75
6. Perform a Binary Search
Primary Index Example

• A Unique Primary Index will • A Non-Unique Primary Index will


spread the data perfectly evenly NOT spread the data perfectly
evenly.
Primary Index Example

• Multi-Column Primary Index is • In No Primary Index , all AMPs


often used to fix a data skew read all of their rows (full table
problem scan) because there is no Primary
Index.
Secondary Index

• Secondary Index can be created and dropped dynamically


• Syntax

• Secondary index requires a separate physical structure (the subtable), but


a Primary Index do NOT require a separate physical structure
• Unique Secondary Index (USI) Subtable contains two columns
1. Emp_No (The USI column)
2. Row-ID of the real Primary Index of the base table
Primary Index Vs Secondary Index
How Parsing Engine uses the USI Subtable

• Parsing Engine plan - It is a 2 AMP operation

Emp_no is a USI.
PE will hash 1004 and see which AMP holds row in subtable. (AMP 3).
PE will have the BYNET contact with AMP 3 and retrieves row 1004 (Single AMP).
AMP will pass the real row id of base table row (1,4) back up to PE.
PE will use the ROW –ID to find the base table row with another single AMP retrieve.

• A USI is a Two-AMP Operation


• The first AMP is assigned to read the subtable and the second the base table.
• Two binary searches are performed in total, and one row is returned.
Non Unique Secondary Index

• Syntax

• Non Unique Secondary Index (NUSI) Subtable contains two columns


1. Emp_No (The USI column) First_Name (The NUSI column)
2. Row-ID of the real Primary Index of the base table
• The NUSI rows get their own Row-ID, but they are not hashed to different
AMPs and stay AMP local.
NUSI are AMP -Local

• Subtable rows match those of the base rows on the same AMP , hence it is
AMP Local.
• A NUSI query always searches all AMPs, but the intent is not to do a Full
Table Scan. If there are 50 AMPs, then a minimum of 50 binary searches
are done.
How Parsing Engine uses the NUSI
Subtable

• Parsing Engine plan - It is ALL AMP operation

 First_name is a NUSI.
 PE will order each AMP to search if they have kyle’ in their NUSI subtable
 Each AMP will simultaneously perform a binary search on their NUSI Subtable
 If AMP has Kyle, PE will order them to retrieve the base row.
 If there are 50 AMP’s, then all 50 AMP’s will perform a binary search simultaneously and if they find ‘Kyle’
they perform another binary search on base table.

• A NUSI is ALL AMP Operation


Primary Index vs. Secondary Index

Index Feature UPI NUPI USI NUSI


Required? Yes* Yes* No No
Single-AMP Retrieve Yes Yes No No
Number of Binary Searches 1 1 2 Many
Number per Table 1 1 "0-32" "0-32"
Max Columns 64 64 64 64
Unique Y N Y N
Affects Row Distribution Y Y N N
Created/Dropped Dynamically N N Y Y
Improves Access Y Y Y Y
Can be multiple data types Y Y Y Y
Separate physical structure N N Sub-table Sub-table
Extra Processing Overhead N N Y Y
May be ordered by value N N N Y
May be partitioned Y Y N N
* Teradata has a NoPI table now in V13.10
Full- Table Scans

• Teradata Database always uses a full-table scan to access the data of a


table if a query:
 Accesses a NoPI table that does not have an index defined on it
 Does not specify a WHERE clause
 The Index columns are not used in the query’
 An index is used in a non –Equality test
 A range of values is specified for the primary index
• A full-table scan is always an all-AMP operation, and should be avoided
when possible
Questions

63
Summary

• Index is the physical mechanism to store the data


• A PK is a relational modeling convention which allows each row to be uniquely
identified
• The Primary Index is defined when the table is created.
• A table can have only one Primary Index, but you can combine up to 64 columns
together max to form one Multi-Column Primary Index.
• Hash the Primary Index Value for a row with the Hash Formula.
• The output of the Hash Formula is a 32-bit Row Hash.
• Row-ID equals the Row Hash of the Primary Index column and the Uniqueness Value.
• Secondary Index can be created and dropped dynamically
• Non Unique Secondary Index (NUSI) Subtable contains two columns
– Emp_No (The USI column) First_Name (The NUSI column)
– Row-ID of the real Primary Index of the base table
• NUSI are AMP -Local
Test Your Understanding

1. How are both tables sorted?


2. What was the Row-ID when Minal was hashed?
3. Looking in the subtable what is the Row-ID of the base for employee 1006?
4. When 1006 was placed in the subtable, which bucket in the hash
map was chosen?
5. How many times is the Hash Map consulted on a query using a USI in
the WHERE Clause?
Module 4: Space

Objectives:
After completing this chapter, you will be able to answer the following
questions
What is Teradata database and user?
How are space allocated to Teradata objects?
What is the hierarchy of objects in Teradata syatem?
Space

There are three types of space in Teradata


Perm Space : PERM space houses permanent tables, Secondary Indexes, Join
Indexes and Permanent Journals
Temp Space: Temp space is store temporary tables
Spool Space : Spool space is used by each AMP in order to build the answer
set for the user.
A Teradata Database(Example)

A Teradata database is a logical repository for


• Tables (requires perm space)
• Views (uses no perm space)
• Macros (use no perm space)
When a system arrives, there is only one user called DBC.
USER DBC
• System user DBC contains all Teradata Database software components and all system tables.

Syntax:
CREATE DATABASE new_db FROM existing_db
AS
PERMANENT = 20000000
,SPOOL= 50000000
,TEMP = 20000000

‘new_db’ is owned by ‘existign_db’


A database is empty until all objects are created within it
A database with no PERM space can have view and macros but not tables
A Teradata User

A Teradata user is a database with an assigned password


A Teradata user may also own tables, view, macros, triggers but users with no perm space may
not own tables
A user may logon to Teradata and access objects within:
• Itself
• Other database for which it has access rights

Syntax:
CREATE USER new_user FROM existing_user
AS
PERMANENT = 10000000
PASSWORD =‘Acdmy’
,SPOOL= 50000000
,TEMP = 20000000

‘new_user’ is owned by ‘existing_user’


A user is empty until all objects are created within it
The Teradata Hierarchy

• Initially DBC owns 10 TB of PERM space. DBC created Spool_Reserve (4


TB), USER Retail (2 TB) and USER Financial (2 TB) and later that DBC has
only 2 TB of PERM space.
• USER Retail and USER Financial can create the databases and users desired
as below.
Difference between PERM and Spool space

Assume User ‘A’ has 2TB of permanent space ,10 GB of spool


space and has 1000 users under them
 User ‘A’ can create and load up to 2 TB of Tables data in his
PERM space
 Every 1000 user under ‘A’ say ‘A1, A2, A3….’ can run queries
up to 10GB of spool space simultaneously
Test Your Understanding

• What is the difference between Teradata


Database and Teradata User?
Module 5: Data Protection

Objectives
After completing this module you will be able to answer
• How locks prevents loss of data integrity?
• What are the types of locking provided by Teradata?
• What are FALLBACK tables?
Locks

There are four types of Locks


Exclusive Lock: This is placed only on a database or table when the object is going through
a structural change. Prevents any other type of concurrent access to database or tables
and never to rows
Write Lock: This happens on an INSERT, DELETE, or UPDATE request. It prevents other Read,
Write and Exclusive locks
Read Lock: This is placed in response to a SELECT request. This restricts access by users
who require Exclusive or Write locks. If you have a multi-user environment with updates
occurring and you need to keep data consistent, you want a read lock.
Access Locks(Dirty-Read or Stale-Read): An Access lock permits the user to access to READ
an object that may already be locked for READ or WRITE. An access lock does not restrict
access by another user except when an Exclusive lock is required. This is placed in response
to a user-defined LOCKING FOR ACCESS phrase. A user requesting access cannot be
concerned with data consistency.
Locks

• Locks are applied at 3 levels


1. Database: Applies to tables/Views in
the database
2. Table/View: Applies to all rows in a
table
3. Row Hash: Applies to all rows with
same Row Hash
Rule:
Lock requests are queued behind all
outstanding incompatible lock request for
the same object.
Row Hash Lock Syntax :
Locking Row for Access SELECT * FROM
TABLE_A;
Compatibility between Read Locks

Read Locks are compatible but Write Locks are not.

Assume in Employee_Table, we have four SQL statement first two are SELECT, third is INSERT and fourth is
SELECT.

Compatibility:
• Read supports other Read locks and Access Locks
• Write supports Access Lock
Cliques

• A cliques is a defined set of nodes with fallover capability


• A clique protects against a node failure
• All nodes in a clique must be able to access all vdisks for all amps in a
clique
• If a node fails all AMPs will migrate to the remaining nodes in a clique
• When a node fails:
– Teradata resets
– On the restart the AMPs in Node 1 Migrate
– The system is degraded but still able to function
– The down node is fixed
– Another reset is done and the AMPs return home
• Each node can support 128 AMPs
Cliques

• An example of Four node cliques

• Node 1 fails and the AMPs are migrated to other AMPS


Fallback

• Fallback is to protect against an AMP Failure.


• Fallback makes a duplicate copy of every row in a table and keeps that row on a different
AMP.
• If an AMP goes down, the system can still process the query because the rows on the failed
AMP are also held by another AMP.
• Automatically restores data changed during AMP offline.
• It is critical for high availability applications.
Cost of Fallback:
• The cost of Fallback is that the table is twice as big and uses twice the space.
• Twice the Inserts, updates, and deletes is needed.
Table with Fallback and with no fallback
CREATE TABLE Emp_Intl, Fallback CREATE TABLE Emp_Intl, No Fallback
(Emp_No INTEGER (Emp_No INTEGER
, Dept_No SMALLINT , Dept_No SMALLINT
, First_Name VARCHAR(12) , First_Name VARCHAR(12)
, Last_Name CHAR(20) , Last_Name CHAR(20)
, Salary DECIMAL(10,2)) , Salary DECIMAL(10,2))
UNIQUEPRIMARY
Note: INDEX ( Emp_No );
Default is No fallback UNIQUEPRIMARY INDEX ( Emp_No );
Fallback Clusters

• A cluster is a group of AMPs that act as a single fallback unit.


• Fallback rows for AMPs reside in a cluster.
• Loss of AMPs in a cluster permits continued table access.
• Loss of 2 AMPs in the cluster causes the RDBMS to halt.
2 Clusters with 2AMP each

System performance can be adversely affected when any AMP has a


disproportionate burden
Fallback Vs. Non-Fallback tables

Fallback tables
• One AMP down
– Data fully available
• Tow or more AMPs down
– In different cluster
• Data fully available
– In the same cluster
• System halts.

Non - Fallback tables


• One AMP down
– Data partially available
– Queries avoiding down AMP succeed
• Tow or more AMPs down
– In different cluster
• Data partially available
• Queries avoiding down AMP succeed
– In the same cluster
• System halts.
RAID

RAID –Redundant Array of Independent Disks


Two Types of Disk Array protection
• RAID 1(Mirroring)

• RAID 1 provides each AMP two disks for storing data and two disks for
mirroring.
• The data disk and the mirror disk are called a mirrored pair.
• RAID 1 costs 50% of the disk space, but it ensures a 99% up time for
customers.
• If a single disk goes down, it is easily replaced and Teradata isn't even
effected
RAID

RAID 5(Parity):
• For every 3 blocks of data, there is a parity block on a 4th disk.
• If a disk fails, any missing blockmay be reconstructed using the other three
disks
• Array controller reconstruction of failed disk is longer than RAID 1

Summary:
• RAID 1: Good Performance with disk failures. Higher cost in terms of disk
space
• RAID 5: Reduced Performance with disk failures. Lower cost in terms of
disk space
Questions

84
Test Your Understanding

1. List the type of locks in Teradata


2. What are compatibility locks?
3. What is Dirty read lock?
4. How can the Node failure be protected?
5. What is RAID?
6. Is it mandatory to have FALLBACK for all tables?
Summary

• Exclusive Lock is placed only on a database or table when the object is


going through a structural change.
• Write Lock happens on an INSERT, DELETE, or UPDATE request.
• Read Lock is placed in response to a SELECT request.
• Access Locks is also known as Dirty-Read or Stale-Read.
• A cliques is a defined set of nodes with fallover capability.
• Fallback is to protect against an AMP Failure.
• RAID 1 shows good Performance with disk failures.
Source
• Tera Tom e – Book
• Teradata Database Design (PDF)
• www.teradataforum.com
• www.teradata.com

Disclaimer: Parts of the content of this course is based on the materials available from the websites and books
listed above. The materials that can be accessed from the linked sites are not maintained by Cognizant Academy
and we are not responsible for the contents thereof. All trademarks, service marks, and trade names in this course
are the marks of the respective owner(s).

32
Change Log

Version Changes made


Number

V1.0 Initial Version

V1.1 Slide No. Changes


Changed By Effective Date
Effected
   1-86 Bhuvanya.M 05/05/2015 Base line
(221634) content
         

34
Introduction to Teradata

You have successfully completed the session


on Teradata Architecture

Das könnte Ihnen auch gefallen