Beruflich Dokumente
Kultur Dokumente
1.1.
Primary Index (PI)
==============
Every table must have at least one column as the Primary Index. The
Primary Index is defined when the table is created. There are two reasons
you might pick a different Primary Index then your Primary Key. They are
(1) for Performance reasons and (2) known access paths.
Primary Index Rules
Rule 1: One Primary Index per table.
Rule 2: A Primary Index value can be unique or non-unique.
Rule 3: The Primary Index value can be NULL.
Rule 4: The Primary Index value can be modified.
Rule 5: The Primary Index of a populated table cannot be modified.
Rule 6: A Primary Index has a limit of 64 columns.
Two Types of Primary Indexes (UPI or NUPI)
Unique Primary Index(UPI)
A Unique Primary Index (UPI) is unique and cannot have any duplicates.
If you try and insert a row with a Primary Index value that is already in the
table, the row will be rejected. An UPI enforces UNIQUENESS for a
column.
A Unique Primary Index (UPI) will always spread the rows of the table
evenly amongst the AMPs. UPI access is always a one-AMP operation.
We have selected EMP_NO to be our Primary Index. Because we have
designated EMP_NO to be a Unique Primary Index, there can be no
duplicate employee numbers in the table.
A Non-Unique Primary Index will almost never spread the table rows
evenly.
An All-AMP operation will take longer if the data is unevenly distributed.
You might pick a NUPI over an UPI because the NUPI column may be more
effective for query access and joins.
We have selected LAST_NAME to be our Primary Index. Because we have
designated LAST_NAME to be a Non-Unique Primary Index we are
anticipating that there will be individuals in the table with the same last
name.
Hashing Process
1.The primary index value goes into the hashing algorithm.
2.The output of the hashing algorithm is the row hash value.
3.The hash map points to the specific AMP where the row resides.
4.The PE sends the request directly to the identified AMP.
5.The AMP locates the row(s) on its vdisk.
6.The data is sent over the BYNET to the PE, and the PE sends the answer
set on to the client application.
Duplicate Row Hash Values
It is possible for the hashing algorithm to end up with the same row hash
value for two different rows.
There are two ways this could happen:
Duplicate NUPI values: If a Non-Unique Primary Index is used, duplicate
NUPI values will produce the same row hash value.
Hash synonym: Also called a hash collision, this occurs when the
hashing algorithm calculates an identical row hash value for two
different Primary Index values.
When each row is inserted, the AMP adds the row ID, stored as a prefix of
the row.
The first row inserted with a particular row hash value is assigned a
uniqueness value of 1. The uniqueness value is incremented by 1 for any
additional rows inserted with the same row hash value.
Below Diagrams will show how data resides in AMP.
Even Distribution with an UPI
1.2.
Secondary Index (SI)
====================
Before going to learn about Secondary Index, i would like to suggest you
learn Primary Index.
Duplicate
Map
Accessing Data
Cash Back
Customer Number
Discards
Indexes
A Secondary Index (SI) is an alternate data access path. It allows you
to access the data without having to do a full-table scan.
You can drop and recreate secondary indexes dynamically, as they are
needed.Secondary Indexes are stored in separate subtables that requires
additional disk space and maintenance which is handled automatically by
the system.
The entire purpose for the Secondary Index Subtable will be to
point back to the real row in the base table via the Row-ID.
Secondary Index Rules
Rule 1: Secondary Indexes are optional.
Rule 2: Secondary Index values can be unique or non-unique.
Rule 3: Secondary Index values can be NULL.
Rule 4: Secondary Index values can be modified.
Rule 5: Secondary Indexes can be changed.
Rule 6: A Secondary Index has a limit of 64 columns.
Each AMP will hold the secondary index values for their rows in the base
table only. In our example, each AMP holds the name column for all
employee rows in the base table on their AMP (AMP local).
Each AMP Local name will have the Base Table Row-ID (pointer)so the AMP
can retrieve it quickly if needed. If an AMP contains duplicate first names,
only one subtable row for that name is built with multiple Base Row-IDs.
Syntax to create a Non-Unique Secondary Index is:
CREATE INDEX (Column/Columns) ON .;
There can be up to 32 Secondary Indexes on a table
An USI is always a Two-AMP operation so it is almost as fast as a
Primary Index, but a NUSI is an All-AMP operation, but not a Full Table
Scan.
Data Access With USI:
The hashing algorithm calculates a row hash value (in this case,
602).
The hash map points to the AMP containing the subtable row
corresponding to the row hash value (in this case, AMP 2).
The subtable indicates where the base row resides (in this case, row
778 on AMP 4).
The message goes back over the BYNET to the AMP with the row
and the AMP accesses the data row (in this case, AMP 4).
The row is sent over the BYNET to the PE, and the PE sends the
answer set on to the client application.
The SQL is submitted, specifying a NUSI (in this case, a last name of
"Smith").
The hashing algorithm calculates a row hash value for the NUSI (in
this case, 567).
All AMPs are activated to find the hash value of the NUSI in their
index subtables. The AMPs whose subtables contain that value
become the participating AMPs in this request (in this case, AMP1
and AMP2). The other AMPs discard the message.
Each participating AMP locates the row IDs (row hash value plus
uniqueness value) of the base rows corresponding to the hash value
(in this case, the base rows corresponding to hash value 567 are
640, 222, and 115).
The participating AMPs access the base table rows, which are
located on the same AMP as the NUSI subtable (in this case, one row
from AMP 1 and two rows from AMP 2).
The qualifying rows are sent over the BYNET to the PE, and the PE
sends the answer set on to the client application (in this case, three
qualifying rows are returned).
1.3.
Join Index (JI)
===========
Blob
Cash Back
Duplication
Indexes
Possibly
Reduce joins
Eliminate redistributions
The following are wide variety of Join Index names. We will discuss each in
clear detail.
1. Single Table Join Index
2. Multi-Table Join Index
3. Multi-Table Compressed Join Index
4. Aggregate Join Index
An Aggregate Join Index will allow tracking of the Aggregates SUM and
COUNT on any table
A Sparse Join Index is a Join Index that doesnt use every row because it
has a WHERE Clause. This is done to save space and time.
Hash Indexes are used similar to a Join Index, but Hash Indexes are
maintained in AMP-Local tables and used to quickly find certain key
columns in a base table.
or all of the tables in the query. Check if the PI of target table loaded is
unique.
D) Use PPI: If there is Partition Primary Index created on a table, try to
use it. If you are not using it in filter condition, it will degrade the
performance.
E) No FUNCTIONS in Conditions:
Duplicate
Try to avoid using function in join conditions. Ex Applying COALESCE or
TRIM etc causes high CPU consumption.
F) Use PPI: If Partition Primary Index is defined in tables try to use it. If
you are not using it in filter condition, it will degrade the performance.
G) Same column DATA TYPES: Define same data type for the joining
columns.
H) Avoid IN clause in filter conditions: When there can be huge
number of values in where conditions, better option can be to insert
such values in a volatile table and use volatile table with INNER JOIN in
the main query.
I) Use Same PI in Source & Target: PI columns also can help in
saving the data into disk .If the Source and Target have the same PI,
data dump can happen very efficiently form source to target.
J) Collect STATS on VOLATILE table: Collect stats on volatile tables
where required can save AMPCPU. Remove stats if already present
where it is not getting used.
If the volatile table contains UNIQUE PI, then go for sample stats rather
than full stats.
K) DROPPING volatile tables explicitly: Once volatile tables is no
more required you can drop those. Dont wait for complete procedure
to be over. This will free some spool space immediately and could
prove to be very helpful in avoiding No More Spool Space error.
L) NO LOG for volatile tables: Create volatile tables with NO LOG
option.
M) Check DBQL Stats: Keep your performance stats accessible.
Target the most AMPCPU consuming query first.
N) UPDATE clause: Do not write UPDATE clause with just SET
condition and no WHERE condition. Even if the Target/Source has just
one row, add WHERE clause for PI column.
3.1.
Transient journal
==============
Is an automatic feature that provides Data Integrity
Automatic rollback of changed rows in the event of transaction
failure
Data is always returned to its original state after a transaction
failure.
Takes Before Image (BI) of changes for rollback purpose
BI is stored in AMPs transient journal
AMPs transient journals are maintained in DBC users Perm Space.
When the transaction is committed, the BI in transient journal is
purged automatically
When a transaction fails
User receives failure message
Transaction is rolled back
Locks are released
Spool files are discarded
3.2.
Fallback
=======
3.3.
Down AMP recovery journal (DARJ)
============================
DARJ is started on all AMPs in a cluster when an AMP is down
DARJ keeps track of all changes that would have been written to the
failed AMP.
When the AMP comes back online, the DARJ will catch-up the AMP by
applying the missed transactions.
Once everything is caught up, the DARJ is dropped
After the loss of any AMP, a Down-AMP Recovery Journal is started
automatically. Its purpose is to log any changes to rows which
reside on the down AMP. Any inserts, updates, or deletes affecting
rows on the down AMP, are applied to the Fallback copy within the
cluster. The AMP which holds the Fallback copy logs the row-id in its
Recovery Journal.
This process continues until such time as the down AMP is brought
back on-line. As a part of the restart activity, the Recovery Journal is
read and changed rows are applied to the recovered AMP. When the
journal has been exhausted, it is discarded and the AMP is brought
on-line fully recovered
3.4.
Cliques
======
3.5.
Permanent journal
===============
The Permanent Journal is an optional, user specified, systemmaintained journal which is used for recovery of a database to a
specified point in time.
The Permanent Journal:
Is used for recovery from unexpected hardware or software
disasters.
May be specified for one or more tables
Permits capture of Before Images for database rollback.
Permits capture of After Images for database roll forward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.
3.7.
Locks
=====
Locking prevents multiple users who are trying to change the same
data at the same time from violating the data's integrity. This
concurrency control is implemented by locking the desired data.
There are four types of locks:
Exclusive - prevents any other type of concurrent access
Write
- prevents other reads, writes, exclusives
Read
- prevents writes and exclusives
Access
- prevents exclusive only
Locks may be applied at three database levels:
Database
- applies to all tables/views in the database
Table/View
- applies to all rows in the table/views
Row Hash
- applies to all rows with same row hash
Exclusive locks are only applied to databases or tables but never to
rows. They are the most restrictive type of lock; all other users are
locked out. Exclusive locks are used rarely, most often when
structural changes are being made to the database.
Write locks enable users to modify data while locking out all other
users except readers not concerned about data consistency (Access
lock readers). Until a Write lock is released, no new read or write
locks are allowed.
Read locks are used to ensure consistency during read operations.
Several users may hold concurrent read locks on the same data,
during which no modification of the data is permitted.
Access locks can be specified by users who are not concerned about
data consistency. The use of an access lock allows for reading data
while modifications are in process. Access locks are designed for
decision support on large tables that are updated only by small
single-row changes. Access locks are sometimes called stale read
locks, i.e. you may get stale data that hasnt been updated.
4. Types of Tables
=================
A two-dimensional structure of columns and rows of data
Permanent Tables -Requires Perm Space
SET : No duplicate rows
CREATE SET TABLE EMP_TABLE
(
EMP_NO INTEGER
,DEPT_NO INTEGER
,FIRST_NAME VARCHAR(20)
,LAST_NAME CHAR(20)
,SALARY DECIMAL(10,2)
,ADDRESS VARCHAR(100)
)
PRIMARY INDEX(EMP_NO);
MULTISET : duplicate rows allowed
CREATE MULTISET TABLE EMP_TABLE
(
EMP_NO INTEGER
,DEPT_NO INTEGER
,FIRST_NAME VARCHAR(20)
,LAST_NAME CHAR(20)
,SALARY DECIMAL(10,2)
,ADDRESS VARCHAR(100)
)
PRIMARY INDEX(EMP_NO);
Temporary Tables
Derived Tables
Requires Spool Space
Volatile Tables
Requires Spool Space
Global Temporary Tables Requires Temp Space
4.1.
Derived Tables
===========
Derived tables are temporary tables that are created within a users
SQL query and deleted when the query is done.
Spool space is used to materialize the data
Do not create DDL and hence not part of DD
SELECT ENAME, EMP. DEPTNO,SAL, AVGSAL
FROM EMP,
(SELECT DEPTNO, AVG(SAL) AS AVGSAL FROM EMP GROUP BY 1) AS
DEPT_AVG
WHERE EMP.DEPTNO = DEPT_AVG.DEPTNO
AND SAL > AVGSAL;
ENAME
---------FORD
BLAKE
DEPTNO
----------20
30
SAL
AVGSAL
----------------3000.00
2175.00
2850.00
1566.67
4.2.
Global Temporary table
======================
4.3.
Volatile table
===============
Created with CREATE VOLATILE TABLE
DDL is not stored in DD
Date is populated with INSERT/SELECT uses Spool space
Drops automatically after the session is over
Life time of data is till either transaction or session ends
Both definition and data is private to the session
Cannot collect statistic on Volatile tables
CREATE VOLATILE TABLE DEPT_REPORT
(DEPTNO INTEGER,
SUM_SAL DECIMAL(10,2),
AVG_SAL DECIMAL(10,2),
CNT_EMPS INTEGER)
ON COMMIT PRESERVE ROWS;
5. Compression, Data type attributes....
5.1.
Compression
===========
Compression is the feature of Teradata which helps to reduce the
disk storage. It is useful in reducing the disk space required by
FIXED LENGTH columns or NULLS. It is a loss less compression so no
data will be lost in it.
We can do compression on the following data types
Nulls, Zeros, Blanks. (Note all are different values and never
consider them as same)
DATE
Syntax of Compression
CREATE TABLE EMP_ADDRESS(
Address VARCHAR(50),
City CHAR(20) COMPRESS (Bangalore),
StateCode CHAR(2)
);
Guidelines of Compression
With a normal nested subquery, the inner SELECT query runs first
and executes once, returning values to be used by the main query. A
correlated subquery, however, executes once for each candidate
row considered by the outer query. In other words, the inner query is
driven by the outer query.
6.2.
Joins
=======
INNER JOIN
All matching rows
LEFT OUTER JOIN
Table to the left is used to qualify, table on the right has nulls
when rows do not match.
RIGHT OUTER JOIN
Table to the right is used to qualify, table on the left has nulls
when rows do not match.
FULL OUTER JOIN
Both tables are used to qualify and extended with nulls.
CROSS JOIN
Each row of one table is matched with each row of the other
table
6.3.
Views
=======
6.4.
Macros
======
A macro is a Teradata extension to ANSI SQL that contains
prewritten SQL statements. The actual text of the macro is stored in
a global repository called the Data Dictionary (DD). Macros are
database objects and thus they belong to a specified user or
database. They may contain one or more SQL statements. Macros
FROM EMP
WHERE DEPTNO=10
ORDER BY EMPNO;
);
EXEC DEPT_10;
EMPNO ENAME
DEPTNO
7782 CLARK
7839 KING
--------10
10
SAL
2450.00
5000.00
7934 MILLER
10
1300.00
7.1.
Normal Functions
=============
SELECT DATE;
SELECT CURRENT_DATE;
SELECT TIME;
SELECT CURRENT_TIME;
SELECT CURRENT_TIMESTAMP;
7.2.
Aggregate functions
================
7.3.
Analytical functions
======================
SUM/CSUM
: These functions are used to compute a cumulative
sum of a particular group of rows. SUM also can be used to simply
calculate group sum. For moving sum use MSUM.
COUNT
: To calculate cumulative or moving count.
AVG
: Similarly to compute the moving average use the
AVG or MAVG function.
MDIFF
: To see the difference between the current row
(column) and the preceding nth row (column) value. If you want to see the
sales numbers (increasing or decreasing) on a daily basis, use this
function.
MLINREG
: To project the next value in a series based on the
data pattern present in the series.
QUANTILE
: To divide the result set into partitions with equal
number of rows present in each partition.
RANK
: This function is used to display the ordered rank of
all rows in a particular group.
PERCENT_RANK:- To find out relative rank of a row in a group use
PERCENT_RANK.
ROW_NUMBER:- To get the sequential row number of the row within
its data subset.
MAX/MIN
value in a group.