Sie sind auf Seite 1von 29

1. Indexes (PI,SI,JI,HI) in detail and depth?

1.1.
Primary Index (PI)
==============
Every table must have at least one column as the Primary Index. The
Primary Index is defined when the table is created. There are two reasons
you might pick a different Primary Index then your Primary Key. They are
(1) for Performance reasons and (2) known access paths.
Primary Index Rules
Rule 1: One Primary Index per table.
Rule 2: A Primary Index value can be unique or non-unique.
Rule 3: The Primary Index value can be NULL.
Rule 4: The Primary Index value can be modified.
Rule 5: The Primary Index of a populated table cannot be modified.
Rule 6: A Primary Index has a limit of 64 columns.
Two Types of Primary Indexes (UPI or NUPI)
Unique Primary Index(UPI)
A Unique Primary Index (UPI) is unique and cannot have any duplicates.
If you try and insert a row with a Primary Index value that is already in the
table, the row will be rejected. An UPI enforces UNIQUENESS for a
column.
A Unique Primary Index (UPI) will always spread the rows of the table
evenly amongst the AMPs. UPI access is always a one-AMP operation.
We have selected EMP_NO to be our Primary Index. Because we have
designated EMP_NO to be a Unique Primary Index, there can be no
duplicate employee numbers in the table.

Non-Unique Primary Index (NUPI)


A Non-Unique Primary Index (NUPI) means that the values for the selected
column can be non-unique. Duplicate values can exist.

A Non-Unique Primary Index will almost never spread the table rows
evenly.
An All-AMP operation will take longer if the data is unevenly distributed.
You might pick a NUPI over an UPI because the NUPI column may be more
effective for query access and joins.
We have selected LAST_NAME to be our Primary Index. Because we have
designated LAST_NAME to be a Non-Unique Primary Index we are
anticipating that there will be individuals in the table with the same last
name.

Multi-Column Primary Indexes:


Teradata allows more than one column to be designated as the Primary
Index. It is still only one Primary Index, but it is merely made up by
combining multiple columns together. Teradata allows up to 64 combined
columns to make up the one Primary Index required for a table.
On the following page you can see we have designated First_Name and
Last_Name combined to make up the Primary Index.
This is often done for two reasons:
(1) To get better data distribution among the AMPs
(2) Users often use multiple keys consistently to query

SQL Syntax for Creating a Primary Index


Creating a Unique Primary Index
The SQL syntax to create a Unique Primary Index is:

CREATE TABLE sample_1


(col_a INT
,col_b INT
,col_c INT)
UNIQUE PRIMARY INDEX (col_b);
Creating a Non-Unique Primary Index
The SQL syntax to create a Non-Unique Primary Index is:
CREATE TABLE sample_2
(col_x INT
,col_y INT
,col_z INT)
PRIMARY INDEX (col_x);
Data distribution using Primary Index
When a user submits an SQL request against a table using a Primary
Index, the request becomes a one-AMP operation, which is the most direct
and efficient way for the system to find a row. The process is explained
below.

Hashing Process
1.The primary index value goes into the hashing algorithm.
2.The output of the hashing algorithm is the row hash value.
3.The hash map points to the specific AMP where the row resides.
4.The PE sends the request directly to the identified AMP.
5.The AMP locates the row(s) on its vdisk.
6.The data is sent over the BYNET to the PE, and the PE sends the answer
set on to the client application.
Duplicate Row Hash Values

It is possible for the hashing algorithm to end up with the same row hash
value for two different rows.
There are two ways this could happen:
Duplicate NUPI values: If a Non-Unique Primary Index is used, duplicate
NUPI values will produce the same row hash value.

Hash synonym: Also called a hash collision, this occurs when the
hashing algorithm calculates an identical row hash value for two
different Primary Index values.

To differentiate each row in a table, every row is assigned a unique Row


ID. The Row ID is the combination of the row hash value and a uniqueness
value.
Row ID = Row Hash Value + Uniqueness Value
The uniqueness value is used to differentiate between rows whose Primary
Index values generate identical row hash values. In most cases, only the
row hash value portion of the Row ID is needed to locate the row.

When each row is inserted, the AMP adds the row ID, stored as a prefix of
the row.
The first row inserted with a particular row hash value is assigned a
uniqueness value of 1. The uniqueness value is incremented by 1 for any
additional rows inserted with the same row hash value.
Below Diagrams will show how data resides in AMP.
Even Distribution with an UPI

Uneven Distribution with a NUPI

1.2.
Secondary Index (SI)
====================

Before going to learn about Secondary Index, i would like to suggest you
learn Primary Index.
Duplicate

Map

Accessing Data

Cash Back

Customer Number

Discards

Indexes
A Secondary Index (SI) is an alternate data access path. It allows you
to access the data without having to do a full-table scan.
You can drop and recreate secondary indexes dynamically, as they are
needed.Secondary Indexes are stored in separate subtables that requires
additional disk space and maintenance which is handled automatically by
the system.
The entire purpose for the Secondary Index Subtable will be to
point back to the real row in the base table via the Row-ID.
Secondary Index Rules
Rule 1: Secondary Indexes are optional.
Rule 2: Secondary Index values can be unique or non-unique.
Rule 3: Secondary Index values can be NULL.
Rule 4: Secondary Index values can be modified.
Rule 5: Secondary Indexes can be changed.
Rule 6: A Secondary Index has a limit of 64 columns.

Like Primary Indexes,Secondary Indexes also two Types (USI or NUSI).


A Unique Secondary Index (USI) serves two purposes.

Enforces uniqueness on a column or group of columns. The


database will check USIs to see if the values are unique.

Speeds up access to a row (data retrieval speed).

When A USI is created Teradata will immediately build a secondary index


subtable on each AMP.
Each AMP will then hash the secondary index value for each of their rows
in the base table.
In the Below diagram,each AMP hashes the Emp_no column for all
employee rows they hold.
The output of the Emp_no hash will utilize the hash map to point to a
specific AMP and that AMP will hold the secondary index subtable row for
the secondary index value.
That means the subtable row will hold the base table Row-ID and Teradata
will then find the base row immediately.
Syntax to create a Unique Secondary Index is:
CREATE UNIQUE INDEX (Column/Columns) ON .;

A Non-Unique Secondary Index (NUSI) is usually specified to prevent


full-table scans, in which every row of a table is read.
When A NUSI is created Teradata will immediately build a secondary index
subtable on each AMP.

Each AMP will hold the secondary index values for their rows in the base
table only. In our example, each AMP holds the name column for all
employee rows in the base table on their AMP (AMP local).
Each AMP Local name will have the Base Table Row-ID (pointer)so the AMP
can retrieve it quickly if needed. If an AMP contains duplicate first names,
only one subtable row for that name is built with multiple Base Row-IDs.
Syntax to create a Non-Unique Secondary Index is:
CREATE INDEX (Column/Columns) ON .;
There can be up to 32 Secondary Indexes on a table
An USI is always a Two-AMP operation so it is almost as fast as a
Primary Index, but a NUSI is an All-AMP operation, but not a Full Table
Scan.
Data Access With USI:

The SQL is submitted, specifying a USI (in this case, a customer


number of 56).

The hashing algorithm calculates a row hash value (in this case,
602).

The hash map points to the AMP containing the subtable row
corresponding to the row hash value (in this case, AMP 2).

The subtable indicates where the base row resides (in this case, row
778 on AMP 4).

The message goes back over the BYNET to the AMP with the row
and the AMP accesses the data row (in this case, AMP 4).

The row is sent over the BYNET to the PE, and the PE sends the
answer set on to the client application.

As shown in the example above, accessing data with a USI is


typically a two-AMP operation. However, it is possible that the
subtable row and base table row could end up being stored on the
same AMP, because both are hashed separately. If both were on the
same AMP, the USI request would be a one-AMP operation.

Data Access With NUSI:

The SQL is submitted, specifying a NUSI (in this case, a last name of
"Smith").

The hashing algorithm calculates a row hash value for the NUSI (in
this case, 567).

All AMPs are activated to find the hash value of the NUSI in their
index subtables. The AMPs whose subtables contain that value
become the participating AMPs in this request (in this case, AMP1
and AMP2). The other AMPs discard the message.

Each participating AMP locates the row IDs (row hash value plus
uniqueness value) of the base rows corresponding to the hash value
(in this case, the base rows corresponding to hash value 567 are
640, 222, and 115).

The participating AMPs access the base table rows, which are
located on the same AMP as the NUSI subtable (in this case, one row
from AMP 1 and two rows from AMP 2).

The qualifying rows are sent over the BYNET to the PE, and the PE
sends the answer set on to the client application (in this case, three
qualifying rows are returned).

1.3.
Join Index (JI)
===========

A Join Index is an optional index which may be created by a User. Join


indexes provide additional processing efficiencies:
Duplicate

Blob

Cash Back

Duplication

Indexes

Possibly

Quick weight loss tips

Eliminate base table access

Eliminate aggregate processing

Reduce joins

Eliminate redistributions

Eliminate Summary processing

The following are wide variety of Join Index names. We will discuss each in
clear detail.
1. Single Table Join Index
2. Multi-Table Join Index
3. Multi-Table Compressed Join Index
4. Aggregate Join Index

5. Sparse Join Index


6. Global Join Index
7. Hash Index
A Single-Table Join Index duplicates a single table, but changes the
Primary Index. Users will only query the base table, but the Parsing Engine
will use the Join Index.

The reason to create a Single-Table Join Index is so joins can be performed


faster because no Redistributions or Duplication needs to occur.
A Multi-Table Join index is a Join Index that involves two or more tables.
Facilitates join operations by possibly eliminating join processing or by
reducing/eliminating join data redistribution.

A Compressed Join Index is designed to save space by not REPEATING


the repeating values

An Aggregate Join Index will allow tracking of the Aggregates SUM and
COUNT on any table

A Sparse Join Index is a Join Index that doesnt use every row because it
has a WHERE Clause. This is done to save space and time.

Hash Indexes are used similar to a Join Index, but Hash Indexes are
maintained in AMP-Local tables and used to quickly find certain key
columns in a base table.

Join Index Details:

Max 64 columns per table per Join Index.

BLOB and CLOB types cannot be defined.

Triggers with Join Indexes are allowed V2R6.

After Restoring a Table, Drop and Recreate the Join Index.

Automatically updated as table changes.

Fast load/Multi load wont load with them.

can have NUPI and NUSI.

Collect statistics on Primary and Secondary.

2. Performance tuning in depth?


A) Explain the EXPLAIN: Check for EXPLAIN plan to see how exactly
Teradata will be executing the query. Try to understand basic keywords in
Explain Plan like confidence level, join strategy used, re-distribution
happening or not.
B) Collect STATS: The stats of the columns used join conditions should
updated. Secondary Indexes without proper STATS can be of little or no
help. Check for STATS status of the table.
C) Use Proper PI: If the Primary index is not properly defined in any one

or all of the tables in the query. Check if the PI of target table loaded is
unique.
D) Use PPI: If there is Partition Primary Index created on a table, try to
use it. If you are not using it in filter condition, it will degrade the
performance.

E) No FUNCTIONS in Conditions:
Duplicate
Try to avoid using function in join conditions. Ex Applying COALESCE or
TRIM etc causes high CPU consumption.
F) Use PPI: If Partition Primary Index is defined in tables try to use it. If
you are not using it in filter condition, it will degrade the performance.
G) Same column DATA TYPES: Define same data type for the joining
columns.
H) Avoid IN clause in filter conditions: When there can be huge
number of values in where conditions, better option can be to insert
such values in a volatile table and use volatile table with INNER JOIN in
the main query.
I) Use Same PI in Source & Target: PI columns also can help in
saving the data into disk .If the Source and Target have the same PI,
data dump can happen very efficiently form source to target.
J) Collect STATS on VOLATILE table: Collect stats on volatile tables
where required can save AMPCPU. Remove stats if already present
where it is not getting used.
If the volatile table contains UNIQUE PI, then go for sample stats rather
than full stats.
K) DROPPING volatile tables explicitly: Once volatile tables is no
more required you can drop those. Dont wait for complete procedure
to be over. This will free some spool space immediately and could
prove to be very helpful in avoiding No More Spool Space error.
L) NO LOG for volatile tables: Create volatile tables with NO LOG
option.
M) Check DBQL Stats: Keep your performance stats accessible.
Target the most AMPCPU consuming query first.
N) UPDATE clause: Do not write UPDATE clause with just SET
condition and no WHERE condition. Even if the Target/Source has just
one row, add WHERE clause for PI column.

O) DELETE & INSERT: Sometimes replacing UPDATE with DELETE &


INSERT can save good number of AMPCPU. Check if this holds good for
your query.
P) Query SPLITS: Split queries into several smaller queries logically
and use volatile tables with proper PI.
Q) Try MSR: If same target table is loaded multiple times, try MSR for
several sections. This will speed the final MERGE step into target table
and you may see good CPU gain.
R) Try OLAP Functions: Check if replacing co-related sub query with
OLAP function may result in AMPCPU saving.
S) Avoid DUPLICATE data: If the join columns in the tables involved
in the query have duplicates. Use Distinct or Group by, load into a
volatile table, collect stats and use the volatile table.
T) Use Proper JOINS: If joins used, dont use right outer, left or full
joins where inner joins is sufficient.
U) User proper ALIAS: Check the aliases in the joins. Small mistake
could lead to a product join.
V) Avoid CAST: Avoid unnecessary casting for DATE columns. Once
defined as DATE, you can compare date columns against each other
even when they are in different format. Internally, DATE is stored as
INTEGER. CAST is required mainly when you have to compare VARCHAR
value as DATE.
W) Avoid UDF: Most of the functions are available in Teradata for data
manipulations. So avoid User Defined Functions
X) Avoid FULL TABLE SCAN: Try to avoid FTS scenarios like SI should
be defined on the columns which are used as part of joins or Alternate
access path. Collect stats on SI columns else there are chances where
optimizer might go for FTS even when SI is defined on that particular
column
Y) Avoid using IN/NOT IN: For large list of values, avoid using IN
/NOT IN in SQLs. Write large list values to a temporary table and use
this table in the query
Z) Use CONSTANTS: Use constants to specify index column contents
whenever possible, instead of specifying the constant once, and joining
the tables. This may provide a small savings on performance.

3. Data recovery and protection options...

3.1.
Transient journal
==============
Is an automatic feature that provides Data Integrity
Automatic rollback of changed rows in the event of transaction
failure
Data is always returned to its original state after a transaction
failure.
Takes Before Image (BI) of changes for rollback purpose
BI is stored in AMPs transient journal
AMPs transient journals are maintained in DBC users Perm Space.
When the transaction is committed, the BI in transient journal is
purged automatically
When a transaction fails
User receives failure message
Transaction is rolled back
Locks are released
Spool files are discarded
3.2.
Fallback
=======

Fallback table is available in the event of an unavailable AMP (Single


AMP)
A Fallback row is a copy of a Primary row which is stored on a
different AMP.
Automatic restore of data changed during AMP off-line.
Create table with or without Fallback
Add/drop Fallback feature any time
Cost of Fallback
Twice the disk space for table storage.
Twice the I/O for Inserts, Updates and Deletes.
A hardware (disk) or software (vproc) failure causes an AMP to be
taken off-line until the problem is corrected. During this period,
Fallback tables are fully available to users. When the AMP is brought
back on-line, the associated vdisk is refreshed to reflect any
changes during the off-line period.
3.2.1.
Fallback cluster
============

Fallback is always associated with clusters


A Fallback cluster is a defined number of AMPs which are treated as
a single fault tolerant unit.
All Fallback rows for AMPs in a cluster must reside within the cluster.
Loss of one AMP in the cluster permits continued table access.
Loss of two AMPs in the cluster causes the RDBMS to halt.

A cluster is a group of AMPs that act as a single fallback unit.


Clustering has no effect on the distribution of the Primary rows
of a table. The Fallback row copy however, will always go to a
different AMP in the same cluster.
Cluster sizes are set through a Teradata console utility, and may
range from 2 to 16 AMPs per cluster (not all clusters in a system
have to be the same size).

3.3.
Down AMP recovery journal (DARJ)
============================
DARJ is started on all AMPs in a cluster when an AMP is down
DARJ keeps track of all changes that would have been written to the
failed AMP.
When the AMP comes back online, the DARJ will catch-up the AMP by
applying the missed transactions.
Once everything is caught up, the DARJ is dropped
After the loss of any AMP, a Down-AMP Recovery Journal is started
automatically. Its purpose is to log any changes to rows which
reside on the down AMP. Any inserts, updates, or deletes affecting
rows on the down AMP, are applied to the Fallback copy within the
cluster. The AMP which holds the Fallback copy logs the row-id in its
Recovery Journal.
This process continues until such time as the down AMP is brought
back on-line. As a part of the restart activity, the Recovery Journal is
read and changed rows are applied to the recovered AMP. When the
journal has been exhausted, it is discarded and the AMP is brought
on-line fully recovered
3.4.
Cliques
======

Clique (pronounced as cleek) is a group of nodes that share the


same disk array.
Clique provides protection against the failure of an entire node.
If a node fails, VPROCs migrate to other nodes in the same clique
and still have access to their vdisks.
All AMPs and only LAN attached PEs migrate. Channel attached PEs
do not migrate.

3.5.
Permanent journal
===============

The Permanent Journal is an optional, user specified, systemmaintained journal which is used for recovery of a database to a
specified point in time.
The Permanent Journal:
Is used for recovery from unexpected hardware or software
disasters.
May be specified for one or more tables
Permits capture of Before Images for database rollback.
Permits capture of After Images for database roll forward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.

There are four image options for the Permanent Journal :


1. Before Journal
2. After Journal
3. Dual Before Journal
4. Dual After Journal
Syntax for creating table with Fallback and Permanent Journal
CREATE TABLE CUSTOMER,
FALLBACK,
BEFORE JOURNAL,
AFTER JOURNAL
(
C1 INTEGER;
..
.
)
UNIQUE PRIMARY INDEX(C1 );
3.6.
RAID
=====

RAID Redundant Array of Independent Disks provides protection


against a disk failure
Teradata uses RAID-1
RAID 1 Transparent Mirroring
Provides high data availability and performance, but storage
costs are high.
Characteristics:
Data is fully replicated
Mirrored striping is possible with multiple pairs of disks in a
drive group
Transparent to operating system
Advantages:
Maximum data availability, read performance gains
No performance penalty with write operations
Fast recovery and restoration
Disadvantages:
50% of disk space for mirrored data
FALLBACK and RAID-1 provides highest level of protection

3.7.
Locks
=====

Locking prevents multiple users who are trying to change the same
data at the same time from violating the data's integrity. This
concurrency control is implemented by locking the desired data.
There are four types of locks:
Exclusive - prevents any other type of concurrent access
Write
- prevents other reads, writes, exclusives
Read
- prevents writes and exclusives
Access
- prevents exclusive only
Locks may be applied at three database levels:
Database
- applies to all tables/views in the database
Table/View
- applies to all rows in the table/views
Row Hash
- applies to all rows with same row hash
Exclusive locks are only applied to databases or tables but never to
rows. They are the most restrictive type of lock; all other users are
locked out. Exclusive locks are used rarely, most often when
structural changes are being made to the database.
Write locks enable users to modify data while locking out all other
users except readers not concerned about data consistency (Access
lock readers). Until a Write lock is released, no new read or write
locks are allowed.
Read locks are used to ensure consistency during read operations.
Several users may hold concurrent read locks on the same data,
during which no modification of the data is permitted.

Access locks can be specified by users who are not concerned about
data consistency. The use of an access lock allows for reading data
while modifications are in process. Access locks are designed for
decision support on large tables that are updated only by small
single-row changes. Access locks are sometimes called stale read
locks, i.e. you may get stale data that hasnt been updated.

4. Types of Tables
=================
A two-dimensional structure of columns and rows of data
Permanent Tables -Requires Perm Space
SET : No duplicate rows
CREATE SET TABLE EMP_TABLE
(
EMP_NO INTEGER
,DEPT_NO INTEGER
,FIRST_NAME VARCHAR(20)
,LAST_NAME CHAR(20)
,SALARY DECIMAL(10,2)
,ADDRESS VARCHAR(100)
)
PRIMARY INDEX(EMP_NO);
MULTISET : duplicate rows allowed
CREATE MULTISET TABLE EMP_TABLE
(
EMP_NO INTEGER
,DEPT_NO INTEGER
,FIRST_NAME VARCHAR(20)
,LAST_NAME CHAR(20)
,SALARY DECIMAL(10,2)
,ADDRESS VARCHAR(100)
)
PRIMARY INDEX(EMP_NO);
Temporary Tables
Derived Tables
Requires Spool Space
Volatile Tables
Requires Spool Space
Global Temporary Tables Requires Temp Space

4.1.
Derived Tables
===========

Derived tables are temporary tables that are created within a users
SQL query and deleted when the query is done.
Spool space is used to materialize the data
Do not create DDL and hence not part of DD
SELECT ENAME, EMP. DEPTNO,SAL, AVGSAL

FROM EMP,
(SELECT DEPTNO, AVG(SAL) AS AVGSAL FROM EMP GROUP BY 1) AS
DEPT_AVG
WHERE EMP.DEPTNO = DEPT_AVG.DEPTNO
AND SAL > AVGSAL;

ENAME
---------FORD
BLAKE

DEPTNO
----------20
30

SAL

AVGSAL
----------------3000.00
2175.00
2850.00
1566.67

4.2.
Global Temporary table
======================

Created with CREATE GLOBAL TEMPORARY TABLE


DDL is stored in DD
Date is populated with INSERT/SELECT uses TEMP space
Life time of data is till either transaction or session ends
Definition is Global - shared across sessions/users
Data is private to the session
Can collect statistic on GTT
CREATE GLOBAL TEMPORARY TABLE DEPT_REPORT
(DEPTNO INTEGER,
SUM_SAL DECIMAL(10,2),
AVG_SAL DECIMAL(10,2),
CNT_EMPS INTEGER)
ON COMMIT PRESERVE ROWS;

4.3.
Volatile table
===============
Created with CREATE VOLATILE TABLE
DDL is not stored in DD
Date is populated with INSERT/SELECT uses Spool space
Drops automatically after the session is over
Life time of data is till either transaction or session ends
Both definition and data is private to the session
Cannot collect statistic on Volatile tables
CREATE VOLATILE TABLE DEPT_REPORT
(DEPTNO INTEGER,
SUM_SAL DECIMAL(10,2),
AVG_SAL DECIMAL(10,2),
CNT_EMPS INTEGER)
ON COMMIT PRESERVE ROWS;
5. Compression, Data type attributes....

5.1.

Compression

===========
Compression is the feature of Teradata which helps to reduce the
disk storage. It is useful in reducing the disk space required by
FIXED LENGTH columns or NULLS. It is a loss less compression so no
data will be lost in it.
We can do compression on the following data types

Nulls, Zeros, Blanks. (Note all are different values and never
consider them as same)

Any numeric data type. (INTEGER, DECIMAL, FLOAT, DOUBLE, SMALL


INT, BYTE INT)

DATE

CHARACTER(up to 255 characters)

Syntax of Compression
CREATE TABLE EMP_ADDRESS(
Address VARCHAR(50),
City CHAR(20) COMPRESS (Bangalore),
StateCode CHAR(2)
);
Guidelines of Compression

Columns must be FIXED LENGTH (not variable length) and it must


be less than or equal to 255 characters.

The compressed columns cannot be a part of PRIMARY INDEX .

We can compress upto 256 constant values, which means for a


single FIXED LENGTH column we can define max 256 values to be
compressed.

NULL will be compressed by default if you choose that column for


compression.
NOTE Maximum 256 values allowed in Teradata is including NULL,
which means that you can define 255 unique values for that column
(255 unique values + 1 NULL value = 256).

Compression is case sensitive which mean if you compressed


TERADATATECH it will not compress teradatatech.

Difference between compression and VARCHAR


VARCHAR will be more efficient when the difference of maximum and
average field length is high and compressibility is low.
Compression and fixed CHAR will be more efficient when the
difference of maximum and average field length is low and
compressibility is high.

6. Sub queries, set operations, joins, views macros and


exercises...
6.1.
Subquery
=======
A subquery is a SELECT statement embedded in a clause of another
SQL statement.
SELECT
select_list
FROM
table
WHERE
expr operator
(SELECT
select_list
FROM
table);
The subquery (inner query) executes once before the main query.
The result of the subquery is used by the main query (outer query).
SELECT last_name
FROM employees
WHERE salary >
(SELECT salary
FROM employees
WHERE last_name = VIJAYENDRA');
You can build powerful statements out of simple ones by using
subqueries. They can be very useful when you need to select rows from a
table with a condition that depends on the data in the table itself.
You can place the subquery in a number of SQL clauses, including:
The WHERE clause
The HAVING clause
The FROM clause
In the syntax: operator includes a comparison condition such as >, =, or
IN

Comparison conditions fall into two classes: single-row operators (>, =,


>=, <, <>, <=) and multiple-row operators (IN, ANY, ALL).
The subquery is often referred to as a nested SELECT, sub-SELECT, or
inner SELECT statement. The subquery generally executes first, and its
output is used to complete the query condition for the main or outer
query.
Additionally, subqueries can be placed in the CREATE VIEW statement,
CREATE TABLE statement, UPDATE statement, INTO clause of an INSERT
statement, and SET clause of an UPDATE statement.

Correlated subqueries are used for row-by-row processing. Each


subquery is executed once for every row of the outer query.

The server performs a correlated subquery when the subquery


references a column from a table referred to in the parent
statement. A correlated subquery is evaluated once for each row
processed by the parent statement. The parent statement can be a
SELECT, UPDATE, or DELETE statement.

Nested Subqueries Versus Correlated Subqueries

With a normal nested subquery, the inner SELECT query runs first
and executes once, returning values to be used by the main query. A
correlated subquery, however, executes once for each candidate
row considered by the outer query. In other words, the inner query is
driven by the outer query.

6.2.
Joins
=======
INNER JOIN
All matching rows
LEFT OUTER JOIN
Table to the left is used to qualify, table on the right has nulls
when rows do not match.
RIGHT OUTER JOIN
Table to the right is used to qualify, table on the left has nulls
when rows do not match.
FULL OUTER JOIN
Both tables are used to qualify and extended with nulls.
CROSS JOIN
Each row of one table is matched with each row of the other
table
6.3.
Views
=======

A view is a window into the data contained in relational tables.


A view is sometimes called a virtual table.
It may define a subset of rows of a table.
It may define a subset of columns of a table.
It may reference more than one table.
Data is neither duplicated nor stored separately for a view.
Data can be accessed directly via a table or indirectly via a view,
based on privileges held.
View definitions are stored in the Data Dictionary, not in the users
own space.
It provides customized access to base tables by:
Restricting which columns are visible from base tables.
Restricting which rows are visible from base tables.
Combining columns and rows from several base tables.

Create a view of the employees in department 10 to be used for


both read and update purposes. Limit the view to an employees
number, name, and salary.

CREATE VIEW EMP_10 AS


SELECT EMPNO, ENAME, SAL
FROM EMP
WHERE DEPTNO = 10;

SELECT from view :

SELECT * FROM EMP_10 ORDER BY ENAME;

UPDATE row through the view

UPDATE EMP_10 SET SAL =5000


WHERE EMPNO=7788;

6.4.
Macros
======
A macro is a Teradata extension to ANSI SQL that contains
prewritten SQL statements. The actual text of the macro is stored in
a global repository called the Data Dictionary (DD). Macros are
database objects and thus they belong to a specified user or
database. They may contain one or more SQL statements. Macros

have special efficiencies in the Teradata environment and are highly


desirable for
building reusable queries.
A macro allows you to name a set of one or more statements. When
you need to execute those statements, simply execute the named
macro.
Macros provide a convenient shortcut for executing groups of
frequently-run SQL statements.
Characteristics of a macro include the following:
Can contain one or more SQL statements
Can contain certain BTEQ commands
Can contain comments
Are stored in the Data Dictionary

Macros contain one or more prewritten SQL statements.


Macros are a Teradata extension to ANSI SQL.
Macros are stored in the Teradata Data Dictionary.
Macros can be executed from any viable SQL front-end, including:
Teradata SQL Assistant
BTEQ
LOGON Startup
Another macro
To execute a macro requires the user to have the EXEC privilege on
the macro.
Explicit privileges on the tables or views used by the macro are not
needed by the executing user.
Cannot have DDL in a macro unless it is the only statement in the
macro.

CREATE MACRO DEPT_10 AS

(SELECT EMPNO, ENAME,DEPTNO,SAL

FROM EMP

WHERE DEPTNO=10

ORDER BY EMPNO;

);

EXEC DEPT_10;

EMPNO ENAME

DEPTNO

----------- ---------- -----------

7782 CLARK

7839 KING

--------10

10

SAL

2450.00
5000.00

7934 MILLER

10

DROP MACRO DEPT_10;

To change and existing macro

1300.00

SHOW MACRO MACRO_NAME;

SHOW MACRO DEPT_10;


CREATE MACRO DEPT_10 AS
(SELECT EMPNO, ENAME,DEPTNO,SAL
FROM EMP
WHERE DEPTNO=10
ORDER BY EMPNO;
);

Copy the DDL

Change CREATE MACRO to REPLACE MACRO and make other


required changes

Paste the modified DDL and run

EXEC MACRO_NAME; to test the modified macro.

Parameterized macros allow substitutable variables.

Values for these variables are supplied at runtime.

CREATE MACRO DEPT_MACRO(DEPT INTERGER) AS


(SELECT EMPNO, ENAME,DEPTNO,SAL
FROM EMP
WHERE DEPTNO= :DEPT
ORDER BY EMPNO;
);
EXEC DEPT_MACRO(20);

7. Normal functions and analytical functions


=====================================

7.1.

Normal Functions
=============
SELECT DATE;
SELECT CURRENT_DATE;
SELECT TIME;
SELECT CURRENT_TIME;
SELECT CURRENT_TIMESTAMP;
7.2.
Aggregate functions
================

COUNT Counts the rows

SUM Sums up the values of the specified column(s)

MAX Returns the large value of the specified column

MIN Returns the minimum value of the specified column

AVG Returns the average value of the specified column

7.3.

Analytical functions

======================

SUM/CSUM
: These functions are used to compute a cumulative
sum of a particular group of rows. SUM also can be used to simply
calculate group sum. For moving sum use MSUM.
COUNT
: To calculate cumulative or moving count.
AVG
: Similarly to compute the moving average use the
AVG or MAVG function.
MDIFF
: To see the difference between the current row
(column) and the preceding nth row (column) value. If you want to see the
sales numbers (increasing or decreasing) on a daily basis, use this
function.
MLINREG
: To project the next value in a series based on the
data pattern present in the series.
QUANTILE
: To divide the result set into partitions with equal
number of rows present in each partition.
RANK
: This function is used to display the ordered rank of
all rows in a particular group.
PERCENT_RANK:- To find out relative rank of a row in a group use
PERCENT_RANK.
ROW_NUMBER:- To get the sequential row number of the row within
its data subset.

MAX/MIN
value in a group.

: To calculate the maximum or minimum cumulative

Das könnte Ihnen auch gefallen