Beruflich Dokumente
Kultur Dokumente
Document Type Date Created Current Version Date Last Updated Authors Approved By Approved Date Prepared By
Database TuningTD12 Features guide December 1, 2008 Version 1.1 January 19, 2009 ADMIN COE MS Team Manish Korgaonkar January 19, 2009 GCC India ( Mumbai) ADMIN COE MS Team
Preface
Purpose:
This document will take you through the enhancements in V2R12.0.0 as compared to V2R6.1.X.
Audience:
The primary audience includes database and system administers and Application developers.
Prerequisites:
You should be familiar with TD V2R6.1.X features.
Table of Contents
1 Introduction......................................................................................6 2 Performance Enhancements...............................................................7
2.1 Collect Statistics Improvements...................................................................................................7 2.1.1 Better Cardinality Estimate....................................................................................................8 2.1.2 Improved AMP-level statistics..............................................................................................16 2.2 Parameterized Statement Caching............................................................................................18 2.2.1 Where this feature applies ..................................................................................................19 2.2.2 Where this feature does not apply.......................................................................................19 2.2.3 How this feature works........................................................................................................20
3 Database Enhancements..................................................................22
3.1 Restartable Scandisk.................................................................................................................22 3.1.1 Usage..................................................................................................................................22 3.2 Check table Enhancements ......................................................................................................30 3.2.1 Usage of Check table .........................................................................................................30 3.2.2 Differences among Checking Levels...................................................................................30 3.2.3 New Features in Teradata 12.0 ..........................................................................................31 3.2.4 Checktable checks compressed values...............................................................................31 3.3 Software Event Log....................................................................................................................33
4 Security Enhancements....................................................................34
4.1 Password Enhancements..........................................................................................................34 4.1.1 Password Enhancements....................................................................................................34 4.1.2 How does it Work?..............................................................................................................36 4.1.3 Data Dictionary Modifications..............................................................................................36
5 Enhancements to Utilities.................................................................37
5.1 Normalized Resusage/DBQL data and Ampusage views for coexistence systems...................37 5.1.1 Resusage Data....................................................................................................................38 5.1.2 DBQL Data..........................................................................................................................38 5.1.3 Ampusage View...................................................................................................................39 5.2 Tdpkgrm New option to remove all non-current TD packages................................................40 5.3 MultiTool New DIP option........................................................................................................42
7.2.4 Support for UTF -16.............................................................................................................63 7.2.5 TPT API can now be called from an XSP............................................................................63
8 TASM ..............................................................................................65
8.1 Query Banding...........................................................................................................................65 8.1.1 How to Set a Query Band....................................................................................................65 8.1.2 Query Band and Workload Management............................................................................69 8.1.3 Using Query banding to Improve Resource Accounting......................................................78 8.1.4 Using Both the Session and Transaction Query Bands.......................................................79 8.2 State Matrix................................................................................................................................81 8.2.1 Managing the System through a State Matrix......................................................................81 8.2.2 System Conditions...............................................................................................................82 8.2.3 Operating Environments......................................................................................................86 8.2.4 State....................................................................................................................................88 8.2.5 Events.................................................................................................................................89 8.2.6 Periods:...............................................................................................................................94 8.3 Global Exception / Multiple Exception......................................................................................104 8.3.1 Global Exception Directive.................................................................................................104 8.3.2 Multiple Global Exception Directive...................................................................................109 8.4 Utility Management..................................................................................................................112
9 Usability Features.........................................................................114
9.1 Complex Error Handling..........................................................................................................114 9.2 Multilevel Partitioned Primary Index .......................................................................................117 9.2.1 Features of MLPPI.............................................................................................................117 9.2.2 MLPPI Table joins and Optimizer join plans......................................................................125 9.3 Schmon Enhancements .........................................................................................................130 9.3.1 Comparison of Options available in TD6.1 and TD12.0.....................................................130 9.3.2 Delay Modifier (-d) ............................................................................................................131 9.3.3 Display PG Usage.............................................................................................................133 9.4 Enhanced Explain Plan Details ..............................................................................................134 9.5 DBC Indexes Contains Join Index ID.......................................................................................140 9.6 List all Global Temporary Tables..............................................................................................142 9.7 ANSI Merge.............................................................................................................................143
1 Introduction
Teradatas mission is to provide an integrated, optimized, and extensible enterprise data warehouse solution to power better, faster decisions. Teradata 12.0 is a highly integrated solution that continues to advance Teradata further along in this mission. Teradata 12.0: Extends its lead in enterprise intelligence by supporting both strategic and operational intelligence. Continues to be the only true choice for concurrently using detailed data in operational applications, while using business intelligence, and deep analytics to direct business. Strengthens business logic processing capability, high availability, and performance of the EDW and ADW foundations. Enhances query performance. Advances its enterprise fit characteristics, including partner friendliness and ease of enterprise integration. Improves availability, supportability, and security.
2 Performance Enhancements
2.1 Collect Statistics Improvements
Internal enhancements to the way statistics are collected will capture a larger quantity of data demographic information with more accuracy, so that the Optimizer can create better query execution plans.
Description:
This feature provides the following benefits: Improved decision support (DSS) query performance as a result of improved query execution plans. Query performance consistency with new releases of Teradata. Faster results of extremely complex, highly analytical DSS queries.
The statistics collection improvements allow the Optimizer to better estimate the cardinality (number of elements) in the data in the following ways: Increased number of statistics intervals Improved statistics collection for multi-column NULL values Improved AMP-level statistics
Statistics are stored as a histogram (collection of occurrences of values), and the more granular the statistics, the better the query execution plans can be. In Teradata 12, the maximum number of intervals has been increased from 100 to 200, providing the Optimizer with a more detailed picture of the actual column data distribution for estimating purposes.
In TD 6.1 Statistics are collected in only 100 Intervals as displayed in above Screen Shot.
In TD 12.0 Statistics are collected in 200 intervals as displayed in above Screen shot.
Advantages: Statistics are stored as a histogram (collection of occurrences of values), and the more granular the statistics, the better the query execution plans can be. In Teradata 12, the maximum number of intervals has been increased from 100 to 200, providing the Optimizer with a more detailed picture of the actual column data distribution for estimating purposes. Certain types of queries will experience improved performance, specifically: Queries involving a JOIN operation between tables Queries with conditions that have many distinct values Queries with a large number of IN-list values.
Example: To see Improved Statistics count, suppose there are statistics collected on the table below:
Teradata 6.1: Statistics would indicate four NULL rows and two rows with unique values. Teradata 12.0 and above: The statistics more accurately indicate two all-NULL rows and four rows with unique values.
For Comparison between TD6.1 and TD 12.0 Please use below Scripts:
CollectStats_All_NUL LS.txt
CollectStats_UNIQUE _Values.txt
Advantages:
With the more refined count of all-NULL rows, the Optimizers join plans are improved, especially for large tables where a significant number of rows have Nulls. In addition, any data redistribution effort is more accurately estimated.
This improvement does not change procedures for collecting or dropping statistics or any associated timing for collecting statistics.
Prior to Teradata 12, average rows-per-value (RPV) statistics were obtained using a probability model, which often underestimated the actual rows. In Teradata 12, this measure is calculated exactly using the following formula:
Advantages:
The new RPV calculation formula makes the cost estimates for joins much more accurate. This improvement does not change procedures for collecting or dropping statistics or any associated timing for collecting statistics.
For Comparison between TD6.1 and TD 12.0 Please use below Scripts:
In TD 12.0:
2.2
This internal feature improves the logic that caches optimized plans for parameterized queries (SQL statements that include variables). In previous releases, the Optimizer did not evaluate the value of the parameter (the actual USING, CURRENT_DATE, or DATE value) when creating the query plan for a parameterized request. If the same request was resubmitted with different parameters, the old cached plan was used, often generating suboptimal plans. Query plans that remain the same regardless of the parameter values are called generic plans. With Teradata 12, the Optimizer first determines whether the request would benefit if the parameter values are evaluated. If so, then the Optimizer will include the parameter values when optimizing the
request. Plans in which the Optimizer peeks at parameter values and generates a plan optimized for those values are called specific plans. For example, the Optimizer considers the user-supplied product_code value when generating a plan for the following request: USING (x INT) SELECT * FROM SalesHistory WHERE product_code =: x OR store_number = 56;
With this feature, performance improvements have been observed in the following situations: Partition Elimination Sparse Join Index access NUSI access Join plans
These queries are already highly optimized, and any evaluation of the parameter value is redundant and/or would not change the query plan.
The CURRENT_DATE variable will be resolved for all queries, parameterized or otherwise, and replaced with the actual date prior to optimization. This will help in generating a more optimal plan in cases of partition elimination, sparse join indexes, or NUSIs that are based on CURRENT_DATE. For a parameterized request that uses CURRENT_DATE, a generic plan with CURRENT_DATE resolved will be referred to as DateSpecific Generic Plan. Similarly for a parameterized request that uses the CURRENT_DATE, a specific plan with CURRENT_DATE resolved will be referred to as DateSpecific Specific Plan.
The explain text, for queries for which CURRENT_DATE has been resolved, will show the resolved date in TD 12.0.
In TD 6.1:
Explain select * from retail.item where l_receiptdate= current_date;
Explanation
1) for 2) First, we lock a distinct retail."pseudo table" for read on a RowHash to prevent global deadlock retail.item. Next, we lock retail.item for read.
3) We do an all-AMPs RETRIEVE step from retail.item by way of an all-rows scan with a condition of ("retail.item.L_RECEIPTDATE = DATE") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with no confidence to be 6,018 rows. The estimated time for this step is 0.58 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.58 seconds.
In TD 12.0:
Explain select * from retail.item where l_receiptdate= current_date;
Explanation 1) First, we lock a distinct retail."pseudo table" for read on a RowHash to prevent global deadlock for retail.item. 2) Next, we lock retail.item for read.
3) We do an all-AMPs RETRIEVE step from retail.item by way of an all-rows scan with a condition of ("retail.item.L_RECEIPTDATE = DATE '2008-12-01'") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with no confidence to be 6,018 rows (806,412 bytes). The estimated time for this step is 0.59 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.59 seconds.
3 Database Enhancements
3.1 Restartable Scandisk
3.1.1 Usage
Scandisk is a diagnostic tool designed to check for inconsistencies between the key file system data structures such as the Master Index, Cylinder Index and the Data blocks. As an administrator, you can perform this procedure as preventive maintenance to validate the file system, as part of other maintenance procedures, or when users report file system problems. The SCANDISK command: Verifies data block content matches the data descriptor. Checks that all the sectors are allocated to one and only one of the Bad sector list, the free sector list, or a data block. Ensures that the continuation bits are flagged correctly.
In TD12.0, with the Restartable Scandisk feature, Scandisk utility can be restarted either from a defined point or from the last table scanned. Restartability allows you to halt the Scandisk process during times of extremely heavy system use and then restart it at some later time (e.g. off-peak hours)
Syntax:
SCANDISK TAB[L[E]] starting_tableid [ starting_rowid ] [ TO ending_tableid [ ending_rowid]]
We do see the comparison of the options available in both the versions by giving the command (Screen shots shown below).
In TD12:
Syntax: SCANDISK [ /dispopt ] [ { DB | CI | MI | FREECIS | WMI | WCI | WDB } ] [ /dispopt ][INQUIRE <interval>] [ { NOCR | CR } ] Where CI --Scans the MI and CIs. If the scope is AMP or all tables, rather than selected tables, the free CIs are also scanned. DB-- Scans the MI, CIs, and DBs. This is the default for the normal file system, which can be overridden by the CI, MI, or FREECIS options. If the scope is AMP or all tables, rather than selected tables, the free CIs are also scanned. FREECIS --- Scans the free CIs only. This option also detects missing WAL and Depot cylinders. MI ---- Scans the MI only. WCI --- Scans the WMI and WCIs.
WDB --- Scans the WMI, WCIs, and WDBs. This is the default for the WAL log, which can be overridden by the WCI or WMI options. WMI --- Scans the WMI only. INQUIRE --- Reports SCANDISK progress as a percentage of total time to completion and display the number of errors encountered so far. Interval --- An integer which defines the time interval, in seconds, to automatically display SCANDISK progress. CR ----specifies to use Cylinder reads NOCR --- Specifies to use regular data block preloads, instead of cylinder reads.
CR option:
When we specify the SCANDISK CR option, the utility uses the cylinder reads. This option is not supported in TD6.1 (Screenshot shown below).
In TD6.1:
In TD12.0:
In TD6.1:
Scandisk wdb:
Scandisk wmi:
3.2
Check Table is a console-startable utility and a diagnostic tool for Teradata DBS software. Check Table checks for inconsistencies among internal data structures, such as table headers, row identifiers and secondary indexes. Although Check Table identifies and isolates data inconsistencies and corruption, it cannot repair inconsistencies or data corruption.
3.2.4 Checktable checks compressed values In TD6.1, we do see from the below screenshot
In TD12.0:
Checks Compressed Values Usage The feature can be invoked from the Teradata Manager Check table Utility Menu Item.
3.3
During log system processing in Teradata Database 12.0, all Teradata messages are captured in the software event log so that all messages are available in one place.
4 Security Enhancements
4.1 Password Enhancements
TD6.1 supported two forms of password encryption from previous releases, namely DES and SHA-256 truncated to 27. (This support will continue) Teradata Database 12.0 includes the following password enhancements: Full implementation of 32-byte Secure Hashing Algorithm (SHA) 256 encrypted passwords. A user-customizable list of words restricted for password use.
All passwords created on old releases will continue to work and will be changed to full SHA-256 encryption when next modified. The restricted passwords feature includes the new column, PasswordRestrictWords, in the table DBC.SysSecDefaults, having the following possible, single character values: N, n = Do not restrict any words from being contained within in a password string. This is the default. Y, y = Restrict any word (case independent) that is listed in DBC.PasswordRestrictions from being a significant part of a password string.
Default Value: The default value is 30. Maximum number of characters allowed in a valid password is 30. The parameter PasswordMaxChar in dbc.SysSecDefaults sets the maximum characters allowed for a valid password. Screen shot below shows the table DBC.SysSecDefaults by which we come to know that PasswordRestrictWords column was missing in TD6.1
In TD6.1:
In TD12.0:
How to enable this feature: This feature can be enabled in two ways: System-wide by using DBC.SysSecDefaults. the new column, PasswordRestrictWords, in the table
By user profile using the New CREATE/MODIFY PROFILE syntax clause RESTRICTWORDS = 'Y' | 'N'. The default is N.
SELECT * FROM DBC.PasswordRestrictions WHERE UPPER (RestrictedWords) = UPPER (StrippedPassword) If there is a match, then reject the password.
DBC.PasswordRestrictions: A new system table with a single column, RestrictedWords. DBC.RestrictedWords: A new view which is created via DIP scripts for access to the system table DBC.PasswordRestrictions. This view will not have PUBLIC access, even for SELECT. DBC.SysSecDefaults: Contains one new field, PasswordRestrictWords. The default value for this field is N.
5 Enhancements to Utilities
5.1 Normalized Resusage/DBQL data and Ampusage views for coexistence systems
Teradata Database 12.0 provides normalized CPU time(which was not present in TD6.1) in Resusage Data, DBQL Data, Ampusage view which provides more accurate performance statistics for mixed node systems , particularly in the areas of CPU skewing and capacity planning. This feature adds the following fields to the ResUsageSpma table: CPUIdleNorm: Normalized idle time. CPUIOWaitNorm: Normalized idle and waiting for I/O completion CPUUServNorm: Normalized user service CPUUExecNorm: Normalized user execution NodeNormFactor: Per node normalization factor
To compare the columns from the ResUsageSpma table in TD6.1 and TD12.0, please find the documents attached below:
To compare the columns from DBQLogTbl and QRYLOG tables from TD6.1 and TD12.0, please find the documents attached below:
dbqlogtbl_TD61.rtf
dbqlogtblTD12.rtf
qrylogTD61.rtf
qrylogTD12.rtf
DBC.DBQLStepTbl and DBC.QRYLOGSTEPS view: CPUtimeNorm: Normalized AMP CPU time for co-existence systems. MaxAmpCPUTimeNorm: Normalized maximum CPU time for AMP. MaxCPUAmpNumberNorm: Number of the AMP with the maximum normalized CPU time. MinAmpCPUTimeNorm: Normalized minimum CPU time for AMP.
To compare the columns from DBQLStepTbl and QRYLOGSteps tables from TD6.1 and TD12.0, please find the documents attached below:
DBQLStepTbl_TD61.r tf
dbqlsteptblTD12.rtf
QRYLogSteps_TD61. rtf
qrylogstepsTD12.rtf
DBC.DBQLSummaryTbl and DBC.QRYLOGSUMMARY view: AMPCPUTimeNorm: Normalized AMP CPU time ParserCPUTimeNorm: Normalized CPU time to parse queries.
To compare the columns from DBQLSummaryTbl and QRYLOGSummary tables from TD6.1 and TD12.0 , please find the documents attached below:
am pusage_TD61.rtf
am pusageTD12.rtf
This feature adds the following field to the Acctg table: CPUNorm: Normalized AMP CPU seconds used by the user and account. CPUNorm = CPU * Scaling Factor The Scaling Factor is gathered from tosgetpma().
In TD6.1:
In TD12.0:
If there are no non current packages, then after running the tdpkgrm a it will give the message No Teradata Software non current version available for removal
5.3
A new DIP option DIPPWD-Password Restrictions has been added to Teradata Database 12.0 as part of the password enhancement feature which was not available in TD6.1 (Screen shots shown below) This feature allows DBA to create list of restricted words that are not allowed in new or modified passwords.
In TD6.1:
DIPSQLJ: The SQLJ database and its views are used by the system to manage JAR files that implement Java external stored procedures. The DIP script used to create the SQLJ database and its objects is called DIPSQLJ. This script is run as part of DIPALL. DIPDEM: Loads tables, stored procedures, and a UDF that enable propagation, backup, and recovery of database extensibility mechanisms (DEMs). DEMs include stored procedures, UDFs, and UDTs that are distributed as packages, which can be used to extend the capabilities of Teradata Database. Note: DIPSQLJ and DIPDEM were supported from TD6.2
Prepared by GCC India (Mumbai) ADMIN-COE MS Team Page 43 of 147
Online archive allows the archival of a running database; that is, a database (or tables within a database) can be archived in conjunction with concurrently executing update transactions for the tables in the database. Transactional consistency is maintained by tracking any changes to a table in a log such that changes applied to the table during the archive can be rolled back to the transactional consistency point after the restore.
Scenario how Online Feature Works We are trying to update a column on the table which is backing up, this we can do in TD12. Created a table named b_t in AU database consisting of more records, in our scenario we have taken around 2 million records. Following is the script to create AU.b_t table. STEP 1: Scripts required for ONLINE Backup, update and restore -
arcTD12data.txt
update_qry.txt
restore.txt
STEP 2: Note the data before starting which you are going to update while taking backup -
DataBeforeBackup.J PG
STEP 3-A: Start the Online backup using the script and start. STEP 3-B: As soon as the archiving of the table is started.
arcm ain_update.J PG
STEP 4: Wait for the Completion of the backup as well as update. Make sure that data Online Log info contains data which is highlighted in the below screenshot
OnlineLogInfo.J PG
The three lines displayed indicate that the table was archived online, the consistency point for that table (e.g. when online logging was started), and how many log bytes and rows were archived (indicating the amount of change in the table during the online archive). These lines are also displayed during a restore, copy, or analyze of the archive, as an indication that the archive was done online. STEP 5: Check for the column data after taking backup in the database, following is the screenshot
DataAfterBackup.J P G
DeleteDataAfterBack up.J PG
STEP 7: Restore the Table using the restore script, following is the screenshot
Restore_Snapshot.J PG
STEP 8: Check for the data after restore, following is the screenshot
DataAfterRestore.J P G
6.1.2 Usage
6.1.2.1 DROP and DELETE Dropping a table on which online archive logging is active is only allowed if the table is not being archived. It is possible to delete a database that already has online archive logging initiated on the database or some tables in the database. Delete database is not possible during the archive.
6.1.2.2 LOGGING Online Archive ON Statement The LOGGING ONLINE ARCHIVE ON statement is defined as LOGGING ONLINE ARCHIVE ON FOR databasename | databasename.tablename, {databasename | databasename.tablename} ] . . .; To execute the LOGGING ONLINE ARCHIVE ON statement, the user name specified in the logon statement must have one of the following privileges: The Archive privilege on the database or table that being logged. The privilege may be granted to a user or an active role for the user. Ownership of the database or table.
6.1.2.3 LOGGING Online Archive OFF Statement The LOGGING ONLINE ARCHIVE OFF statement is defined as LOGGING ONLINE ARCHIVE OFF FOR databasename | databasename.tablename [[, {databasename | databasename.tablename} ] . . .][, OVERRIDE]; This command will delete the log subtables associated with that object. The OVERRIDE option allows online archive logging to be stopped by someone other than the user who sets them. To execute the LOGGING ONLINE ARCHIVE OFF statement, the user name specified in the logon statement must have one of the following privileges: The Archive privilege on the database or table that being logged. The privilege may be granted to a user or an active role for the user. Ownership of the database or table.
6.1.2.4
Archive Statement
An ARCHIVE statement can start online archive with the ONLINE option without being initiated first by the LOGGING ONLINE ARCHIVE ON statement. This allows you to start the online archive immediately. In this case, online archive logging will be started implicitly. The ARCMAIN starts online archive logging on the specified objects before it starts archiving. The ARC statement of ARCHIVE/DUMP has been extended to support the online archive. The SQL DUMP statement doesnt support the new ONLINE option. The online archive feature adds the options ONLINE and KEEP LOGGING to the syntax. ARCHIVE DATA TABLES . . . [, ONLINE] [, KEEP LOGGING] [, RELEASE LOCK ] [, INDEXES ] [, ABORT ] [, USE [ GROUP ] READ LOCK ] [, NONEMPTY DATABASE[S] ] , FILE = name [,FILE = name];
6.1.3.1.1
all-db_TD12.txt
all-db_TD12_log.txt
6.1.3.1.2
single-db_TD12.txt
single-db_TD12_log.t xt
6.1.3.1.3
single-exclude_TD12 .txt
single-exclude_TD12 _log.txt
6.1.3.1.4
m ultiple-exlude_TD1 2.txt
m ultiple-exlude_TD1 2_og.txt
single_table_TD12.tx t
single_table_TD12_lo g.txt
6.1.3.2.2
m ulti-table_TD12.txt
m ulti-table_TD12_log .txt
6.1.3.3 Dictionary Backup In TD12.0 Database, database level dictionary backup and table level dictionary backup are not supported while taking ONLINE backup 6.1.3.3.1 Database level dictionary backup
Following are the backup scripts and the Log files attached, please have a look of the log file which tell us that online database level dictionary backup is not supported.
db_dict_TD12.txt
db_dict_TD12_log.tx t
6.1.3.3.2
Following are the backup scripts and the Log files attached. Please have a look of the log file which tells us that online table level dictionary backup is not supported.
tbl-dict_TD12.txt
tbl-dict_TD12_log.txt
6.1.3.4 Partition Level Backup Please find the below document for the Partition Level backup
PartitionBackup.doc
6.1.3.5 6.1.3.5.1
There is no change in the restore script but you can see in the log file that table has been taken with ONLINE enabled. Please find the below restore and log scripts.
restore_t2.txt
restore_t2_log.txt
6.1.3.5.2
rest_drop_tbl.txt
rest_drop_tbl_log.tx t
In TPT 12.0:
TPT 12.0:
UTF16SUPPORT12.t xt
Teradata PT 12.0 added a new ArraySupport attribute to the Stream Operator that allows the use of the Array Support database feature for DML statements. Array support improves Stream driver performance. By default, this feature is automatically turned on if the database supports it. User action is not needed to take advantage of this feature. With the ARRAYSUPPORT attribute enabled
In TPT 8.1:
Attached is the log file in which we see that the Array support feature is not showing up in the Attribute definitions
ArraySupportTD61_ ON_log.txt
In TD12.0:
Prepared by GCC India (Mumbai) ADMIN-COE MS Team Page 51 of 147
Attached is the log file in which we see that the Array support feature is shown (which was not showing in TPT 8.1)
ArraySupportTD12_ ON_Log.txt
In TPT 8.1:
Attached is the log file in which we see that the Array support feature is not showing up in the Attribute definitions
ArraySupportTD61_ OFF_log.txt
In TD12.0:
Attached is the log file in which we see that the Array support feature is shown (which was not showing in TPT 8.1)
ArraySupportTD12_ OFF_Log.txt
INTERVAL DAY TO HOUR INTERVAL DAY TO MINUTE INTERVAL DAY TO SECOND INTERVAL HOUR TO MINUTE INTERVAL HOUR TO SECOND INTERVAL MINUTE TO SECOND
Prior to Teradata PT 12.0, if you wanted to define a column such as MyTimestamp TIMESTAMP (4), you would have to code it as: MyTimestamp CHAR (24) This character string would be necessary to contain all of the characters of a Timestamp with four precision points. Now, you can simply specify it in the script as: MyTimestamp TIMESTAMP (4)
7.1.4 Delimiters
Delimiters are usually used in conjunction with the Data Connector operator. The following attributes will imply the use of delimiters in the data file: VARCHAR FORMAT = Delimited VARCHAR TextDelimiter = | (this is the default delimiter) VARCHAR ExcapeTextDelimiter = \ (this is the default escape delimiter) If delimiters are expected to be embedded within delimited data they must be preceded by the backslash ('\') escape character or an alternative designated escape character. The TextDelimiter attribute is used to specify the delimiter. The EscapeTextDelimiter attribute is used to change the default escape delimiter to something other than the backslash character (\).
Example:
In TD12.0:
Prepared by GCC India (Mumbai) ADMIN-COE MS Team Page 55 of 147
Script attached has been run with and with out the n options
Load2Stream s.txt
From the screenshots below we clearly see the how n option continues the job even if an error is encountered.
Location of the Checkpoint files: In UNIX environments, checkpoint files can be found at: /usr/tbuild/08.01.00.00/checkpoint or $TWB_ROOT/checkpoint (if you are on a later version of TPT) In Windows environments, they can be found at C:\Program Files\NCR\Teradata Parallel Transporter\8.1\checkpoint or %TWB_ROOT%\checkpoint (if you are on a later version of TPT)
RATE and PERIODICITY values will be used for the Stream operator for this job. Note: This RATE and PERIODICITY value will be for the complete job.
It can be used as: APPLY <DML> TO OPERATOR (Stream_Oper[2] ATTRIBUTES (OperatorCommandID=rate_step1)), Then we can use the twbcmd utility to assign a Rate and/or Periodicity value to a specific Stream operator within a specific job step. When more then one copy of the Stream operator to be active within a given job step, each can be assigned a separate OperatorCommandID APPLY <DML1> TO OPERATOR (Stream_Oper[2] ATTRIBUTES (OperatorCommandID=rate_step1)), APPLY <DML2> TO OPERATOR (Stream_Oper[2] ATTRIBUTES (OperatorCommandID=rate_step2))
The RATE or PERIODICITY change will apply to all the instances of the Stream operator running within that job step.
Below screen shot shows how to change the rate: Step 1: New attribute has been added to the attached script
lab7_1.txt
Step 2-b: Run using twbcmd command as soon as the job id generates in Step2-a.
Step 3: After successfully executing the job check the log using tlogview
CaseSensitiveTD61.t xt
In TPT 8.1:
In TD12.0:
Script attached
CaseSensitiveTD12.t xt
7.2
The Application Program Interface (API) is a feature of TPT which permits developers to create programs with a direct interface to the load and unload protocols used by the utilities. The following are new features which have been added to the API as a result of Teradata Release 12.
7.2.2 Stream Operator returns Number of Rows Inserted, Updated and Deleted
The TD_Evt_ApplyCount event can now be used for the Stream driver to obtain the number of rows inserted, updated, or deleted for each DML Group statement in the job. Prior to Teradata PT API 12.0 the statistics for the Stream driver were only printed in the log. Now you can obtain these stats via the GetEvent method in your application without having to look at the log. It is often more convenient to make a function call in your application to obtain this information than it is to view or parse a log file.
8 TASM
8.1 Query Banding
A set of Name/Value pairs that can be set on a Session or Transaction to identify the querys originating source enabling improved workload management and classification.
Transaction Query Band: The query band is set for a transaction using the following syntax:
8.1.1.1 GetQueryBand A system user defined function (UDF) is provided to return the current query band for the session by using the following syntax:
SEL GetQueryBand();
Below screenshot shows query band has been successfully set and is active at session level (We can activate it at Transaction level by giving the above command).
8.1.1.2 GetQueryBandValue A system UDF is also provided to retrieve the value of a specified name in the query band. This can be used to retrieve the name of the end user.
SEL GetQueryBandValue(0,'ExtUserId');
The output below displays the QueryBand value (CV1) associated with the QueryBand name (ExtUserId) as we had given the value for the ExtUserId as CV1 while setting the queryband.
8.1.1.3 GetQueryBandPairs A system table function will return the name/value pairs in the query band in name and value columns. Sel QBName (FORMAT X (20)'), QBValue (FORMAT X (20)') FROM TABLE (GetQueryBandPairs(0)) AS t1 ORDER BY 1;
The below output shows the QueryBand value associated with each QueryBand Name set in the queryband.
8.1.1.4 MonitorQueryBand An administrative UDF is provided to retrieve the query band for a specified session. The DBA can use this UDF in order to track down the originator of a blocker request or one using excessive resources.
Below output shows the query band for the session bearing session number 1207.
Filters
Teradata DWM Filter Rules allow the administrator to control access to and from specific Teradata Database objects and users. There are two types of filter rules: Object access filters limit access to all objects associated with the filter during a specified time period. Queries referencing objects associated with a filter during the time the filter applies are rejected. Query Resource filters limit database resource usage for objects associated with the filter. Queries exceeding the resource usage limit (estimated number of rows, estimated processing time, types of joins, table scans) during the time the filter applies are rejected. Query band name/value pairs can be associated with Teradata TWM Filter rules.
Below screenshot shows the additional options (Include QueryBand and Exclude QueryBand)
Below screenshots shows stepwise how to Include QueryBand in the WHO criteria Select Include QueryBand in the WHO criteria
Clicking on Choose will pop up with the Include QueryBand window which will allow to Load Names and the QueryBand values associated with the Names. (Load Names option will be highlighted as seen in the screenshot)
Clicking on the Load Names options will show all QueryBand Names.
Selecting a particular QueryBand Name will highlight Load Values option which will display the Load Value associated the selected QueryBand Name as shown below:
Selecting the QueryBand Value and clicking on the Add option will add the queryband in the QueryBand classification criteria.
If a name/value pair in the Query band set for the session or transaction matches the query band pair associated with a filter rule, the filter rule will be applied to request. If we select two name / value set in the QueryBand classifications, it will be ANDed as shown below:
In the same way we can use the Exclude QueryBand option in the WHO criteria.
Workload Definitions
Workload Definitions are another component of Teradata DWM. A workload definition (WD) is a type of rule that groups queries for management based on the querys operational properties. The attributes that can be defined for a WD are who (user, account, profile, etc.), what (CPU limits, estimated processing time, row counts, etc.), and where (databases, tables, macros, etc.). The attributes of the WDs are compared to the attributes of each the incoming request and the request are classified into a WD. The WD determines the priority of the request. Query band name/value pairs can be defined as additional who attributes. This enables us to solve the following problems mentioned in the first section: To set the priority of a request based on the end user when submitted through a connection pool To assign requests from the same application different priorities
The following is an example of how to use WDs to accomplish this. WD Name: Marketing-Online Priority: Tactical Classification Criteria;
Query Band
Query Band
Note Query Band classification criteria with different names (EXTUserID, EXTGroup, Importance) are ANDed so that all must be present in the query band to match the WD classification criteria. Query Band classification criteria with different values for the same name are ORed.
Given the above WDs: SET QUERY_BAND='EXTUserId=MG123;EXTGroup=Marketing;Importance=Online;' for session; SEL * FROM cust_table; Request will be assigned to WD Marketing-Online and will run in the Tactical Priority
Prepared by GCC India (Mumbai) ADMIN-COE MS Team Page 76 of 147
SET QUERY_BAND='EXTUserId=MG123;EXTGroup=Marketing;Importance=Batch;' for session; SEL * FROM cust_table; Request will be assigned to WD Marketing-Batch and will run in the Normal Priority
Query banding UDFs can be used to extract accounting reports from the DBQL log table. Sel t1.AMPCPUTime, t1.ParserCPUTime from dbc.dbqlogtbl t1 where GetQueryBandValue(t1.queryband, 0, 'EXTUserId') = CV185018' AND GetQueryBandValue(t1.queryband, 0, 'unitofwork') = Fin123 AND t1.QueryBand is NOT NULL;
One reason to use both in the same session is to have the transaction query band add additional pairs. For example, using the WD classification criteria in the previous section, if the session query band is set as follows: SET QUERY_BAND='EXTUserId=MG123;EXTGroup=Marketing;' for session;
Then you would need to set the transaction query band to add the Importance name/value pair to determine which WD to use for request. SET QUERY_BAND='Importance=Online;' for transaction; SEL * FROM cust_table; A name/value pair in the transaction query band will override the pair in the session query band if the name is the same in both query bands. Using the same example as above, say we want to the set session query band so that the default Importance=Batch. SET QUERY_BAND='EXTUserId=MG123;EXTGroup=Marketing;Importance=Batch' for session;
Then for a request that needs a higher priority, the transaction query band can be used to set the Importance name/value pair to the Online priority. SET QUERY_BAND='Importance=Online;' for transaction; SEL * FROM cust_table; When Teradata DWM searches the query band associated with a request for comparisons with rules and classification criteria, it always searches the transaction query band first and stops when the name in the queryband pair matches that of the name in the rule or WD.
8.2
State Matrix
The state matrix is a two-dimensional diagram that can help you visualize how you want the system to behave in different situations. It extends active workload management to automatically detect, notify, and act on planned and unplanned system and enterprise events. TASM then automatically implements a new set of workload management rules specific to the detected events and resulting system condition.
The combination of a SysCon and an OpEnv reference a specific state of the system. Associated with each state are a set of workload management behaviors, such as throttle thresholds, Priority Scheduler weights, and so on. When specific events occur, they can direct a change of SysCon or OpEnv, resulting in a state change and therefore an adjustment of workload management behavior.
Guidelines for Defining State Matrix: Keep the size of the state matrix and the number of unique states to a minimum If there are not clear-cut needs for managing with additional states, as may be the case when you first dive into workload management, it is recommended to simply utilize the default state, base, referenced by the default <Always, Normal> operating environment and system condition pair. As the need for additional states becomes apparent, add them, but keep the total number of states to a minimum, because the state matrix supports gross level, not granular level system management.
Select System Conditions = New SysCon This will pop up with the System Condition window where in we can give the Name of the SysCon, Minimum duration for which the state should be in same state.
The same way we can add the SysCon Degraded. Screenshot below shows State Matrix with the System Conditions (Normal which is the default, Base, Degraded). (Operating Environment Base is by default)
When a unique system condition is defined, there is an option to associate it with a minimum duration. Otherwise, consider a system condition of RED associated with degraded health. When an event results in this system condition are being activated, the state will be transitioned appropriately. That state may have working values for tighter throttles, more restrictive priority scheduler weights, more filters, etc. If by invoking the state, the system immediately returns to good health (i.e., the events that result in the system condition are no longer valid), the system could conceivably realize another state transition that removes the more restrictive working values. By removing the more restrictive working values, the system could conceivably put itself right back into RED state and yet another state transition, etc. A minimum duration can be set for the System Condition. That way regardless of the associated event status, the system will remain in the same state for at least the minimum duration, giving the system a better chance of more fully working itself out of the situations that are putting it into degraded state. It is recommended that System Conditions that are activated entirely by internal event detections and not external user-defined event detections be set to have a minimum duration > 0, perhaps 10 minutes or so.
Click OK and then click on Accept so that the new OpEnv will be added as shown below:
The below screenshot shows the State Matrix with the SysCon and OpEnv defined. (Base is the default state which later can be changed)
8.2.4 State
The combination of a SysCon and an OpEnv reference a specific state of the system. Associated with each state are a set of workload management behaviors, such as throttle thresholds, Priority Scheduler weights, and so on. When specific events occur, they can direct a change of SysCon or OpEnv, resulting in a state change and therefore an adjustment of workload management behavior. State can be defined as shown below: Select States and click New State which will pop up with the State window in which the Name of the State can be mentioned.
8.2.5 Events
A System Condition, Operating Environment or the State can change as directed by event directives defined by the DBA. Establishing Event Combinations and Associating Associations to the State Matrix for System Conditions: Once you have defined your state matrix, you will need to define the event combinations that will put you into the particular system conditions or operating environments defined in the state matrix.
First we need to define an Event as shown below: We do create an Event (NodeDown) which will create an Event Combination
Then we associate a particular action against the Event. As shown below, we select the Event Combination (NodeDown) and select Change SysCon and assign it to Busy.
Screenshot below shows the Event Combination NodeDownAction which was defined above which will change the System Condition to Busy.
Likewise we create the other events (AWTLimitEvent) and associate Action to be taken against the Event as shown below:
Screenshot below shows the Events which have been created AWTLimitEvent and NodeDown
Screenshot below shows the Event Combination after creating the Events and the associate action to be taken against the event
8.2.6 Periods:
These are intervals of time during the day, week, or month.TDWM monitors the system time, automatically triggers an event when the period starts and it will last until the period ends. Screenshots below shows how we can create Periods
Select Periods and then click on New Period which will open New Period window in which we can give the Period Name
Uncheck Everyday and 24 hours option so that we can select the days of week and the time as shown below
After specifying the days and the time, click Accept so that the period can be saved.
Likewise we can create other periods too. As shown in the below screenshot we see two periods defined ( EndOfMonthReporting and LoadWindowPeriod)
Establishing Event Combinations and Associating Associations to the State Matrix for Operating Environments: Screenshot below shows the Operating Environments defined
Below we associate a particular action against an Event. In this when the Event LoadWindowPeriod is active, it will change the OpEnv to Load Window (9am to 4pm)
Screenshot below shows the Event Combination and what action will be taken against this Event Combination
Likewise we create the other Event Combinations and associate the Actions to be taken.
After defining all the necessary parameters for State Matrix, it will look as shown below:
Guidelines to establish Event Combinations and Associate Actions to the State Matrix: Once you have defined your state matrix, you will need to define the event combinations that will put you into the particular system conditions or operating environments defined in the state matrix. Event combinations are logical combinations of events that you have defined. The current release of Teradata DWM offers the following Event Types:
Components Down Event Types, detected at system startup:
Node Down: Maximum percent of nodes down in a clique. AMP Fatal: Number of AMPs reported as fatal. PE Fatal: Number of PEs reported as fatal. Gateway Fatal: Number of gateways reported as fatal.
AMP Activity Level Event Types. To avoid unnecessary detections, these must persist for a qualified amount of time you specify (default is 180 seconds) on at least the number of AMPs that you also specify:
AWT Limit: Number of AWTs in use for MSGWORKNEW and MSGWORKONE work on an AMP. Flow Control: Number of AMPs in flow control.
Period Events. These are enabled or disabled depending on the current date/time relative to your defined periods of time. E.g. If daytime is defined as daily 8am-6pm, the daytime period is enabled everyday at 8am, and disabled every day at 6pm. User-Defined Events. These are enabled via openAPI or PMAPI calls to the database, and are disabled via an expiration timer given by the enable call, or through an explicit disable call.
Here we discuss guidelines for the usage and associated settings of event types meriting additional discussion as well as general event detection considerations.
Guidelines for Node_Down Event Type: Consider that when a node goes down, its VPROCs migrate, increasing the amount of work required of the nodes that are still up. That translates to performance degradation. When your system is performing in a degraded mode, it is not unusual to want to throttle back further lower priority requests or reassigned priority scheduler weights or enabling filters to assure that critical requests can still meet their Service Level Goals. Alternatively or in addition, you may want to send a notification so that follow-on actions can occur.
The Node_Down Event Type threshold you define is the maximum percent of nodes down in a clique, and is representative of the performance degradation the system will incur. Consider a system configuration with mixed clique types, some with more nodes per clique than others, and some with Hot Standby Nodes (HSN).
Cliques 1&2: 8 5400 nodes Clique 3: 6 5400 nodes Cliques 4: 4 5450 nodes Clique 5: 2 5450 nodes Clique 6: 3 5500 nodes plus 1 HSN
If a single node were to go down, what is the associated performance degradation? It is roughly synonymous with the maximum percent down in a clique, and depends on which clique bears the down node: Cliques 1&2: 1/8 =12.5% Clique 3: 1/6 = 16.7% Cliques 4: = 25% Clique 5: = 50% Clique 6: 0/3 (because the HSN took the burden of the down node) = 0%
In the example above, you probably dont want to take much, if any, action if cliques 1, 2 or 6 were to have a node down, however if clique 4 or especially 5 were to experience a node down, that would be a very serious problem that requires immediate attention to resolve, and drastic workload management controls to assure critical work can still be addressed during the degraded period. Recommendation: If your system is designed to run with some amount of degradation (for example, many very large systems with hundreds of nodes may be sized expecting that there is always a single node down somewhere in the system) it is suggested to set the threshold such that the Node Down event will activate only when that degradation exceeds what was sized for. For example, if the example system above were sized to meet workload expectations as long as clique 5 did not experience a down node, you might set your Node_Down event to activate at a threshold > 25%, in other words, to only activate if clique 5 experienced a node down. At that time you could change your system condition appropriately. Prior to that threshold, you could possibly define a second Node_Down event with a lower threshold with an action to Notify only. If you are only interested in sending a notification, you could simply rely on the Teradata Managers alert policy manager to send an alert. However the Alert Policy Manager cannot notify to a queue. Also, when detecting a node down, the Alert Policy Manager cannot distinguish between the severity of a node going down in a clique with HSN vs. a node going down in a 2 node clique. In general, assuming your system is NOT designed to expect nodes down (as is the case with many small to moderate sized systems), a good threshold to set Down_Nodes Event Type Threshold to is roughly 24%.
Guidelines for AMP Activity Level Event Types (AWT Limit and Flow Control):
Consider reserving AWTs for tactical WDs by selecting the expedited option for your tactical AGs. This allows a special reserve pool of AWTs to be set aside. Up to 5 AWTs can be defined into the pool. Expedited message types will not be subject to flow control caused by standard new work and will receive priority over standard new work in the AWT queue.
Consider follow-up correlation analysis to determine what, if any, changes to the affected states working values should be considered, such as Consider adding or lowering workload throttles, object and/or utility throttles on lower priority requests across all appropriate states.
Consider reserving AWTs for tactical WDs by selecting the expedited option for your tactical AGs. Further, consider a state change with the unique working values described above as a legitimate option when flow control is persistently detected. This is due to the potential seriousness associated with the loss of priority control, and it is appropriate to act automatically to resolve the situation ASAP.
As an example, consider that a single Teradata system may be part of an enterprise of systems that may include multiple Teradata systems cooperating in a dual-active role, various application servers and source systems. When one of these other systems in the enterprise is degraded or down, it may in turn affect anticipated demand on the Teradata system. An external application can convey this information by means of a well-known user-defined event via open APIs to Teradata. Teradata can then act automatically, for example, by changing the system condition and therefore the state, and employ different workload management directives appropriate to the situation. The situations tend to boil down to either an increase of decrease in user demand. Via the state matrix directives, you may choose to disable filters and raise throttles of lower priority work in times of anticipated lower user demand, and do the opposite in times of anticipated higher user demand. 2> To convey business-oriented events Many businesses have events that impact the way a Teradata system should manage its workloads. For example, there are business calendars, where daily, weekly, monthly, quarterly or annual information processing increases or changes the demand put on the Teradata System. While period event types provide alignment of a fixed period of time to some of these business events, user-defined events provide the opportunity to de-couple the events from fixed windows of time that often do not align accurately to the actual business event timing.
For example, through the use of a period event defined as 6PM till 6AM daily, you could define an event combination that changes the Operating Environment to LoadWindow when the clock ticked 6PM. However the actual source data required to begin the load might be delayed, and therefore the actual load may not begin for several hours. Also, it is typical to define the period event to encompass far more hours than the actual business situation will require just to compensate for these frequently experienced delays. Even then, sometimes the delays are so severe that the period transpires while the load is still executing, leading to workload management issues. However, if instead of using a period event, you could define a user-defined event called Loading. The load application could activate the event via an OpenAPI call prior to the load commencing, and deactivate it upon completion. The end result is that workload management is accurately adjusted for the complete duration of the actual load processing, and not shorter or longer than that duration. Note that period events are not capable of operating on a business calendar, for example, that includes holidays, end-of quarter dates, etc. However they can be conveyed to the Teradata System through userdefined events. To enhance workload management capabilities through an external application
The current version of TDWM provides many opportunities to automate based on system-wide events, and we anticipate that subsequent releases of TDWM will continue to enhance those capabilities through the addition of new event types. However, until those new event types are available, an external application, through the use of PM/API and OpenAPI commands or other means, can monitor the Teradata System for key situations that are useful to act on. Some example key situations that an external application might monitor for include a persistent miss of critical WDs SLG (such as a tactical workload or a heartbeat monitoring workload), persistent high or low CPU usage, arrival rate surges and throttle queue depths associated with a workload. Or the external application could provide even more complex correlated analysis on the key situations observed to derive more specific knowledge. Once detected through the use of the external application, the event can be conveyed to Teradata in the form of a user-defined event that can be included in an event combination with actions, for example, to change the System Condition and therefore the State of the system. (Generally utilizing an action type of notification has limited value-add here because the external application could have provided that notification directly without involving TDWM. The real value is in automatically invoking a more appropriate state associated with the detected event.)
8.3
Click on New button to add new Global Exception name BadQuery and Optional Description and then on OK, Screenshot below:
Define one of the Exception Criterias available and Exceptions Actions to take into effect, Screenshot below:
After defining Exception Criteria and Exception Actions, to apply operating environments to your new exception directive, select Apply. The Exception Apply dialog box displays with the operating environments you defined, Screenshot below:
Select the WDs to which you want each operating environment to apply. You can select one or several WDs, or you can select ALL WDs. Then select OK, Screenshot below:
Select Overview to view the operating environments, workloads, and exceptions you applied to the exception directive, Screenshot below:
Create another Exception with BadIO with Exception Criteria and Exception Actions, Screenshot below:
Teradata DWM follows these guidelines to resolve conflicting exception actions when necessary: Local exception actions take precedence over global exception actions. Teradata DWM orders local and global exception actions to their defined precedence for resolving situations similar to the following case:
o o o
if Maximum Rows > 100, Change Workload to WD-M if Sum Over All Nodes > 200, Change Workload to WD-N If Maximum Rows and Sum Over All Nodes both exceed their limits at the same time, the defined precedence determines to which WD Teradata DWM changes.
If you did not specify Abort and Log or Abort on Select and Log, and you specified multiple global Change Workload exception actions, the global Change Workload exception action with highest precedence occurs. Teradata DWM logs all other Change Workload exception actions as overridden. If you did not specify Abort and Log or Abort on Select and Log, and you specified multiple local Change Workload exception actions, the local Change Workload exception action with highest precedence occurs. Teradata DWM logs all other Change Workload exception actions as overridden. If you did not specify Abort and Log or Abort on Select and Log, and you specified multiple global and local Change Workload exception actions, the local Change Workload exception action with highest precedence occurs, since local exception actions take precedence over global exception actions. Teradata DWM logs all other Change Workload exception actions as overridden. Aborts take precedence over any Change Workload exception actions. If you specified Abort and Log or Abort on Select and Log, and you specified multiple global and local Change Workload exception actions, Teradata DWM aborts the query and logs all Change Workload exception actions as overridden.
8.4
Utility Management
Utility Management Feature provides the ability to ensure that these utilities do not impact higher priority system work and can be controlled when system state changes or can get prioritized when deemed necessary (for instance, during a batch window). Utility Management helps in Capacity Planning and system utilization reporting by enabling better management of mixed workloads to allow critical work to complete.
In TD6.1:
In previous versions Load Utility Rule directly rejects but in TD12.0 it got the option to delay the load when concurrency load is reached.
In TD12.0:
TD12.0 extends its Utility management from load and export control to include backup and recovery jobs as well. Utility type Archive/Restore option has been added to Utility Throttles. Delay Option to Utility Throttle is provided for queuing of jobs exceeding the threshold instead of directly rejecting.
9 Usability Features
9.1 Complex Error Handling
Teradata Database 12.0 provides complex error handling capabilities during bulk SQL Insert operations (MERGE-INTO or INSERT-SELECT) through the use of new SQL-based error tables. Errors such as duplicate row, CHECK constraints, and LOB data truncations arising from a bulk insert operation are logged in an error table while the bulk insert operation continues to run instead of aborting. This feature increases the flexibility and opportunity in developing load strategies by allowing SQL to be used for batch updates that contain errors. It also provides error reporting similar to current load utilities while overcoming current load utility restrictions on having Unique Secondary Indexes (USIs), Join or Hash indexes, Referential Indexes (RIs) and triggers resident on target tables. In TD 6.1: Teradata Database 6.1 doesnt provide any error table SQL Insert operations (MERGE-INTO or INSERTSELECT). In TD 12.0: Following scripts to create table structure:
Scripts.txt
Creating Error Tables: Before specifying LOGGING ERRORS for a request, an error table for the target table must first be created. CREATE ERROR TABLE [[<database>.]<error table>] FOR <data table>; Example: Create error table et1 for test.t4; Insert data in t4 table with following statement: Insert t4 select * from t3 logging errors;
Now check USI violation rows: Select a1, b1, c1, ETC_DBQL_QID, ETC_DMLType , ETC_ErrorCode , ETC_ErrSeq, ETC_IndexNumber, ETC_IdxErrType , ETC_RowId, ETC_TableId, ETC_FieldId, ETC_RITableId, ETC_RIFieldId, ETC_TimeStamp From et1;
If the query ID is not saved or captured, it may be extracted from DBC.DBQLogTbl if DBQL is enabled. Select querytext, starttime, queryid (format '-z (17)9') from dbc.dbqlogtbl where username='myusername' order by 1; If the query ID is not available because the query output is not saved and DBQL is disabled, then the ETC_TimeStamp value in the error table may be used to associate error rows with the approximate times of different loads.
Dropping Error Tables: An error table must first be dropped before its data table can be dropped. This restriction prevents orphaned error tables from being left behind in the system. To drop an error table, use any of the following syntax statements: DROP ERROR TABLE FOR <data table>; DROP TABLE <error table>;
Example: DROP ERROR TABLE FOR test.t1 SHOW and HELP Error table structure and column information may be displayed with the following requests respectively: SHOW ERROR TABLE FOR <data table>; SHOW TABLE <error table>; HELP ERROR TABLE FOR <data table>; HELP TABLE <error table>; Example: Help database test HELP TABLE et1;
Error Table System Views Information on data tables and their error tables may be retrieved by querying two new system views, DBC.ErrorTblsV (all data and error tables), and DBC.ErrorTablesVX (data and error tables accessible to requesting user). Select * from DBC.ErrorTblsV;
9.2
PARTITION BY RANGE_N(claim_date BETWEEN DATE '2000-01-01' AND DATE '2000-12-31' EACH INTERVAL '1' MONTH); Successful Message:
In TD6.1:
Create Table script (MPPI): CREATE TABLE claims (claim_id INTEGER NOT NULL, claim_date DATE NOT NULL, state_id integer NOT NULL, claim_info VARCHAR(200) NOT NULL) PRIMARY INDEX (claim_id) PARTITION BY ( RANGE_N(claim_date BETWEEN DATE '2000-01-01' AND DATE '2000-12-31'
EACH INTERVAL '1' MONTH), RANGE_N( state_id BETWEEN 1 AND 10 EACH 1) ) UNIQUE INDEX (claim_id);
In TD12.0:
Success Message:
Create Table script (MPPI): Create table claims (claim_id integer not null, claim_date date not null, state_id integer not null, claim_info varchar(200) not null) Primary index (claim_id) Partition by (range_n( claim_date between date '2000-01-01' And date '2000-12-31' Each interval '1' month), range_n( state_id between 1 and 10 each 1) ) Unique index (claim_id);
Insert.txt
Explain plan: Explain SELECT * FROM claims WHERE claim_date BETWEEN '2000/01/01' AND '2000/12/30'; Explanation 1) 2) 3) First, we lock a distinct TD12."pseudo table" for read on a deadlock for TD12.claims. Next, we lock TD12.claims for read. We do an all-AMPs RETRIEVE step from 120 partitions of TD12.claims with a condition of ("(TD12.claims.claim_date <= DATE '2000-12-30') AND (TD12.claims.claim_date >= DATE '200001-01')") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (101 bytes). The estimated time for this step is 0.03 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds. RowHash to prevent global 12 months * 10 StateIds = 120 Partitions
4) ->
Explain SELECT * FROM claims WHERE state_id BETWEEN 1and 2; 36 months * 2 StateIds = 72 Partitions
Explanation 1) 2) 3) First, we lock a distinct TD12."pseudo table" for read on a RowHash to prevent global deadlock for TD12.claims. Next, we lock TD12.claims for read. We do an all-AMPs RETRIEVE step from 72 partitions of TD12.claims with a condition of ("(TD12.claims.state_id <= 2) AND (TD12.claims.state_id >= 1)") into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 (TD12.claims.claim_date). The size of Spool 1 is estimated with no confidence to be 1 row (101 bytes). The estimated time for this step is 0.03 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.
4) ->
Explain SELECT * FROM claims WHERE state_id = 1and claim_date between '2000/01/01' and '2000/12/30' order by 2; 12 months * 1 StateIds = 12 Partitions
Explanation 1) 2) First, we lock a distinct TD12."pseudo table" for read on a RowHash to prevent global deadlock for TD12.claims. Next, we lock TD12.claims for read.
3)
We do an all-AMPs RETRIEVE step from 12 partitions of TD12.claims with a condition of ("(TD12.claims.state_id = 1) AND ((TD12.claims.claim_date >= DATE '2000-01-01') AND (TD12.claims.claim_date <= DATE '2000-12-30'))") into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 (TD12.claims.claim_date). The size of Spool 1 is estimated with no confidence to be 1 row (101 bytes). The estimated time for this step is 0.03 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.
4) ->
Explain SELECT * FROM claims WHERE state_id =1 and claim_date BETWEEN 2000/01/01' AND 2000/01/20'; Explanation 1) 2) 3) First, we lock a distinct USER01."pseudo table" for read on a RowHash to prevent global deadlock for USER01.claims. Next, we lock USER01.claims for read. We do an all-AMPs RETRIEVE step from a single partition of USER01.claims with a condition of ("(USER01.claims.state_id = 1) AND ((USER01.claims.claim_date >= DATE '2000-01-01') AND (USER01.claims.claim_date <= DATE '2000-01-20'))") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (101 bytes). The estimated time for this step is 0.03 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.
1 mo n Partit th * 1 St ateId ion
= 1
4) ->
Advantage: The use of ML-PPI on table(s) affords a greater opportunity for the Teradata Optimizer to achieve a greater degree of partition elimination at a more granular level which in turn results in achieving a greater level of query performance.
MLPPI vs. NPPI Join Scenarios: Case Study 1: Join on MLPPI Top Level Partition Column with NPPI Column Optimizer used sliding-window merge join Technique while joining MLPPI Partition Column with NPPI Column.
Explain SELECT * FROM Claim_MLPPI A, Claim_NPPI B WHERE A.Claim_ID = B.Claim_ID AND A. State_ID = B.State_ID; Explanation 1) First, we lock a distinct AU."pseudo table" for read on a RowHash to prevent global deadlock for AU.b. Optimizer is using SLIDING-WINDOW Merge Join Technique
2) Next, we lock a distinct AU."pseudo table" for read on a RowHash to prevent global deadlock for AU.a. 3) We lock AU.b for read, and we lock AU.a for read.
4) We do an all-AMPs JOIN step from AU.b by way of a RowHash match scan with no residual conditions, which is joined to AU.a by way of a RowHash match scan with no residual conditions. AU.b and AU.a are joined using a sliding-window merge join (contexts = 1, 7), with a join condition of ("(AU.a.claim_id = AU.b.claim_id) AND (AU.a.state_id = U.b.state_id)"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (362 bytes). The estimated time for this step is 0.05 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.05 seconds.
Case Study 2: Join on MLPPI Low Level Partition Column with NPPI Column: Optimizer used merge join while joining on MLPPI Low Level Partition Column with NPPI Table Column
( AU.A.state_id) to all AMPs. Then we do a SORT to order Spool 3 by row hash. The size of Spool 3 is estimated with low confidence to be 2 rows (186 bytes). The estimated time for this step is 0.01 seconds. 5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan. Spool 2 and Spool 3 are joined using a merge join, with a join condition of ("state_id = state_id"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 3 rows (543 bytes). The estimated time for this step is 0.06 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
6)
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.07 seconds.
Explain SELECT * FROM Claim_MLPPI A, Claim_MLPPI_2 B WHERE A.Claim_ID = B.Claim_ID AND A. State_ID = B.State_ID Explanation
1) 2) 3) 4) First, we lock a distinct AU."pseudo table" for read on a RowHash to prevent global deadlock for AU.b. Next, we lock a distinct AU."pseudo table" for read on a RowHash to prevent global deadlock for AU.a. We lock AU.b for read, and we lock AU.a for read. We do an all-AMPs RETRIEVE step from AU.b by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 2 by the hash code of (AU.b.claim_id). The size of Spool 2 is estimated with low confidence to be 806 rows (74,958 bytes). The estimated time for this step is 0.01 seconds. We do an all-AMPs JOIN step from AU.a by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use) by way of a RowHash match scan. AU.a and Spool 2 are joined using a sliding-window merge join (contexts = 7, 1), with a join Condition of ("(AU.a.claim_id = claim_id) AND (AU.a.state_id = state_id)"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 3 rows (543 bytes). The estimated time for this step is 0.06 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. Optimizer is using SLIDING-WINDOW Merge Join Technique
5)
6)
The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.07 seconds.
Case Study 2:
Join on MLPPI Low Level Partition Columns:
Optimizer used merge join while joining on MLPPI Low Level Partition Column with another MLPPI Low Level Partition Column. Note: Used only Low Level Partition columns in the Join Condition.
Explanation 1) 2) 3) 4) First, we lock a distinct AU."pseudo table" for read on a RowHash to prevent global deadlock for AU.B. Next, we lock a distinct AU."pseudo table" for read on a RowHash to prevent global deadlock for AU.A. We lock AU.B for read, and we lock AU.A for read. We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from AU.B by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 2 by the hash code of (AU.B.state_id). The size of Spool 2 is estimated with low confidence to be 806 rows (74,958 bytes). The estimated time for this step is 0.01 seconds. We do an all-AMPs RETRIEVE step from AU.A by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 3 by the hash code of (AU.A.state_id). The size of Spool 3 is estimated with low confidence to be 4 rows (372 bytes). The estimated time for this step is 0.01 seconds. Optimizer is using Normal Merge Join
2)
5)
We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan. Spool 2 and Spool 3 are joined using a merge join, with a join condition of ("state_id = state_id"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 57 rows (10,317 bytes). The estimated time for this step is 0.06 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.07 seconds.
6)
->
9.3
Schmon Enhancements
In TD6.1:
In TD12.0:
In TD6.1:
Delay modifier option is not supported in TD6.1 (shown in the screen shot below)
In TD12.0:
The command works as shown in the below screen shot
Note: In the above screenshot, the command should execute 5 times with a delay of 1 second, but we see that it takes the delay as 5seconds, as 5seconds is the minimum delay we need to specify. The following command does not repeat 2 times with a delay of 5 seconds because it is not preceded by one of more of the id, all, -S, -T, or -P options. Instead it will repeat schmon -s 5 indefinitely with a delay of 5 (because the minimum delay allowed is 5 seconds). Here the 5 is not viewed as the delay option, but rather the <id> option. Therefore, the following command will output data related to session id 5, repeat forever, and delay 5 seconds between repetitions. A warning is displayed to tell the user that an invalid interval was entered.
schmon -s 5 2:
In TD6.1:
We do see in the above screenshot that it gives the message Invalid Set Division type
In TD12.0:
9.4
Teradata Database 12.0 adds additional information to the EXPLAIN output including cost estimates, spool size estimates, view names and actual column names for Hashing, Sorting or Grouping columns. These enhancements improve readability and understanding as well as aid in debugging of complex queries and the identification of intermediate result spool skewing. 1) Adding Spool Size Estimates: Currently most steps that generate a spool have an estimate of the number of rows the spool contains but not its size in bytes. At the place the number of rows is printed, the spool size in bytes will also be printed.
In TD 6.1:
Explain select * from retail.contract; Explanation 1) First, we lock a distinct retail."pseudo table" for read on a deadlock for retail.contract. 2) Next, we lock retail. Contract for read. RowHash to prevent global
3) We do an all-AMPs RETRIEVE step from retail.contract by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 15,000 rows. The estimated time for this step is 0.23 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.23 seconds.
In TD 12.0:
Explain select * from retail.contract; Explanation 1) First, we lock a distinct retail."pseudo table" for read on a RowHash to prevent global deadlock for retail.contract. 2) Next, we lock retail.contract for read.
3) We do an all-AMPs RETRIEVE step from retail.contract by way of an all-rows scanwith no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 15,000 rows (1,320,000 bytes). The estimated time for this step is 0.24 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.24 seconds.
2)
View Names:
Prints view names with table names once in every step. If a view is from a different database with a table, its database name is also printed. Create view emp_v as Select * from employee;
In TD 6.1:
Explain select * from emp_v; Explanation 1) First, we lock a distinct CUSTOMER_SERVICE."pseudo table" for read on a RowHash to prevent global deadlock for CUSTOMER_SERVICE.employee. 2) Next, we lock CUSTOMER_SERVICE.employee for read.
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an allrows scan with no residual conditions into Spool 1(group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 26 rows. The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.
In TD 12.0:
Explain select * from emp_v; Explanation 1) First, we lock a distinct CUSTOMER_SERVICE."pseudo table" for read on a RowHash to prevent global deadlock for CUSTOMER_SERVICE.employee. 2) Next, we lock CUSTOMER_SERVICE.employee in view emp_v for read.
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee in view emp_v by way of an all-rows scan with no residual conditions into Spool 2 (group_amps), which is built locally on the AMPs.
The size of Spool 2 is estimated with low confidence to be 24 rows (2,040 bytes). The estimated time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.
3) Hashing/Sorting/Grouping columns
For grouping columns, hashing columns and sorting columns, we trace back their sources and print original base tables fields.
In TD 6.1:
Explain Select TRANS_number, sum (TRANS_AMOUNT) from temp_trans Where TRANS_number< 1 Group by TRANS_number Order by 1;
Explanation 1) 2) 3) First, we lock a distinct USER01."pseudo table" for read on a RowHash to prevent global deadlock for USER01.temp_trans. Next, we lock temporary table USER01.temp_trans for read. We do an all-AMPs SUM step to aggregate from temporary table USER01.temp_trans by way of an all-rows scan with a condition of ("USER01.temp_trans.TRANS_NUMBER < 1"), and the grouping identifier in field 1025. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 3 is estimated with no confidence to be 11,090 rows. The estimated time for this step is 0.69 seconds. We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1. The size of Spool 1 is estimated with no confidence to be 11,090 rows. The estimated time for this step is 0.09 seconds.
4)
5) ->
Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.78 seconds.
In TD 12.0:
Explain Select TRANS_number, sum(TRANS_AMOUNT) from temp_trans Where TRANS_number< 1 Group by TRANS_number Order by 1;
Explanation 1) 2) 3) First, we lock a distinct CUSTOMER_SERVICE."pseudo table" for read on a RowHash to prevent global deadlock for CUSTOMER_SERVICE.temp_trans. Next, we lock temporary table CUSTOMER_SERVICE.temp_trans for read. We do an all-AMPs SUM step to aggregate from temporary table CUSTOMER_SERVICE.temp_trans by way of an all-rows scan with a condition of ("CUSTOMER_SERVICE.temp_trans.TRANS_NUMBER < 1") grouping by field1 ( CUSTOMER_SERVICE.temp_trans.TRANS_NUMBER). Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with no confidence to be 1 row (29 bytes). The estimated time for this step is 0.03 seconds. We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 (CUSTOMER_SERVICE.temp_trans.TRANS_NUMBER). The size of Spool 1 is estimated with no confidence to be 1 row (33 bytes). The estimated time for this step is 0.04 seconds. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.07 seconds.
4)
5) ->
In TD 6.1:
Explain Insert into emp1 Select * from employee; Explanation 1) 2) 3) 4) 5) 6) -> First, we lock a distinct USER01."pseudo table" for write on a RowHash to prevent global deadlock for USER01.emp1. Next, we lock a distinct USER01."pseudo table" for read on a RowHash to prevent global deadlock for USER01.employee. We lock USER01.emp1 for write, and we lock USER01.employee for read. We do an all-AMPs MERGE into USER01.emp1 from USER01.employee. We spoil the parser's dictionary cache for the table. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. No rows are returned to the user as the result of statement 1.
In TD 12.0:
Explain Insert into emp1 Select * from employee; Explanation 1) 2) 3) First, we lock a distinct TEST."pseudo table" for write on a RowHash to prevent global deadlock for TEST.emp1. Next, we lock a distinct user01."pseudo table" for read on a RowHash to prevent global deadlock for user01.employee. We lock TEST.emp1 for write, and we lock user01.employee for read.
4) 5) 6)
We do an all-AMPs MERGE into TEST.emp1 from user01.employee. The size is estimated with no confidence to be 28 rows. The estimated time for this step is 1.92 seconds. We spoil the parser's dictionary cache for the table. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
9.5
Prior to Teradata Database 12.0 it was not possible to do any SQL queries to determine the base tables a join index covers. A new column, JoinIndexTableID, has been added to DBC.Indexes in TD12.0 (not present in TD6.1- Compare the screenshots shown below) Use this feature when you want to determine the tables a join or hash index covers.
In TD6.1:
In TD12.0:
Note: Highlighted columnname is the new column in TD12.0 The DBC.Indexes table contains a row for each column that is an index. When the IndexType is 'J' (join index) or 'N' (hash index) and it was created in Teradata Database 12.0, the new column will contain the Table ID of the join index. If the IndexType is not 'J' or 'N,' or the join index was created prior to Teradata Database 12.0, the JoinIndexTableID is NULL.
Screenshot below shows the JoinIndexTableID for the join index created
9.6
Prior to Teradata Database 12.0, to get a list of all global temporary tables, you would have to get a list of all the databases and then execute a HELP DATABASE for each database. In order to provide an efficient way to obtain a list of all global temporary tables, both the CommitOpt and TransLog columns from DBC.TVM are now included in DBC.Tables, DBC.TablesX, DBC.TableV, and DBC.TablesVX.
The CommitOpt with the value D or P identifies a global temporary table. The TransLog tells the transaction logging for a global temporary table (Y for LOG and N for NO LOG).
Please find documents attached which shows the addition of the two columns in DBC.Tables, DBC.TablesX, DBC.TableV and DBC.TablesVX in TD12.0 (which are not there in TD6.1)
GTT_TD61.doc
GTT_TD12.doc
Note: The highlighted columns are added in TD12.0 in the above mentioned tables.
9.7
ANSI Merge
Apart from SELECT requests, typical SQL DML operations involve INSERTS, UPDATES and DELETES. ANSI devised a new SQL statement, MERGE INTO, as a part of the SQL-2003 standards. MERGE INTO has the capability of performing UPDATES and INSERTS together in a single statement. A rudimentary form of MERGE INTO was implemented in the TERADATA Database V2R5.0 release. As a part of the 12 .0 release, the MERGE INTO statement has been enhanced to remove some of the restrictions imposed by the TERADATA Database V2R5.0 MERGE INTO statement. This document describes the enhancements made to the MERGE INTO statement. It also discusses the various restrictions and performance implications and considerations you must consider when using the enhanced MERGE INTO statement with complex error handling. The Teradata 12.0 MERGE INTO statement offers the following enhancements to the Teradata Database V2R5.0 MERGE INTO statement: 1) Allows multiple source rows to be merged into the target table, unlike the Teradata Database V2R5.0 MERGE statement that enforced a restriction that the source table could not have more than one row. Consequently if the source table happens to be a single table, it is not necessary for it to have a UPI or USI defined on it for it to be used in the context of MERGE INTO statement. 2) Provides complex error handling support for the MERGE INTO statement.
1) Using ANSI MERGE Instead of INSERT-SELECT: ANSI MERGE request can substitute for an equivalent INSERT-SELECT operation. An INSERTSELECT operation may perform slightly better than the ANSI MERGE SQL, but using ANSI MERGE might be advantageous if your system is running with restrictions on spool space. This is because the source table might not be spooled for an ANSI MERGE request if it is a single table without a WHERE clause, and the ON clause has an equality constraint between the target table primary index and the source table primary index. For example: >create table t1 (a1 int, b1 int, c1 int) primary index (a1); >create table t2 (a2 int, b2 int, c2 int) primary index (a2);
>explain ins into t1 sel a2, b2 + 1, c2 from t2; Explanation --------------------------------------------------------------------------1) 2) 3) 4) First, we lock a distinct USER01."pseudo table" for read on a RowHash to prevent global deadlock for USER01.t2. Next, we lock a distinct USER01."pseudo table" for write on a RowHash to prevent global deadlock for USER01.t1. We lock USER01.t2 for read, and we lock USER01.t1 for write. We do an all-AMPs RETRIEVE step from USER01.t2 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 22 rows (550 bytes). The estimated time for this step is 0.01 seconds. We do an all-AMPs MERGE into USER01.t1 from Spool 1 (Last Use). The size is estimated with low confidence to be 22 rows. The estimated time for this step is 0.23 seconds. We spoil the parser's dictionary cache for the table. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
5) 6) 7)
As you can see, step 4 spools the source table. The same INSERT-SELECT request can be recoded as an ANSI MERGE INSERT request as shown below: >explain merge into t1 Using (select a2, b2 + 1, c2 from t2) source (a2, b2, c2) On a1=a2 and 1=0 When not matched then INS (a2, b2, c2); Explanation --------------------------------------------------------------------------1) 2) 3) 4) 5) 6) -> First, we lock a distinct USER01."pseudo table" for read on a RowHash to prevent global deadlock for USER01.t2. Next, we lock a distinct USER01."pseudo table" for write on a RowHash to prevent global deadlock for USER01.t1. We lock USER01.t2 for read, and we lock USER01.t1 for write. We do an all-AMPs merge with unmatched inserts into USER01.t1 from USER01.t2 with a condition of ("(1=0)"). The number of rows merged is estimated with low confidence to be 22 rows. We spoil the parser's dictionary cache for the table. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. No rows are returned to the user as the result of statement 1.
2) Using MERGE to Implement a Conditional INSERT-SELECT: Situations can occur where you need to insert some rows into a target table based on a condition between the source table and the target table column. To do this using an INSERTSELECT statement, the request would have to be coded in such a way that it determines which qualify for inserts, and then inserts those rows into the target table.
For example: >explain insert into t1 sel a2, b2, c2 from t2, t1 where not (a1=a2);
Explanation --------------------------------------------------------------------------1) 2) 3) 4) First, we lock a distinct USER01."pseudo table" for read on a RowHash to prevent global deadlock for USER01.t2. Next, we lock a distinct USER01."pseudo table" for write on a RowHash to prevent global deadlock for USER01.t1. We lock USER01.t2 for read, and we lock USER01.t1 for write. We do an all-AMPs RETRIEVE step from USER01.t1 by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with low confidence to be 44 rows (748 bytes). The estimated time for this step is 0.03 seconds. We do an all-AMPs JOIN step from USER01.t2 by way of an all-rows scan with no residual conditions, which is joined to Spool 2 (Last Use) by way of an all-rows scan. USER01.t2 and Spool 2 are joined using a product join, with a join condition of ("a1 <> USER01.t2.a2"). The result goes into Spool 1 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the hash code of (USER01.t2.a2). The size of Spool 1 is estimated with no confidence to be 104 rows (2,600 bytes). The estimated time for this step is 0.03 seconds. We do an all-AMPs MERGE into USER01.t1 from Spool 1 (Last Use). The size is estimated with no confidence to be 104 rows. The estimated time for this step is 0.23 seconds. We spoil the parser's dictionary cache for the table. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. No rows are returned to the user as the result of statement 1.
5)
6) 7) 8) ->
You can rewrite the same INSERT-SELECT request using ANSI MERGE, as shown below: >explain merge into t1 Using t2 On a1=a2 When not matched then INS (a2, b2, c2); Explanation --------------------------------------------------------------------------1) 2) 3) 4) First, we lock a distinct USER01."pseudo table" for read on a Row Hash to prevent global deadlock for USER01.t2. Next, we lock a distinct USER01."pseudo table" for write on a Row Hash to prevent global deadlock for USER01.t1. We lock USER01.t2 for read, and we lock USER01.t1 for write. We do an all-AMPs merge with unmatched inserts into USER01.t1 from USER01.t2 with a condition of ("USER01.t1.a1 = USER01.t2.a2"). The number of rows merged is estimated with low confidence to be 22 rows. We spoil the parser's dictionary cache for the table. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. No rows are returned to the user as the result of statement 1.
5) 6) ->
As you can see, there is no spooling of the source table, and there is no separate join step. Both of these operations are achieved by the single merge with unmatched inserts step, and the request runs very quickly compared to the equivalent conditional INSERT-SELECT request.