Beruflich Dokumente
Kultur Dokumente
Reprinted for KV Satish Kumar, IBM kvskumar@in.ibm.com Reprinted with permission as a subscription benefit of Books24x7, http://www.books24x7.com/
Table of Contents
Chapter 4: Multiload........................................................................................................................1 Why it is Called "Multi" Load..................................................................................................1 Two MultiLoad Modes: IMPORT and DELETE................................................................1 Block and Tackle Approach.............................................................................................2 MultiLoad Imposes Limits .................................................................................................3 Error Tables, Work Tables and Log Tables...........................................................................3 Supported Input Formats.......................................................................................................4 MultiLoad Has Five IMPORT Phases....................................................................................5 Phase 1: Preliminary Phase .............................................................................................5 Phase 2: DML Transaction Phase...................................................................................6 Phase 3: Acquisition Phase.............................................................................................6 Phase 4: Application Phase.............................................................................................7 Phase 5: Clean Up Phase ................................................................................................7 MultiLoad Commands............................................................................................................8 Two Types of Commands................................................................................................8 Parameters for .BEGIN IMPORT MLOAD.............................................................................9 Parameters for .BEGIN DELETE MLOAD...........................................................................12 A Simple Multiload IMPORT Script......................................................................................12 Building our Multiload Script................................................................................................13 Executing Multiload..............................................................................................................14 Another Simple MultiLoad IMPORT Script ..........................................................................15 . MultiLoad IMPORT Script....................................................................................................18 Error Treatment Options for the .DML LABEL Command ....................................................19 An IMPORT Script with Error Treatment Options................................................................21 A IMPORT Script that Uses Two Input Data Files...............................................................22 Redefining the INPUT..........................................................................................................24 A Script that Uses Redefining the Input...............................................................................24 DELETE MLOAD Script Using a Hard Coded Value...........................................................26 A DELETE MLOAD Script Using a Variable........................................................................27 An UPSERT Sample Script.................................................................................................28 . What Happens when MultiLoad Finishes .............................................................................29 MultiLoad Statistics........................................................................................................29 Troubleshooting Multiload Errors.........................................................................................30 RESTARTing Multiload........................................................................................................31 RELEASE MLOAD: When You DON'T Want to Restart MultiLoad.....................................31 MultiLoad and INMODs ........................................................................................................32 How Multiload Compares with FastLoad.............................................................................32
Chapter 4: Multiload
"In the end we'll remember not the sound of our enemies, but the silence of our friends." - Martin Luther King Jr.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months, monthly data is rotated in and out. At the end of every month, the oldest month of data is removed and the new month is added. The cycle is "add a month, delete a month, add a month, delete a month." In our illustration, that means that January data must be deleted to make room for May's data. Here is a question for you: What if there was another way to accomplish this same goal without consuming all of these extra resources? To illustrate, let's consider the following scenario: Suppose you have TableA that contains 12 billion rows. You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad DELETE to DELETE FROM TableA WHERE <date-column> < '2002-02-01'. The final step would be to INSERT the new rows for May using MultiLoad IMPORT.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
Each error table does the following: Identifies errors Provides some detail about the errors Stores the actual offending row for debugging
You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they default to ET_<target_table_name> and UV_<target_table_name>. In either case, MultiLoad will not accept error table names that are the same as target table names. It does not matter what you name them. It is recommended that you standardize on the naming convention to make it easier for everyone on your team. For more details on how these error tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors." Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed step. Work Table(s): MultiLoad will automatically create one worktable for each target table. This means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only works on one target table. The purpose of worktables is to hold two things: 1. The Data Manipulation Language (DML) tasks 2. The input data that is ready to APPLY to the AMPs
The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
BINARY
Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the smallest means of storage of for Teradata. FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the end of the record. TEXT Each record has a random number of bytes and is followed by an end of the record marker. UNFORMAT The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components FIELD, FILLER and TABLE. VARTEXT This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter characters in a row will result in a null value between them. Figure 5-1
Let's take a look at each phase and see what it contributes to the overall load process of this magnificent utility. Should you memorize every detail about each phase? Probably not. But it is important to know the essence of each phase because sometimes a load fails. When it does, you need to know in which phase it broke down since the method for fixing the error to RESTART may vary depending on the phase. And if you can picture what MultiLoad actually does in each phase, you will likely write better scripts that run more efficiently.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
these sessions are running on your poor little computer as well as on Teradata. Each session loads the data to Teradata across the network or channel. Every AMP plays an essential role in the MultiLoad process. They receive the data blocks, hash each row and send the rows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks on disk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for multiple sessions to be established. What about the extra two sessions? Well, the first one is a control session to handle the SQL and logging. The second is a back up or alternate for logging. You may have to use some trial and error to find what works best on your system configuration. If you specify too few sessions it may impair performance and increase the time it takes to complete load jobs. On the other hand, too many sessions will reduce the resources available for other important database activities. Third, the required support tables are created. They are the following: Type of Table Table Details ERRORTABLES MultiLoad requires two error tables per target table. The first error table contains constraint violations, while the second error table stores Unique Primary Index violations. WORKTABLES Work Tables hold two things: the DML tasks requested and the input data that is ready to APPLY to the AMPs. LOGTABLE The LOGTABLE keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART.
Figure 5-2 The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
At this point, Teradata does not care about which AMP receives the data block. The blocks are simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal with the blocks that they have been dealt. It is like a game of cards - you take the cards that you have received and then play the game. You want to keep some and give some away. Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get inserted into its target table, just yet. The receiving AMP must first do some preparation before that happens. Don't you have to get ready before company arrives at your house? The AMP puts all of the hashed rows it has received from other AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are reblocked, they can be sorted into the proper order for storage in the target table. Now the utility places a load lock on each target table in preparation for the Application Phase. Of course, there is no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.
Figure 5-3 Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-sequence sorted block from Phase 3 and each block of the base table is read only once to reduce I/O operations to gain speed. Then, all matching rows in the base block are inserted, updated or deleted before the entire block is written back to disk, one time. This is why the match tags are so important. Changes are made based upon corresponding data and DML (SQL) based on the match tags. They guarantee that the correct operation is performed for the rows and blocks with no duplicate operations, a block at a time. And each time a table block is written to disk successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again from the very beginning if a RESTART is needed. What happens when several tables are being updated simultaneously? In this case, all of the updates are scripted as a multi-statement request. That means that Teradata views them as a single transaction. If there is a failure at any point of the load process, MultiLoad will merely need to be RESTARTed from the point where it failed. No rollback is required. Any errors will be written to the proper error table.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
Code (&SYSRC). MultiLoad believes the adage, "All is well that ends well." If the last error code is zero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This being the case, all empty error tables, worktables and the log table are dropped. All locks, both Teradata and MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and the system count variables are set. After this, each MultiLoad session is logged off. So what happens if the final error code is not zero? Stay tuned. Restarting MultiLoad is a topic that will be covered later in this chapter.
MultiLoad Commands
Two Types of Commands
You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoad tasks are commands that are used by the MultiLoad utility for specific individual steps as it processes a load. Support functions are those commands that involve the Teradata utility Support Environment (covered in Chapter 9), are used to set parameters, or are helpful for monitoring a load. The chart below lists the key commands, their type, and what they do. MLOAD Command Type What does the MLOAD Command do? Support This command communicates directly with Teradata .BEGIN to specify if the MultiLoad mode is going to be [IMPORT] IMPORT or DELETE. Note that the word IMPORT MLOAD is optional in the syntax because it is the DEFAULT, but DELETE is required. We .BEGIN recommend using the word IMPORT to make the DELETE coding consistent and easier for others to read. Any MLOAD parameters for the load, such as error limits or checkpoints will be included under the .BEGIN command, too. It is important to know which commands or parameters are optional ince, if you do not include them, MultiLoad may supply defaults that may impact your load. Task The DML LABEL defines treatment options and .DML LABEL labels for the application (APPLY) of data for the INSERT, UPDATE, UPSERT and DELETE operations. A LABEL is simply a name for a requested SQL activity. The LABEL is defined first, and then referenced later in the APPLY clause. Task This instructs MultiLoad to finish the APPLY .END MLOAD operations with the changes to the designated databases and tables. Task This defines a column of the data source record that .FIELD will be sent to the Teradata database via SQL. When writing the script, you must include a FIELD for each data field you need in SQL. This command is used with the LAYOUT command. Task Do not assume that MultiLoad has somehow .FILLER uncovered much of what you used in your term papers at the university! FILLER defines a field that is accounted for as part of the data source's row format, but is not sent to the Teradata DBS. It is used with the LAYOUT command. Task LAYOUT defines the format of the INPUT DATA .LAYOUT record so Teradata knows what to expect. If one record is not large enough, you can concatenate
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
Support
.LOGON
Support
.LOGTABLE
Support
.LOGOFF
Task
.IMPORT
Support
.SET
multiple data records by using the LAYOUT parameter CONTINUEIF to tell which value to perform for the concatenation. Another option is INDICATORS, which is used to represent nulls by using the bitmap (1 bit per field) at the front of the data record. This specifies the username or LOGON string that will establish sessions for MultiLoad with Teradata. This support command names the name of the Restart Log that will be used for storing CHECKPOINT data pertaining to a load. The LOGTABLE is then used to tell MultiLoad where to RESTART, should that be necessary. It is recommended that this command be placed before the .LOGON command. This command terminates any sessions established by the LOGON command. This command defines the INPUT DATA FILE, file type, file usage, the LAYOUT to use and where to APPLY the data to SQL. Optionally, you can SET utility variables. An example would be {.SET DBName TO 'CDW_Test'}. This interrupts the operation of MultiLoad in order to issue commands to the local operating system. This is a command that may be used with the .LAYOUT command. It identifies a table whose columns (both their order and data types) are to be used as the field names and data descriptions of the data source records.
Support
.SYSTEM
Task
.TABLE
Figure 5-4
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
10
PARAMETER
AMPCHECK {NONE|APPLY|ALL}
REQUIRED OR NOT WHAT IT DOES Optional NONE specifies that MLOAD starts even with one down AMP per cluster if all tables are Fallback. APPLY (DEFAULT) specifies MLOAD will not start or finish Phase 4 with a down AMP. ALL specifies not to proceed if any AMPs are down, just like FastLoad. Short for Access Module, this command specifies input protocol like OLE-DB or reading a tape from REEL Librarian. This parameter is for network-attached systems only. When used, it must precede the DEFINE command in the script. You have two options: CHECKPOINT refers to the number of minutes, or frequency, at which you wish a CHECKPOINT to occur if the number is 60 or less. If the number is greater than 60, it designates the number of rows at which you want the CHECKPOINT to occur. This command is NOT valid in DELETE mode. You may specify the maximum number of errors, or the percentage, that you will tolerate during the
Optional
AXSMOD
Optional
CHECKPOINT
Optional
ERRLIMIT errcount [errpercent]
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
11
Optional
ERRORTABLES ET_ERR UV_ERR
Optional
NOTIFY {LOW|MEDIUM|HIGH|OFF
processing of a load job. Names the two error tables, two per target table. Note there is no comma separator. If you opt to use NOTIFY for a any event during a load, you may designate the priority of that notification: LOW for level events, MEDIUM for important events, HIGH for events at operational decision points, and OFF to eliminate any notification at all for a given phase. This refers to the number of SESSIONS that should be established with Teradata. For MultiLoad, the optimal number of sessions is the number of AMPs in the system, plus two more. You can also use MAX or MIN, which automatically use the maximum or minimum number of sessions to complete the job. If you pecify nothing, it will default to MAX. Tells MultiLoad how frequently, in minutes, to try logging on to the system. Names up to 5 target tables. Tells MultiLoad how many hours to try logging on
Optional
SESSIONS <MAX> <MIN>
Optional
SLEEP
Required
TABLES Tablename1, Tablename2, Tablename5
Optional
TENACITY
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
12
Optional
WORKTABLES Tablename1, Tablename2, Tablename5
when its initial effort to do so is rebuffed. Names the worktable(s), one per target table.
Figure 5-5
Remember, we'll still use the BTEQ utility to create our flat file.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
13
"If you don't know where you're going, any road will take you there." - Lewis Carrol Creating our Multiload script
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
14
Executing Multiload
"Ambition is a dream with a V8 Engine." - Elvis Presley You will feel like the King after executing your first Multiload script. Multiload is the Elvis Presley of data warehousing because nobody knows how make more records then Multiload. If you have the ambition to learn, this book will give you what it takes to steer through these utilities. We initialize the Multiload utility like we do with BTEQ, except that the keyword with Multiload Is mload. Remember that this Multiload is going to double the salaries of our employees. Let's execute our Multiload script
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
15
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
16
Telling the system to start loading Finishing loading and logging off of Teradata
This first script example is designed to show MultiLoad IMPORT in its simplest form. It depicts the loading of a three-column Employee table. The actual script is in the left column and our comments are on the right. Below the script is a step-by-step description of how this script works. Step One: Setting up a Logtable and Logging onto Teradata MultiLoad requires you specify a log table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once you name the Logtable, it will be automatically created for you. The Logtable may be placed in the same database as the target table, or it may be placed in another database. Immediately after this you log onto Teradata using the .LOGON command. The order of these two commands is interchangeable, but it is recommended to define the Logtable first and then to Log on, second. If you reverse the order, Teradata will give a warning message. Notice that the commands in MultiLoad require a dot in front of the command key word. Step Two: Identifying the Target, Work and Error tables In this step of the script you must tell Teradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then you will preface the names of these tables with the sub-commands TABLES, WORKTABLES AND ERROR TABLES. All you must do is name the tables and specify what database they are in. Work tables and error tables are created automatically for you. Keep in mind that you get to name and locate these tables. If you do not do this, Teradata might supply some defaults of its own! At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not specifically been named, the script would still execute and build these tables. They would have been built in the default database for the user. The name of the worktable would be WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and UV_EMPLOYEE_DEPT1, respectively. Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One customer calls this database CORP_WORK. This is where all of the logtables and worktables are normally created. You can use a DATABASE command to point all table creations to it or qualify the names of these tables individually. Step Three: Defining the INPUT flat file record structure MultiLoad is going to need to know the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous fields length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No field could have been replaced with the number 132 (CHAR(11)+CHAR(20)+CHAR(100)+1). Then, the .FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE can be used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is specified in the LAYOUT. Step Four: Defining the DML activities to occur The .DML LABEL names and defines the SQL that is to execute. It is like setting up executable code in a programming language, but using SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept table. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Are you allowed to use multiple labels in a script? Sure! But remember this: Every label must be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and its format type This step is vital! Using the .IMPORT command, we have identified the INFILE data as being contained in a file called "CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced the LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
17
DML LABEL called INSERTS that is, to INSERT the data rows into the target table. This is still a sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the real name of the file. Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until it finds the semi-colon to define the end of the command. This is how it determines one operation from another. Therefore, it is very important or it would have attempted to process the END LOADING as part of the IMPORT it wouldn't work. Step Six: Finishing loading and logging off of Teradata This is the closing ceremonies for the load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system. Important note: Since the script above in Figure 5-6 does not DROP any tables, it is completely capable of being restarted if an error occurs. Compare this to the next script in Figure 5-7. Do you think it is restartable? If you said no, pat yourself on the back. REQUIRED OR NOT Required Optional
WORKTABLES Tablename1
PARAMETER
TABLES Tablename1
WHAT IT DOES Names the Target table. Names the worktable one per target table. Names the two error tables, two per target table and there is no comma separator between them. Tells MultiLoad how many hours to try establishing sessions when its initial effort to do so is rebuffed.
Optional
ERRORTABLES ET_ERR UV_ERR
Optional
TENACITY
Figure 5-6
/* Simple Mload script .LOGTABLE SQL01.CDW_Log; .LOGON TDATA/SQL01,SQL0; */
Begins the Load Process by naming the Target Table, Work table and error tables; Notice NO comma between the error tables Names the LAYOUT of the .LAYOUT FILEIN; INPUT record and defines .FIELD Employee_No * CHAR(11); its structure; Notice the .FIELD Last_Name * CHAR(20); .FILLER Junk_stuff * CHAR(100); dots before the FIELD and FILLER and the .FIELD Dept_No * CHAR(6); semi-colons after each definition. Names the DML Label
.BEGIN IMPORT MLOAD TABLES SQL01.Employee_Dept1 WORKTABLES SQL01.CDW_WT ERRORTABLES SQL01.CDW_ET SQL01.CDW_UV; .DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1 (Employee_No ,Last_Name ,Dept_No )
Tells MultiLoad to INSERT a row into the target table and defines the row format.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
18
VALUES (:Employee_No ,:Last_Name ,:Dept_No ); .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS; .END MLOAD; .LOGOFF;
Lists, in order, the VALUES (each one preceded by a colon) to be INSERTed. Names the Import File and its Format type; Cites the LAYOUT file to use tells Mload to APPLY the INSERTs. Ends MultiLoad and Logs off all MultiLoad sessions
Figure 5-7
/* +++++++++++++++++++++++++++++++++++++*/ /* MultiLoad SCRIPT */ /*This script is designed to change the */ /*EMPLOYEE_DEPT1 table using the data found */ /* in IMPORT INFILE CDW_Join_Export.txt */ /* Version 1.1 */ /* Created by Coffing Data Warehousing */ /* +++++++++++++++++++++++++++++++++++++*/
Load Runs from a Shell Script Any words between /* */ are comments only and are not processed by Teradata.
Names and describes the purpose of the script; names the author Secures the logon by .LOGTABLE SQL01.CDW_Log; storing userid and .RUN FILE LOGON.TXT; password in a separate /*Drop Error Tables caution, this script cannot be file, then reads it.
restarted because these tables would be needed */ DROP TABLE SQL01.CDW_ET; DROP TABLE SQL01.CDW_UV;
Drops Existing error tables and cancels the ability for the script to restart DON'T ATTEMPT THIS AT HOME! Also, SQL does not use a dot (.) Begins the Load /* Begin Import and Define Work and Error Tables */ Process by telling us .BEGIN IMPORT MLOAD TABLES first the names of the SQL01.Employee_Dept1 target table, Work table WORKTABLES and error tables; note SQL01.CDW_WT ERRORTABLES NO comma between SQL01.CDW_ET the names of the error SQL01.CDW_UV; tables Names the LAYOUT of /* Define Layout of Input File */ the INPUT file.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
19
.LAYOUT FILEIN; .FIELD Employee_No .FIELD First_Name .FIELD Last_Name .FIELD Dept_No .FIELD Dept_Name
* * * * *
Defines the structure of the INPUT file. Notice the dots before the FIELD command and the semi-colons after each FIELD definition. Names the DML Label
/* Begin INSERT Process on Table */ .DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1 ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name ) VALUES ( :Employee_No ,:First_Name ,:Last_Name ,:Dept_No ,:Dept_Name );
Tells MultiLoad to INSERT a row into the target table and defines the row format. Note that we place comma separators in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed. Names the Import File and States its Format type; Names the Layout file to use And tells MultiLoad to APPLY the INSERTs. Ends MultiLoad and Logs off of Teradata
/* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS;
Figure 5-8
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
20
Figure 5-9 In IMPORT mode, you may specify as many as five distinct error-treatment options for one .DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you have a standard INSERT load that you know will end up recording about 20,000 duplicate row errors. Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the error table. By ignoring those errors, you gain three benefits: 1. You do not need to see all the errors. 2. The error table is not filled up needlessly. 3. MultiLoad runs much faster since it is not conducting a duplicate row check.
The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an UPSERT. So, this keeps these rows out of your error table. The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a row from the data source if that row does not exist in the target table because the update didn't find it.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
21
The table that follows shows you, in more detail, how flexible your options are: ERROR TREATMENT OPTIONS IN DETAIL .DML LABEL WHAT IT DOES OPTION MARK DUPLICATE This option logs an entry for all duplicate INSERT rows in the INSERT ROWS UV_ERR table. Use this when you want to know about the duplicates. IGNORE DUPLICATE This tells MultiLoad to IGNORE duplicate INSERT rows INSERT ROWS because you do not want to see them. MARK DUPLICATE This logs the existence of every duplicate UPDATE row. UPDATE ROWS IGNORE DUPLICATE This eliminates the listing of duplicate update row errors. UPDATE ROWS MARK MISSING This option ensures a listing of data rows that had to be UPDATE ROWS INSERTed since there was no row to UPDATE. IGNORE MISSING This tells MultiLoad NOT to list UPDATE rows as an error. UPDATE ROWS This is a good option when doing an UPSERT since UPSERT will INSERT a new row. MARK MISSING This option makes a note in the ET_Error Table that a row to DELETE ROWS be deleted is missing. IGNORE MISSING This option says, "Do not tell me that a row to be deleted is DELETE ROWS missing. DO INSERT for This is required to accomplish an UPSERT. It tells MultiLoad MISSING UPDATE that if the row to be updated does not exist in the target table, ROWS then INSERT the entire row from the data source.
Figure 5-10
/* +++++++++++++++++++++++++++++++++++++*/ /* MultiLoad SCRIPT */ /*This script is designed to change the */ /*EMPLOYEE_DEPT table using the data from */ /* the IMPORT INFILE CDW_Join_Export.txt */ /* Version 1.1 */ /* Created by Coffing Data Warehousing*/ /* +++++++++++++++++++++++++++++++++++++ */
Load Runs from a Shell Script Any words between /* */ are COMMENTS ONLY and are not processed by Teradata.
Names and describes the purpose of the script; names the author Sets up a Logtable and /* Setup the MulitLoad Logtables, Logon Statements*/ then logs on to .LOGTABLE SQL01.CDW_Log; Teradata. .LOGON TDATA/SQL01,SQL01;
DATABASE SQL01;
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
22
Drops Existing error tables in the work database. Begins the Load /* Begin Import and Define Work and Error Tables */ Process by telling us .BEGIN IMPORT MLOAD TABLES first the names of the Employee_Dept Target Table, Work WORKTABLES table and error tables WORKDB.CDW_WT ERRORTABLES are in a work database. WORKDB.CDW_ET Note there is no WORKDB.CDW_UV; comma between the names of the error tables (pair). Names the LAYOUT of /* Define Layout of Input File */ the INPUT file.
.LAYOUT FILEIN; .FIELD Employee_No .FIELD First_Name .FIELD Last_Name .FIELD Dept_No .FIELD Dept_Name * * * * * CHAR(11); CHAR(14); CHAR(20); CHAR(6); CHAR(20);
Defines the structure of the INPUT file. Notice the dots before the FIELD command and the semi-colons after each FIELD definition. Names the DML Label
/* Begin INSERT Process on Table */ .DML LABEL INSERTS IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Dept ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name) VALUES ( :Employee_No ,:First_Name, ,:Last_Name, ,:Dept_No, ,:Dept_Name);
Tells MultiLoad NOT TO LIST duplicate INSERT rows in the error table; notice the option is placed AFTER the LABEL identification and immediately BEFORE the DML function. Lists, in order, the VALUES to be INSERTed. Names the Import File and States its Format type; names the Layout file to use and tells MultiLoad to APPLY the INSERTs. Ends MultiLoad and logs off of Teradata
/* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS;
Figure 5-11
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
23
/* !/bin/ksh* /*MultiLoad IMPORT SCRIPT with two INPUT files */ /*This script INSERTs new rows into the /* Employee_table and UPDATEs the Dept_Name /*in the Department_table. /* Version 1.1 /* Created by Coffing Data Warehousing /* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.EMPDEPT_LOG; .RUN FILE c:\mydir\logon.txt;
*/ */ */ */ */ */ */
Load Runs from a Shell Script Any words between /* */ are comments only and are not processed by Teradata.
Sets up a Logtable and logs on with .RUN. The logon.txt file contains: .logon TDATA/SQL01,SQL01; Drops the worktables and error tables, in case they existed from a prior load; NOTE: Do NOT include IF you want to RESTART using CHECKPOINT. Identifies the 2 target tables with a comma between them. Names the worktable and error tables for each target table;
/* the following defines 2 tables for loading */ .BEGIN IMPORT MLOAD TABLES SQL01.Employee_Table, SQL01.Department_Table WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01.DEPT_UV;
Note there are NO commas between the pair of names, but there is a comma between this pair and the next pair. Names and Defines /* these next 2 LAYOUTs define 2 different records */ the LAYOUT of the 1st .LAYOUT FILEIN1; INPUT file .FIELD Emp_No * INTEGER;
.FIELD .FIELD .FIELD .FIELD LName FName Sal Dept_Num * * * * CHAR(20); VARCHAR(20); DECIMAL (10,2); INTEGER;
* CHAR(6); * CHAR(20);
Names and Defines the LAYOUT of the 2nd INPUT file Names the 1st DML Label; Tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. INSERT a row into the table, but does NOT name the columns. So
.DML LABEL EMP_INS IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Table VALUES (:Emp_No ,:FName ,:LName ,:Sal ,:Dept_Num);
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
24
all VALUES are passed IN THE ORDER they are defined in the Employee table.
.DML LABEL DEPT_UPD; UPDATE Department_Table SET Dept_Name = :DeptName WHERE Dept_No = :DeptNo;
Names the 2nd DML Label; Tells MultiLoad to UPDATE when it finds Deptno (record) equal to the Dept_No in the Department_table and change the Dept_name column with the DeptName from the INPUT file. Names the TWO Import Files Names the TWO Layouts that define the structure of the INPUT DATA files and tells MultiLoad to APPLY the INSERTs to target table 1 and the UPDATEs to target table 2. Ends MultiLoad and logs off of Teradata.
.IMPORT INFILE Emp_Data LAYOUT FILEIN1 APPLY EMP_INS; .IMPORT INFILE Dept_Data LAYOUT FILEIN2 APPLY DEPT_UPD;
Figure 5-12
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
25
/* !/bin/ksh*
*/
/* +++++++++++++++++++++++++++++++++++++*/ /* MultiLoad IMPORT SCRIPT with multiple target */ /*tables and DML labels */ /*This script INSERTs new rows into the */ /* Employee_table and UPDATEs the Dept_Name */ /*in the Department_table */ /* Version 1.1 */ /* Created by Coffing Data Warehousing */ /* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.EmpDept_Log; .LOGON TDATA/SQL01,SQL01;
Load Runs from a Shell Script Any words between /* */ are comments only and are not processed by Teradata.
Sets Up a Logtable and Logs on to Teradata; Optionally, specifies the database to work in. Identifies the 2 target /* 2 target tables, 2 work tables, 2 error tables per tables;
target table, defined in pairs BEGIN IMPORT MLOAD TABLES SQL01.Employee_Table, SQL01.Department_Table WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01 .DEPT_UV; .LAYOUT FILEIN; .FILLER Trans .FIELD Emp_No .FIELD Dept_Num .FIELD LName .FIELD FName .FIELD Sal .FIELD DeptNo .FIELD DeptName */
Names the worktable and error tables for each target tables; Note there is no comma between the names of the error tables but there is a comma between the pair of error tables. Names and defines the LAYOUT of the INPUT record. The FILLER is for a field that tells what type of record has been read. Here that field contains an "E" or a "D". The "E" tells MLOAD use the Employee data and the "D" is for department data. The definition for Dept_Num tells MLOAD to jump backward to byte 2. Where as the * for Emp_Num defaulted to byte 2. So, Emp_No and Dept_Num both start at byte 2, but in different types of records. When Trans (byte position 1) contains a "D", the APPLY uses the dept names and for an "E" the APPLY uses the employee data.
* * * * * * 2 *
CHAR (1); INTEGER; INTEGER; CHAR(20); VARCHAR(20); DECIMAL (10,2); INTEGER; CHsssssssAR(20);
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
26
.DML LABEL EMPIN IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Table VALUES ( :Emp_No ,:FName ,:LName ,:Sal ,:Dept_Num );
Names the 1st DML Label; Tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. Tells MultiLoad to INSERT a row into the 1st target table but optionally does NOT define the target table row format. All the VALUES are passed to the columns of the Employee table IN THE ORDER of that table's row format. Names the 2nd DML Label; Tells MultiLoad to UPDATE the 2nd target table but optionally does NOT define that table's row format. When the VALUE of the DeptNo equals that of the Dept_No column of the Department, then update the Dept_Name column with the DeptName from the INPUT file. Ends MultiLoad and logs off of Teradata.
.DML LABEL DEPTIN; UPDATE Department_Table SET Dept_Name = :DeptName WHERE Dept_No = :DeptNo;
.IMPORT INFILE UPLOAD.dat LAYOUT FILEIN APPLY EMPIN WHERE Trans = 'E' APPLY DEPTIN WHERE Trans = 'D' ; .END MLOAD; .LOGOFF;
Figure 5-13
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
27
.LOGTABLE RemoveLog; .LOGON TDATA/SQL01,SQL01; .BEGIN DELETE MLOAD TABLES Order_Table; DELETE FROM Order_Table WHERE Order_Date < '99/12/31';
Identifies the Logtable and logs onto Teradata with a valid logon string. Begins MultiLoad in DELETE mode and Names the target table. SQL DELETE statement does a massive delete of order data for orders placed prior to the hard coded date in the WHERE clause. Notice that this is not the Primary Index. You CANNOT DELETE in DELETE MLOAD mode based upon the Primary Index. Ends loading and logs off of Teradata.
Figure 5-14 How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here are a few that we saw:
At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command. You need not specify it in the .END MLOAD command. You will readily notice that this mode has no .DML LABEL command. Since it is focused on just one absolute function, no APPLY clause is required so you see no .DML LABEL. Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad command, so it has no dot prefix. Since default names are available for worktables (WT_<target_tablename>) and error tables (ET_<target_tablename> and UV_<target_tablename>), they need not be specifically named, but be sure to define the Logtable.
Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoad IMPORT. The IMPORT delete is used to remove small volumes of data rows based upon the Primary Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassing the Transient Journal. Because there is no Transient Journal, there are no rollbacks when the job fails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoad DELETE task is never based upon the Primary Index. Because we are not importing any data rows, there is neither a need for worktables nor an Acquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That statement will be applied to every table row. If the condition is met, then the row is deleted. Using the match tags, each target block is read once and the appropriate rows are deleted.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
28
.LOGTABLE RemoveLog; .LOGON TDATA/SQL01,SQL01; .BEGIN DELETE MLOAD TABLES Order_Table; .LAYOUT OldMonth .FIELD OrdDate * DATE;
Identifies the Logtable and logs onto Teradata with a valid logon string. Begins the DELETE task and names only one table, but still uses TABLES option.
Names the LAYOUT and defines the column whose value will be passed as a single row to MultiLoad. In this case, all of the order dates in the Order_Table will be tested against this OrdDate value. The condition in the WHERE clause is that the data rows with DELETE FROM Order_Table orders placed prior to the date value (:OrdDate) passed from WHERE Order_Date < :OrdDate; the LAYOUT OldMonth will be DELETEd from the Order_Table. Note that this time there is no dot in front of LAYOUT in this .IMPORT INFILE clause since it is only being referenced.
LAYOUT OldMonth ;
Figure 5-15
/* Setup Logtable, Logon Statements*/ .LOGTABLE SQL01.CDW_Log; .LOGON CDW/SQL01,SQL01; /* Begin Import and Define Work and Error Tables */ .BEGIN IMPORT MLOAD TABLES SQL01.Student_Profile WORKTABLES SQL01.SWA_WT ERRORTABLES SQL01.SWA_ET SQL01.SWA_UV; /* Define Layout of Input File */
Names and describes the purpose of the script; names the author. Sets Up a Logtable and then logs on to Teradata. Begins the Load Process by telling us first the names of the target table, work table and error tables. Names the LAYOUT of the INPUT file;
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
29
.LAYOUT FILEIN; .FIELD Student_ID .FIELD Last_Name .FIELD First_Name .FIELD Class_Code .FIELD Grade_Pt
* * * * *
An ALL CHARACTER based flat file. Defines the structure of the INPUT file; Notice the dots before the FIELD command and the semi-colons after each FIELD definition; Names the DML Label Tells MultiLoad to INSERT a row if there is not one to be UPDATED, i.e., UPSERT. Defines the UPDATE. Qualifies the UPDATE. Defines the INSERT. We recommend placing comma separators in front of the following column or value for easier debugging. Names the Import File and it names the Layout file to use and tells MultiLoad to APPLY the UPSERTs. Ends MultiLoad and logs off of Teradata
/* Begin INSERT and UPDATE Process on Table */ .DML LABEL UPSERTER DO INSERT FOR MISSING UPDATE ROWS; /* Without the above DO, one of these is guaranteed to fail on this same table. If the UPDATE fails because rows is missing, it corrects by doing the INSERT */ UPDATE SQL01.Student_Profile SET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID; INSERT INTO SQL01.Student_Profile VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt);
Figure 5-16
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
30
****08:06:41 UTY1803 Import Processing Statistics Import 1 Candidate Records considered . . . 70000 Apply conditions satisfied . . . . 70000 Total Thus Far 70000 70000
****08:06:38 UTY0818 Statistics for table Employee_Table INSERTS: 25000 UPDATES: 25000 DELETES: 0 ****08:06:41 UTY0818 Statistics for table Department_Table INSERTS: 0 UPDATES: 20000 DELETES: 0
Figure 5-17
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
31
THREE COLUMNS SPECIFIC TO THE APPLICATION ERROR TABLE Contains a certain value that disallows duplicate row errors in this table; can be ignored, if desired. DBCErrorCode System code that identifies the error. DBCErrorField Name of the column in the target table where the error happened; is left blank if the offending column cannot be identified. NOTE: A copy of the target table column immediately follows this column. Uniqueness Figure 5-20
RESTARTing Multiload
Who hasn't experienced a failure at some time when attempting a load? Don't take it personally! Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the impressive ability to RESTART from failures in either environment. In fact, it requires almost no effort to continue or resubmit the load job. Here are the factors that determine how it works: First, MultiLoad will check the Restart Logtable and automatically resume the load process from the last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where it left off. Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing! Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and start again from that very spot. Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run until complete. Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause will be enacted. The results are stored in the Logtable. During the Application Phase, CHECKPOINTs are logged each time a data block is successfully written to its target table. HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of records is assumed.
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
32
In DELETE mode, the point of no return was when Teradata received the DELETE statement. If the job halted in the Apply Phase, you will have to RESTART the job.
With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the RELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:
You are running a Teradata V2R3 or later version You use the correct syntax:
RELEASE MLOAD <target-table> IN APPLY
The load script has NOT been modified in any way The target tables either: Must be empty, or Must have no Fallback, no NUSIs, no Permanent Journals
You should be very cautious using the RELEASE command. It could potentially leave your table half updated. Therefore, it is handy for a test environment, but please don't become too reliant on it for production runs. They should be allowed to finish to guarantee data integrity.
You will find a more detailed discussion on how to write INMODs for MultiLoad in the chapter of this book titled, "INMOD Processing".
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition
33
Function Error Tables must be defined Work Tables must be defined Logtable must be defined Allows Referential Integrity Allows Unique Secondary Indexes Allows Non-Unique Secondary Indexes Allows Triggers Loads a maximum of n number of tables DML Statements Supported DDL Statements Supported Transfers data in 64K blocks Number of Phases Is RESTARTable Stores UPI Violation Rows Allows use of Aggregated, Arithmetic calculations or Conditional Exponentiation Allows Data Conversion NULLIF function Figure 5-21
MultiLoad Optional. 2 Error Tables have to exist for each target table and will automatically be assigned. No Optional. 1 Work Table has to exist for each target table and will automatically be assigned. No Yes No No No No No Yes No No One Five INSERT INSERT, UPDATE, DELETE, and "UPSERT" CREATE and DROP DROP TABLE TABLE Yes Yes Two Five Yes Yes, in all 5 phases (auto CHECKPOINT) Yes Yes No Yes Yes, 1 per column Yes Yes Yes
FastLoad Yes