Sie sind auf Seite 1von 4

TeraData Performance Analysis Tips Key Points: Stat Collection Pack Disk Data Skew Analysis Lock monitoring

(Locking Logger) Session tuning (DBS Control parameters) 1. Stat Collection : Statistics on the loaded tables are very important from the performance perspect ive. Statistics helps optimizer to generate the accurate & faster query plans. An old statistics in warehouse can lead to wrong query plans which may take time in query processing. Hence it is much required to refresh the statistics periodically at least in the warehousing environment. We can identify the tables which are frequently accessed & modified with inserts , updates, deletes etc. It is recommended to refresh the stats after every 10% of data change. We can co llect the statistics at column level or at index level. Syntax: Collect statistics on column (column_name 1,.., column_name n); OR Collect statistics on index (column_name 1,.., column_name n);

2.Pack Disk : Pack disk is an utility that free up the cylinder space on the database, this utility must be run periodically as in the warehouse environment large amount of data inserts, updates are happening which causes the physical memory to disorder due to freque nt data manipulation. Pack disk utility allows us to restructure & physically reorder the data, free u p space same as defragmentation. Teradata also run mini CYLPACKs automatically, if cylinder space goes below the prescribed limit. Cylinder space is required for the merge operation while the data Insert, Delete s Updates etc. To run a pack disk we use Ferret utility provided by Teradata can be run throug h Teradata Manager Tool or through telnet on node session. The set of commands that starts packdisk utility are given below one can create a kron job to schedule the same & run it periodically.

Commands to run pack defrag & packdisk utilities : ~ferret defrag Y packdisk fsp=8 Y 3.Skew Analysis : Primary index of a table in Teradata is responsible for the data distribution on all the AMPs. Proper data distribution is required for the paral lel processing in the system. As Teradata system follows shared nothing architecture, all the AMPs works in pa rallel. If data is evenly distributed amongst the AMPs then the amount of the work done by every AMP would be equal & time required for particular job would obviously be lesser. In contrast to this if only one/two AMPs are flooded with the data i.e. data skew then while running that job the two AMPs would be working & othe rs will be idle. In this case we won t be utilizing the parallel processing power of the system. To avoid such data skew need to analyze the primary index of the tables in Tera data database over the period of time it might happen that data is getting accumulate at the few AMPs, which can have a adverse effect on the ETL as well as the system performance. To analyze the data distribution for the table we can use the inbuilt HASH funct ions provided by the Teradata. To check the data distribution for a table one can use a query: SELECT HASHAMP (HASHBUCKET (HASHROW (Column 1,.., column n))) AS AMP_NUM, count(*) From Table_Name Group by 1; This query will provide the distribution of records on each AMP we can also analyze the probable PIs with this query which will predict the data distribution on the AMPs 4.Lock monitoring : Locking Logger is an utility that enables us to monitor the locking on the tables. Using this utility we can create a table that has the ent ries for the locks which have been applied to the tables while processing. This utility a llows us to analyze the regular ETL process, jobs being blocked at particular time when ther e is no one to monitor the locking. By analyzing such locking situations we can modify the jobs & avoid the waiting period due to such situations. To apply this locking loggers First, we need to enable locking logger via the DB S console window or the cnsterm subsystem. The setting does not take effect until the database is restarted.

LockLogger - This Field defines the system default for the locking logger. This allows the DBA to log the delays caused by database locks to help in identi fying lock conflicts. To enable this feature set the field to TRUE. To disable the feature set the fie ld to FALSE. After a database restart with the LockLogger flag set to true, the Locking Logge r will begin to accumulate lock information into a circular memory buffer of 64KB. Depending on how frequently the system encounters lock contention, this buffer will wrap, but it will usually sp an a several day period. Following a period of lock contention, to analyze the lock activity, you need to run the dumplocklog utility which moves the data from the memory buffer to a database table where it can be accessed. 5.Session Tuning: Session tuning is done for the running the load utilities in parallel this requires to analyze some DBScontrol parameters & tune the same to provide t he best parallel processing of the load utilities. There are two parameters MaxLoa dAWT & MaxLoadTasks that enables the parallel job management a short note on the same: The MaxLoadAWT internal field serves two purposes: 1) Enabling a higher limit for the MaxLoadTasks field beyond the default limit o f 15. 2) Specifying the AMP Worker Task (AWT) limit for concurrent FastLoad and MultiL oad jobs when a higher limit is enabled. In effect, this field allows more FastLoad, MultiLoad, and FastExport utilities running concurrently while controlling AWT usage and preventing excessive consumption an d possible AWT exhaustion. The default value is zero: When MaxLoadAWT is zero, concurrency limit operates i n the same manner as prior to V2R6.1 MaxLoadTasks specifies the concurrency limit for all three ut ilities: FastLoad, MultiLoad, and FastExport. The valid range for MaxLoadTasks is from 0 to 15. When MaxLoadAWT is non-zero (higher limit enabled): It specifies the maximum num ber of AWTs that can be used by FastLoads and MultiLoads. Maximum allowable value is 60% of the total AWTs. The valid range for MaxLoadTasks is from 0 to 30. A new FastLoad/MultiLoad job is allowed to start only if BOTH MaxLoadTasks AND M axLoadAWT limits are not reached. Therefore, jobs may be rejected before MaxLoadTasks limit is exceeded. MaxLoadTa sks specifies the concurrency limit for the combination of only two utilities: FastLoad and MultiLoad. FastExp ort is managed differently;

FastExport is no longer controlled by the MaxLoadTasks field. A FastExport job is only rejected if the total number of active utility jobs is 60. At least 30 FastExport jobs can run at any time. A FastExport job may be able to run even when FastLoad and MultiLoad jobs are rejected. When a Teradata Dynamic Workload Manager (TDWM) utility throttle rule is enabled , the MaxLoadAWT field is overridden. TDWM will use the highest allowable value which is 60% of total AWTs. Update to MaxLoadAWT becomes effective after the DBS control record has been written. No DBS restart is required. Note that when the total number of AWTs (specified by the internal field MaxAMPWorkerTasks) has been modified but a DBS restart has not occurred, then there may be a discrepancy between the actual number of AWTs and the DBS c ontrol record. The system may internally reduce the effective value of MaxLoadAWTs to prevent AWT exhaustion. AWT Usage of Load Utilities: All load/unload utilities require and consume AWTs at different rates depending on the execution phase: FastLoad: - Phase 1 (Loading): 3 AWT's - Phase 2 (End Loading): 1 AWT's MultiLoa d*: - Acquisition Phase (and before): 2 AWT's. Application Phase (and after): 1 AWT's FastExport: - All. This description is for the single target table case which is the most common. The above explained parameters can be analyzed & tuned accordingly to achieve the expected performance on the Teradata system. Also need to have some maintenance/ House keeping activities in place to avoid the performance implications due to some physical data parameters like data skew , less cylinder space etc.

Das könnte Ihnen auch gefallen