Sie sind auf Seite 1von 9

GE Medical Systems Teradata Index Diagrams

Data Storage Using PPI

/var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems

Teradata Terminology AMP (Access Module Processors) are virtual processors ( vprocs ) that receive steps from PEs and perform database functions to retrieve or update data. Each AMP is associated with one virtual disk, where the data is stored. An AMP manages only its own vdisk, not the vdisk of any other AMP. When data is loaded, inserted, and updated, the AMP receives incoming data from the PE, formats rows and distributes them on its vdisk. Each AMP has console utilities that can rebuild tables and configure and reconfigure the system. The subsystem on the AMP can: Lock databases and tables Create, modify, or delete definitions of tables Join tables Insert, delete, or modify rows within tables Sort, aggregate, or format data Retrieve information from definitions and rows from tables AWS Is an administration workstation. GUI environment to configure, monitor and manage a Teradata system. Management utilities can be used. BTEQ Basic Teradata Query Utility can be used to enter SQL statements. BYNET a high-speed interconnect network that connects the nodes and allows the virtual processors to communicate. Because there are at least two Bynet networks, if one BYNET detects an unusable path in either network, it will automatically reconfigure that network so all messages avoid the unusable path. Additionally, in the rare case that BYNET 0 cannot be reconfigured, hardware on BYNET 0 is disabled and messages are re-routed to BYNET 1. Hardware: BYNET boards and cables. Carries broadcast messages to all nodes and messages from one node to another. Software: On each node, consisting of a BYNET driver that is an interface between the PDE software and the BYNET hardware Channel Driver - The means of communication between the PEs and applications running on channel-attached (mainframe) clients. Each node has one channel driver. Clique - is a group of nodes ( recommendation is 4 and this should be the same for all cliques ) that share access to the same disk arrays. Each multi-node system has at least one clique. The cabling determines which nodes are in which cliques -- the nodes of a clique are connected to the disk array controllers of the same disk arrays. The overall system is connected via the BYNET hardware. Database - A defined object that may contain a collection of Teradata objects. DBQM Teradata Dynamic Query Manager

/var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems Analysis Control Thresholds can restrict requests that will exceed a certain processing time, or whose expected result set size exceeds a specified number of rows. Object Control Thresholds can limit access to and use of static criteria such as database objects and other items. Object controls can control workload requests based on user IDs, tables, views, date, time, macros, databases, and group IDs. Environmental Factors can manage requests based on dynamic environment factors, including database system CPU and disk utilization, network activity, and number of users. EXPLAIN a modifier that allows you to preview how Teradata will execute an SQL request by stating text describing a plan for how the statement will be processed. It includes an estimate of the number of rows involved and a relative cost of the request. The relative cost is shown in units of time, and should not be used to predict actual response time for an SQL request. This time estimate can be used to compare the duration of request processing relative to other plans. When you execute a request preceded by the EXPLAIN modifier, the request is not executed. The request is fully parsed and optimized and a complete plan for executing the request in readable English. Example: EXPLAIN SELECT * FROM tablename; Fallback A Teradata feature that protects data against AMP failure by using groups of AMPs that provide for data availability and consistency if an AMP is unavailable. Fallback protection can be setup on a table-by-table basis. Space requirements, in addition to the original table, is 100% additional storage space for each fallback protected table, plus RAID protection is needed for fallback-protected tables. Lastly, there will be twice as much input/output (I/O) for Inserts, Updates, and Deletes of rows in Fallbackprotected tables. Gateway - The means of communication between the PEs (on the node) and applications running on network-attached clients or a node in the system. Each node has one Gateway. ODBC and CLIv2 can be used for communication between client applications and the Teradata Gateway on a Teradata node. GIVE Data Control Language to transfer database ownership. HELP Used to display information about objects like DATABASE, USER, TABLE, VIEW, MACRO, TRIGGER, PROCEDURE, COLUMN, INDEX, STATISTICS. Locks Four lock types exist. Exclusive are for database and tables and no one can access those areas in any way at the time of the lock. Write are for data where read consistency is important, thus only access lock usage of that data is allowed at the time of this type of lock. Read are for data consistency and prevent and write locks from being applied. Access is for when the data consistency does not matter and only exclusive locks are prevented. Locks may be applied at three levels: Database Locks: Apply to all tables and views in the database. Table Locks: Apply to all rows in the table. /var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems Row Hash Locks: Apply to a group of one or more rows in a table. Macro A macro is a Teradata extension to ANSI SQL that defines a sequence of prewritten SQL statements. To execute the macro, you use one EXECUTE statement, and the statements in the macro are processed as a single transaction. Every time the macro runs its commands are re-optimized based on the current environment. Privileges needed to work with macros are EXEC, DROP and CREATE. The commands for are resident on the Teradata server, so there is less I/O (input/output) traffic used to execute them. MPP - A massively parallel processing system that typically has two BYNET networks (BYNET 0 and BYNET 1). Node Contains multiple CPUs sharing a memory pool. Contains multiple virtual processors called vprocs. Hardware components: CPU, System Disks, Memory, System Bus, Host Channel adapter and Channel Connection, Ethernet Cards and LAN Cables Software components: Application, Channel Driver, Gateway, AMP, PE, Teradata PDE and OS, BYNET software, Teradata client software Non-Unique Primary Index (NUPI) - For a given row, the combination of the data values in the columns of a Non-Unique Primary Index can be duplicated in other rows within the table. Non-Unique Secondary Index (NUSI) - is usually specified to prevent full table scans, in which every row of a table is read. Accessing a row with a NUSI requires all AMPs. The Non-Unique Secondary Indexes are stored in subtables on the same AMPs as their data rows. Optimizer - Has knowledge of the system components like how many nodes, vprocs, etc. exist ( parallel aware ). The Optimizer develops the least expensive plan (in terms of time) to return the requested response set by evaluating alternative plans and choosing the fastest one. The plan is converted into executable steps and passed on to the Dispatcher. Partitioned Primary Index (PPI) This is used to improve performance for large tables when you submit queries that specify a range constraint. PPI allows you to reduce the number of rows to be processed by using a new technique called partition elimination. PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints. Using PPI, the rows are stored first by partition and then by row hash. PDE - (Parallel Database Extensions) software layer runs on the operating system on each node. This additional software layer was created by NCR to support the parallel environment. Translates LUNs into vdisks using slices ( pdisks ) in conjunction with a Teradata utility called pdeconfig.

/var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems pdisk Are user slices are assigned to an AMP through the software. No cabling is involved. Pdisks make-up an AMPs vdisk. PE - (Parsing Engines) are vprocs that receive SQL requests from the client and break the requests into steps. The PEs send the steps to the AMPs and subsequently return the answer to the client. Each PE can support a maximum of 120 sessions. The PE handles an incoming request in the following manner: Verifies the request for session authorization (user names and passwords), and either allows or disallows the request. Interprets the SQL statement received from the application. Verifies SQL requests for the proper syntax. Consults the Data Dictionary to ensure that all objects exist and that the user has authority to access them. The Optimizer develops the least expensive plan which is converted into executable steps and passed on to the Dispatcher. The Dispatcher ( part of the PE ) controls the sequence in which the steps are executed and passes the steps on to the BYNET for execution by the AMPs. After the AMPs process the steps, the PE receives their responses over the BYNET. The Dispatcher builds a response message and sends the message back to the user. PERM space - The maximum amount of permanent space assigned and available to a User or Database to store tables. A permanent space limit is defined and then the space is consumed dynamically as needed. Permanent Journal - It provides selective or full database recovery to a specific point in time. It also can reduce the need for costly and time-consuming full-table backups. The use of Permanent Journals is possible at the table level. They are tables stored on disk arrays like user data, so they take up additional disk space on the system. Each database can have one permanent journal. When stating if these journals should be used you must state if you want Before Images ( rollback ) and/or After Images ( rollforward ) and whether Fallback protection will be applied. Periodically, the Teradata DBA can dump the Permanent Journal to external media, thus reducing the need for full-table backups. pdisk - user slices in UNIX or partitions in Windows NT, whose size and configuration vary based on RAID level. The pdisks logically combine to comprise an AMP's vdisk. Primary Key Needed only when designing a database, as a mechanism for maintaining referential integrity according to relational theory. The Teradata RDBMS itself does not require keys in order to manage the data, and can function fully with no awareness of Primary Keys. Primary Index - When the Primary Index for a table is well chosen, the table rows are evenly distributed across the AMPs for the best performance. The way to guarantee even distribution of data is by choosing a Primary Index whose columns contain unique values. /var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems The even distribution enables each AMP to be responsible for only a subset of the rows in a table, thus the work is evenly divided among the AMPs, so they can work in parallel and complete their processing about the same time. Rule 1: Every table must have ONLY one Primary Index Rule 2: A Primary Index value can be unique or non-unique. Rule 3: The Primary Index value can be NULL. Rule 4: The Primary Index value can be modified. Rule 5: The Primary Index of a table cannot be modified. Rule 6: A Primary Index has a limit of 64 columns. When the primary index value in a row is changed, Teradata re-hashes it and redistributes the row to its new location based on its new index value. In the event that you need a new Primary Index, you must drop the table, recreate it with the new Primary Index, and reload the table. The ALTER TABLE statement does allow you to change the PI of a table if the table is empty. Reconfiguration - During a reconfiguration, no data is accessible to users until the system is operational in its new configuration. Recovery Journal Used for two situations, 1 for interrupted transactions ( Transient Journal ) keeps "before images" of changed rows until the transaction is complete and removes the images after completed or changes are rolled back 2 for when an AMP fails ( Down AMP Recovery Journal ) Used with Fallback-protected tables to maintain a record of write transactions (updates, creates, inserts, deletes, etc.) on the failed AMP while it is unavailable. There can be one Down-AMP Recovery Journal for each cluster on the system. It starts automatically after the loss of an AMP. Any changes to the data on the failed AMP are logged into the Down-AMP Recovery Journal by the other AMPs in the cluster. When the failed AMP is brought back online, the restart process includes applying the changes in the Down-AMP Recovery Journal to the recovered AMP. The journal is discarded once the process is complete, and the AMP is brought online, fully recovered. 3 both are used at system restarts After a system restart, the Transient Journal entries are merged with the Down-AMP Recovery Journal entries to create a Recovery Journal for that restart. Once recovery is complete, the Recovery Journal is purged automatically Row ID - To differentiate each row in a table, every row is assigned a unique Row ID. The Row ID is the combination of the row hash value and a uniqueness value. The uniqueness value is used to differentiate between rows whose Primary Index values generate identical row hash values. In most cases, only the row hash value portion of the Row ID is needed to locate the row. Secondary Index - an alternate data access path. It allows you to access the data without having to do a full table scan. Secondary indexes do not affect how rows are distributed among the AMPs. You can drop and recreate secondary indexes dynamically, as they are /var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems needed. They are stored in separate subtables that require extra overhead in terms of disk space, and maintenance which is handled automatically by the system. So, Secondary Indexes do require some system resources. 0 to 32 Secondary Indexes can be assigned to a table for multiple data access paths. Rule 1: Secondary Indexes are optional. Rule 2: Secondary Index values can be unique or non-unique. Rule 3: Secondary Index values can be NULL. Rule 4: Secondary Index values can be modified. Rule 5: Secondary Indexes can be changed. Rule 6: A Secondary Index has a limit of 64 columns. SHOW - statement to display the data definition language (DDL) associated with the following objects: tables, views, macros, triggers, stored procedures and join indexes. SPL Stored Procedure Language ( procedural statements ) Spool Space - The amount of space assigned and available to a User or Database to gather answer sets. When executing a query with a WHERE clause, qualifying rows are temporarily stored using spool space. Depending on how the system is set up, a single query could temporarily use all available system space to store its result in spool. Stored Procedure - Combination of procedural and non-procedural statements run using a single CALL statement. In Teradata, the procedural statements are allowed only in a stored procedure, so the terms SQL and SPL are used to differentiate between the nonprocedural and procedural statements. The commands for stored procedures are resident on the Teradata server, so there is less I/O (input/output) traffic used to execute them. Subtable - Each Teradata RDBMS table is stored in a set of subtables. There is one table for each kind of data, including: Table headers Primary data Fallback data Secondary Indexes Fallback Secondary Indexes TDP - Teradata Director Program software to manage session traffic, installed on the channel-attached client. TPA - Trusted Parallel Application uses PDE to implement virtual processors (vprocs). The Teradata RDBMS is classified as a TPA. Its four components are AMP, PE, Channel Driver and Gateway Unique Primary Index (UPI) - For a given row, the combination of the data values in the columns of a Unique Primary Index are not duplicated in other rows within the table.

/var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems Unique Secondary Index (USI) Two purposes exist for this type of index. Enforces uniqueness in a column or group of columns or speeds up access to a row because accessing a row with a USI requires one or two AMPs. The Unique Secondary Indexes are hash distributed separately from the data rows, based on their USI value. UNIX MP-RAS - one of the operating system platforms that the Teradata runs on. NCR made enhancements to the UNIX MP kernel to improve reliability, availability, and serviceability (RAS). User A database that has a logon ID and password for logging on to Teradata.

vdisk Is the logical disk space that is managed by an AMP. Depending on the configuration, a vdisk may not be contained on the node; however, it is managed by an AMP, which is always a part of the node. The vdisk is made up of 1 to 64 pdisks. The pdisks logically combine to comprise the AMP's vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP manages only its own vdisk, not the vdisk of any other AMP. vproc - A virtual processor which is a collection of software processes running under the operating system's multi-tasking environment: The two types of Teradata vprocs are: AMP (Access Module Processor) PE (Parsing Engine) Communication within the node is done by the PDE and BYNET software using Point-to-point or Multicast or Broadcast messaging. The difference between multicast and broadcast messaging is that a multicast message is only to specific vprocs and not all of them.

/var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

GE Medical Systems Failure Points and Recovery Node Failure: After restart of Teradata, cliques allow for continuous data access through vproc migration. When a node resets, the following happens to the AMPs: When the node fails, Teradata restarts across all remaining nodes in the system. The vprocs from the failed node migrate to the operational nodes in its clique. Processing continues while the failed node is being repaired.

/var/www/apps/conversion/tmp/scratch_1/129646293.doc
TCS Internal

Das könnte Ihnen auch gefallen