Sie sind auf Seite 1von 10
Teradata SQL: Unleash the Power Coffing Data by Warehousing. Michael Larkins (c) and 2001. Tom
Teradata SQL: Unleash the Power Coffing Data by Warehousing. Michael Larkins (c) and 2001. Tom

Teradata SQL: Unleash the Power

Coffing Data by Warehousing. Michael Larkins (c) and 2001. Tom Copying Coffing Prohibited.

Reprinted for madhukar.devarasetti@accenture.com Accenture madhukar.devarasetti@accenture.com,

madhukar.devarasetti@accenture.com Reprinted with permission http://skillport.books24x7.com/ as a subscription benefit of Skillport,

All rights reserved. Reproduction other forms without and/or written distribution permission in whole is or prohibited. in part in electronic,paper or

other forms without and/or written distribution permission in whole is or prohibited. in part in electronic,paper

Teradata SQL: Unleash the Power

Chapter 1: Teradata Parallel Architecture

Teradata Introduction

of central management The this world's data repository will largest system be of collected data the (RDBMS). data warehouses on-line that A reflects data as commonly a warehouse result the effectiveness of use normal is the normally superior business of the loaded technology methodologies operations. directly of from The NCR's used data operational in Teradata warehouse running data. relational a business. therefore The majority, database acts if as not a all

As normally information, rules a result, to determine this the to data discover data is such not loaded trends aspects changed into and the as once the profitability, warehouse effectiveness it is loaded. return is mostly Instead, of on operational investment historic it is interrogated in procedures. nature. and evaluation To repeatedly get This a of true interrogation risk. to representation transform is based data of into the on useful business, business

investigation For example, of an the airline data might could load indicate all of the its frequency maintenance at which activity certain on every parts aircraft tend to into fail. the Further database. analysis Subsequent might show that on ability the hand parts to when plan are failing for and the maybe more next often failure where on and it certain is maybe needed, models even or the of the aircraft. part type might of The airplane be first proactively benefit on which of changed the the new part prior will found fail. to knowledge its Therefore, failure. regards the part the can be If that the the information aircraft manufacturer reveals that has the part a problem is failing with more the frequently design or production on a particular of that model aircraft. of aircraft, Another this possible could be cause an indication is that the if you maintenance do not know crew that is a doing problem something exists. There incorrectly is incredible and contributing power and to savings the situation. in this Either type of way, knowledge. you cannot fix a problem

Another business area where the Teradata database excels is in retail. It provides an environment that can store billions of business. world. sales. This Whether This is a type critical it is used of knowledge capability for inventory when is not control, you easily are attainable marketing recording without and research analyzing detailed or credit the data analysis, sales that of records every the data item every provides in aspect every an store of insight the around business. into the the priceless go Tracking out of inventory business. perspective turns, into stock the operation replenishment, of a retail or predicting outlet. This the information number of is goods what enables needed in one a particular retailer to store thrive yields while a others

Teradata margin environment. is flourishing Continually, with the realization businesses that are detail forced data to is do critical more to with the less. survival Therefore, of a business it is vital in to a maximize competitive, the lower efforts that work well to improve profit and minimize or correct those that do not work.

was Unfortunately, One realized computer in increased vendor profit. Prior used sales to these this meant realization, same increased techniques the losses. sales to determine effort Today, had that that attempted company it cost to more is make doing to sell up much the into loss better the by desktop and selling has environment more made computers. a huge than step into profitability by discontinuing the small computer line. Teradata Architecture The environment. nodes standard for speed. Teradata working internal Some database together and of external these currently in a single systems bus runs architectures system. consist normally The of on a NCR like single NCR PCI nodes Corporation's processing and are SCSI, based and node WorldMark entirely standard (computer) on memory Systems industry while modules standard others in the UNIX are with CPU several MP-RAS 4-way processor hundred interleaving chips,

At Microsoft the same NT time, and Windows Teradata 2000. can run This on single any hardware node may server be any in the computer single node from environment a large server when to a laptop. the system runs

Whether RDBMS large systems uses the system the is the exact number consists same of of components processing a single node components. executing or is a massively on all the nodes parallel in system parallel. with The hundreds only difference of nodes, between the Teradata small and

When speed. path, with each dual these To consecutive redundant facilitate components the communications node communications, exist added on different into channel. the the nodes, system. multi-node Another it is There essential systems amazing is more that use capability detail the the components on BYNET of the the BYNET interconnect. BYNET communicate later is that in It this the is a with chapter. bandwidth high each speed, other increases multi- at high

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 2 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

Teradata Components

architectural As previously design. mentioned, It is the Teradata parallel is processing the superior by product the major today components because of that its provide parallel the operations power to based move on mountains its of coordinated user data. request. Teradata human Therefore, works efforts. more a monumental It like uses the smaller early task Egyptians nodes is completed running who built several in record the pyramids processing time. without components heavy equipment all working using together parallel, on the same

Teradata Engine Processors, operates with Access three Module major components Processors and to achieve the Message the parallel Passing operations. Layer. The These role components of each component are called: is discussed Parsing in pursue the next the sections SQL that to allows provide storage a better and understanding access of the of data. Teradata. Once we understand how Teradata works, we will

Parsing Engine Processor (PEP or PE)

is used The the Parsing by primary Teradata. Engine director It Processor provides task within the (PEP) entry Teradata. or point Parsing into the Engine database (PE), for for short, users on is one mainframe of the two and primary networked types computer of processing systems. tasks It Within As users each "logon" of these to the sessions database users they submit establish SQL a as Teradata a request session. for the Each database PE can server manage to take 120 an concurrent action on their user behalf. sessions. database stored The PE in will a object table then is and parse a table. it is the defined A SQL table statement using is a two-dimensional columns. to establish An example which array of database that a row consists might objects of be rows are the and sale involved. columns. of an For item now, A and row let's its represents columns assume an include that entity the the UPC, a description and the quantity sold.

administrator. exist Any action within a the user Once objects requests their referenced. authorization must also go at through the object a security level is verified, check to the validate PE will their verify privileges that the as columns defined requested by the database actually

each needed Next, table, the to PE retrieve the optimizes indices the data. defined, the SQL The the to PE create type is responsible of an indices, execution for the passing selectivity plan that the is level optimized as efficient of the execution indices, as possible and plan the based to number other on the components of amount processing of as data steps the in best way to gather the data.

use means An execution of an that index all plan rows is preferable might in the use table the and must primary will be be read discussed index and column compared later assigned in this to chapter. locate to the the table, For requested now, a secondary it is data. sufficient index to or say a full that table a full scan. table The scan

Although the data is a divided full table up scan and sounds distributed really to multiple, bad, within parallel the architecture components of throughout Teradata, it the is not database. necessarily We will a bad look thing next because at the AMPs PE has that no perform disks. the parallel disk access using their file system logic. The AMPs manage all data storage on disks. The

Activities of a PE:

n

Convert incoming requests from EBCDIC to ASCII (if from an IBM mainframe)

n

Parse the SQL to determine type and validity

n

Validate user privileges

n

Optimize the access path(s) to retrieve the rows

n

Build an execution plan with necessary steps for row access

n Send the plan steps to Access Module Processors (AMP) involved Access Module Processor (AMP)

retrieves The next the major distributed component data of in Teradata's parallel. Ideally, parallel the architecture data rows is of called each table an Access are distributed Module Processor evenly across (AMP). all It the stores AMPs. and

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 3 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

The built AMPs by the read PE after and it write completes data and the are optimization, the workhorses and execute of the database. them. The Their AMPs job are is to designed receive to the work optimized in parallel plan to steps, complete the request in the shortest possible time.

automatically the Optimally, SQL asks every divides for AMP a specific up should the row, work contain that of retrieving row a subset exists the of in all its data. the entirety Remember, rows (all loaded columns) all into work every on comes a single table. as By AMP a result dividing and of other up a users' the rows data, SQL exist it request. on the If other AMPs.

access it each complete If owns the AMP user each all the of request is the others' only retrieval rows. responsible asks data of Within for all rows, all rows. Teradata, for of and the its This there rows, rows type the is in not of AMP no a processing the table, need environment rows every for that them is AMP belong called to is should do a an "shared to so. all a participate different AMP nothing" operation AMP. along configuration. As and with far an as all all the the The rows AMPs other AMPs scan. AMPs are cannot However, concerned, to Once the rows have been selected, the last step is to return them to the client program that initiated the SQL request. consolidation the Since rows the is rows never process are performed. scattered is accomplished Instead, across multiple all as AMPs a part AMPs, sort of the only they transmission their must rows be consolidated (at to the the same client before time so that reaching in a parallel) final comprehensive the and client. the This Message sort of all Passing Layer is used to merge the rows as they are transmitted from all the AMPs.

done the Therefore, individual in parallel. when sorts Each a client are AMP complete, wishes sorts to only the sequence BYNET its subset the merges of rows the the rows of an sorted at answer the rows. same set, Pretty time this technique brilliant! all the other causes AMPs the sort sort their of rows. all the Once rows all to be of

Activities of the AMP:

n

Store and retrieve data rows using the file system

n

Aggregate data

n

Join processing between multiple tables

n

Convert ASCII returned data to EBCDIC (IBM mainframes only)

n Sort and format output data Message Passing Layer (BYNET) The Message Passing Layer varies depending on the specific hardware on which the Teradata database is executing. In the in 1998, latter Teradata part of the was 20 th released century, on most Microsoft's Teradata NT database operating systems system. executed Today it under also executes the UNIX under operating Windows system. 2000. However, The initial release of Teradata, on the Microsoft systems, is for a single node.

terabytes. system the When basis using in for the the storing world UNIX consists and operating retrieving of 176 system, data nodes. from Teradata There the largest is supports much commercial room up to for 512 growth databases nodes. as This the in databases the massively world, Teradata. begin parallel to system exceed Today, establishes 40 the or largest 50 For the NCR UNIX systems, the Message Passing Layer is called the BYNET. The amazing thing about the BYNET is its cable number capacity. or of a Instead twisted nodes of increase. pair a fixed configuration. bandwidth This feat is that accomplished is shared among as a result multiple of using nodes, virtual the bandwidth circuits instead of the of BYNET using a increases single fixed as the To understand the workings of the BYNET, think of a telephone switch used by local and long distance carriers. As more automatically or and other more type people of used. disaster place When phone occurs your calls, phone and a no switch call one is needs routed is destroyed, to through speak all slower. a subsequent different As switch, one calls switch you are becomes do routed not need through saturated, to speak other another switches. slower. switch If The a natural is BYNET is designed to work like a telephone switching network. An redundancy additional allows aspect for of two the different BYNET is aspects that it is of really its performance. two connection The paths, first aspect like having is speed. two Each phone path lines of for the a BYNET business. The

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 4 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

grows aggregate provides linearly bandwidth speed as more of of the 10 nodes two Megabytes connections are added. (MB) is per 20MB/second second with or Version 120MB/second. 1 and 60 However, MB per second as mentioned with Version earlier, 2. the Therefore bandwidth the BYNET, can Using utilize Version the 200MB/second same 1 any 100 two nodes nodes and communicate 100 communicate nodes have at at 12,000MB/second 2000MB/second 40MB/second (10MB/second available (60MB/second between * 2 * BYNETs 2 them. BYNETs When * 2 * nodes). 100 using nodes). the Therefore, version 10 2 nodes The second and equally important aspect of the BYNET uses the two connections for availability. Regardless of the speed

between can associated continue all with nodes. to function each BYNET at its connection, individual speed if one without of the connections the other connection. should fail, Therefore, the second communications is completely independent continue to pass and Although without failing. the BYNET In reality, is performing when the BYNET at half the is performing capacity during at only an 10MB/second outage, it is still per operational node, it is still and a SQL lot faster is able than to complete many normal networks that typically transfer messages at 10MB per second. All messages going across the BYNET offer guaranteed delivery. So, any messages not successfully delivered because of

a failure on one connection automatically route across the other connection. Since half of the BYNET is not working, the

configured capacity bandwidth returns reduces back to into normal. by service half. However, and it begins when transferring the failed connection messages along is returned with the to service, other connection. its topology Once is automatically this occurs, the

A Teradata Database

administrator Within Teradata, can a use database Data Definition is a storage Language location (DDL) for database to establish objects a database (tables, by views, using macros, a CREATE and triggers). DATABASE An command.

A database may have PERMANENT (PERM) space allocated to it. This PERM space establishes the maximum amount of

disk database, store storage space views space. it for and is not storing The macros required DD user is because in data to a have "database" rows they PERM in are any called physically space. table DBC. located Although stored in a in the database the database. Data without Dictionary However, PERM (DD) if space no PERM tables cannot space are stored store and require tables, within it no a can user

the Instead, Teradata DBS it Control allocates is allocated, Record, PERM as at rows space the are database to stored tables, level in up blocks to or the individually on maximum, disk. The for as maximum each rows table. are block inserted. Like size PERM, The is defined the space block either is size not at pre-allocated. is a a system maximum level size. in is possible Yet, not it pre-allocated; is only wasted a maximum disk instead, space for in blocks it a is block allocated that is contain 511 on bytes. an multiple as needed rows. basis, By nature, one sector the blocks (512 bytes) are variable at a time. in length. Therefore, So, disk the largest space

A database can also have SPOOL space associated with it. All users who run queries need workspace at some point in

time. This SPOOL space is workspace used for the temporary storage of rows during the execution of user SQL statements. Since disk space PERM throughout Like is not PERM pre-allocated, the space, system. SPOOL unused is PERM defined space as a maximum is automatically amount available that can for be use used as within SPOOL. a database This maximizes or by a user. the

It is a common practice in Teradata to have some databases with PERM space that contain only tables. Then, other

privileges the databases actual to tables contain access. from only The user views. views access. These in these There view databases will databases be more control require on views all no access later PERM in to this space the book. real and tables are the in other only databases. databases that They users insulate have

TEMP more The newest detail space, later type however, in of the space SQL it is allocation portion required of within if this Global book. Teradata Temporary is TEMPORARY Tables are used. (TEMP) The space. use of A temporary database tables may or is may also not covered have in

A database is defined using a series of parameter values at creation time. The majority of the parameters can easily be

increase allocated. changed PERM after There a database or may TEMP not be space has more been maximums, PERM created space using there defined the must MODIFY be that sufficient actual DATABASE disk disk on space the command. system. available However, even though when it attempting is not immediately to

A number of additional database parameters are listed below along with the user parameters in the next section. These

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 5 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

parameters and views. are tools for the database administrator and other experienced users when establishing databases for tables

CREATE / MODIFY DATABASE Parameters

n

PERMANENT

n

TEMPORARY

n

SPOOL

n

ACCOUNT

n

FALLBACK

n

JOURNAL

n DEFAULT JOURNAL Teradata Users

command. same cannot. In Teradata, time Therefore, that a user the to CREATE is authenticate the same USER as the a statement database user, a password is with executed. one exception. must The be password established. A user is can able The also to password be logon changed to the is normally using system a MODIFY and established a database USER at the exactly and Like TEMP a database, the space same a as and user a database. can area also can have contain spool database space. On objects the other (tables, hand, views, a user macros might and not triggers). have any A of user these can types have of PERM space, two The makes biggest administering difference between the system a database easier and and allows a user for is that default a user values must that have all a databases password. and This users similarity can inherit. between the The next two lists regard the creation and modification of databases and users.

{ CREATE | MODIFY } DATABASE or USER (in common)

n

PERMANENT

n

TEMPORARY

n

SPOOL

n

ACCOUNT

n

FALLBACK

n

JOURNAL

n

DEFAULT JOURNAL

{ CREATE | MODIFY } USER (only)

n

PASSWORD

n

STARTUP

n DEFAULT DATABASE By administration. no means are There these are all reference of the parameters. manuals and It is courses not the intent available of this to chapter, use. Teradata nor the administration intent of this book warrants to teach a book database by itself. Symbols Used in this Book

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 6 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

Since syntax there diagrams are no throughout standard this symbols book. for teaching SQL, it is necessary to understand some of the symbols used in our

is necessary to understand some of the symbols used in our Figure 1-1 DATABASE Command referenced

Figure 1-1 DATABASE Command referenced by When the users database objects. negotiate administrator. a successful When logon an SQL to Teradata, request is they executed, are automatically by default, positioned it looks in the in a current default database database for as all defined There may be times when the object is not in the current database. When this happens, the user has one of two choices to resides. or resolve dot as this To shown do situation. this, below: the One user solution simply is associates to qualify the the database name of the name object to the along object with name the name by connecting of the database them with in which a period it (.)

<database-name>.<table-name>

The command second is solution executed, is to there use is the no database longer a need command. to qualify It repositions the objects the in user that database. to the specified Of course, database. if the After SQL the statement database references Normally, you additional will DATABASE objects in to another the database database, that they contains will have most to of be the qualified objects in that order you for need. the system Therefore to locate it reduces them. the number of object names requiring qualification. The following is the syntax for the DATABASE command.

DATABASE <database-name>

;

make If you are that not determination. sure what database These commands you are in, and either other the HELP HELP functions SESSION are or covered SELECT in DATABASE the SQL portion command of this may book. be used to Use of an Index Although design is a a relational Logical Model. data model Each vendor uses Primary uses specialized Keys and Foreign techniques Keys to to implement establish a the Physical relationships Model. between Teradata tables, does not that use keys in its physical model. Instead, Teradata is implemented using indices, both primary and secondary.

The the value selection Primary is used Index of to this map (PI) index. the is row the The most to data a specific important value AMP in the index for PI data in column(s) all distribution of Teradata. is submitted and The storage. performance to the hashing of function. Teradata The can resulting be linked row directly hash to

To room. illustrate To the this largest, concept, most I have powerful on several looking occasions man in the used room, two you decks give of one cards. of the Imagine decks of if you cards. will, His fourteen large hands people allow in a him to continuing diamonds. hold all fifty-two through Each suit cards the is king arranged at one of spades time, starting with in ascending with some the degree ace order. and of After ascending success. the spades, The up to cards the are are king. the arranged hearts, The cards then with are the the partitioned clubs ace of and spades by last, suit. the

(i.e. The aces) other deck all go of to cards the same is divided person. among Likewise, the other all the thirteen deuces, people. treys Using and subsequent this procedure, cards all each cards go with to one the of same the thirteen value people. spades, four cards Each hearts, of the of clubs the same four and value cards diamonds. (4*13=52). will be Once in the Now, all same the the game order cards as can have the begin. been suits distributed, contained in each the single of the thirteen deck that people went to will the be lone holding man:

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 7 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

The requests in this game come in the form of "give-me," one or more cards. To as does make the it easy lone for player the lone with player, all 52 cards, we first both request: on the give-me top other the their ace cards. of spades. That was The easy! person with four aces finds their ace,

instance, As the difficulty when the of the give-me give-me request requests is for increase, all of the the twos, level one of difficulty of the thirteen dramatically people increases holds up all for four the of lone their man. cards For and they diamonds, thirteen are done. cards The thirteen later lone cards between man must after the locate that ace to and the finally 2 trey. of complete spades Then, find between the the request. 2 the of clubs, ace and thirteen trey. Then, cards after go and that, locate as well the as 2 of the hearts, 2 of Another request might be give-me all of the diamonds. For the thirteen people, each person locates and holds up one card up two of their the cards last cards while thirteen and the the cards thirteen request in their other is finished. deck people of fifty-two. only For the needed lone In each person to determine of these with the give-me which single of requests, the deck, four the cards the request lone applied man means had to the finding to request, negotiate and holding if all any. fifty This is the same procedure used by Teradata. It divides up the data like we divided up the cards. As were illustrated, 26 people the who thirteen wished people to play are on faster the same than the team, lone the man. cards However, simply need the game to be is divided not limited or distributed to thirteen differently. players. If there

When come ace) with up using with the the color. 26 value unique Therefore, (ace values through we for have 26 king) people. two there red To are aces make only and the 13 two unique cards black more values. aces unique, as In well order we as for might two 26 sets combine people for every to the play, value other we of need card. the a Now card way (i.e. to when distribution we distribute is still based the cards, on fifty-two each of cards the twenty-six (2 times 26). people receives only two cards instead of the original four. The

the At the optimum same time, number 26 of people people? is not the optimum number for the game. Based on what has been discussed so far, what is If your answer is 52, then you are absolutely correct. With one card this many either people, qualifies each or it person does not. has It one doesn't and get only any one simpler card. Any or faster time than a give-me this situation. is requested of the participants, their

values. unique As easy values Neither as this are is sounds, using desired. the to accomplish value and the this color. distribution That combination the value of only the card gives alone us a is distribution not sufficient of 26 to unique manifest values 52 unique when 52

value. of To diamonds. achieve Therefore, this In distribution other the ace words, of we spades there need are is to different now establish 52 unique than still the more identities ace uniqueness. of hearts, to use for which Fortunately, distribution. is different we can from use the the ace suit of clubs along and with the the ace

To relate this distribution to Teradata, one or more columns of a table are chosen to be the Primary Index. Primary Index previously comprehensive The Primary regarding Index technique can the consist card to derive analogy. of up a to Unique That sixteen is Primary the different good Index columns. news. (UPI, These pronounced columns, as "you-pea") when considered value as together, we discussed provide a data To store values the always data, the hash value(s) the same in the row PI hash are hashed and therefore via a calculation are always to associated determine which with the AMP same will AMP. own the data. The same This The advantage simply means to using that each up to AMP sixteen contains columns the is same that row number distribution of rows. is At very the smooth same time, or evenly there is based a downside on unique to using values. access recreated. several a columns particular Any row for row. retrieval a PI. If The a single using PE needs the column PI every column(s) value data is missing, is value always for a each an full efficient, table column scan one as will input AMP result to operation. the because hashing the calculation row hash cannot to directly be

Although Primary dups) are Index uniqueness stored, (NUPI, they is pronounced all good go to in the most as same new-pea). cases, AMP. Teradata This The can potential does cause not downside an require uneven that of distribution a a NUPI UPI be is that used. that if places several It also more allows duplicate rows for a values on Non-Unique some (NUPI of

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 8 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

the than AMPs the other than AMPs. on others. The This other means AMPs that will finish any time before an AMP the slower with a AMP. larger The number time of to rows process is involved, a single it user has request to work is harder always NUPI. based on the slowest AMP. Therefore, serious consideration should be used when making the decision to use a

Every table must have a PI and it is established when the table is created. If the CREATE TABLE statement contains:

the duplicate UNIQUE statement PRIMARY values. reads: Again, INDEX( PRIMARY all the <column-list> same INDEX values ( <column-list> ), the will value go to the in ), the the same column(s) value AMP. in the will column(s) be distributed will be to distributed an AMP as as a a UPI. NUPI However, and allow if

UPI. If the Although DDL statement Teradata does does not not specify use primary a PI, but keys, it specifies the DDL a PRIMARY may be ported KEY from (PK), another the named vendor's column(s) database are used system. as the

A UPI is used because a primary key must be unique and cannot be null. By default, both UPIs and NUPIs allow a null

value to be stored unless the column definition indicates that null values are not allowed using a NOT NULL constraint. This Now, is with because that being the rows said, being when joined considering between JOIN tables accesses must be on on the the tables, same sometimes AMP. If they it is are advantageous not on the same to use AMP, a NUPI. one of temporarily the the column rows must defined move be moved rows. as the It to can join the copy domain same all AMP needed that is as a the rows PI. matching However, to all AMPs row. if neither or Teradata it can join redistribute column will use is one a them PI, of two it using might different the be hashing necessary strategies mechanism to to on redistribute all participating rows from both tables by hash code to get them together on a single AMP. Planning join performance. data distribution, This works using fine access as long characteristics, as there is a consistent can reduce number the amount of duplicate of data values movement or only and a small therefore number improve of duplicate distribute values. the data The rows. logical This data is done model during needs the to physical be extended implementation with usage phase information before in creating order to tables. know the best way to

Secondary Index

A Secondary Index (SI) is used in Teradata as a way to directly access rows in the data, sometimes called the base table,

without alternate requiring read path the and use allows of PI values. for a method Unlike to the locate PI, an the SI PI does value not using effect the the SI. distribution Once the PI of the is obtained, data rows. the Instead, row can it be is an

directly accessed using the PI. Like the PI, an SI can consist of up to 16 columns.

In order for an SI to retrieve the data row by way of the PI, it must store and retrieve an index row. To accomplish this

the base Teradata SI. table. The creates, The "data" SI stored maintains is a pointer in the and to subtable uses the real a subtable. row data is row the The desired previously PI of by the the hashed subtable request. value is An the of SI the value can real also in PI the for be column(s) the unique data (USI, row that or pronounced are rows defined in the as as

you-sea) or non-unique (NUSI, pronounced as new-sea).

value The the same last rows of as step the when of PI the is as to starting subtable get the the row with actual contain ID. a Once PI. data When the the row row row using from hashed ID a the of USI, the value AMP the PI is of where access obtained the SI, it of is the stored. the from actual subtable the The data subtable action is value(s) a one row, and AMP of hashing using the operation SI, the for and hashed an the and SI is row value then exactly hashed accessing of the the SI, the based data on row two from separate the base row table hash is operations. another one AMP operation. Therefore, USI accesses are always a two AMP operation

requested duplicate When using values NUSI a NUSI, value. may the exist subtable and probably access do is exist always on an multiple all AMP AMPs. operation. So, the Since best the plan data is to is go distributed to all AMPs by the and PI, check NUSI for the value To make of the this data more that efficient, created each the NUSI AMP scans and one its or subtable. more row These IDs for subtable all the rows PI rows contain on that the AMP. row hash This of is still the a NUSI, fast the contains one operation or more no because rows rows for with these the the value rows NUSI of are value the quite NUSI requested, small requested, and it several then it is goes finished are and stored retrieves with in its a single portion the data block. of the rows If request. the into AMP spool However, determines space if using an that AMP the it index. has

With When this this said, happens, the SQL the optimizer AMPs will may do a decide full base that table there scan are too to locate many the base data table rows data and rows ignore to make the NUSI. index This access situation efficient. is called read the a weakly entire file selective and not NUSI. use an Even index using if more old-fashioned than 15% indexed of the records sequential were files, needed. it has This always is compounded been more efficient with Teradata to

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 9 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Teradata SQL: Unleash the Power

because being less the than "file" 3% is of read all the in parallel rows in instead order to of use all data the NUSI. from a single file. So, the efficiency percentage is probably closer to

If the SQL does not use a NUSI, you should consider dropping it, due to the fact that the subtable takes up PERM space

with SQL no is using benefit a to NUSI. the users. Furthermore, The Teradata the optimizer EXPLAIN will is never covered use a in NUSI this book without and STATISTICS. it is the easiest way to determine if your

has There two has different been another NUSI indices evolution and in individually the use of they NUSI are processing. weakly selective, It is called but NUSI together Bitmapping. they can This be bitmapped means that together if a table to selective. one eliminate column) most Therefore, NUSI. of the many non-conforming times, it is rows; better it to will use use smaller the two individual different NUSI NUSI indices columns instead together of a because large composite they become (more highly than

requested. values There is can another When generate feature using small hash related hash values, to values NUSI it is and processing impossible small data that to values determine can improve can any produce access value large within time hash when the values. range. a value This So, range to is because overcome comparison large the is issue data

associated with a hashed value, there is a range feature called Value Ordered NUSIs. At this time, it may only be used with

See a four the byte DDL or chapter smaller in numeric this book data for column. more details Based on on USI its functionality, and NUSI usage. a Value Ordered NUSI is perfect for date processing.

Reprinted for OET7P/madhukar.devarasetti@accenture.com, Accenture

Page 10 / 10 Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited