Michael J. Larkins and Thomas L. Coffing, Jr. Third Edition 2003 (Includes 2!" functionalit#$ %ritten &# Michael J. Larkins and Thomas L. Coffing %e& 'age( ))).Coffing*%.com E+Mail addresses( Mike( TeraTeach,Consultant.com Tom( Tom.Coffing,Coffing*%.com Teradata, -C!, and ./-ET are registered trademarks of -C! Cor0oration, *a#ton, 1hio, 2.3.4., I.M and *.2 are registered trademarks of I.M Cor0oration, 4-3I is a registered trademark of the 4merican -ational 3tandards Institute. The Jeo0ard# game is a registered trademark of 'arker .rothers and Mer5 6riffin. In addition to these 0roducts names, all &rands and 0roduct names in this document are registered names or trademarks of their res0ecti5e holders. Coffing *ata %arehousing shall ha5e neither lia&ilit# nor res0onsi&ilit# to an# 0erson or entit# )ith res0ect to an# loss or damages arising from the information contained in this &ook or from the use of 0rograms or 0rogram segments that are included. The manual is not a 0u&lication of -C! Cor0oration, nor )as it 0roduced in con7unction )ith -C! Cor0oration. Co0#right 2008 &# Coffing Publishing 4ll rights reser5ed. -o 0art of this &ook shall &e re0roduced, stored in a retrie5al s#stem, or transmitted &# an# means, electronic, mechanical, 0hotoco0#ing, recording, or other)ise, )ithout )ritten 0ermission from the 0u&lisher. -o 0atent lia&ilit# is assumed )ith res0ect to the use of information contained herein. 4lthough e5er# 0recaution has &een taken in the 0re0aration of this &ook, the 0u&lisher and author assume no res0onsi&ilit# for errors or omissions, neither is an# lia&ilit# assumed for damages resulting from the use of information contained herein. 9or information, address( Coffing Publishing 7810 Kiester Rd. Middletown, OH 4504 International 3tandard .ook -um&er( !"#$ 0%&704&80%'%& Printed in the United States of America 4ll terms mentioned in this &ook that are kno)n to &e trademarks or ser5ice ha5e &een stated. Coffing Publishing cannot attest to the accurac# of this information. 2se of a term in this &ook should not &e regarded as affecting the 5alidit# of an# trademark or ser5ice mark.
Acknowledgements and Special Thanks Todd %alter, -C!, for 0ro5iding access to his 0eo0le regarding 2!: s#stem information. 'aul 3inclair, -C!, for 0ro5iding 2!: information on 3tored 'rocedures and re5ie)ing the stored 0rocedures cha0ter, information on the ne) 2!: 1L4' functionalit#, and for information regarding the ne) 2'3E!T command. 9red 'lue&ell, JC'enne# Cor0., for 0ro5iding 2!: s#stem a5aila&ilit# )hile )e )ere teaching in *allas. 4 s0ecial thanks to the staff at -ation)ide Insurance for letting us teach an earl# 2!: u0date class and hel0ing finali;e some additional s#nta< )hen creating stored 0rocedures. Larr# Carter and 'aul *e!ouin, -C!, for information on changes to triggers in 2!:. .ill 'utnam for assistance in o&taining 2!: information. Chris Coffing, Coffing *ata %arehousing, for dedication in getting our s#stem u0 on 2!: so that )e didn=t ha5e to >&orro)? so much s#stem time. %e ha5e a 5er# s0ecial thank #ou for Loraine Larkins. 3he is Mike=s Mom and an e<cellent 0roof+reader and &arometer for the ease of understanding the material. This is es0eciall# true for someone )ho )as not 3@L literate )hen this )hole thing started. Last, &ut far from least, )e )ant to thank 6od for 0ro5iding us )ith the ins0iration, dedication and fortitude to finish this &ook.
Teradata Introduction The )orld=s largest data )arehouses commonl# use the su0erior technolog# of -C!=s Teradata relational data&ase management s#stem (!*.M3$. 4 data )arehouse is normall# loaded directl# from o0erational data. The ma7orit#, if not all of this data )ill &e collected on+line as a result of normal &usiness o0erations. The data )arehouse therefore acts as a central re0ositor# of the data that reflects the effecti5eness of the methodologies used in running a &usiness. 4s a result, the data loaded into the )arehouse is mostl# historic in nature. To get a true re0resentation of the &usiness, normall# this data is not changed once it is loaded. Instead, it is interrogated re0eatedl# to transform data into useful information, to disco5er trends and the effecti5eness of o0erational 0rocedures. This interrogation is &ased on &usiness rules to determine such as0ects as 0rofita&ilit#, return on in5estment and e5aluation of risk. 9or e<am0le, an airline might load all of its maintenance acti5it# on e5er# aircraft into the data&ase. 3u&seAuent in5estigation of the data could indicate the freAuenc# at )hich certain 0arts tend to fail. 9urther anal#sis might sho) that the 0arts are failing more often on certain models of aircraft. The first &enefit of the ne) found kno)ledge regards the a&ilit# to 0lan for the ne<t failure and ma#&e e5en the t#0e of air0lane on )hich the 0art )ill fail. Therefore, the 0art can &e on hand )hen and ma#&e )here it is needed, or the 0art might &e 0roacti5el# changed 0rior to its failure. If the information re5eals that the 0art is failing more freAuentl# on a 0articular model of aircraft, this could &e an indication that the aircraft manufacturer has a 0ro&lem )ith the design or 0roduction of that aircraft. 4nother 0ossi&le cause is that the maintenance cre) is doing something incorrectl# and contri&uting to the situation. Either )a#, #ou cannot fi< a 0ro&lem if #ou do not kno) that a 0ro&lem e<ists. There is incredi&le 0o)er and sa5ings in this t#0e of kno)ledge. 4nother &usiness area )here the Teradata data&ase e<cels is in retail. It 0ro5ides an en5ironment that can store &illions of sales. This is a critical ca0a&ilit# )hen #ou are recording and anal#;ing the sales of e5er# item in e5er# store around the )orld. %hether it is used for in5entor# control, marketing research or credit anal#sis, the data 0ro5ides an insight into the &usiness. This t#0e of kno)ledge is not easil# attaina&le )ithout detailed data that records e5er# as0ect of the &usiness. Tracking in5entor# turns, stock re0lenishment, or 0redicting the num&er of goods needed in a 0articular store #ields a 0riceless 0ers0ecti5e into the o0eration of a retail outlet. This information is )hat ena&les one retailer to thri5e )hile others go out of &usiness. Teradata is flourishing )ith the reali;ation that detail data is critical to the sur5i5al of a &usiness in a com0etiti5e, lo)er margin en5ironment. Continuall#, &usinesses are forced to do more )ith less. Therefore, it is 5ital to ma<imi;e the efforts that )ork )ell to im0ro5e 0rofit and minimi;e or correct those that do not )ork. 1ne com0uter 5endor used these same techniAues to determine that it cost more to sell into the deskto0 en5ironment than )as reali;ed in 0rofit. 'rior to this reali;ation, the sales effort had attem0ted to make u0 the loss &# selling more com0uters. 2nfortunatel#, increased sales meant increased losses. Toda#, that com0an# is doing much &etter and has made a huge ste0 into 0rofita&ilit# &# discontinuing the small com0uter line.
Teradata Architecture The Teradata data&ase currentl# runs normall# on -C! Cor0oration=s %orldMark 3#stems in the 2-IB M'+!43 en5ironment. 3ome of these s#stems consist of a single 0rocessing node (com0uter$ )hile others are se5eral hundred nodes )orking together in a single s#stem. The -C! nodes are &ased entirel# on industr# standard C'2 0rocessor chi0s, standard internal and e<ternal &us architectures like 'CI and 3C3I, and standard memor# modules )ith :+)a# interlea5ing for s0eed. 4t the same time, Teradata can run on an# hard)are ser5er in the single node en5ironment )hen the s#stem runs Microsoft -T and %indo)s 2000. This single node ma# &e an# com0uter from a large ser5er to a la0to0. %hether the s#stem consists of a single node or is a massi5el# 0arallel s#stem )ith hundreds of nodes, the Teradata !*.M3 uses the e<act same com0onents e<ecuting on all the nodes in 0arallel. The onl# difference &et)een small and large s#stems is the num&er of 0rocessing com0onents. %hen these com0onents e<ist on different nodes, it is essential that the com0onents communicate )ith each other at high s0eed. To facilitate the communications, the multi+node s#stems use the ./-ET interconnect. It is a high s0eed, multi+0ath, dual redundant communications channel. 4nother ama;ing ca0a&ilit# of the ./-ET is that the &and)idth increases )ith each consecuti5e node added into the s#stem. There is more detail on the ./-ET later in this cha0ter. Teradata Components 4s 0re5iousl# mentioned, Teradata is the su0erior 0roduct toda# &ecause of its 0arallel o0erations &ased on its architectural design. It is the 0arallel 0rocessing &# the ma7or com0onents that 0ro5ide the 0o)er to mo5e mountains of data. Teradata )orks more like the earl# Eg#0tians )ho &uilt the 0#ramids )ithout hea5# eAui0ment using 0arallel, coordinated human efforts. It uses smaller nodes running se5eral 0rocessing com0onents all )orking together on the same user reAuest. Therefore, a monumental task is com0leted in record time. Teradata o0erates )ith three ma7or com0onents to achie5e the 0arallel o0erations. These com0onents are called( 'arsing Engine 'rocessors, 4ccess Module 'rocessors and the Message 'assing La#er. The role of each com0onent is discussed in the ne<t sections to 0ro5ide a &etter understanding of Teradata. 1nce )e understand ho) Teradata )orks, )e )ill 0ursue the 3@L that allo)s storage and access of the data. Parsing Engine Processor (PEP or PE) The 'arsing Engine 'rocessor ('E'$ or 'arsing Engine ('E$, for short, is one of the t)o 0rimar# t#0es of 0rocessing tasks used &# Teradata. It 0ro5ides the entr# 0oint into the data&ase for users on mainframe and net)orked com0uter s#stems. It is the 0rimar# director task )ithin Teradata. 4s users >logon? to the data&ase the# esta&lish a Teradata session. Each 'E can manage 820 concurrent user sessions. %ithin each of these sessions users su&mit 3@L as a reAuest for the data&ase ser5er to take an action on their &ehalf. The 'E )ill then 0arse the 3@L statement to esta&lish )hich data&ase o&7ects are in5ol5ed. 9or no), let=s assume that the data&ase o&7ect is a ta&le. 4 ta&le is a t)o+ dimensional arra# that consists of ro)s and columns. 4 ro) re0resents an entit# stored in a ta&le and it is defined using columns. 4n e<am0le of a ro) might &e the sale of an item and its columns include the 2'C, a descri0tion and the Auantit# sold. 4n# action a user reAuests must also go through a securit# check to 5alidate their 0ri5ileges as defined &# the data&ase administrator. 1nce their authori;ation at the o&7ect le5el is 5erified, the 'E )ill 5erif# that the columns reAuested actuall# e<ist )ithin the o&7ects referenced. -e<t, the 'E o0timi;es the 3@L to create an e<ecution 0lan that is as efficient as 0ossi&le &ased on the amount of data in each ta&le, the indices defined, the t#0e of indices, the selecti5it# le5el of the indices, and the num&er of 0rocessing ste0s needed to retrie5e the data. The 'E is res0onsi&le for 0assing the o0timi;ed e<ecution 0lan to other com0onents as the &est )a# to gather the data. 4n e<ecution 0lan might use the 0rimar# inde< column assigned to the ta&le, a secondar# inde< or a full ta&le scan. The use of an inde< is 0refera&le and )ill &e discussed later in this cha0ter. 9or no), it is sufficient to sa# that a full ta&le scan means that all ro)s in the ta&le must &e read and com0ared to locate the reAuested data. 4lthough a full ta&le scan sounds reall# &ad, )ithin the architecture of Teradata, it is not necessaril# a &ad thing &ecause the data is di5ided u0 and distri&uted to multi0le, 0arallel com0onents throughout the data&ase. %e )ill look ne<t at the 4M's that 0erform the 0arallel disk access using their file s#stem logic. The 4M's manage all data storage on disks. The 'E has no disks. 4cti5ities of a 'E( Con5ert incoming reAuests from E.C*IC to 43CII (if from an I.M mainframe$ 'arse the 3@L to determine t#0e and 5alidit# alidate user 0ri5ileges 10timi;e the access 0ath(s$ to retrie5e the ro)s .uild an e<ecution 0lan )ith necessar# ste0s for ro) access 3end the 0lan ste0s to 4ccess Module 'rocessors (4M'$ in5ol5ed Access Module Processor (AMP) The ne<t ma7or com0onent of Teradata=s 0arallel architecture is called an 4ccess Module 'rocessor (4M'$. It stores and retrie5es the distri&uted data in 0arallel. Ideall#, the data ro)s of each ta&le are distri&uted e5enl# across all the 4M's. The 4M's read and )rite data and are the )orkhorses of the data&ase. Their 7o& is to recei5e the o0timi;ed 0lan ste0s, &uilt &# the 'E after it com0letes the o0timi;ation, and e<ecute them. The 4M's are designed to )ork in 0arallel to com0lete the reAuest in the shortest 0ossi&le time. 10timall#, e5er# 4M' should contain a su&set of all the ro)s loaded into e5er# ta&le. .# di5iding u0 the data, it automaticall# di5ides u0 the )ork of retrie5ing the data. !emem&er, all )ork comes as a result of a users= 3@L reAuest. If the 3@L asks for a s0ecific ro), that ro) e<ists in its entiret# (all columns$ on a single 4M' and other ro)s e<ist on the other 4M's. If the user reAuest asks for all of the ro)s in a ta&le, e5er# 4M' should 0artici0ate along )ith all the other 4M's to com0lete the retrie5al of all ro)s. This t#0e of 0rocessing is called an all 4M' o0eration and an all ro)s scan. Co)e5er, each 4M' is onl# res0onsi&le for its ro)s, not the ro)s that &elong to a different 4M'. 4s far as the 4M's are concerned, it o)ns all of the ro)s. %ithin Teradata, the 4M' en5ironment is a shared nothing? configuration. The 4M's cannot access each other=s data ro)s, and there is no need for them to do so. 1nce the ro)s ha5e &een selected, the last ste0 is to return them to the client 0rogram that initiated the 3@L reAuest. 3ince the ro)s are scattered across multi0le 4M's, the# must &e consolidated &efore reaching the client. This consolidation 0rocess is accom0lished as a 0art of the transmission to the client so that a final com0rehensi5e sort of all the ro)s is ne5er 0erformed. Instead, all 4M's sort onl# their ro)s (at the same time D in 0arallel$ and the Message 'assing La#er is used to merge the ro)s as the# are transmitted from all the 4M's. Therefore, )hen a client )ishes to seAuence the ro)s of an ans)er set, this techniAue causes the sort of all the ro)s to &e done in 0arallel. Each 4M' sorts onl# its su&set of the ro)s at the same time all the other 4M's sort their ro)s. 1nce all of the indi5idual sorts are com0lete, the ./-ET merges the sorted ro)s. 'rett# &rilliantE 4cti5ities of the 4M'( 3tore and retrie5e data ro)s using the file s#stem 4ggregate data Join 0rocessing &et)een multi0le ta&les Con5ert 43CII returned data to E.C*IC (I.M mainframes onl#$ 3ort and format out0ut data Message Passing Layer (!"ET) The Message 'assing La#er 5aries de0ending on the s0ecific hard)are on )hich the Teradata data&ase is e<ecuting. In the latter 0art of the 20 th centur#, most Teradata data&ase s#stems e<ecuted under the 2-IB o0erating s#stem. Co)e5er, in 8FFG, Teradata )as released on Microsoft=s -T o0erating s#stem. Toda# it also e<ecutes under %indo)s 2000. The initial release of Teradata, on the Microsoft s#stems, is for a single node. %hen using the 2-IB o0erating s#stem, Teradata su00orts u0 to "82 nodes. This massi5el# 0arallel s#stem esta&lishes the &asis for storing and retrie5ing data from the largest commercial data&ases in the )orld, Teradata. Toda#, the largest s#stem in the )orld consists of 8HI nodes. There is much room for gro)th as the data&ases &egin to e<ceed :0 or "0 tera&#tes. 9or the -C! 2-IB s#stems, the Message 'assing La#er is called the ./-ET. The ama;ing thing a&out the ./-ET is its ca0acit#. Instead of a fi<ed &and)idth that is shared among multi0le nodes, the &and)idth of the ./-ET increases as the num&er of nodes increase. This feat is accom0lished as a result of using 5irtual circuits instead of using a single fi<ed ca&le or a t)isted 0air configuration. To understand the )orkings of the ./-ET, think of a tele0hone s)itch used &# local and long distance carriers. 4s more and more 0eo0le 0lace 0hone calls, no one needs to s0eak slo)er. 4s one s)itch &ecomes saturated, another s)itch is automaticall# used. %hen #our 0hone call is routed through a different s)itch, #ou do not need to s0eak slo)er. If a natural or other t#0e of disaster occurs and a s)itch is destro#ed, all su&seAuent calls are routed through other s)itches. The ./-ET is designed to )ork like a tele0hone s)itching net)ork. 4n additional as0ect of the ./-ET is that it is reall# t)o connection 0aths, like ha5ing t)o 0hone lines for a &usiness. The redundanc# allo)s for t)o different as0ects of its 0erformance. The first as0ect is s0eed. Each 0ath of the ./-ET 0ro5ides &and)idth of 80 Mega&#tes (M.$ 0er second )ith ersion 8 and I0 M. 0er second )ith ersion 2. Therefore the aggregate s0eed of the t)o connections is 20M.Jsecond or 820M.Jsecond. Co)e5er, as mentioned earlier, the &and)idth gro)s linearl# as more nodes are added. 2sing ersion 8 an# t)o nodes communicate at :0M.Jsecond (80M.Jsecond K 2 ./-ETs K 2 nodes$. Therefore, 80 nodes can utili;e 200M.Jsecond and 800 nodes ha5e 2000M.Jsecond a5aila&le &et)een them. %hen using the 5ersion 2 ./-ET, the same 800 nodes communicate at 82,000M.Jsecond (I0M.Jsecond K 2 ./-ETs K 800 nodes$. The second and eAuall# im0ortant as0ect of the ./-ET uses the t)o connections for a5aila&ilit#. !egardless of the s0eed associated )ith each ./-ET connection, if one of the connections should fail, the second is com0letel# inde0endent and can continue to function at its indi5idual s0eed )ithout the other connection. Therefore, communications continue to 0ass &et)een all nodes. 4lthough the ./-ET is 0erforming at half the ca0acit# during an outage, it is still o0erational and 3@L is a&le to com0lete )ithout failing. In realit#, )hen the ./-ET is 0erforming at onl# 80M.Jsecond 0er node, it is still a lot faster than man# normal net)orks that t#0icall# transfer messages at 80M. 0er second. 4ll messages going across the ./-ET offer guaranteed deli5er#. 3o, an# messages not successfull# deli5ered &ecause of a failure on one connection automaticall# route across the other connection. 3ince half of the ./-ET is not )orking, the &and)idth reduces &# half. Co)e5er, )hen the failed connection is returned to ser5ice, its to0olog# is automaticall# configured &ack into ser5ice and it &egins transferring messages along )ith the other connection. 1nce this occurs, the ca0acit# returns to normal.
A Teradata #ata$ase %ithin Teradata, a data&ase is a storage location for data&ase o&7ects (ta&les, 5ie)s, macros, and triggers$. 4n administrator can use *ata *efinition Language (**L$ to esta&lish a data&ase &# using a C!E4TE *4T4.43E command. 4 data&ase ma# ha5e 'E!M4-E-T ('E!M$ s0ace allocated to it. This 'E!M s0ace esta&lishes the ma<imum amount of disk s0ace for storing user data ro)s in an# ta&le located in the data&ase. Co)e5er, if no ta&les are stored )ithin a data&ase, it is not reAuired to ha5e 'E!M s0ace. 4lthough a data&ase )ithout 'E!M s0ace cannot store ta&les, it can store 5ie)s and macros &ecause the# are 0h#sicall# stored in the *ata *ictionar# (**$ 'E!M s0ace and reAuire no user storage s0ace. The ** is in a >data&ase? called *.C. Teradata allocates 'E!M s0ace to ta&les, u0 to the ma<imum, as ro)s are inserted. The s0ace is not 0re+allocated. Instead, it is allocated, as ro)s are stored in &locks on disk. The ma<imum &lock si;e is defined either at a s#stem le5el in the *.3 Control !ecord, at the data&ase le5el or indi5iduall# for each ta&le. Like 'E!M, the &lock si;e is a ma<imum si;e. /et, it is onl# a ma<imum for &locks that contain multi0le ro)s. .# nature, the &locks are 5aria&le in length. 3o, disk s0ace is not 0re+allocatedL instead, it is allocated on an as needed &asis, one sector ("82 &#tes$ at a time. Therefore, the largest 0ossi&le )asted disk s0ace in a &lock is "88 &#tes. 4 data&ase can also ha5e 3'11L s0ace associated )ith it. 4ll users )ho run Aueries need )orks0ace at some 0oint in time. This 3'11L s0ace is )orks0ace used for the tem0orar# storage of ro)s during the e<ecution of user 3@L statements. Like 'E!M s0ace, 3'11L is defined as a ma<imum amount that can &e used )ithin a data&ase or &# a user. 3ince 'E!M is not 0re+allocated, unused 'E!M s0ace is automaticall# a5aila&le for use as 3'11L. This ma<imi;es the disk s0ace throughout the s#stem. It is a common 0ractice in Teradata to ha5e some data&ases )ith 'E!M s0ace that contain onl# ta&les. Then, other data&ases contain onl# 5ie)s. These 5ie) data&ases reAuire no 'E!M s0ace and are the onl# data&ases that users ha5e 0ri5ileges to access. The 5ie)s in these data&ases control all access to the real ta&les in other data&ases. The# insulate the actual ta&les from user access. There )ill &e more on 5ie)s later in this &ook. The ne)est t#0e of s0ace allocation )ithin Teradata is TEM'1!4!/ (TEM'$ s0ace. 4 data&ase ma# or ma# not ha5e TEM' s0ace, ho)e5er, it is reAuired if 6lo&al Tem0orar# Ta&les are used. The use of tem0orar# ta&les is also co5ered in more detail later in the 3@L 0ortion of this &ook. 4 data&ase is defined using a series of 0arameter 5alues at creation time. The ma7orit# of the 0arameters can easil# &e changed after a data&ase has &een created using the M1*I9/ *4T4.43E command. Co)e5er, )hen attem0ting to increase 'E!M or TEM' s0ace ma<imums, there must &e sufficient disk s0ace a5aila&le e5en though it is not immediatel# allocated. There ma# not &e more 'E!M s0ace defined that actual disk on the s#stem. 4 num&er of additional data&ase 0arameters are listed &elo) along )ith the user 0arameters in the ne<t section. These 0arameters are tools for the data&ase administrator and other e<0erienced users )hen esta&lishing data&ases for ta&les and 5ie)s. %&EATE ' M(#I)! #ATAASE Parameters 'E!M4-E-T TEM'1!4!/ 3'11L 4CC12-T 94LL.4CM J12!-4L *E942LT J12!-4L
Teradata *sers In Teradata, a user is the same as a data&ase )ith one e<ce0tion. 4 user is a&le to logon to the s#stem and a data&ase cannot. Therefore, to authenticate the user, a 0ass)ord must &e esta&lished. The 0ass)ord is normall# esta&lished at the same time that the C!E4TE 23E! statement is e<ecuted. The 0ass)ord can also &e changed using a M1*I9/ 23E! command. Like a data&ase, a user area can contain data&ase o&7ects (ta&les, 5ie)s, macros and triggers$. 4 user can ha5e 'E!M and TEM' s0ace and can also ha5e s0ool s0ace. 1n the other hand, a user might not ha5e an# of these t#0es of s0ace, e<actl# the same as a data&ase. The &iggest difference &et)een a data&ase and a user is that a user must ha5e a 0ass)ord. This similarit# &et)een the t)o makes administering the s#stem easier and allo)s for default 5alues that all data&ases and users can inherit. The ne<t t)o lists regard the creation and modification of data&ases and users. + %&EATE , M(#I)! - #ATAASE or *SE& (in common) 'E!M4-E-T TEM'1!4!/ 3'11L 4CC12-T 94LL.4CM J12!-4L *E942LT J12!-4L + %&EATE , M(#I)! - *SE& (only) '433%1!* 3T4!T2' *E942LT *4T4.43E .# no means are these all of the 0arameters. It is not the intent of this cha0ter, nor the intent of this &ook to teach data&ase administration. There are reference manuals and courses a5aila&le to use. Teradata administration )arrants a &ook &# itself.
Sym$ols *sed in this ook 3ince there are no standard s#m&ols for teaching 3@L, it is necessar# to understand some of the s#m&ols used in our s#nta< diagrams throughout this &ook. This chart should &e used as a reference for 3@L s#nta< used in the &ook( <database-name> Substitute an actual database name in this location <table-name> Substitute an actual table name in this location <comparison> Substitute a comparison in this location, i.e. a=1 <column-name> Substitute an actual column name in this location <data-value> Substitute a literal data value in this location [ optional entry ] Everything beteen the [ ] is optional, not re!uired to be valid synta" , use hen needed # use this $ or this % &se one o' the (eyords or symbols on either side o' the ) $ ), but not both. *.e. # +E,- $ .*/0- % use either )+E,-1 or ).*/0-1 but not both Figure 1-1
#ATAASE %ommand %hen users negotiate a successful logon to Teradata, the# are automaticall# 0ositioned in a default data&ase as defined &# the data&ase administrator. %hen an 3@L reAuest is e<ecuted, &# default, it looks in the current data&ase for all referenced o&7ects. There ma# &e times )hen the o&7ect is not in the current data&ase. %hen this ha00ens, the user has one of t)o choices to resol5e this situation. 1ne solution is to Aualif# the name of the o&7ect along )ith the name of the data&ase in )hich it resides. To do this, the user sim0l# associates the data&ase name to the o&7ect name &# connecting them )ith a 0eriod (.$ or dot as sho)n &elo)( Ndata&ase+nameO.Nta&le+nameO The second solution is to use the data&ase command. It re0ositions the user to the s0ecified data&ase. 4fter the data&ase command is e<ecuted, there is no longer a need to Aualif# the o&7ects in that data&ase. 1f course, if the 3@L statement references additional o&7ects in another data&ase, the# )ill ha5e to &e Aualified in order for the s#stem to locate them. -ormall#, #ou )ill *4T4.43E to the data&ase that contains most of the o&7ects that #ou need. Therefore it reduces the num&er of o&7ect names reAuiring Aualification. The follo)ing is the s#nta< for the *4T4.43E command. *4T4.43E Ndata&ase+nameO L If #ou are not sure )hat data&ase #ou are in, either the CEL' 3E33I1- or 3ELECT *4T4.43E command ma# &e used to make that determination. These commands and other CEL' functions are co5ered in the 3@L 0ortion of this &ook.
*se o. an Inde/ 4lthough a relational data model uses 'rimar# Me#s and 9oreign Me#sto esta&lish the relationshi0s &et)een ta&les, that design is a Logical Model. Each 5endor uses s0eciali;ed techniAues to im0lement a 'h#sical Model. Teradata does not use ke#s in its 0h#sical model. Instead, Teradata is im0lemented using indices, &oth 0rimar# and secondar#. The 'rimar# Inde< ('I$ is the most im0ortant inde< in all of Teradata. The 0erformance of Teradata can &e linked directl# to the selection of this inde<. The data 5alue in the 'I column(s$ is su&mitted to the hashing function. The resulting ro) hash 5alue is used to ma0 the ro) to a s0ecific 4M' for data distri&ution and storage. To illustrate this conce0t, I ha5e on se5eral occasions used t)o decks of cards. Imagine if #ou )ill, fourteen 0eo0le in a room. To the largest, most 0o)erful looking man in the room, #ou gi5e one of the decks of cards. Cis large hands allo) him to hold all fift#+t)o cards at one time, )ith some degree of success. The cards are arranged )ith the ace of s0ades continuing through the king of s0ades in ascending order. 4fter the s0ades, are the hearts, then the clu&s and last, the diamonds. Each suit is arranged starting )ith the ace and ascending u0 to the king. The cards are 0artitioned &# suit. The other deck of cards is di5ided among the other thirteen 0eo0le. 2sing this 0rocedure, all cards )ith the same 5alue (i.e. aces$ all go to the same 0erson. Like)ise, all the deuces, tre#s and su&seAuent cards each go to one of the thirteen 0eo0le. Each of the four cards )ill &e in the same order as the suits contained in the single deck that )ent to the lone man( s0ades, hearts, clu&s and diamonds. 1nce all the cards ha5e &een distri&uted, each of the thirteen 0eo0le )ill &e holding four cards of the same 5alue (:K83P"2$. -o), the game can &egin. The reAuests in this game come in the form of >gi5e+me,? one or more cards. To make it eas# for the lone 0la#er, )e first reAuest( gi5e+me the ace of s0ades. The 0erson )ith four aces finds their ace, as does the lone 0la#er )ith all "2 cards, &oth on the to0 other their cards. That )as eas#E 4s the difficult# of the gi5e+me reAuests increase, the le5el of difficult# dramaticall# increases for the lone man. 9or instance, )hen the gi5e+ me reAuest is for all of the t)os, one of the thirteen 0eo0le holds u0 all four of their cards and the# are done. The lone man must locate the 2 of s0ades &et)een the ace and tre#. Then, go and locate the 2 of hearts, thirteen cards later &et)een the ace and tre#. Then, find the 2 of clu&s, thirteen cards after that, as )ell as the 2 of diamonds, thirteen cards after that to finall# com0lete the reAuest. 4nother reAuest might &e gi5e+me all of the diamonds. 9or the thirteen 0eo0le, each 0erson locates and holds u0 one card of their cards and the reAuest is finished. 9or the lone 0erson )ith the single deck, the reAuest means finding and holding u0 the last thirteen cards in their deck of fift#+t)o. In each of these gi5e+me reAuests, the lone man had to negotiate all fift# t)o cards )hile the thirteen other 0eo0le onl# needed to determine )hich of the four cards a00lied to the reAuest, if an#. This is the same 0rocedure used &# Teradata. It di5ides u0 the data like )e di5ided u0 the cards. 4s illustrated, the thirteen 0eo0le are faster than the lone man. Co)e5er, the game is not limited to thirteen 0la#ers. If there )ere 2I 0eo0le )ho )ished to 0la# on the same team, the cards sim0l# need to &e di5ided or distri&uted differentl#. %hen using the 5alue (ace through king$ there are onl# 83 uniAue 5alues. In order for 2I 0eo0le to 0la#, )e need a )a# to come u0 )ith 2I uniAue 5alues for 2I 0eo0le. To make the cards more uniAue, )e might com&ine the 5alue of the card (i.e. ace$ )ith the color. Therefore, )e ha5e t)o red aces and t)o &lack aces as )ell as t)o sets for e5er# other card. -o) )hen )e distri&ute the cards, each of the t)ent#+si< 0eo0le recei5es onl# t)o cards instead of the original four. The distri&ution is still &ased on fift#+t)o cards (2 times 2I$. 4t the same time, the o0timum num&er of 0eo0le for the game is not 2I. .ased on )hat has &een discussed so far, )hat is the o0timum num&er of 0eo0leQ If #our ans)er is "2, then #ou are a&solutel# correct. %ith this man# 0eo0le, each 0erson has one and onl# one card. 4n# time a gi5e+me is reAuested of the 0artici0ants, their one card either Aualifies or it does not. It doesn=t get an# sim0ler or faster than this situation. 4s eas# as this sounds, to accom0lish this distri&ution the 5alue of the card alone is not sufficient to manifest "2 uniAue 5alues. -either is using the 5alue and the color. That com&ination onl# gi5es us a distri&ution of 2I uniAue 5alues )hen "2 uniAue 5alues are desired. To achie5e this distri&ution )e need to esta&lish still more uniAueness. 9ortunatel#, )e can use the suit along )ith the 5alue. Therefore, the ace of s0ades is different than the ace of hearts, )hich is different from the ace of clu&s and the ace of diamonds. In other )ords, there are no) "2 uniAue identities to use for distri&ution. To relate this distri&ution to Teradata, one or more columns of a ta&le are chosen to &e the 'rimar# Inde<. Primary Index The 'rimar# Inde< can consist of u0 to si<teen different columns. These columns, )hen considered together, 0ro5ide a com0rehensi5e techniAue to deri5e a 2niAue 'rimar# Inde< (2'I, 0ronounced as >#ou+ 0ea?$ 5alue as )e discussed 0re5iousl# regarding the card analog#. That is the good ne)s. To store the data, the 5alue(s$ in the 'I are hashed 5ia a calculation to determine )hich 4M' )ill o)n the data. The same data 5alues al)a#s hash the same ro) hash and therefore are al)a#s associated )ith the same 4M'. The ad5antage to using u0 to si<teen columns is that ro) distri&ution is 5er# smooth or e5enl# &ased on uniAue 5alues. This sim0l# means that each 4M' contains the same num&er of ro)s. 4t the same time, there is a do)nside to using se5eral columns for a 'I. The 'E needs e5er# data 5alue for each column as in0ut to the hashing calculation to directl# access a 0articular ro). If a single column 5alue is missing, a full ta&le scan )ill result &ecause the ro) hash cannot &e recreated. 4n# ro) retrie5al using the 'I column(s$ is al)a#s an efficient, one 4M' o0eration. 4lthough uniAueness is good in most cases, Teradata does not reAuire that a 2'I &e used. It also allo)s for a -on+2niAue 'rimar# Inde<(-2'I, 0ronounced as ne)+0ea$. The 0otential do)nside of a -2'I is that if se5eral du0licate 5alues (-2'I du0s$ are stored, the# all go to the same 4M'. This can cause an une5en distri&ution that 0laces more ro)s on some of the 4M's than on others. This means that an# time an 4M' )ith a larger num&er of ro)s is in5ol5ed, it has to )ork harder than the other 4M's. The other 4M's )ill finish &efore the slo)er 4M'. The time to 0rocess a single user reAuest is al)a#s &ased on the slo)est 4M'. Therefore, serious consideration should &e used )hen making the decision to use a -2'I. E5er# ta&le must ha5e a 'I and it is esta&lished )hen the ta&le is created. If the C!E4TE T4.LE statement contains( 2-I@2E '!IM4!/ I-*EB( Ncolumn+listO $, the 5alue in the column(s$ )ill &e distri&uted to an 4M' as a 2'I. Co)e5er, if the statement reads( '!IM4!/ I-*EB ( Ncolumn+listO $, the 5alue in the column(s$ )ill &e distri&uted as a -2'I and allo) du0licate 5alues. 4gain, all the same 5alues )ill go to the same 4M'. If the **L statement does not s0ecif# a 'I, &ut it s0ecifies a '!IM4!/ ME/ ('M$, the named column(s$ are used as the 2'I. 4lthough Teradata does not use 0rimar# ke#s, the **L ma# &e 0orted from another 5endorRs data&ase s#stem. 4 2'I is used &ecause a 0rimar# ke# must &e uniAue and cannot &e null. .# default, &oth 2'Is and -2'Is allo) a null 5alue to &e stored unless the column definition indicates that null 5alues are not allo)ed using a -1T -2LL constraint. -o), )ith that &eing said, )hen considering J1I- accesses on the ta&les, sometimes it is ad5antageous to use a -2'I. This is &ecause the ro)s &eing 7oined &et)een ta&les must &e on the same 4M'. If the# are not on the same 4M', one of the ro)s must &e mo5ed to the same 4M' as the matching ro). Teradata )ill use one of t)o different strategies to tem0oraril# mo5e ro)s. It can co0# all needed ro)s to all 4M's or it can redistri&ute them using the hashing mechanism on the column defined as the 7oin domain that is a 'I. Co)e5er, if neither 7oin column is a 'I, it might &e necessar# to redistri&ute all 0artici0ating ro)s from &oth ta&les &# hash code to get them together on a single 4M'. 'lanning data distri&ution, using access characteristics, can reduce the amount of data mo5ement and therefore im0ro5e 7oin 0erformance. This )orks fine as long as there is a consistent num&er of du0licate 5alues or onl# a small num&er of du0licate 5alues. The logical data model needs to &e e<tended )ith usage information in order to kno) the &est )a# to distri&ute the data ro)s. This is done during the 0h#sical im0lementation 0hase &efore creating ta&les. Secondary Index 4 3econdar# Inde< (3I$ is used in Teradata as a )a# to directl# access ro)s in the data, sometimes called the &ase ta&le, )ithout reAuiring the use of 'I 5alues. 2nlike the 'I, an 3I does not effect the distri&ution of the data ro)s. Instead, it is an alternate read 0ath and allo)s for a method to locate the 'I 5alue using the 3I. 1nce the 'I is o&tained, the ro) can &e directl# accessed using the 'I. Like the 'I, an 3I can consist of u0 to 8I columns. In order for an 3I to retrie5e the data ro) &# )a# of the 'I, it must store and retrie5e an inde< ro). To accom0lish this Teradata creates, maintains and uses a su&ta&le. The 'I of the su&ta&le is the 5alue in the column(s$ that are defined as the 3I. The >data? stored in the su&ta&le ro) is the 0re5iousl# hashed 5alue of the real 'I for the data ro) or ro)s in the &ase ta&le. The 3I is a 0ointer to the real data ro) desired &# the reAuest. 4n 3I can also &e uniAue (23I, 0ronounced as #ou+sea$ or non+uniAue (-23I, 0ronounced as ne)+sea$. The ro)s of the su&ta&le contain the ro) hashed 5alue of the 3I, the actual data 5alue(s$ of the 3I, and the ro) hashed 5alue of the 'I as the ro) I*. 1nce the ro) I* of the 'I is o&tained from the su&ta&le ro), using the hashed 5alue of the 3I, the last ste0 is to get the actual data ro) from the 4M' )here it is stored. The action and hashing for an 3I is e<actl# the same as )hen starting )ith a 'I. %hen using a 23I, the access of the su&ta&le is a one 4M' o0eration and then accessing the data ro) from the &ase ta&le is another one 4M' o0eration. Therefore, 23I accesses are al)a#s a t)o 4M' o0eration &ased on t)o se0arate ro) hash o0erations. %hen using a -23I, the su&ta&le access is al)a#s an all 4M'o0eration. 3ince the data is distri&uted &# the 'I, -23I du0licate 5alues ma# e<ist and 0ro&a&l# do e<ist on multi0le 4M's. 3o, the &est 0lan is to go to all 4M's and check for the reAuested -23I 5alue. To make this more efficient, each 4M' scans its su&ta&le. These su&ta&le ro)s contain the ro) hash of the -23I, the 5alue of the data that created the -23I and one or more ro) I*s for all the 'I ro)s on that 4M'. This is still a fast o0eration &ecause these ro)s are Auite small and se5eral are stored in a single &lock. If the 4M' determines that it contains no ro)s for the 5alue of the -23I reAuested, it is finished )ith its 0ortion of the reAuest. Co)e5er, if an 4M' has one or more ro)s )ith the -23I 5alue reAuested, it then goes and retrie5es the data ro)s into s0ool s0ace using the inde<. %ith this said, the 3@L o0timi;er ma# decide that there are too man# &ase ta&le data ro)s to make inde< access efficient. %hen this ha00ens, the 4M's )ill do a full &ase ta&le scan to locate the data ro)s and ignore the -23I. This situation is called a )eakl# selecti5e -23I. E5en using old+fashioned inde<ed seAuential files, it has al)a#s &een more efficient to read the entire file and not use an inde< if more than 8"S of the records )ere needed. This is com0ounded )ith Teradata &ecause the >file? is read in 0arallel instead of all data from a single file. 3o, the efficienc# 0ercentage is 0ro&a&l# closer to &eing less than 3S of all the ro)s in order to use the -23I. If the 3@L does not use a -23I, #ou should consider dro00ing it, due to the fact that the su&ta&le takes u0 'E!M s0ace )ith no &enefit to the users. The Teradata EB'L4I- is co5ered in this &ook and it is the easiest )a# to determine if #our 3@L is using a -23I. 9urthermore, the o0timi;er )ill ne5er use a -23I )ithout 3T4TI3TIC3. There has &een another e5olution in the use of -23I 0rocessing. It is called -23I .itma00ing. This means that if a ta&le has t)o different -23I indices and indi5iduall# the# are )eakl# selecti5e, &ut together the# can &e &itma00ed together to eliminate most of the non+ conforming ro)sL it )ill use the t)o different -23I columns together &ecause the# &ecome highl# selecti5e. Therefore, man# times, it is &etter to use smaller indi5idual -23I indices instead of a large com0osite (more than one column$ -23I. There is another feature related to -23I 0rocessing that can im0ro5e access time )hen a 5alue range com0arison is reAuested. %hen using hash 5alues, it is im0ossi&le to determine an# 5alue )ithin the range. This is &ecause large data 5alues can generate small hash 5alues and small data 5alues can 0roduce large hash 5alues. 3o, to o5ercome the issue associated )ith a hashed 5alue, there is a range feature called alue 1rdered -23Is. 4t this time, it ma# onl# &e used )ith a four &#te or smaller numeric data column. .ased on its functionalit#, a alue 1rdered -23I is 0erfect for date 0rocessing. 3ee the **Lcha0ter in this &ook for more details on 23I and -23I usage.
#etermining the &elease o. !our Teradata System0 3ELECT K 9!1M *.C.*.CI-91L In.o1ey In.o#ata 222 !ELE43E 2!.0:.00.02.2I E!3I1- 0:.00.02.2H
)undamental Structured Query Language (SQL) The access language for all modern relational data&ase s#stems (!*.M3$ is 3tructured @uer# Language (3@L$. It has e5ol5ed o5er time to &e the standard. The 4-3I 3@L grou0 defines )hich commands and functionalit# all 5endors should 0ro5ide )ithin their !*.M3. There are three le5els of com0liance )ithin the standard( Entr#, Intermediate and 9ull. The three le5el definitions are &ased on s0ecific commands, data t#0es and functionalities. 3o, it is not that a 5endor has incor0orated some 0ercentage of the commandsL it is more that each command is categori;ed as &elonging to one of the three le5els. 9or instance, most data t#0es are Entr# le5el com0liant. /et, there are some that fall into the Intermediate and 9ull definitions. 3ince the standard continues to gro) )ith more o0tions &eing added, it is difficult to sta# full# 4-3I com0liant. 4dditionall#, all !*.M35endors 0ro5ide e<tra functionalit# and o0tions that are not 0art of the standard. These e<tra functions are called e<tensions &ecause the# e<tend or offer a &enefit &e#ond those in the standard definition. 4t the )riting of this &ook, Teradata )as full# 4-3I Entr# le5el com0liant &ased on the 8FF2 3tandards document. -C! also 0ro5ides much of the Intermediate and some of the 9ull ca0a&ilities. This &ook indicates feature &# feature )hich 3@L ca0a&ilities are 4-3I and )hich are Teradata s0ecific, or e<tensions. It is to -C!=s &enefit to &e as com0liant as 0ossi&le in order to make it easier for customers of other !*.M3 5endors to 0ort their data )arehouse to Teradata. 4s indicated earlier, 3@L is used to access, store, remo5e and modif# data stored )ithin a relational data&ase, like Teradata. The 3@L is actuall# com0rised of three t#0es of statements. The# are( *ata *efinition Language (**L$, *ata Control Language (*CL$ and *ata Mani0ulation Language (*ML$. The 0rimar# focus of this &ook is on *ML and **L. .oth **L and *CL are, for the most 0art, used for administering an !*.M3. 3ince the 3ELECT statement is used the 5ast ma7orit# of the time, )e are concentrating on its functionalit#, 5ariations and ca0a&ilities. E5er#thing in the first 0art of this cha0ter descri&es 4-3I standardca0a&ilities of the 3ELECT command. 4s the statements &ecome more in5ol5ed, each ca0a&ilit# )ill &e designated as either 4-3I or a Teradata E<tension.
asic SELE%T %ommand 2sing the 3ELECT has &een descri&ed like 0la#ing the game, Jeo0ard#. The ans)er is thereL all #ou ha5e to do is come u0 )ith the correct Auestion. The &asic structure of the 3ELECT statement indicates )hich column 5alues are desired and the ta&les that contain them. To aid in the learning of 3@L, this &ook )ill ca0itali;e the 3@L ke#)ords. Co)e5er, )hen 3@L is )ritten for Teradata, the case of the statement is not im0ortant. The 3@L statements can &e )ritten using all u00ercase, lo)ercase or a com&inationL it does not matter to the Teradata 'E. The 3ELECT is used to return the data 5alue(s$ stored in the columns named )ithin the 3ELECT command. The reAuested columns must &e 5alid names defined in the ta&le(s$ listed in the 9!1M 0ortion of the 3ELECT. The follo)ing sho)s the format of a &asic 3ELECT statement. In this &ook, the s#nta< uses e<0ressions like( Ncolumn+nameO (see 9igure 8+8$ to re0resent the location of one or more names reAuired to construct a 5alid 3@L statement( The structure of the a&o5e command 0laces all ke#)ords on the left in u00ercase and the 5aria&le information such as column and ta&le names to the right. Like using ca0ital letters, this 0ositioning is to aid in learning 3@L. Lastl#, although the use of 3EL is acce0ta&le in Teradata, )ith TECTU in sAuare &rackets &eing o0tional, it is not 4-3I standard. Lastl#, )hen multi0le column names are reAuested in the 3ELECT, a comma must se0arate them. %ithout the se0arator, the o0timi;er cannot determine )here one ends and the ne<t &egins. The follo)ing s#nta< format is also acce0ta&le( SEL[ECT] <column-name> FROM <table-name> ; .oth of these 3ELECT statements 0roduce the out0ut re0ort, &ut the a&o5e st#le is easier to read and de&ug for com0le< Aueries. The out0ut dis0la# might a00ear as( 3 !o)s !eturned 3column4name5 aaaaaaaaaaaaaaaaaa &&&&&&&&&&&&&&&& cccccccccccccccccc In the out0ut, the column name &ecomes the default heading for the re0ort. Then, the data contained in the selected column is dis0la#ed once for each ro) returned. The ne<t 5ariation of the 3ELECT statement returns all of the columns defined in the ta&le indicated in the 9!1M 0ortion of the 3ELECT. The out0ut of the a&o5e reAuest uses each column name as the heading and the columns are dis0la#ed in the same seAuence as the# are defined in the ta&le. *e0ending on the tool used to su&mit the reAuest, care should &e taken, &ecause if the returned dis0la# is )ider than the media (i.e. terminalPG0 and 0a0erP833$L it ma# &e truncated. 4t times, it is desira&le to select the same column t)ice. This is 0ermitted and to accom0lish it, the column name is sim0l# listed in the 3ELECT column list more than once. This techniAue might often &e used )hen doing aggregations or calculating a 5alue, &oth are co5ered in later cha0ters. The ta&le &elo) is used to demonstrate the results of 5arious reAuests. It is a small ta&le )ith a total of ten ro)s for eas# com0arison. 3tudent Ta&le + contains 80 students Student6I# Last6"ame )irst6name %lass6code 7rade6Pt PK
FK
UPI NUSI NUSI 8232"0 82"I3: 23:828 238222 2I0000 2G0023 322833 32:I"2 333:"0 :23:00 'hilli0s Canson Thomas %ilson Johnson Mc!o&erts .ond *elane# 3mith Larkins Martin Cenr# %end# 3usie 3tanle# !ichard Jimm# *ann# 4nd# Michael 3! 9! 9! 31 J! J! 3! 31 9! 3.00 2.GG :.00 3.G0 8.F0 3.F" 3.3" 2.00 0.00 Figure 2-1 9or E<am0le( the ne<t 3ELECT might &e used )ith 9igure 2+8, to dis0la# the student num&er, the last name, first name, the class code and grade 0oint for all of the students in the 3tudent ta&le( SELECT * FROM Student_Table ; 80 !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt :23:00 Larkins Michael 9! 0.00 82"I3: Canson Cenr# 9! 2.GG 2G0023 Mc!o&erts !ichard J! 8.F0 2I0000 Johnson 3tanle# Q Q 238222 %ilson 3usie 31 3.G0 23:828 Thomas %end# 9! :.00 32:I"2 *elane# *ann# 3! 3.3" 8232"0 'hilli0s Martin 3! 3.00 322833 .ond Jimm# J! 3.F" 333:"0 3mith 4nd# 31 2.00 -otice that Johnson has Auestion marks in the grade 0oint and class code columns. Most client soft)are uses the Auestion mark to re0resent missing data or an unkno)n 5alue (-2LL$. More discussion on this condition )ill a00ear throughout this &ook. The other thing to note is that character data is aligned from left to right, the same as )e read it and numeric is from right to left, from the decimal. This 3ELECT returns all of the columns e<ce0t the 3tudent I* from the 3tudent ta&le( 80 !o)s returned )irst6"ame Last6"ame %lass6%ode 7rade6Pt Michael Larkins 9! 0.00 Cenr# Canson 9! 2.GG !ichard Mc!o&erts J! 8.F0 3tanle# Johnson Q Q 3usie %ilson 31 3.G0 %end# Thomas 9! :.00 *ann# *elane# 3! 3.3" Martin 'hilli0s 3! 3.00 Jimm# .ond J! 3.F" 4nd# 3mith 31 2.00 There is no short cut for selecting all columns e<ce0t one or t)o. 4lso, notice that the columns are dis0la#ed in the out0ut in the same seAuence the# are reAuested in the 3ELECT statement.
89E&E %lause The 0re5ious >unconstrained? 3ELECT statement returned e5er# ro) from the ta&le. 3ince the Teradata data&ase is most often used as a data )arehouse, a ta&le might contain millions of ro)s. 3o, it is )ise to reAuest onl# certain t#0es of ro)s for return. .# adding a %CE!E clause to the 3ELECT, a constraint is esta&lished to 0otentiall# limit )hich ro)s are returned &ased on a T!2E com0arison to s0ecific criteria or set of conditions. The conditional check in the %CE!E can use the 4-3I com0arison o0erators (s#m&ols are 4-3I J al0ha&etic is Teradata E<tension$( E:ual "ot E:ual Less Than 7reater Than Less Than or E:ual 7reater Than or E:ual = <> < > <= >= E2 3E +- /- +E /E Figure 2-2 The follo)ing 3ELECT can &e used to return the students )ith a . (3.0$ a5erage or &etter from the 3tudent ta&le( " !o)s returned Student_ID Last_Name Grade_Pt 238222 %ilson 3.G0 23:828 Thomas :.00 32:I"2 *elane# 3.3" 8232"0 'hilli0s 3.00 322833 .ond 3.F" %ithout the %CE!E clause, the 4M's return all of the ro)s in the ta&le to the user. More and more Teradata user s#stems are getting to the 0oint )here the# are storing &illions of ro)s in a single ta&le. There must &e a 5er# good reason for needing to see all of them. More sim0l# 0ut, #ou )ill al)a#s use a %CE!E clause )hene5er #ou )ant to see onl# a 0ortion of the ro)s in a ta&le.
%ompound %omparisons ( A"# ' (& ) Man# times a single com0arison is not sufficient to s0ecif# the desired ro)s. To add more functionalit# to the %CE!E it is common to use more than one com0arison. The multi0le condition checks and column names are not se0arated &# a comma, like column names. Instead, the# must &e connected using a logical o0erator. The follo)ing is the s#nta< for using the 4-* logical o0erator( -otice that the column name is listed for each com0arison se0arated &# a logical o0eratorL this )ill &e true e5en )hen it is the same column &eing com0ared t)ice. The 4-* signifies that each indi5idual com0arison on &oth sides of the 4-* must &e true. The final result of the com0arison must &e T!2E for a ro) to &e returned. This Truth Ta&le illustrates this 0oint using 4-*. )irst Test &esult A"# Second Test &esult )inal &esult True True True True 9alse 9alse 9alse True 9alse 9alse 9alse 9alse Figure 2-3 %hen using 4-*, different columns must &e used &ecause a single column can ne5er contain more than a single data 5alue. Therefore, it does not make good sense to issue the ne<t 3ELECT using an 4-* on the same column &ecause no ro)s )ill e5er &e returned. -o ro)s found The a&o5e 3ELECT )ill ne5er return an# ro)s. It is im0ossi&le for a column to contain more than one 5alue. -o student has a 3.0 grade a5erage 4-* a :.0 a5erage. The# might ha5e one or the other, &ut not &oth. It might contain one or the other, &ut ne5er &oth at the same time. The 4-* o0erator indicates &oth must &e T!2E and should ne5er &e used &et)een t)o com0arisons on the same column. .# su&stituting an 1! logical o0erator for the 0re5ious 4-*, ro)s )ill no) &e returned. The follo)ing is the s#nta< for using 1!( 2 !o)s returned Student6I# Last6"ame )irst6"ame 7rade6Pt 23:828 Thomas %end# :.00 8232"0 'hilli0s Martin 3.00 The 1! signifies that onl# one of the com0arisons on each side of the 1! needs to &e true for the entire test to result in a true and the ro) to &e selected. This Truth Ta&le illustrates the results for the 1!( )irst Test &esult (& Second Test &esult )inal &esult -rue -rue -rue -rue ,alse -rue ,alse -rue -rue ,alse ,alse ,alse Figure 2-4 %hen using the 1!, the same column or different column names ma# &e used. In this case, it makes sense to use the same column &ecause a ro) is returned )hen a column contains either of the s0ecified 5alues as o00osed to &oth 5alues as seen )ith 4-*. It is 0erfectl# legal and common 0ractice to com&ine the 4-* )ith the 1! in a single 3ELECT statement. The ne<t 3ELECT contains &oth an 4-* as )ell as an 1!( 2 !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt 23:828 Thomas %end# 9! :.00 8232"0 'hilli0s Martin 3! 3.00 4t first glance, it a00ears that the com0arison )orked correctl#. Co)e5er, u0on closer e5aluation it is incorrect &ecause 'hilli0s is a senior and not a freshman. %hen mi<ing 4-* )ith 1! in the same %CE!E clause, it is im0ortant to kno) that the 4-* is e5aluated first. The 0re5ious 3ELECT actuall# returns all ro)s )ith a grade 0oint of 3.0. Cence, 'hilli0s )as returned. The second com0arison returned Thomas )ith a grade 0oint of :.0 and a class code of V9!=. %hen it is necessar# for the 1! to &e e5aluated &efore the 4-* the use of 0arentheses changes the 0riorit# of e5aluation. 4 different result is seen )hen doing the 1! first. Cere is ho) the statement should &e )ritten( 8 !o) returned Last6"ame %lass6%ode 7rade6Pt Thomas 9! :.00 -o), onl# Thomas is returned and the out0ut is correct.
Impact o. "*LL on %ompound %omparisons -2LL is an 3@L reser5ed )ord. It re0resents missing or unkno)n data in a column. 3ince -2LL is an unkno)n 5alue, a normal com0arison cannot &e used to determine )hether it is true or false. 4ll com0arisons of an# 5alue to a -2LL result in an unkno)nL it is neither true nor false. The onl# 5alid test for a null uses the ke#)ord -2LL )ithout the normal com0arison s#m&ols and is e<0lained in this cha0ter. %hen a ta&le is created in Teradata, the default for a column is for it to allo) a -2LL 5alue to &e stored. 3o, unless the default is o5er+ridden and -2LL 5alues are not allo)ed, it is a good idea to understand ho) the# )ork. 4 3C1% T4.LE command (cha0ter 3$ can &e used to determine )hether a -2LL is allo)ed. If the column contains a -1T -2LL constraint, #ou need not &e concerned a&out the 0resence of a -2LL &ecause it is disallo)ed. This 4-* Truth Ta&le must no) &e used for com0ound tests )hen -2LL 5alues are allo)ed( )irst Test &esult A"# Second Test &esult )inal &esult True 2nkno)n 2nkno)n 2nkno)n True 2nkno)n 9alse 2nkno)n 9alse 2nkno)n 9alse 9alse 2nkno)n 2nkno)n 2nkno)n Figure 2-5 This 1! Truth Ta&le must no) &e used for com0ound tests )hen -2LL5alues are allo)ed( )irst Test &esult (& Second Test &esult )inal &esult True 2nkno)n True 2nkno)n True True 9alse 2nkno)n 2nkno)n 2nkno)n 9alse 2nkno)n 2nkno)n 2nkno)n 2nkno)n Figure 2-6 9or most com0arisons, an unkno)n (null$ is functionall# eAui5alent to a false &ecause it is not a true. Therefore, )hen using an# com0arison s#m&ol a ro) is not returned )hen it contains a -2LL. 4t the same time, the ne<t 3ELECT does not return Johnson &ecause all com0arisons against a -2LL are unkno)n( -o ro)s found 2!"( KKK 9ailure 3H38 The user must use I3 -2LL or I3 -1T -2LL to test for -2LL 5alues. 4s seen in the a&o5e Truth ta&les, a com0arison test cannot &e used to find a -2LL. To find a -2LL, it &ecomes necessar# to make a slight change in the s#nta< of the conditional com0arison. The coding necessar# to find a -2LL is seen in the ne<t section.
*sing "(T in SQL %omparisons It can &e fairl# straightfor)ard to reAuest e<actl# )hich ro)s are needed. Co)e5er, sometimes ro)s are needed that contain an# 5alue other than a s0ecific 5alue. %hen this is the case, it might &e easier to )rite the 3ELECT to find )hat is not needed instead of )hat is needed. Then con5ert it to return e5er#thing else. This might &e the situation )hen there are 800 0otential 5alues stored in the data&ase ta&le and FF of them are needed. 3o, it is easier to eliminate the one 5alue than it is to s0ecificall# list the desired FF different 5alues indi5iduall#. Either of the ne<t t)o 3ELECT formats can &e used to accom0lish the elimination of the one 5alue( This second 5ersion of the 3ELECT is normall# used )hen com0ound conditions are reAuired. This is &ecause it is usuall# easier to code the 3ELECT to get )hat is not )anted and then to enclose the entire set of com0arisons in 0arentheses and 0ut one -1T in front of it. 1ther)ise, )ith a single com0arison, it is easier to 0ut -1T in front of the com0arison o0erator )ithout reAuiring the use of 0arentheses. The ne<t 3ELECT uses the -1T )ith an 4-* com0arison to dis0la# seniors and lo)er classmen )ith grade 0oints less than 3.0( I !o)s returned Last6"ame )irst6"ame %lass6%ode 7rade6Pt Mc!o&erts !ichard J! 8.F0 Canson Cenr# 9! 2.GG *elane# *ann# 3! 3.3" Larkins Michael 9! 0.00 'hilli0s Martin 3! 3.00 3mith 4nd# 31 2.00 %ithout using the a&o5e techniAue of a single -1T, it is necessar# to change e5er# indi5idual com0arison. The follo)ing 3ELECT sho)s this a00roach, notice the other change necessar# &elo), -1T 4-* is an 1!( 3ince #ou cannot ha5e conditions like( -1T OP and -1T NO, the# must &e con5erted to N (not N and not P$ and P (not, not P$. It returns the same " ro)s, &ut also notice that the 4-* is no) an 1!( I !o)s returned Last6"ame )irst6"ame %lass6%ode 7rade6Pt Mc!o&erts !ichard J! 8.F0 Canson Cenr# 9! 2.GG *elane# *ann# 3! 3.3" 'hilli0s Martin 3! 3.00 Larkins Michael 9! 0.00 3mith 4nd# 31 2.00 Chart of indi5idual conditions and -1T( %ondition (pposite condition "(T condition <= < 34- >= <> = 34- <> 536 4. 4. 4. 536 536 Figure 2-7 To maintain the integrit# of the statement, all 0ortions of the %CE!E must &e changed, including 4-*, as )ell as 1!. The follo)ing t)o 3ELECT statements illustrate the same conce0t )hen using an 1!( 8 !o) returned Last6"ame Canson In the earlier Truth ta&le, the -2LL 5alue returned an unkno)n )hen checked )ith a com0arison o0erator. %hen looking for s0ecific conditions, an unkno)n )as functionall# eAui5alent to a false, &ut reall# it is an unkno)n. These t)o Truth ta&les can &e used together as a tool )hen mi<ing 4-* and 1! together in the %CE!E clause along )ith -1T. This Truth Ta&le hel0s to gauge returned ro)s )hen using -1T )ith 4-*( )irst Test &esult A"# Second Test &esult &esult -1T(True$ P 9alse -1T(2nkno)n$ P 2nkno)n 9alse -1T(2nkno)n$ P 2nkno)n -1T(True$ P 9alse 9alse -1T(9alse$ P True -1T(2nkno)n$ P 2nkno)n 2nkno)n -1T(2nkno)n$ P 2nkno)n -1T(9alse$ P True 2nkno)n -1T(2nkno)n$ P 2nkno)n -1T(2nkno)n$ P 2nkno)n 2nkno)n Figure 2-8 This Truth Ta&le can &e used to gauge returned ro)s )hen using -1T )ith 1!( )irst Test &esult (& Second Test &esult &esult -1T(True$ P 9alse -1T(2nkno)n$ P 2nkno)n 2nkno)n -1T(2nkno)n$ P 2nkno)n -1T(True$ P 9alse 2nkno)n -1T(9alse$ P True -1T(2nkno)n$ P 2nkno)n True -1T(2nkno)n$ P 2nkno)n -1T(9alse$ P True True -1T(2nkno)n$ P 2nkno)n -1T(2nkno)n$ P 2nkno)n 2nkno)n Figure 2-9 There is an issue associated )ith using -1T. %hen a -1T is done on a true condition, the result is a false. Like)ise, the -1T of a false is a true. Co)e5er, )hen a -1T is done )ith an unkno)n, the result is still an unkno)n. %hene5er a -2LL a00ears in the data for an# of the columns &eing com0ared, the ro) )ill ne5er &e returned and the ans)er set )ill not &e )hat is e<0ected. 4nother area )here care must &e taken is )hen allo)ing -2LL 5alues to &e stored in one or &oth of the columns. 4s mentioned earlier, 0re5ious 5ersions of Teradata had no conce0t of >unkno)n? and if a com0are didn=t result in a true, it )as false. %ith the em0hasis on 4-3I com0ati&ilit# the unkno)n )as introduced. If -2LL 5alues are allo)ed and there is 0otential for the -2LL to im0act the final outcome of com0ound tests, additional tests are reAuired to eliminate them. 1ne )a# to eliminate this concern is to ne5er allo) a -2LL 5alue in an# columns. Co)e5er, this ma# not &e a00ro0riate and it )ill reAuire more storage s0ace &ecause a -2LL can &e com0ressed. Therefore, )hen a -2LL is allo)ed, the 3@L needs to sim0l# check for a -2LL. Therefore, using the e<0ression I3 -1T -2LL is a good techniAue )hen -2LL is allo)ed in a column and the -1T is used )ith a single or a com0ound com0arison. This does reAuire another com0arison and could &e )ritten as( H !o)s returned Last6"ame )irst6"ame %lass6%ode 7rade6Pt Larkins Michael 9! 0.00 Canson Cenr# 9! 2.GG Mc!o&erts !ichard ! 8.F0 Johnson 3tanle# Q Q *elane# *ann# 3! 3.3" 'hilli0s Martin 3! 3.00 3mith 4nd# 31 2.00 -otice that Johnson came &ack this time and did not a00ear 0re5iousl# &ecause of the -2LL 5alues. Later in this &ook, the C14LE3CE )ill &e e<0lored as another )a# to eliminate -2LL 5alues directl# in the 3@L instead of in the data&ase.
Multiple ;alue Search (I") 're5iousl#, it )as sho)n that adding a %CE!E clause to the 3ELECT limited the returned ro)s to those that meet the criteria. The I- com0arison is an alternati5e to using one or more 1! com0arisons on the same column in the %CE!E clause of a 3ELECT statement and the I- com0arison also makes it a &it easier to code( The 5alue list normall# consists of multi0le 5alues se0arated &# commas. %hen the 5alue in the column &eing com0ared matches one of the 5alues in the list, the ro) is returned. The follo)ing is an e<am0le for the alternati5e method )hen an# one of the conditions is enough to satisf# the reAuest using I-( 3 !o) returned Last6"ame %lass6%ode 7rade6Pt 'hilli0s 3! 3.00 Thomas 9! :.00 3mith 31 2.00 The use of multi0le conditional checks as )ell as the I- can &e used in the same 3ELECT reAuest. Considerations include the use of 4-* for declaring that multi0le conditions must all &e true. Earlier, )e sa) the solution using a com0ound 1!. Using NOT IN 4s seen earlier, sometimes the un)anted 5alues are not kno)n or it is easier to eliminate a fe) 5alues than to s0ecif# all the 5alues needed. %hen this is the case, it is a common 0ractice to use the -1T I- as coded &elo). The ne<t statement eliminates the ro)s that match and return those that do not match( I !o)s returned Last6"ame 7rade6Pt Mc!o&erts 8.F0 Canson 2.GG %ilson 3.G0 *elane# 3.3" Larkins 0.00 .ond 3.F" The follo)ing 3ELECT is a &etter )a# to make sure that all ro)s are returned )hen using a -1T I-( H !o)s returned Last6"ame %lass6%ode 7rade6Pt Larkins 9! 0.00 Canson 9! 2.GG Mc!o&erts J! 8.F0 Johnson Q Q %ilson 31 3.G0 *elane# 3! 3.3" .ond J! 3.F" -otice that Johnson came &ack in this list and not the 0re5ious reAuest using the -1T I-./ou ma# &e thinking that if the -2LLreser5ed )ord is used )ithin the I- list it )ill co5er the situation. 2nfortunatel#, #ou are forgetting that this com0arison al)a#s returns an unkno)n. Therefore, the ne<t reAuest )ill -EE! return an# ro)s( -o !o)s found Making this mistake )ill cause no ro)s to e5er &e returned. This is &ecause e5er# time the column is com0ared against the 5alue list the -2LL is an unkno)n and the Truth ta&le sho)s that the -1T of an unkno)n is al)a#s an unkno)n for all ro)s. If #ou are not sure a&out this, do an EB'L4I- (cha0ter 3$ of the -1T I- and a su&Auer# to see that the 4M' ste0 )ill actuall# &e ski00ed )hen a -2LL e<ists in the list. There are also e<tra 4M' ste0s to com0ensate for this condition. It makes the 3@L E!/ inefficient.
*sing Quanti.iers ;ersus I" There is another alternati5e to using the I-. @uantifiers can &e used to allo) for normal com0arison o0erators )ithout reAuiring com0ound conditional checks. The follo)ing is eAui5alent to an I-( This ne<t reAuest uses 4-/ instead of I-( 3 !o) returned Last6"ame %lass6%ode 7rade6Pt 'hilli0s 3! 3.00 Thomas 9! :.00 3mith 31 2.00 2sing a Aualifier, the eAui5alent to a -1T I- is( -otice that like adding a -1T to the com0ound condition, all elements need to &e changed here as )ell. To re5erse the P 4-/, it &ecomes -1T P 4LL. This is im0ortant, &ecause the -1T P 4-/ selects all the ro)s e<ce0t those containing a -2LL. The reason is that as soon as a 5alue is not eAual to an# one of the 5alues in the list, it is returned. The follo)ing 3ELECT is con5erted from an earlier -1T I-( I !o)s returned Last6"ame 7rade6Pt Mc!o&erts 8.F0 Larkins 0.00 Canson 2.GG %ilson 3.G0 *elane# 3.3" .ond 3.F"
Multiple ;alue &ange Search (ET8EE") The .ET%EE- com0arison can &e used as another techniAue to reAuest multi0le 5alues for a column that are all in a s0ecific range. It is easier than )riting a com0ound 1! com0arison or a long 5alue list of seAuential num&ers )hen using the I-. This is a good time to 0oint out that this cha0ter is incrementall# adding ne) )a#s to com0are for 5alues )ithin a %CE!E clause. Co)e5er, all of these techniAues can &e used together in a single %CE!E clause. 1ne method does not eliminate the a&ilit# to use one or more of the others using logical o0erators &et)een each com0arison. The ne<t 3ELECT sho)s the s#nta< format for using the .ET%EE-( The first and second 5alues s0ecified are inclusi5e for the 0ur0oses of the search. In other )ords, )hen these 5alues are found in the data, the ro)s are included in the out0ut. 4s an e<am0le, the follo)ing code returns all students )hose grade 0oints of 2.0, :.0 and all 5alues &et)een them( H !o)s returned 7rade6Pt 3.00 2.GG :.00 3.G0 3.F" 3.3" 2.00 -otice that due to the inclusi5e nature of the .ET%EE-, &oth 2.0 and :.0 )ere included in the ans)er set. The first 5alue of the .ET%EE- must &e the lo)er 5alue, other)ise, no ro)s )ill &e returned. This is &ecause it looks for all 5alues that are greater or eAual to the first 5alue and less than or eAual to the second 5alue. 4 .ET%EE- can also &e used to search for character 5alues. %hen doing this, care must &e taken to insure that ro)s are recei5ed )ith the 5alues that are needed. The s#stem can onl# com0are character 5alues that are the same length. 3o, if one column or 5alue is shorter than the other, the shortest )ill automaticall# &e 0added )ith s0aces out to the same length as the longer 5alue. Com0aring VC4= and VC4LI91!-I4= ne5er constitutes a match. In realit#, the data&ase is com0aring VC4 = )ith VC4LI91!-I4 V and the# are not eAual. 3ometimes, it is easier to use the LIME com0arison o0erator )hich )ill &e co5ered in the ne<t section. 4lthough, easier to code, it does not al)a#s mean faster to e<ecute. There is al)a#s a trade+off to consider. The ne<t 3ELECT finds all of the students )hose last name starts )ith an L( 8 !o) returned Last6"ame Larkins In realit#, the %CE!E could ha5e used .ET%EE- VL= and VM= as long as no student=s last name )as VM=. The data needs to &e understood )hen using .ET%EE- for character com0arisons.
%haracter String Search (LI1E) The LIME is used e<clusi5el# to search for character data strings. The ma7or difference &et)een the LIME and the .ET%EE- is that the .ET%EE- looks for s0ecific 5alues )ithin a range. The LIME is normall# used )hen looking for a string of characters )ithin a column. 4lso, the LIME has the ca0a&ilit# to use >)ildcard? characters. The )ildcard characters are( 8ildcard sym$ol 8hat it does 7 8underscore9 matches any single character, but a character must be present : 8percent sign9 matches any single character, a series o' characters or the absence o' characters Figure 2-10 The ne<t 3ELECT finds all ro)s that ha5e a character string that &egins )ith V3m=( 8 !o) returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt 333:"0 3mith 4nd# 31 2.00 The fact that the Vs= is in the first 0osition dictates its location in the data. Therefore, the Vm= must &e in the second 0osition. Then, the VS= indicates that an# num&er of characters (including none$ ma# &e in the third and su&seAuent 0ositions. 3o, if the %CE!E clause contained( LIME VSsm=, it onl# looks for strings that end in >3M.? 1n the other hand, if it )ere )ritten as( LIME VSsmS=, then all character strings containing >sm? an#)here are returned. 4lso, remem&er that in Teradata mode, the data&ase is not case sensiti5e. Co)e5er, in 4-3I mode, the case of the letters must match e<actl# and the 0re5ious reAuest must &e )ritten as V3mS= to o&tain the same result. Care should &e taken regarding case )hen )orking in 4-3I mode. 1ther)ise, case does not matter. The VW= )ildcard can &e used to force a search to a s0ecific location in the character string. 4n#thing in that 0osition is considered a match. Co)e5er, a character must &e in that 0osition. The follo)ing 3ELECT uses a LIMEto find all last names )ith an >4? in the second 0osition of the last name( 2 !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt :23:00 Larkins Michael 9! 0.00 82"I3: Canson Cenr# 9! 2.GG In the a&o5e e<am0le, the >W? allo)s an# character in the first 0osition, &ut reAuires a character to &e there. The ke#)ords 4LL, 4-/, or 31ME can &e used to further define the 5alues &eing searched. The# are the same Auantifiers used )ith the I-. Cere, the Auantifiers are used to e<tend the fle<i&ilit# of the LIMEclause. -ormall#, the LIME )ill look for a single set of characters )ithin the data. 3ometimes, that is not sufficient for the task at hand. There )ill &e times )hen the characters to search are not consecuti5e, nor are the# in the same seAuence. The ne<t 3ELECT returns ro)s )ith &oth an Vs= and an Vm= &ecause of the 4LL. 3 !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt 2G0023 Mc!o&erts !ichard J! 8.F0 23:828 Thomas %end# 9! :.00 333:"0 3mith 4nd# 31 2.00 It does not matter if the Vs= a00ears first or the Vm= a00ears first, as long as &oth are contained in the string. .elo), 4-3I is case sensiti5e and onl# 8 ro) returns due to the fact that the V3= is u00ercase, so Thomas and Mc!o&erts are not returned( 8 !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt 333:"0 3mith 4nd# 31 2.00 If, in the a&o5e statement, the 4LL Auantifier is changed to 4-/ (4-3I standard$ or 31ME (Teradata e<tension$, then a character string containing either of the characters, Vs= or Vm=, in either order is returned. It uses the 1! com0arison. This ne<t 3ELECT returns an# ro) )here the last name contains either an Vs= or an Vm=( G !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt :23:00 Larkins Michael 9! 0.00 82"I3: Canson Cenr# 9! 2.GG 2G0023 Mc!o&erts !ichard J! 8.F0 2I0000 Johnson 3tanle# Q Q 238222 %ilson 3usie 31 3.G0 23:828 Thomas %end# 9! :.00 333:"0 3mith 4nd# 31 2.00 8232"0 'hilli0s Martin 3! 3.00 4l)a#s &e a)are of the issue regarding case sensiti5it# )hen using 4-3I Mode. It )ill normall# affect the num&er of ro)s returned and usuall# reduces the num&er of ro)s. There is a s0ecialt# o0eration that can &e 0erformed in con7unction )ith the LIME. 3ince the search uses the >W? and the >S? as )ildcard characters, ho) can #ou search for actual data that contains a >W? or >S? in the dataQ -o) that )e kno) ho) to use the )ildcard characters, there is a )a# to take a)a# the s0ecial meaning and literall# make the )ildcard characters an VW= and a VS=. That is the 0ur0ose of E3C4'E. It tells the 'Eto not match an#thing, &ut instead, match the actual character of VW= or VS=. The ne<t 3ELECT uses the E3C4'E to find all ta&le names that ha5e a >W? in the G th 0osition of the name from the *ata *ictionar#. 2 !o)s returned Ta$lename 3tudentWTa&le 3tudentWCourseWTa&le In the a&o5e out0ut, the onl# thing that matters is the VW= in 0osition eight &ecause of the first se5en VW= characters are still )ildcards.
#eri<ed %olumns The ma7orit# of the time, columns in the 3ELECT statement e<ist )ithin a data&ase ta&le. Co)e5er, sometimes it is more ad5antageous to calculate a 5alue than to store it. 4n e<am0le might &e the salar#. In the em0lo#ee ta&le, )e store the annual salar#. Co)e5er, a reAuest comes in asking to dis0la# the monthl# salar#. *oes the ta&le need to &e changed to create a column for storing the monthl# salar#Q Must )e go through and u0date all of the ro)s (one 0er em0lo#ee$ and store the monthl# salar# into the ne) column 7ust so )e can select it for dis0la#Q The ans)er is no, )e do not need to do an# of this. Instead of storing the monthl# salar#, )e can calculate it from the annual salar# using di5ision. If the annual salar# is di5ided &# 82 (months 0er #ear$, )e >deri5e? the monthl# salar# using mathematics. Chart of 4-3I o0erands for math o0erations( (perator (peration per.ormed ( $ 0arentheses, (all math o0erations in 0arentheses done first$ KK e<0onentiation, (80KK82 deri5es 8,000,000,000,000 or 8 trillion$ K multi0lication, (80K82 deri5es 820$ J di5ision, (80J82 deri5es 0, &oth are integers and truncation of decimal occurs $ X addition, (80X82 deri5es 22$ + su&traction, (80+82 deri5es +2, since 82 is greater than 80 and negati5e 5alues are allo)ed$ Figure 2-11 These math functions ha5e a 0riorit# associated )ith their order of e<ecution )hen mi<ed in the same formula. The seAuence is &asicall# the same as their order in the chart. 4ll e<0onentiation is 0erformed first. Then, all multi0lication and di5ision is 0erformed and lastl#, all addition and su&traction is done. %hene5er t)o different o0erands are at the same 0riorit#, like addition and su&traction, the# are 0erformed &ased on their a00earance in the eAuation from left to right. 4lthough the a&o5e is the default 0riorit#, it can &e o5er+ridden )ithin the 3@L. -ormall# an eAuation like 2X:K" #ields 22 as the ans)er. This is &ecause the :K" P 20 is done first and then the 2 is added to it. Co)e5er, if it is )ritten as (2X:$K", no) the ans)er &ecomes 30 (2X:PIK"P30$. The follo)ing 3ELECT sho)s these and the results of an assortment of mathematics( 8 !o) !eturned 2X:K" (2X:$K" 2X:J" (2X:$J" 2X:.0J" (2X:.0$J"80KKF 30 2 8 2.G 8.2 8000000000 -ote( starting )ith integer 5alues, as in the a&o5e, the ans)er is an integer. If decimals are used, the result is a decimal ans)er. 1ther)ise, a con5ersion can &e used to change the characteristics of the data &efore &eing used in an# calculation. 4dding the decimal makes a difference in the 0recision of the final ans)er. 3o, if the 3@L is not 0ro5iding the ans)er e<0ected from the data, con5ert the data first (C43T function later in this &ook$. The ne<t 3ELECT sho)s ho) the 3@L can &e )ritten to im0lement the earlier e<am0le )ith annual and monthl# salaries( 2 !o)s returned salary salary'=> :G,02:.00 :,002.00 80,G00.00 F00.00 3ince the column name is the default column heading, the deri5ed column is called salar#J82, )hich is not 0ro&a&l# )hat )e )ish to see there. The ne<t section co5ers the usage of an alias to tem0oraril# change the name of a column during the life of the 3@L. *eri5ed data can &e used in the %CE!E clause as )ell as the 3ELECT. The follo)ing 3@L )ill onl# return the columns )hen the monthl# salar# is greater than Y8,000.00( 8 !o) returned salary salary'=> :G,02:.00 :,002.00 Teradata contains se5eral functions that allo) a user to deri5e data for &usiness and engineering. This is a chart of those Teradata arithmetic, trigonometric and h#0er&olic math functions( (perator (peration per.ormed ;46 x ;odulo returns the remainder 'rom a division 81 mod < derives 1, as the remainder o' division, < goes into 1, = times ith a remainder o' 1. -hen, < mod 1= derives <, 1= goes into <, = times ith a remainder o' <9. ;46 alays returns = thru "-1. 5s such, ;46 < returns = 'or even numbers and 1 'or odd> ;46 ? can be used to determine the day o' the ee(> and ;46 1=, ;46 1==, ;46 1===, etc can be used to shi't the decimal o' any number to the le't by the number o' @eroes in the ;46 operator. 5AS8x9 5bsolute value, the absolute value o' a negative number is the same number as a positive ". 85AS81=-1<9 = <9 EBC8x9 E"ponentiation, e raised to a poer, 8 EBC81=9 derives <.<=<DEDF?GEH=D?E==E 9 +4/8x9 +ogarithm calculus 'unction, 8 +4/81=9 derives the value 1.=============E=== 9 +38x9 3atural logarithm, 8 +381=9 derives the value <.I=<FHF=G<GGE=FE=== 9 S2.-8x9 S!uare root, 8 S2.-81=9 derives the value I.1D<<??DD=1DHIHE===9 COS(x) Takes an angle in radians (x) and returns the ratio of two sides of a right triangle. The ratio is the length of the side adjacent to the angle divided by the length of the hypotenuse. The result lies in the range - to ! inclusive where x is any valid nu"ber e#pression that e#presses an angle in radians. S$%(x) Takes an angle in radians (x) and returns the ratio of two sides of a right triangle. The ratio is the length of the side opposite to the angle divided by the length of the hypotenuse. The result lies in the range - to ! inclusive where x is any valid nu"ber e#pression that e#presses an angle in radians. T&%8x9 Takes an angle in radians (x) and returns the ratio of two sides of a right triangle. The ratio is the length of the side opposite to the angle divided by the length of the side adjacent to the angle where x is any valid nu"ber e#pression that e#presses an angle in radians. Chart of Teradata arithmetic, trigonometric and h#0er&olic math functions (continued$ (perator (peration per.ormed &COS(x) .eturns the arccosine o' x. -he arccosine is the angle hose cosine is xhere x is the cosine o' the returned angle. -he values o' x must be beteen -1 and 1, inclusive. -he returned angle is in the range = to Jradians, inclusive. &S$%(x) .eturns the arcsine o' 8"9. -he arcsine is the angle hose sine is " here " is the sine o' the returned angle. -he values o' " must be beteen -1 and 1, inclusive. -he returned angle is in the range JK< to JK< radians, inclusive. &T&%8x9 .eturns the arctangent o' 8"9. -he arctangent is the angle hose tangent is arg. -he returned angle is in the range JK< to JK< radians, inclusive. &T&%' 8x,y9 .eturns the arctangent o' the speci'ied 8",y9 coordinates. -he arctangent is the angle 'rom the "-a"is to a line contained the origin8=,=9 and a point ith coordinates 8",y9. -he returned angle is beteen JJand Jradians, e"cluding J. 5 positive result represents a countercloc(ise angle 'rom the "-a"is here a negative result represents a cloc(ise angle. -he 5-53<8",y9 e!uals 5-538yK"9, e"cept that " can be = in 5-53<8",y9 and " cannot be = in 5-538yK"9 since this ill result in a divide by @ero error. *' both " and y are =, an error is returned. COS(8x9 .eturns the hyperbolic cosine o' 8"9 here " is any real number. S$%(8x9 .eturns the hyperbolic sine o' 8"9 here " is any real number. T&%(8x9 .eturns the hyperbolic tangent o' 8"9 here arg is any real number. &COS((x) .eturns the inverse hyperbolic cosine o' 8"9. -he inverse hyperbolic cosine is the value hose hyperbolic cosine is a number so that " is any real number e!ual to, or greater than, 1. &S$%((x) .eturns the inverse hyperbolic sine o' 8"9. -he inverse hyperbolic sine is the value hose hyperbolic sine is a number so that " is any real number. &T&%(8x9 .eturns the inverse hyperbolic tangent o' 8"9. -he inverse hyperbolic tangent is the value hose hyperbolic tangent is a number so that " is any real number beteen 1 and -1, e"cluding 1 and -19. Figure 2-12 3ome of these functions are demonstrated &elo) and throughout this &ook. Cere the# are also using alias names for the columns. Their a00lication )ill &e s0ecific to the t#0e of a00lication &eing )ritten. It is not the intent of this &ook to teach the meaning and use in engineering and trigonometr#, &ut more to educate regarding their e<istence.
%reating a %olumn Alias "ame 3ince the name of the selected column or deri5ed data formula a00ears as the heading for the column, it makes for strange looking results. To make the out0ut look &etter, it is a good idea to use an alias to dress u0 the heading name used in the out0ut. .esides making the out0ut look &etter, an alias also makes the 3@L easier to )rite &ecause the ne) column name can &e used an#)here in the 3@L statement. AS Com0liance( 4-3I The 0re5ious 3ELECT used salar#J82, )hich is 0ro&a&l# not )hat )e )ish to see in the heading. Therefore, it is 0refera&le to alias the column )ithin the e<ecution of the 3@L. This means that a tem0orar# name is assigned to the selected column for use onl# in this statement. To alias a column, use an 43 and an# legal Teradata name after the real column name reAuested or math formula using the follo)ing techniAue( 2 !o)s returned Annual6salary Monthly6salary :G02:.00 :002.00 80G00.00 F00.00 1nce the alias name has &een assigned, it is literall# the name of the column for the life of the 3@L statement. The ne<t reAuest is a 5alid e<am0le of using of the alias in the %CE!E clause( 8 !o) returned annual6salary monthly6salary Y:G,02:.00 Y:,002.00 The math functions are 5er# hel0ful for calculating and e5aluating characteristics of the data. The follo)ing e<am0les incor0orate most of the functions to demonstrate their o0erational functionalit#. The ne<t 3ELECT uses literals and aliases to sho) the data &eing in0ut and results for each of the most common &usiness a00lica&le o0erations( 8 !o) returned #i<>?? Last> E<en (dd 8as Positi<e Positi<e"ow S:&oot 2 : 0 8 8 8 2.00 The out0ut of the 3ELECT sho)s some interesting results. The di5ision is eas#L )e learned that in elementar# school. The first M1* 800 results in :, &ecause the result of the di5ision is 2, &ut the remainder is : (20: D 200 P :$. 4 M1* 800 can result in an# 5alue &et)een 0 and FF. In realit#, the M1* 800 mo5es the decimal 0oint t)o 0ositions to the left. 1n the other hand, the M1* 2 )ill al)a#s &e 0 for e5en num&ers and 8 for odd num&ers. The 4.3 al)a#s returns the 0ositi5e 5alue of an# num&er and lastl#, 2 is the sAuare root of :. Man# of these )ill &e incor0orated into 3@L throughout this &ook to demonstrate additional &usiness a00lications. NAMED Com0liance( Teradata E<tension 'rior to the 43 &ecoming the 4-3I standard, Teradata used -4ME* as the ke#)ord to esta&lish an alias. 4lthough &oth currentl# )ork, it is strongl# suggested that an 43 &e used for com0ati&ilit#. 4lso, as hard as it is to &elie5e, I ha5e heard that -4ME* ma# not )ork in future releases. The follo)ing is the same 3ELECT as seen earlier, &ut here it uses the -4ME* instead of the 43( 2 !o)s returned Annual6salary Monthly6salary :G02:.00 :002.00 80G00.00 F00.00 Naming conventions %hen creating an alias onl# 5alid Teradata naming characters are allo)ed. The alias &ecomes the name of the column for the life of the 3@L statement. The onl# difference is that it is not stored in the *ata *ictionar#. The charts &elo) list the 5alid characters to use and then the rules (on the left$ to follo) )hen 4-3I com0liance is desired. 4lso listed are the more fle<i&le Teradata (on the right$ allo)a&le characters and e<tended character sets )ith its rules. Chart of alid Characters for 4-3I and Teradata( A"SI %haracters Allowed (up to =@ in a single name) Teradata %haracters Allowed (up to A? in a single name) 5 through L 5 through L and a through @ = through G = through G W (underscore J underline$ W (underscore J underline$ Z (octathro0e J 0ound sign J num&er sign $ M 8dollar sign K currency sign9 Figure 2-13 Chart of 4-3I and Teradata -aming Con5entions A"SI &ules .or column names Teradata &ules .or column names ;ust be entirely in upper case Nan be all upper, all loer or a mi"ture o' case using any o' these characters ;ust start ith 5 through L Nan start ith any valid character ;ust end ith underscore 7 Nan end ith any valid character Figure 2-14 Teradata uses all of the 4-3I characters as )ell as the additional ones listed in the a&o5e charts. Breaking Conventions It is not recommended to &reak these con5entions. Co)e5er, sometimes it is necessar# or desira&le to use non+standard characters in a name. 4lso, sometimes )ords ha5e &een used as ta&le or column names and then in a later release, the name &ecomes a reser5ed )ord. There needs to &e a techniAue to assist #ou )hen either of these reAuirements &ecomes necessar#. The techniAue uses dou&le Auotes (>$ around the name. This techniAue tells the 'E that the )ord is not a reser5ed )ord and makes it a 5alid name. This is the onl# 0lace that Teradata uses a dou&le Auote instead of a single Auote (V$. 4s an e<am0le, the 0re5ious 3ELECT has &een modified to use dou&le Auotes (>$ instead of -4ME*( 2 !o)s returned Annualsalary Monthlysalary 80G00.00 F00.00 :G02:.00 :002.00 4lthough it is not o&5ious due to the underlining, the column heading for the first column is 4nnual 3alar#, including the s0ace. 4 s0ace is not a 5alid naming character, &ut this is the column name and it is 5alid &ecause of the dou&le Auotes. This can &e seen in the 1!*E! ./ )here it uses the column name. The ne<t section 0ro5ides more details on the use of 1!*E! ./.
(&#E& ! The Teradata 4M's generall# &ring data &ack randoml# unless the user s0ecifies a sort. The addition of the 1!*E! ./ reAuests a sort o0eration to &e 0erformed. The sort arranges the ro)s returned in ascending seAuence unless #ou s0ecificall# reAuest descending. 1ne or more columns ma# &e used for the sort o0eration. The first column listed is the ma7or sort seAuence. 4n# su&seAuent columns s0ecified are minor sort 5alues in the order of their a00earance in the list. The snta! "o# us$n% an OR&ER '() In Teradata, if the seAuence of the ro)s &eing dis0la#ed is im0ortant, then an 1!*E! ./ should &e used in the 3ELECT. Man# other data&ases store their data seAuentiall# &# the 5alue of the 0rimar# ke#. 4s a result, the data )ill a00ear in seAuence )hen it is returned. To &e faster, Teradata stores it differentl#. Teradata organi;es data ro)s in ascending seAuence on disk &ased on a ro) I* 5alue, not the data 5alue. This is the same 5alue that is calculated to determine )hich 4M' should &e res0onsi&le for storing and retrie5ing each data ro). %hen the 1!*E! ./ is not used, the data )ill a00ear 5aguel# in ro) hash seAuence and is not 0redicta&le. Therefore, it is recommended to use the 1!*E! ./ in a 3ELECT or the data )ill come &ack randoml#. !emem&er, e5er#thing in Teradata is done in 0arallel, this includes the sorting 0rocess. The ne<t 3ELECT retrie5es all columns and sorts &# the 6rade 0oint a5erage( : !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt 32:I"2 *elane# *ann# 3! 3.3" 238222 %ilson 3usie 31 3.G0 322833 .ond Jimm# J! 3.F" 23:828 Thomas %end# 9! :.00 -otice that the default seAuence for the 1!*E! ./ is ascending (43C$, lo)est 5alue to highest. This can &e o5er+ridden using *E3Cto indicate a descending seAuence as sho)n using the follo)ing 3ELECT( : !o)s returned Student6I# Last6"ame )irst6"ame %lass6%ode 7rade6Pt 23:828 Thomas %end# 9! :.00 322833 .ond Jimm# J! 3.F" 238222 %ilson 3usie 31 3.G0 32:I"2 *elane# *ann# 3! 3.3" 4s an alternati5e to using the column name in an 1!*E! ./, a num&er can &e used. The num&er reflects the column=s 0osition in the 3ELECT list. The a&o5e 3ELECT could also &e )ritten this )a# to o&tain the same result( In this case, the grade 0oint column is the fifth column in the ta&le definition &ecause of its location in the ta&le and the 3ELECT uses K for all columns. This adds fle<i&ilit# to the )riting of the 3ELECT. Co)e5er, al)a#s )atch out for the a&ilit# )ords, like fle<i&ilit# &ecause it adds another a&ilit# )ord( res0onsi&ilit#. %hen using the column num&er, if the column that is used for the sort is mo5ed to another location in the select list, a different column is no) used for the sort. Therefore, it is im0ortant to &e res0onsi&le to change the list and the num&er in the 1!*E! ./. Man# times it is necessar# that the 5alue in one column needs to &e sorted )ithin the seAuence of a second column. This techniAue is said to ha5e a ma7or sort column or ke# and one or more minor sort ke#s. The first column listed in the 1!*E! ./ is the ma7or sort ke#. Like)ise, the last column listed is the most minor sort ke# )ithin the seAuence. The minor ke#s are referred to as &eing sorted )ithin the ma7or sort ke#. 4dditionall#, some columns can ascend )hile others descend. This 3ELECT sorts t)o different columns( the last name (minor sort$ ascending (43C$, )ithin the class code (ma7or sort$ descending (*E3C$( 80 !o)s returned Last6"ame %lass6%ode 7rade6Pt *elane# 3! 3.3" 'hilli0s 3! 3.00 3mith 31 2.00 %ilson 31 3.G0 .ond J! 3.F" Mc!o&erts J! 8.F0 Canson 9! 2.GG Larkins 9! 0.00 Thomas 9! :.00 Johnson Q Q -otice, in the a&o5e statement, the use of relati5e column num&ers instead of column names in the 1!*E! ./ for the sort. The num&ers 2 and 8 )ere used instead of ClassWCode and LastW-ame. %hen #ou select columns and then use num&ers in the sort, the num&ers relate to the order of the columns after the ke#)ord 3ELECT. %hen #ou 3ELECT K (all columns in the ta&le$ then the sort num&ers reflect the order of columns )ithin the ta&le. 4n additional ca0a&ilit# of Teradata is that a column can &e used in the 1!*E! ./ that is not selected. This is 0ossi&le &ecause the data&ase uses a tag sort for s0eed and fle<i&ilit#. In other )ords, it &uilds a tag area that consists of all the columns s0ecified in the 1!*E! ./ as )ell as the columns that are &eing selected. This diagram sho)s the la#out of a ro) in 3'11L used )ith an 1!*E! ./( -agcolumn1 -agcolumn3 5;CO Selectcolumn1 Selectcolumn< ...Selectcolumn3 Figure 2-15 4lthough it can sort on a column that is not selected, the seAuence of the out0ut ma# a00ear to &e com0letel# random. This is &ecause the sorted 5alue is not seen in the dis0la#. 4dditionall#, )ithin a Teradata session the user can reAuest a Collation 3eAuence and a Code 3etfor the s#stem to use. .# reAuesting a Collation 3eAuence of E.C*IC, the sort 0uts the data into the 0ro0er seAuence for the I.M mainframe s#stem. Therefore, is the automatic default code set )hen connecting from the mainframe. Like)ise, if a user )ere e<tracting to a 2-IB com0uter, the normal code set is 4C3II. Co)e5er, if the file is transferred from 2-IB to a mainframe and con5erted there, it is in the )rong seAuence. %hen it is kno)n ahead of time that the file )ill &e used on a mainframe &ut e<tracted to a different com0uter, the Collation 3eAuencecan &e set to E.C*IC. Therefore, )hen the file code set is con5erted, the file is in the correct seAuence for the mainframe )ithout doing another sort. Like the Collation 3eAuence, the Code 3et can also &e set. 3o, a file can &e in E.C*IC seAuence and the data in 43CII or sorted in 43CII seAuence )ith the data in E.C*IC. The final use of the file needs to &e considered )hen making this choice.
#ISTI"%T )unction 4ll of the 0re5ious o0erations of the 3ELECT returned a ro) from a ta&le &ased on its e<istence in a ta&le. 4s a result, if multi0le ro)s contain the same 5alue, the# all are dis0la#ed. 3ometimes it is onl# necessar# to see one of the 5alues, not all. Instead of contem0lating a %CE!E clause to accom0lish this task, the *I3TI-CT can &e added in the 3ELECT to return uniAue 5alues &# eliminating du0licate 5alues. The s#nta< for using *I3TI-CT( The ne<t 3ELECT uses *I3TI-CT to return onl# one ro) for dis0la# )hen a 5alue e<ists( " !o)s !eturned %lass6code Q 9! J! 31 3! There are a cou0le note)orth# situations in the a&o5e out0ut. 9irst, although there are three freshman, t)o so0homores, t)o 7uniors, t)o seniors and one ro) )ithout a class code, onl# one out0ut ro) is returned for each of these 5alues. Lastl#, the -2LL is considered a uniAue 5alue )hether there is one ro) or multi0le ro)s containing it. 3o, it is dis0la#ed one time. The main considerations for using *I3TI-CT, it must( 8. 400ear onl# once 2. 400l# to all columns listed in the 3ELECT to determine uniAueness 3. 400ear &efore the first column name The follo)ing 3ELECT uses more than one column )ith a *I3TI-CT( 80 !o)s !eturned class6code grade6pt Q Q 9! 0.00 9! 2.GG 9! :.00 J! 8.F0 J! 3.F" 31 2.00 31 3.G0 3! 3.00 3! 3.3" The *I3TI-CT in this 3ELECT returned all ten ro)s of the ta&le. This is due to the fact that )hen the class code and the grade 0oint are com&ined for com0arison, the# are all uniAue. The onl# 0otential for a du0licate e<ists )hen t)o students in the same class ha5e the same grade 0oint a5erage. Therefore, as more and more columns are listed in a 3ELECT )ith a *I3TI-CT, there is a greater o00ortunit# for more ro)s to &e returned due to a higher likelihood for uniAue 5alues. If, )hen using *I3TI-CT, s0ool s0ace is e<ceeded, see cha0ter " and the use of the 6!12' ./5ersus *I3TI-CT for eliminating du0licate ro)s. It ma# sol5e the 0ro&lem and that cha0ter tells the reason for it.
9ELP commands The Teradata *ata&ase offers se5eral t#0es of hel0 using an interacti5e client. 9or con5enience, this reduces or eliminates the need to look information u0 in a hardco0# manual or on a C*+!1M. Therefore, using the hel0 and sho) o0erations in this cha0ter can sa5e #ou a large amount of time and make #ou more 0roducti5e. 3ince Teradata allo)s #ou to organi;e data&ase o&7ects into a 5ariet# of locations, sometimes #ou need to determine )here certain o&7ects are stored and other detail information a&out them. This chart is a list of a5aila&le CEL' commands on 1&7ects( 0E+C 65-5A5SE <database-name> > 6isplays the names o' all the tables 8-9, vies 8P9, macros 8;9, and triggers 8/9 stored in a database and user ritten table comments 0E+C &SE. <user-name> > 6isplays the names o' all the tables 8-9, vies 8P9, macros 8;9, and triggers 8/9 stored in a user area and user ritten table comments 0E+C -5A+E <table-name> > 6isplays the column names, type identi'ier, and any user ritten comments on the columns ithin a table. 0E+C P4+5-*+E -5A+E > 6isplays the names o' all Polatile temporary tables active 'or the user session. 0E+C P*EQ <vie-name> > 6isplays the column names, type identi'ier, and any user ritten comments on the columns ithin a vie. 0E+C ;5N.4 <macro-name> > 6isplays the characteristics o' parameters passed to it at e"ecution time. 0E+C C.4NE6&.E <procedure-name> > 6isplays the characteristics o' parameters passed to it at e"ecution time. 0E+C -.*//E. <trigger-name> > 6isplays details created 'or a trigger, li(e action time and se!uence. CEL' C1L2M- Nta&le+nameO.K L CEL' C1L2M- N5ie)+nameO.K L CEL' C1L2M- <table-name>.<column-name>, R. > 6isplays detail data describing the column level characteristics. Figure 3-1 To see the data&ase o&7ects stored in a *ata&ase or 2ser area, either of the follo)ing CEL' commandsma# &e used( *EL+ &,T,',SE M_&' ; O# *EL+ -SER M_-se# ; : !o)s !eturned Ta$le';iew'Macroname 1ind %omment em0lo#ee T TPTa&le)ith8ro)0erem0lo#ee em0lo#eeW5 Pie)foraccessingEm0lo#eeTa&le Em0lo#eeWm8 M MPMacrotore0ortonEm0lo#eeTa&le Em0lo#eeWTrig 6 6PTriggertou0dateEm0lo#eeTa&le 3ince Teradata considers a data&ase and a user to &e eAui5alent, &oth can store the same t#0es of o&7ects and therefore, the t)o commands 0roduce similar out0ut. -o) that #ou ha5e seen the names of the o&7ects in a data&ase or user area, further in5estigation dis0la#s the names and the t#0es of columns contained )ithin the o&7ect. 9or ta&les and 5ie)s, use the follo)ing commands( *EL+ T,'LE M_Table ; H !o)s !eturned %olumn"ame Type %omment "ulla$le )ormat Title Column8 I Thiscolumnisaninteger / +(80$F Q Column2 I2 Thiscolumnisasmallint / +("$F Q Column3 I8 Thiscolumnisa&#teint / +(3$F Q Column: C9 Thiscolumnisafi<edlength / B(20$ Q Column" C Thiscolumnisa5aria&lelength / B(20$ Q ColumnI *4 Thiscolumnisadate / ////+MM+** Q ColumnH * Thiscolumnisadecimal / ++++++++.FF Q
*pper%ase Ta$le';iew2 #e.ault<alue %harType Id%olType - T Q Q Q - T Q Q Q - T Q Q Q - T Q 8 Q - T Q 8 Q - T Q Q Q - T Q Q Q The a&o5e out0ut has &een )ra00ed to multi0le lines to sho) all the detail information a5aila&le on the columns of a ta&le. *EL+ 12E3 M_1$e4 ; (notice that the 5ast ma7orit# of the column data is not a5aila&le for a 5ie), it comes from the ta&le, not the 3ELECT that creates a 5ie)$ H !o)s !eturned %olumn"ame Type %omment "ulla$le )ormat Title Column8 Q Thiscolumnisaninteger Q Q Q Column2 Q Thiscolumnisasmallint Q Q Q Column3 Q Thiscolumnisa&#teint Q Q Q Column: Q Thiscolumnisafi<edlengthQ Q Q
*pper%ase Ta$le';iew2 #e.ault<alue %harType Id%olType Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q 8 Q Q Q Q 8 Q Q Q Q Q Q Q Q Q Q Q The a&o5e out0ut is )ra00ed to multi0le lines and dis0la# the column name and the kind, )hich eAuates to the data t#0e and an# comment added to a column. -otice that a 5ie) does not kno) the data t#0e of the columns from a real ta&le. Teradata 0ro5ides a C1MME-T command to add these comments on ta&les and columns. The follo)ing C1MME-T commands add a comment to a ta&le and a 5ie)( This C1MME-T command adds a comment to a column( The a&o5e column information is hel0ful for most of the column t#0es, such as I-TE6E! (I$, 3M4LLI-T (I2$ and *4TE (*4$ &ecause the si;e and the 5alue range is a constant. Co)e5er, the lengths of the *ECIM4L (*$ and the character columns (C9, C$ are not sho)n here. These are the most common of the data t#0es. 3ee cha0ter 8G (**L$ for more details on data t#0es. The ne<t CEL' C1L2M- command 0ro5ides more details for all of the columns( The out0ut is not sho)n again, since it is e<actl# the same as the ne)er 5ersion of the CEL' T4.LEcommand. The ne<t chart sho)s CEL' commands for information on data&ase ta&les and sessions, as )ell as 3@L and 3'L commands( Cel0 Commands( 0E+C *36EB <table-name> > 6isplays the inde"es and their characteristics li(e uni!ue or non-uni!ue and the column or columns involved in the inde". -his data is used by the 4ptimi@er to create a plan 'or S2+. 0E+C S-5-*S-*NS <table-name> > 6isplays values associated ith the data demographics collected on the table. -his data is used by the 4ptimi@er to create a plan 'or S2+. CEL' C1-3T!4I-T <table-name>.<constraint-name> > 6isplays the chec(s to be made on the data hen it is inserted or updated and the columns are involved. 0E+C SESS*43> 6isplays the user name, account name, logon date and time, current database name, collation code set and character set being used, transaction semantics, time @one and character set data. 0E+C SS2+T> 6isplays a list o' available S2+ commands and 'unctions. 0E+C SS2+ <command>T> 6isplays the basic synta" and options 'or the actual S2+ command inserted in place o' the <command> . 0E+C SSC+T> 6isplays a list o' available SC+ commands. 0E+C SSC+ <command>T> 6isplays the basic synta" and options 'or the actual SC+ command inserted in place o' the <command> . Figure 3-2 The a&o5e chart does a 0rett# good 7o& of e<0laining the CEL' functions. These functions onl# 0ro5ide additional information if the ta&le o&7ect has one of these characteristics defined on it. The I-*EB, 3T4TI3TIC3 and C1-3T!4I-T functions )ill &e further discussed in the *ata *efinition Language Cha0ter (**L$ &ecause of their relationshi0 to the o&7ects. 4t this 0oint in learning 3@L, and in the interest of getting to other 3@L functions, one of the most useful of these CEL' functions is the CEL' 3E33I1-. The follo)ing CEL' returns inde< information on the de0artmentWta&le( *EL+ 20&E5 &e6a#tment_table ; 2 ro)s returned *ni:ue2 Primary or Secondary2 %olumn "ames Inde/ Id Appro/imate %ount Inde/ "ame (rdered or Partitioned2 / ' *e0tW-o 8 G.00 Q C - 3 *e0artmentWname : G.00 Q C - 3 MgrW-o G I.00 Q C The follo)ing CEL' returns information on the session from the 'E( *EL+ SESS2O0 ; 8 !o) !eturned (columns )ra00ed for 5ie)ing$ *ser "ame Account "ame Logon #ateLogon Time %urrent #ata$ase %ollation %haracter Set *.C *.C FFJ82J82 88(:"(83 'ersonnel 43CII 43CII
Transaction Semantics %urrent #ate)orm Time Bone #e.ault %haracter Type E/port Latin Teradata Integerdate 00(00 L4TI- 8
%urrency IS(%urrency #ual %urrency "ame #ual %urrency #ual IS(%urrency Y 23* 23 *ollars Y 23*
The a&o5e out0ut has &een )ra00ed for easier 5ie)ing. -ormall#, all headings and 5alues are on a single line. The current date form, time ;one and e5er#thing that follo)s them in the out0ut are ne) )ith the 2!3 release of Teradata. These columns ha5e &een added to make their reference here, easier than digging through the *ata *ictionar# using 3@L. %hen using a tool like .TE@, the line is truncated. 3o, for easier 5ie)ing, the .3I*ETITLE3 and .91L*LI-E commands sho) the out0ut in a 5ertical dis0la#. The ne<t seAuence of commands can &e used )ithin .TE@( Esidetitles on E.oldline on *EL+ SESS2O0; 8 !o) !eturned 2ser -ame MIMEL 4ccount -ame *.C Logon *ate 00J0IJ2" Logon Time 08(02("2 Current *ata.ase MIMEL Collation 43CII Character 3et 43CII Transaction 3emantics Teradata Current *ate9orm Integer*ate 3ession Time [one 00(00 *efault Character T#0e L4TI- E<0ort Latin 8 E<0ort 2nicode 8 E<0ort 2nicode 4d7ust 0 E<0ort Man7i3JI3 8 E<0ort 6ra0hic 0 To reset the dis0la# to the normal line, use either of the follo)ing commands( E#E)A*LTS or ESI#ETITLES ()) E)(L#LI"ES ()) In .TE@, an# command that starts )ith a dot (.$ does not ha5e to end )ith a semi+colon (L$. The ne<t CEL' command returns a list of the a5aila&le 3@L commands and functions( *EL+ 7S8L9; :8 !o)s !eturned 1n+Line Cel0 #S SQL %(MMA"#S0 4.1!T 4LTE! T4.LE .E6I- L166I-6 .E6I- T!4-34CTI1- CCECM'1I-T C1LLECT 3T4TI3TIC3 C1MMIT C1MME-T C!E4TE *4T4.43E C!E4TE I-*EB C!E4TE M4C!1 C!E4TE T4.LE C!E4TE 23E! C!E4TE IE% *4T4.43E *ELETE *ELETE *4T4.43E *ELETE 23E! *!1' *4T4.43E *!1' I-*EB *!1' M4C!1 *!1' T4.LE *!1' IE% *!1' 3T4TI3TIC3 ECC1 E-* L166I-6 E-* T!4-34CTI1-.
#S SQL )*"%TI("S0 4.3 4**WM1-TC3 4E!46E CC4!4CTE!3 C43T CC4!2CEBI-T C12-T C1!! C14!W'1' C32M EB' EBT!4CT 91!M4T I-*EB C43C4M' C43C.M4M' C43C.2CMET C43C!1% M2!T13I3 L- L16 M46 M4BIM2M MCC4!4CTE!3 M*I99 MI-*EB MI-IM2M MLI-!E6 M32.3T! M32M -4ME* -2LLI9[E!1 1CTETWLE-6TC @24-TILE !E6!WI-TE!CE'T !E6!W3L1'E !4-*1M !4-M 3ME% 3@!T 3T**EW'1' 3T**EW34M' S&AS-. S&; -*-+E T!IM T/'E 2''E! 4!6!4'CIC 4!W'1' 4!W34M' [E!1I9-2LL The a&o5e out0ut is not a com0lete list of the commands. The three dots in the center re0resent the location )here commands )ere omitted so it fit onto a single 0age. 4ll commands are seen )hen 0erformed on a terminal. 1nce this out0ut has &een used to find the command, than the follo)ing CEL' command 0ro5ides additional information on it( *EL+ 7S8L E0& TR,0S,CT2O09 ; " !o)s !eturned 3ince the terminal is used most of the time to access the data&ase, take ad5antage of it and use the terminal for #our CEL' commands. Tools like @uer#man also ha5e a 5ariet# of CEL' commands and indi5idual menus. 4l)a#s look for )a#s to make the task easier. SET SESSION command The Teradata *ata&ase 0ro5ides user access onl# &# allocating a session )ith a 'arsing Engine. The 'arsing engine )ill use default attri&utes &ased on the user and host com0uter from )hich the user is connecting. %hen a different session o0tion is needed,the 3ET 3E33I1- command is needed. It o5er+rides the default for this session onl#. The ne<t time the user logs into Teradata, the original default )ill &e used again. 3#nta< for 3ET 3E33I1-( The 3ET 3E33I1- can &e a&&re5iated as( 33. Collation sequence) 43CII, E.C*IC, M2LTI-4TI1-4L (Euro0ean (diacritical$ character or Man7i character$, CC4!3ETWC1LL (&inar# ordering &ased on the current client character set$, JI3WC1LL (logical ordering of characters &ased on the Ja0anese Industrial 3tandards collation$, C13T (E.C*IC for I.M channel+attached clients and 43CII for all other clients + default collation$. Account-id( allo)s for the tem0orar# changing of accounting data for charge &ack and 0riorit#. The account+id s0ecified must &e a 5alid one assigned to the user and the 0riorit# can onl# &e do)n graded. INTEGERDATE( uses the //JMMJ** format and ANSIDATEuses the ////+MM+** format for a date. Database-name( &ecomes the data&ase to use as the current data&ase for 3@L o0erations during this session.
S9(8 commands There are times )hen #ou need to recreate a ta&le, 5ie), or macro that #ou alread# ha5e, or #ou need to create another o&7ect of the same t#0e that is either identical or 5er# similar to an o&7ect that is alread# created. %hen this is the case, the 3C1% command is a )a# to accom0lish )hat #ou need. %e )ill &e discussing all of these o&7ect t#0es and their associated *ata *efinition Language (**L$ commands later in this course. The intent of the 3C1% command is to out0ut the C!E4TE statement that could &e used to recreate the o&7ect of the t#0e s0ecified. This chart sho)s the commands and their formats( S04Q -5A+E <table-name> > 6isplays the N.E5-E -5A+E statement needed to create this table. S04Q P*EQ <vie-name> > 6isplays the N.E5-E P*EQ statement needed to create this vie. S04Q ;5N.4 <macro-name> > 6isplays the N.E5-E ;5N.4 statement needed to create this macro. S04Q -.*//E. <trigger-name> > 6isplays the N.E5-E -.*//E.statement needed to create this trigger. S04Q C.4NE6&.E <procedure-name> > 6isplays the N.E5-E C.4NE6&.Estatement needed to create this stored procedure. S04Q <S2+-statement> > 6isplays the N.E5-E -5A+E statements 'or all tablesKvies re'erenced by the S2+ statement . Figure 3-3 To see the C!E4TE T4.LEcommand for the Em0lo#ee ta&le, )e use the command( S*O3 T,'LE Em6loee ; 83 !o)s !eturned To see the C!E4TE IE%command, )e use a command like( 3C1% IE% T1*4/ L 3 !o)s !eturned To see the C!E4TE M4C!1command for the macro called M/!E'1!T, )e use a command like( 3C1% M4C!1 M/!E'1!T L F !o)s !eturned To see the C!E4TE T!I66E!command for 46W34LWT, )e use( S*O3 TR2::ER ,1:_S,L_T ; ;< Ro4s Retu#ned 3ince the 3C1% command returns the **L, it can &e a real time sa5er. It is a 5er# hel0ful tool )hen a data&ase o&7ect needs to &e recreated, a co0# of an e<isting o&7ect is needed, or another o&7ect is needed that has similar characteristics to an e<isting o&7ect. 'lus, )hat a great )a# to get a reminder on the s#nta< needed for creating a ta&le, 5ie), macro, or trigger. It is a good idea to sa5e the out0ut of the 3C1% command in case it is needed at a later date. Co)e5er, if the o&7ect=s structure changes, the 3C1% command should &e re+e<ecuted and the ne) out0ut sa5ed. It returns the **Lthat can &e used to create a ne) ta&le e<actl# the same as the current ta&le. -ormall#, at a minimum, the ta&le name is changed &efore e<ecuting the command.
EFPLAI" The EB'L4I- command is a 0o)erful tool 0ro5ided )ith the Teradata data&ase. It is designed to 0ro5ide an English e<0lanation of )hat ste0s the 4M' must com0lete to satisf# the 3@L reAuest. The EB'L4I- is &ased on the 'E=s e<ecution 0lan. The 'arsing Engine ('E$ does the o0timi;ation of the su&mitted 3@L, the creation of the 4M' ste0s and the dis0atch to an# 4M' in5ol5ed in accessing the data. The EB'L4I- is an 3@L modifierL it modifies the )a# the 3@L o0erates. %hen an 3@L statement is su&mitted using the EB'L4I-, the 'E still does the same o0timi;ation ste0 as normal. Co)e5er, instead of &uilding the 4M' ste0s, it &uilds the English e<0lanation and sends it &ack to the client soft)are, not to the 4M'. This gi5es users the a&ilit# to see resource utili;ation, use of indices, and ro) and time estimates. Therefore, it can 0redict a Cartesian 0roduct 7oin in seconds, instead of hours later )hen the user gets sus0icious that the reAuest should ha5e &een finished. The EB'L4I- should &e run e5er# time changes to an o&7ect=s structure occur, )hen a reAuest is first 0ut into 0roduction and other ke# times during the life of an a00lication. 3ome com0anies reAuire that the EB'L4I- al)a#s &e run &efore e<ecution of an# ne) Aueries. The s#nta< for using the EB'L4I- is sim0le( 7ust t#0e the EB'L4I- ke#)ord 0receding #our 5alid 3@L statement. 9or e<am0le( The EB'L4I- can &e used to translate the actions for all 5alid 3@L. It cannot 0ro5ide a translation )hen s#nta< errors are 0resent. The 3@L must &e a&le to e<ecute in order to &e e<0lained. Chart for some of the ke#)ords that ma# &e seen in the out0ut of an EB'L4I-( Locking 'suedo Ta&le 3erial lock on a s#m&olic ta&le. E5er# ta&le has one. 2sed to 0re5ent deadlocks situations &et)een users. Locking ta&le for Indicates that an 4CCE33, !E4*, %!ITE, or EBCL23IE lock has &een 0laced on the ta&le Locking ro)s for Nt#0eO Indicates that an 4CCE33, !E4*, or %!ITE, lock is 0laced on ro)s as the# are read or )ritten *o an 4.1!T test 6uarantees a transaction is not in 0rogress for this user 4ll 4M's retrie5e 4ll 4M's are recei5ing the 4M' ste0s and are in5ol5ed in 0ro5iding the ans)er set .# )a# of an all ro)s scan !o)s are read seAuentiall# on all 4M's .# )a# of 0rimar# inde< !o)s are read using the 'rimar# inde< column(s$ .# )a# of inde< num&er !o)s are read using the 3econdar# inde< D num&er from CEL' I-*EB Chart of EB'L4I- ke#)ords (continued$ .M3M3 .itMa0 3et Mani0ulation 3te0, alternati5e direct access techniAue )hen multi0le -23Icolumns are referenced in the %CE!E clause !esidual conditions %CE!E clause conditions, other than those of a 7oin Eliminating du0licate ro)s 'ro5iding uniAue 5alues, normall# result of *I3TI-CT, 6!12' ./ or su&Auer# %here unkno)n com0arison )ill &e ignored Indicates that -2LL 5alues )ill not com0are to a T!2E or 94L3E. Might &e seen in a su&Auer# using -1T I- or -1T P 4LL &ecause no ro)s )ill &e returned if com0arison is ignored. Merge 7oin !o)s of one ta&le are matched to the other ta&le on common domain columns after &eing sorted into the same seAuence, normall# !o) Cash 'roduct 7oin !o)s of one ta&le are matched to all the ro)s of the other ta&le )ithout concern for a domain match *u0licated on all 4M's 'artici0ating ro)s for the ta&le (normall# smaller ta&le$ of a 7oin are du0licated on all 4M'3 Cash redistri&uted on all 4M's 'artici0ating ro)s of a 7oin are hashed on the 7oin column and sent to the same 4M'that stores the matching ro) of the ta&le to 7oin 3M3 3et Mani0ulation 3te0, result of an I-TE!3ECT, 2-I1-, EBCE'T or MI-23o0eration Last use 3'11L file is no longer needed after the ste0 and s0ace is released .uilt locall# on the 4M's 4s ro)s are read, the# are 0ut into 3'11Lon the same 4M' 4ggregate Intermediate !esults are com0uted locall# The aggregation 5alues are all on the same 4M' and therefore no need to redistri&ute them to )ork )ith ro)s on other 4M's 4ggregate Intermediate !esults are com0uted glo&all# The aggregation 5alues are not all on the same 4M' and must &e redistri&uted on one 4M', to accom0an# the same 5alue )ith from the other 4M's Figure 3-4 1nce #ou attain more e<0erience )ith Teradata and 3@L, these terms lead #ou to a more detailed understanding of the )ork in5ol5ed in an# 3@L reAuest. The first is the estimated num&er of ro)s that )ill &e returned. This num&er is an educated guess that the 'E has made &ased on information a5aila&le at the time of the EB'L4I-. This num&er ma# or ma# not &e accurate. If there are current 3T4TI3TIC3 on the ta&le, the num&ers are more accurate. 1ther)ise, the 'E calculates a guess &# asking a random 4M' for the num&er of ro)s it contains. Then, it multi0les the ans)er &# the num&er of 4M's to guess a >total ro) count.? 4t the same time, it lets #ou kno) ho) accurate the num&er 0ro5ided might &e using the terms in the ne<t chart. This chart is for 0hrases that accom0an# the estimated num&er of ro)s( -o confidence The 'E has no degree of certaint# )ith the 5alues used. This is normall# a result of not collecting 3T4TI3TIC3 and )orking )ith multi0le ste0s in 3'11L Lo) confidence The 'E is not sure of the 5alues &eing used. This is normall# a result of 0rocessing in5ol5ing se5eral ste0s in 3'11L instead of the actual ro)s in a ta&le Cigh confidence -ormall# indicates that 3T4TI3TIC3 ha5e &een collected on the columns or indices of a ta&le. 4llo)s the o0timi;er to &e more aggressi5e in the access 0lan. Inde< Joinconfidence Indicates that a 7oin is &eing done there uses a 7oin condition 5ia a uniAue inde<. Figure 3-5 The second area to check in the out0ut of the EB'L4I- is the estimated cost, e<0ressed in time, to com0lete the 3@L reAuest. 4lthough it is e<0ressed in time, do not confuse it )ith either )all+ clock or C'2 time. It is strictl# a cost factor calculated &# the o0timi;er for com0arison 0ur0oses onl#. It does not take the num&er of users, the current )orkload or other s#stem related factors into account. 4fter looking at the 0otential e<ecution 0lans, the 0lan )ith the lo)est cost 5alue is selected for e<ecution. 1nce these t)o 5alues are checked, the Auestion that should &e asked is( 4re these 5alues reasona&leQ 9or instance, if the ta&le contains one million ro)s and the estimate is one million ro)s in :" seconds, that is 0ro&a&l# reasona&le if there is not a %CE!E clause. Co)e5er, if the ta&le contains a million ro)s and is &eing 7oined to a ta&le )ith t)o thousand ro)s and the estimate is that t)o hundred trillion ro)s )ill &e returned and it )ill take fift# da#s, this is not reasona&le. The follo)ing EB'L4I- is for a full ta&le scan of the 3tudent Ta&le( 82 !o)s !eturned The EB'L4I- estimates, G ro)s and .8" seconds. 3ince there are 80 ro)s in the ta&le, the EB'L4I- is slightl# off in its estimate. Co)e5er, this is reasona&le &ased on the contents of the ta&le and the 3ELECT statement su&mitted. The ne<t EB'L4I- is for a 7oin that has an error in it, can #ou find itQ( The EB'L4I- estimates nearl# "82 ro)s )ill &e returned and it )ill take .3F seconds. 4lthough the time estimate sounds acce0ta&le, this is a 5er# small ta&le. Looking at the num&er of ro)s returned as "82 )ith onl# 8: ro)s in the largest of these ta&les. This is not reasona&le &ased on the contents of the ta&les. 20on further e<amination, the 0roduct 7oin in ste0 I is using (8P8$ as the 7oin condition )here it should &e a merge 7oin. Therefore, this is a Cartesian 0roduct 7oin. 4 careful anal#sis of the 3ELECT sho)s a single 7oin condition in the %CE!E clause. Co)e5er, this is a three+ta&le 7oin and should ha5e t)o 7oin conditions. The %CE!E clause needs to &e fi<ed and &# using the EB'L4I- )e ha5e sa5ed 5alua&le time. If #ou can get to the 0oint of using the EB'L4I- in this manner, #ou are )a# ahead of the game. -o one )ill e5er ha5e to sla0 #our hand for )riting 3@L that runs for da#s, uses u0 large amounts of s#stem resources and accom0lishes a&solutel# nothing. /ou sa#, >*octor, it hurts )hen I do this.? The *octor sa#s, >*on=t do that.? %e are sa#ing, >*on=t 0ut e<tensi5e 3ELECT reAuests into 0roduction )ithout doing an EB'L4I- on it. !emem&er, al)a#s e<amine the EB'L4I- for reasona&le results. Then, sa5e the EB'L4I- out0ut as a &enchmark against an# future EB'L4I- out0ut. Then, if the 3@L starts e<ecuting slo)er or using more resources, #ou ha5e a &asis for com0arison. /ou might also use the &enchmark if #ou decide to add a secondar# inde<. This 0rotot#0ing allo)s #ou to see e<actl# )hat #our 3@L is doing. 3ome users ha5e Auit using the EB'L4I- &ecause the# ha5e gotten inaccurate results. 9rom our e<0erience, )hen the num&ers are consistentl# different than the actual ro)s &eing returned and the cost estimate is com0letel# )rong, it is normall# an indicator that 3T4TI3TIC3 should &e collected or u0dated on the in5ol5ed ta&les.
Adding %omments 3ometimes it is necessar# or desira&le to document the logic used in an 3@L statement )ithin the Auer#. 4 comment is not e<ecuted and is ignored &# the 'Eat s#nta< checking and resolution time. ANSI Comment To comment a line using the 4-3I standard form of a comment( ++ the dou&le dash at the start of a single line denotes a comment is on that line Each line that is a comment must &e started )ith the same t)o dashes for each comment line. This is the onl# techniAue a5aila&le for commenting using 4-3I com0lianc#. 4t the )riting of this &ook, @uer#man sometimes gets confused and regards all lines after the ++ as 0art of the comment. 3o, &e careful regarding 5arious client tools. ++ This is an 4-3I form of comment that consists of a single line of user e<0lanation or ++ add notes to an 3@L command. This is a second line and needs additional dashes Teradata Comment To comment a line using the Teradata form of a comment( JK the slash asterisk at the start of a line denotes the &eginning of a comment KJ the asterisk slash (re5ersed from the start of a comment$ is used to end a comment. .oth the start and the end of a comment can &e a single line or multi0le lines. This is the most common form of comment seen in Teradata 3@L, 0rimaril# since it )as the original techniAue a5aila&le. JK This is the Teradata form of comment that consists of a single line of user e<0lanation or add notes to an 3@L command. 3e5eral lines of comment can &e added )ithin a single notation. This is the end of the comment. KJ
*ser In.ormation )unctions The Teradata !*.M3 (!elational *ata.ase Management 3#stem$ has incor0orated into it functions that 0ro5ide data regarding a user )ho has 0erformed a logon connection to the s#stem. The follo)ing functions make that data a5aila&le to a user for dis0la# or storage. ACCOUNT !nction Com0ati&ilit#( Teradata E<tension 4 user )ithin the Teradata data&ase has an account num&er. This num&er is used to identif# the user, 0ro5ide a &asis for charge &ack, if desired and esta&lish a &asic 0riorit#. 're5iousl#, this num&er )as used e<clusi5el# &# the data&ase administrator to control and monitor access to the s#stem. -o), it is a5aila&le for 5ie)ing &# the user 5ia 3@L. 3#nta< for using the 4CC12-T function( 4s an e<am0le, the follo)ing returns the account information for m# user( SELECT ,CCO-0T; 8 !o) returned A%%(*"T YM83IHG If #our account starts )ith a YM, #ou are running at a medium 0riorit#. %here YL is lo) and YC is high. 4t the same time, the account does not ha5e to &egin )ith one of these and can &e an# site s0ecific 5alue. DATABASE !nction Com0ati&ilit#( Teradata E<tension Cha0ter 8 of this &ook discussed the conce0t of a data&ase and user area )ithin the Teradata !*.M3. Mno)ing the current data&ase )ithin Teradata is sometimes an im0ortant 0iece of information needed &# a user. 4s mentioned a&o5e, the CEL' 3E33I1- is one )a# to determine it. Co)e5er, a lot of other information is also 0resented. 3ometimes it is ad5antageous to ha5e onl# that single tid&it of data not onl# to see &ut also for storage. %hen this is the case, the *4T4.43E function is a5aila&le. 3#nta< for using the *4T4.43E function( 4s an e<am0le, the follo)ing returns the account information for m# user( SELECT &,T,',SE; 8 !o) returned #ATAASE Mikel Session !nction Com0ati&ilit#( Teradata E<tension Cha0ter 8 of this &ook discussed the 'E' and the conce0t of a session and its role in5ol5ing the user=s 3@L reAuests. The CEL' 3E33I1- 0ro5ides a )ealth of information regarding the indi5idual session esta&lished for a user. 1ne of those 0ieces of data is the session num&er. It uniAuel# identifies e5er# user session in e<istence at an# 0oint in time. Teradata no) makes the session num&er a5aila&le using 3@L. 3#nta< for using the 3E33I1- function( 4s an e<am0le, the follo)ing returns the account information for m# user( SELECT SESS2O0; 8 !o) returned SESSI(" 80"F
#ata %on<ersions In order for data to &e managed and used, it must ha5e characteristics associated )ith it. These characteristics are called attri&utes that include a data t#0e and a length. The 5alues that a column can store are directl# related to these t)o attri&utes. There are times )hen the data t#0e or length defined is not con5enient for the use or out0ut dis0la# needed. 9or instance, )hen character data is too long for dis0la#, an o0tion might &e to reduce its length. 4t other times, the defined numeric data t#0e is not sufficient to store the result of a mathematical o0eration. Therefore, con5ersion to a larger numeric t#0e ma# &e the onl# )a# to successfull# com0lete the reAuest. %hen one of these situations interru0t the e<ecution of the 3@L, it is necessar# to use one or more of the con5ersion techniAues. The# are co5ered here in detail to enhance the understanding and the use of these ca0a&ilities. In normal 0ractices, there should &e little need to con5ert from a num&er to a character on a regular &asis. This reAuirement is one indicator that the ta&le or column design is Auestiona&le. Co)e5er, if a con5ersion must &e 0erformed, it is much safer to use the 4-3I 3tandard C43T (Con5ert 4nd 3tore$ function )hen going from numeric to character instead of the older Teradata im0lied con5ersion. .oth of these techniAues are discussed here. Con5ersions should &e used onl# )hen a&solutel# necessar# &ecause the# are intensi5e on s#stem resources. 4s an e<am0le, I sa) an 3@L statement that con5erted four columns si< different times. There )ere around a million ro)s in the ta&le. The 3@L did a lot of 0rocessing and it took a&out one hour to run. .# eliminating these I million con5ersions, the 3@L ran in under fi5e minutes. Con5ersions can ha5e an im0act, &ut sometimes #ou need them. 2se them onl# )hen a&solutel# necessar#E
#ata Types Teradata su00orts man# formats for storing data on disk and most of the data t#0es conform to the 4-3I standard. 4t the same time, there are data t#0es s0ecific to Teradata. Most of these uniAue data t#0es are 0ro5ided to sa5e storage s0ace on disk or su00ort an international code set. 3ince Teradata )as originall# designed to store tera&#tes )orth of data in millions or &illions of ro)s, sa5ing a single &#te one million times &ecomes a s0ace sa5ings of nearl# a mega&#te. The sa5ings increases d#namicall# as more ro)s are added and more &#tes 0er ro) are sa5ed. This s0ace sa5ings can &e 5er# significant. Like)ise, the s0eed ad5antage associated )ith smaller ro)s cannot &e ignored. 3ince data is read from a disk in a &lock, smaller ro)s mean that more ro)s are stored in a single &lock. Therefore, fe)er &locks need to &e read and it is faster. The follo)ing charts indicate the data t#0es currentl# su00orted &# Teradata. The first chart sho)s the 4-3I standard t#0es and the second is for the additional data t#0es that are e<tensions to the standard. This chart indicates )hich data t#0es that Teradata currentl# su00orts as 4-3I 3tandards( #ata Type #escription #ata ;alue &ange I-TE6E! 3igned )hole num&er +2,8:H,:G3,I:G to 2,8:H,:G3,I:H 3M4LLI-T 3igned smaller )hole num&er +32,HIG to 32,HIH *ECIM4L(B,/$ %here( BP8 thru 8G, total num&er of digits in the num&er 4nd /P0 thru 8G digits to the right of the decimal 3igned decimal num&er 8G digits on either side of the decimal 0oint Largest 5alue *EC(8G,0$ 3mallest 5alue *EC(8G,8G$ -2ME!IC(B,/$ 3ame as *ECIM4L 3#non#m for *ECIM4L 3ame as *ECIM4L 9L14T 9loating 'oint 9ormat (IEEE$ N5alueO<80 30H to N5alueO<80 +30G !E4L 3tored internall# as 9L14T
'!ECI3I1- 3tored internall# as 9L14T
*12.LE '!ECI3I1- 3tored internall# as 9L14T
CC4!4CTE!(B$ CC4!(B$ %here( BP8 thru I:000 9i<ed length character string, 8 &#te of storage 0er character, 8 to I:,000 characters long, 0ads to length )ith s0ace 4!CC4!(B$ CC4!4CTE! 4!/I-6(B$ CC4! 4!/I-6(B$ %here( BP8 thru I:000 aria&le length character string, 8 &#te of storage 0er character, 0lus 2 &#tes to record length of actual data 8 to I:,000 characters as a ma<imum. The s#stem onl# stores the characters 0resented to it. *4TE 3igned internal re0resentation of ///MM** (/// re0resents the num&er of #ears from 8F00, i.e. 800 for /ear 2000$ Currentl# to the #ear 3"00 as a 0ositi5e num&er and &ack into 4* #ears as a negati5e num&er. TIME Identifies a field as a TIME 5alue )ith Cour, Minutes and 3econds
TIME3T4M' Identifies a field as a TIME3T4M' 5alue )ith /ear, Month, *a#, Cour, Minute, and 3econds
Figure 4-1 This chart indicates )hich data t#0es that Teradata currentl# su00orts as e<tensions( #ata Type #escription #ata ;alue &ange ./TEI-T 3igned )hole num&er +82G to 82H ./TE (B$ %here( BP8 thru I:000 .inar# 8 to I:,000 &#tes 4!./TE (B$ %here( BP8 thru I:000 aria&le length &inar# 8 to I:,000 &#tes L1-6 4!CC4! aria&le length string I:,000 characters (ma<imum data length$ The s#stem onl# stores the characters 0ro5ided, not trailing s0aces.$ 6!4'CIC (B$ %here( BP8 thru 32000 9i<ed length string of 8I+&it &#tes (2 &#tes 0er character$ 8 to 32,000 M4-JI characters 4!6!4'CIC (B$ %here( BP8 thru 32000 aria&le length string of 8I+&it &#tes 8 to 32,000 characters as a ma<imum. The s#stem onl# stores characters 0ro5ided. Figure 4-2 These data t#0es are all a5aila&le for use )ithin Teradata. -otice that there are fi<ed and 5aria&le length data formats. The fi<ed data t#0es al)a#s reAuire the entire defined length on disk for the column. The 5aria&le t#0es can &e used to ma<imi;e data storage )ithin a &lock &# storing onl# the data 0ro5ided )ithin a ro) &# the client soft)are. /ou should use the a00ro0riate t#0e for the s0ecific data. It is a good idea to use a 4! data t#0e )hen most of the data is less than the ma<imum si;e. This is due to the addition of an e<tra 2+&#te length indicator that is stored along )ith the actual data.
%AST Com0ati&ilit#( 4-3I 2nder most conditions, the data t#0es defined and stored in a ta&le should &e a00ro0riate. Co)e5er, sometimes it is neither con5enient nor desira&le to use the defined t#0e. *ata can &e con5erted from one t#0e to another &# using the C43T function. 4s long as the data in5ol5ed does not &reak an# data rules (i.e. 0lacing al0ha&etic or s0ecial characters into a numeric data t#0e$ the con5ersion )orks. The name of the C43T function comes from the Con5ert 4nd 3Tore o0eration that it 0erforms. Care must also &e taken )hen con5erting data to manage an# 0otential length issues. In Teradata mode, truncation occurs if a length is reAuested that is shorter than the original data. Co)e5er, in 4-3I mode, an 3@L error is the result &ecause 4-3I sa#s, >Thou shall not truncate data.? The &asic s#nta< of the C43Tstatement follo)s( E<am0les using C43T( These are onl# some of the 0otential con5ersions and are 0rimaril# here for illustration of ho) to code a C43T. The C43T could also &e used )ithin the %CE!E clause to control the length characteristics or the t#0e of the data to com0are. 4gain, )hen using the C43T in 4-3I mode, an# attem0t to truncate data causes the 3@L to fail &ecause 4-3I does not allo) truncation. The ne<t 3ELECT uses literal 5alues to sho) the results of con5ersion( 8 !o) !eturned Trunc (1 igger 8hole &ounder 4 82G 82H 828 822 In the a&o5e e<am0le, the first C43T truncates the fi5e characters (left to right$ to form the single character V4=. In the second C43T, the integer 82G is con5erted to three characters and left 7ustified in the out0ut. The 82H )as initiall# stored in a 3M4LLI-T (" digits + u0 to 32HIH$ and then con5erted to an I-TE6E!. Cence, it uses 88 character 0ositions for its dis0la#, ten numeric digits and a sign (0ositi5e assumed$ and right 7ustified as numeric. The 5alue of 828."3 is an interesting case for t)o reasons. 9irst, it )as initiall# stored as a *ECIM4L as " total digits )ith 2 of them to the right of the decimal 0oint. Then it is con5erted to a 3M4LLI-T using C43T to remo5e the decimal 0ositions. Therefore, it truncates data &# stri00ing off the decimal 0ortion. It does not round data using this data t#0e. 1n the other hand, the C43T in the fifth column called !ounder is con5erted to a *ECIM4L as 3 digits )ith no digits (3,0$ to the right of the decimal, so it )ill round data 5alues instead of truncating. 3ince ."3 is greater than .", it is rounded u0 to 822.
Implied %AST Com0ati&ilit#( Teradata E<tension 4lthough the C43T function is the 4-3I standard, it has not al)a#s &een that )a#. 'rior to the C43T function, Teradata had the a&ilit# to con5ert data from one t#0e to another. This con5ersion is reAuested &# 0lacing the >im0lied= data t#0e con5ersion in 0arentheses after the column name. Therefore, it &ecomes a 0art of the select list and the column reAuest. The ne) data t#0e is )ritten as an attri&ute for the column name. The follo)ing is the format for reAuesting a con5ersion( 4t first glance, this a00ears to &e the &est and shortest techniAue for doing con5ersions. Co)e5er, there is a hidden danger here )hen con5erting from numeric to character that is demonstrated in this 3ELECT that uses the same data as a&o5e to do im0lied C43Tcon5ersions( 8 !o) !eturned Shortened ((PS= ((PS> igger 6 8hole 4 D 82G 828 %hat ha00ened in the column named 1M and -W1MQ The ans)er to this Auestion is( the 5alue 82G is 8 greater than 82H and therefore too large of a 5alue to store in a ./TEI-T. 3o it is automaticall# stored as a 3M4LLI-T (" digits 0lus a sign$ &efore the con5ersion. The im0licit con5ersion changes it to a character t#0e )ith the first 3 characters &eing returned. 4s a result, onl# the first 3 s0aces are seen in the re0ort (W W W 82G$. Like)ise, 11'32 is stored as (W W +82G$ )ith the first three characters (2 s0aces and + $ sho)n in the out0ut. 4l)a#s think a&out the im0act of the sign as a 5alid 0art of the data )hen con5erting from numeric to character. 4s mentioned earlier, if #ou find that con5ersions of this t#0e are regularl# necessar#, the ta&le design needs to &e re+e<amined. 4s demonstrated in the a&o5e out0ut, it is al)a#s safer to use C43T )hen going from character to numeric data t#0es.
)ormatted #ata Com0ati&ilit#( Teradata E<tension !emem&er that truncation )orks in Teradata mode, &ut not in 4-3I mode. 3o, another )a# to make data a00ear to &e truncated is to use the Teradata 91!M4T in the 3ELECT list )ith one or more columns )hen using a tool like .TE@. 3ince 91!M4T does not truncate data, it )orks in 4-3I mode. The s#nta< for using 91!M4T is( The ne<t 3ELECT demonstrates the use of 91!M4T( 8 !o) !eturned Shorter )mt6=>= =>=EGA )mt6"um#ate )mt6#ate 6 4.C 00828 828."3 80J08J8FFF 1CT 08, 8FFF There are a cou0le of things to notice in this out0ut. 9irst, it )orks in 4-3I mode &ecause truncation does not occur. The distinction is that all of the data from the column is in s0ool. It is onl# the out0ut that is shortened, not truncated. The character data t#0es use the VB= for the formatting character. 3econd, formatting does not round a data 5alue as )ith the 828."3, the dis0la# is shortened. The numeric data t#0es use a VF= as the &asic formatting character. 1thers are sho)n in this cha0ter. -e<t, *4TE t#0e data uses the VM= for month, the V*= for da# of the month and V/= for the #ear 0ortion of a 5alid date. Lastl#, the case of the formatting characters does not matter. The formatting characters can &e )ritten in all u00ercase, lo)ercase, or a mi<ture of &oth cases. The t)o follo)ing charts sho) the 5alid formatting characters for Teradata and 0ro5ide an e<0lanation of the im0act each one has on the out0ut dis0la# )hen using .TE@( .asic -umeric and Character *ata 9ormatting 3#m&ols Sym$ol Mask character and how used B or < Character data. Each B re0resents one character. Can re0eat 5alueD i.e. BBBBB or B("$. F *ecimal digit. Colds 0lace for numeric digit for a dis0la# 0 through F. 4ll leading ;eroes are sho)n if the format mask is longer than the data 5alue. Can re0eat 5alueD i.e. FFFFF or F("$. or 5 Im0lied decimal 0oint. 4ligns data on a decimal 5alue. 'rimaril# used on im0orted data )ithout actual decimal 0oint. E or e E<0onential. 4ligns the end of the mantissa and the &eginning of the e<0onent. 6 or g 6ra0hic data. Each 6 re0resents one logical (dou&le &#te+ M4-JI or Matakana$ character. Can re0eat 5alueD i.e. 66666 or 6("$. Figure 4-3 4d5anced -umeric and Character 9ormatting 3#m&ols Sym$ol Mask character and how used Y 9i<ed or floating dollar sign. Inserts a Y or lea5es s0aces and mo5es (floats$ o5er to the first character of a currenc# 5alue. %ith the 0ro0er ke#&oard, additional currenc# signs are a5aila&le( Cent, 'ound and /en. , Comma. Inserted )here a00ears in format mask. 2sed 0rimaril# to make large num&ers easier to read. . 'eriod. 'rimar# use to align decimal 0oint 0osition. 4lso used for( dates and comma in some currencies. + *ash character. Inserted )here a00ears in format mask. 2sed 0rimaril# for dates and negati5e numeric 5alues. 4lso used for( 0hone num&ers, ;i0 codes, and social securit# (234$. J 3lash character. Inserted )here a00ears in format mask. 2sed 0rimaril# for dates. S 'ercent character. Inserted )here a00ears in format mask. 2sed 0rimaril# for dis0la# of 0ercentage D i.e. FFS 5s. .FF [ or ; [ero+su00ressed decimal digit. Colds 0lace for numeric digit dis0la#s 8 through F and 0, )hen significant. 4ll leading ;eroes (insignificant$ are sho)n as s0ace since their 0resence does not change the 5alue of the num&er &eing dis0la#ed. . or & .lank data. Insert a s0ace )here a00ears in format mask. Figure 4-4 The ne<t chart sho)s the formatting characters used in con7unction )ith *4TE data( *ate 9ormatting 3#m&ols Sym$ol Mask character and how used (not case speci.ic) M or m Month. 4llo)s month to &e dis0la#ed an# )here in the date dis0la#. %hen VMM= is s0ecified, the numeric (08+82$ 5alue is a5aila&le. %hen VMMM= is s0ecified, the three character (J4-+*EC$ 5alue is a5aila&le. * or d *a#. 4llo)s da# to &e dis0la#ed an# )here in the date dis0la#. %hen V**= is s0ecified, the numeric (08+38$ 5alue is a5aila&le. %hen V***= is s0ecified, the three+digit da# of the #ear (008+3II$ 5alue is a5aila&le. / or # /ear. 4llo)s da# to &e dis0la#ed an# )here in the date dis0la#. The normal V//= has &een used for man# #ears for the 20 th centur# )ith the 8F// assumed. Co)e5er, since )e ha5e mo5ed into the 28 st centur#, it is recommended that the V////= &e used. Figure 4-5 There is additional information on date formatting in a later cha0ter dedicated e<clusi5el# to date 0rocessing. The ne<t 3ELECT demonstrates some of the additional formatting s#m&ols( 8 !o) !eturned )mt6Shorter )mt6Phone B6Press )mt6Dulian )mt6Pay 4.C 208+:G"+FFFF 8028."3 FF2H: YFF8,008.00 There are onl# t)o things that need to &e )atched )hen using the 91!M4T function. 9irst, the data t#0e must match the formatting character used or a s#nta< error is returned. 3o, if the data is numeric, use a numeric formatting character and the same condition for character data. The other concern is configuring the format mask &ig enough for the largest data column. If the mask is too short, the 3@L command e<ecutes, ho)e5er, the out0ut contains a series of KKKKKKKKKKKKK to indicate a format o5erflo), as demonstrated &# the follo)ing 3ELECT( 8 !o) !eturned )mt6Phone KKKKKKKKK 4ll of these 91!M4T reAuests )ork )onderfull# if the client soft)are is .TE@. 4fter all, it is a re0ort )riter and these are re0ort )riter o0tions. The issue is that the 1*.C and @uer#man look at the data as data, not as a re0ort. 3ince man# of the formatting s#m&ols are >characters? the# cannot &e numeric. Therefore, the 1*.C stri0s off the s#m&ols and 0resents the numeric data to the client soft)are for dis0la#. Tricking t"e ODBC to A##o$ ormatted Data If a tool uses the 1*.C, the 91!M4T in the 3ELECT is ignored and the data comes &ack as data, not as a formatted field. This is es0eciall# noticea&le )ith numeric data and dates. To force tools like @uer#man to format the data, the soft)are must &e tricked into thinking the data is character t#0e, )hich it lea5es alone. This can &e done using the C43T function. The ne<t 3ELECT uses the C43To0eration to trick the soft)are into thinking the formatted data is character( 8 !o) !eturned )mt6%AST6Phone )mt6%AST6#ate )mt6%AST6Pay :G"+FFFF 8FFF.80.08 YFF8,008.00 *o not let the 0resence of 43 in the a&o5e 3ELECT confuse #ou. The first 43, inside the 0arentheses, goes )ith the ne) data t#0e for the C43T. -otice that the 0arentheses enclose &oth the data and the 91!M4T so that the# are treated as a single entit#. The second 43 is outside the 0arentheses and is used to name the alias.
TITLE Attri$ute .or #ata %olumns Com0ati&ilit#( Teradata E<tension 4s seen earlier, an alias ma# &e used to change the column name. This can &e done for ease of reference or to alter the heading for the column in the out0ut. The TITLE is an alternati5e to using an alias name )hen a column heading needs to &e changed. There is a &ig difference &et)een TITLE and an alias. 4lthough an alias does change the title on a re0ort, it is normall# used to rename a column (throughout the 3@L$ as a ne) name. The TITLEonl# changes the column heading. The s#nta< for using TITLE follo)s( Like 91!M4T, TITLE changes the attri&ute of the dis0la#ed data. Therefore, it is )ritten in 0arentheses also. 4lso like 91!M4T, tools using the 1*.Cma# not )ork as )ell as the# do in .TE@, the re0ort )riter. This is es0eciall# true )hen using the JJ stacking s#m&ols. In tools like @uer#man, the title literall# contains JJ and is 0ro&a&l# not the intent. 4lso, if #ou attem0t to use TITLE in @uer#man and it does not )ork, there is a configuration o0tion in the 1*.C. %hen >2se Column -ames? is checked, it )ill not use the title designation. The follo)ing 3ELECT uses the TITLE to sho) the result( 8 !o) !eturned %haracter #ata %haracter #ata "umeric #ata Character *ata Character *ata 823 -otice that the )ord VCharacter= is stacked o5er the V*ata= 0ortion of the heading for the second column using .TE@. 3o, as an alternati5e, a TITLE can &e used instead of an alias and allo)s the user to include s0aces in the out0ut title. 4nother neat trick for TITLE is to use t)o single Auotes together (TITLE V=$. This techniAue creates a ;ero length TITLE, or no title at all, as seen in the ne<t 3ELECT( 8 !o) !eturned %haracter #ata Character *ata Character *ata 823 !emem&er, this TITLE is t)o se0arate single Auotes, not a single dou&le Auote. 4 dou&le Auote &# itself does not )ork &ecause it is un&alanced )ithout a second dou&le Auote.
Transaction Modes Transaction mode is an area )here the 0ers0ecti5e of the Teradata !*M.3 and 4-3I e<0erience a de0arture. Teradata, &# default, is com0letel# non+case s0ecific. 4-3I reAuires 7ust the o00osite condition, e5er#thing is case s0ecific and as )e sa) earlier, dictates that ta&le and column names &e in ca0ital letters. This is 0ro&a&l# a little restricti5e and I tend to agree com0letel# )ith the Teradata im0lementation. 4t the same time, Teradata allo)s the user to )ork in either mode )ithin a session )hen connected to the !*.M3. The choice is u0 to the user )hen .TE@ is the client interface soft)are. 9or instance, )ithin .TE@ either of the follo)ing commands can &e used &efore logging onto the data&ase( ESET SESSI(" T&A"SA%TI(" A"SI 1r ESET SESSI(" T&A"SA%TI(" TET The .TET transaction is sim0l# an acron#m made from a consolidation of the .E6I- T!4-34CTI1- (.T$ and E-* T!4-34CTI1- (ET$ commands to re0resent Teradata mode. The s#stem administrator defines the s#stem default mode for Teradata. 4 setting in the *.3 Control record determines the default session mode. The a&o5e commands allo) the default to &e o5er+ ridden for each logon session. The 3ET command must &e e<ecuted &efore the logon to esta&lish the transaction mode for the ne<t session(s$. Co)e5er, not all client soft)are su00orts the a&ilit# to change modes &et)een Teradata and 4-3I. %hen it is desira&le for functionalit# or 0rocessing characteristics of the other mode, other o0tions are a5aila&le and are 0resented &elo). There is more information on transactional 0rocessing later in this &ook.
%ase Sensiti<ity o. #ata It has &een discussed earlier that there is no need for concern regarding the use of lo)er or u00er case characters )hen coding the 3@L. 4s a matter of fact, the different case letters can &e mi<ed in a single statement. -ormall#, the Teradata data&ase does not care a&out the case )hen com0aring the stored data either. Co)e5er, the 4-3I mode im0lementation of the Teradata !*.M3 is case sensiti5e, regarding the data. This means that it kno)s the difference &et)een a lo)er case letter like Va= and an u00er case letter V4=. 4t the same time, )hen using Teradata mode )ithin the Teradata data&ase, it does not distinguish &et)een u00er and lo)er case letters. It is the mode of the session that dictates the case sensiti5it# of the data. The 3@L can al)a#s e<ecute 4-3I standard commands in Teradata mode and like)ise, can al)a#s e<ecute Teradata e<tensions in 4-3I mode. The 3@L is al)a#s the same regardless of the mode &eing used. The difference comes )hen com0aring the results of the data ro)s &eing returned &ased on the mode. 9or e<am0le, earlier in this cha0ter, it )as stated that 4-3I mode does not allo) truncation. Therefore, the 91!M4T could &e used in either mode &ecause it did not truncate data. To demonstrate this issue, the follo)ing uses the different modes in .TE@( -o !o)s !eturned The a&o5e 3@L e<ecution is case s0ecific due 4-3I mode and V4= is different than Va=. The same 3@L is e<ecuted again here, ho)e5er, the transaction mode for the session is set to Teradata mode (.TET$ 0rior to the logon( 8 !o) !eturned #o They Match2 The# match -o) that the defaults ha5e &een demonstrated, the follo)ing functions can &e used to mimic the o0eration of each mode )hile e<ecuting in the other (4-3I 5s Teradata$ )here case sensiti5it# is concerned.
%ASESPE%I)I% Com0ati&ilit#( Teradata E<tension The C43E3'ECI9IC attri&ute ma# &e used to reAuest that Teradata com0are data 5alues )ith a distinction made &et)een u00er and lo)er case. The logic &ehind this designation is that e5en in Teradata mode, case sensiti5it# can &e reAuested to make the 3@L )ork the same as 4-3I mode, )hich is case s0ecific. Therefore, )hen C43E3'ECI9IC is used, it normall# a00ears in the %CE!E clause. The s#nta< of the ne<t t)o statements e<ecute e<actl# the same( 1r, it ma# &e a&&re5iated as C3( Con5ersel#, if 4-3I is the current mode and there is a need for it to &e non+case s0ecific, the -1T can &e used to ad7ust the default o0eration of the 3@L )ithin a mode. The follo)ing 3@L forces 4-3I to &e non+case s0ecific( 1r, it ma# &e a&&re5iated as( The ne<t 3ELECT demonstrates the functionalit# of C43E3'ECI9IC and C3 for com0aring an eAualit# condition like it e<ecuted a&o5e in 4-3I mode( -o !o)s !eturned -o ro)s are returned, &ecause V4= is different than Va= )hen case sensiti5it# is used. 4t first glance, this seems to &e unnecessar# since the mode can &e set to use either 4-3I or Teradata. Co)e5er, the dot (.$ commands are .TE@ commands. The# do not )ork in @uer#man. If case sensiti5it# is needed )hen using other tools, this is one of the o0tions a5aila&le to mimic 4-3I com0arisons )hile in Teradata mode. The 3@L e<tensions in Teradata ma# &e used to eliminate the a&solute need to log off to reset the mode and then log &ack onto Teradata in order to use a characteristic like case sensiti5it#. Instead, Teradata mode can &e forced to use a case s0ecific com0arison, like 4-3I mode &# incor0orating the C43E3'ECI9IC(C3$ into the 3@L. The case s0ecific o0tion is not a statement le5el featureL it must &e s0ecified for each column needing this t#0e of com0arison in &oth .TE@ and @uer#man.
L(8E& )unction Com0ati&ilit#( 4-3I The L1%E! case function is used to con5ert all characters stored in a column to lo)er case letters for dis0la# or com0arison. It is a function and therefore reAuires that the data &e 0assed to it. The s#nta< for using L1%E!( The follo)ing 3ELECT uses an u00er case literal 5alue as in0ut and out0uts the same 5alue, &ut in lo)er case( SELECT L(8E& =7,'C&E9> ,S Result ; 8 !o) !eturned &esult a&cde %hen L1%E! is used in a %CE!E clause, the result is a 0redicta&le string of all lo)ercase characters. %hen com0ared to a lo)ercase 5alue, the result is a case &lind com0arison. This is true regardless of ho) the data )as originall# stored. SELECT 7The match9 =t$tle 7&o the match/9> 3*ERE L(8E&=7a'c&e9> ? 7abcde9 ; 8 !o) !eturned #o They match2 The# match
*PPE& )unction Com0ati&ilit#( 4-3I The 2''E! case function is used to con5ert all characters stored in a column to the same characters in u00er case. It is a function and therefore reAuires that data &e 0assed to it. The s#nta< for using 2''E!( The ne<t e<am0le uses a literal 5alue )ithin 2''E! to sho) the out0ut all in u00er case( SELECT *PPE&=7a'c&e9> ,S Result ; 8 !o) !eturned &esult 4.C*E It is also 0ossi&le to use &oth the L1%E! and 2''E! case functions )ithin the %CE!E clause. This techniAue can &e used to make 4-3I non+case s0ecific, like Teradata, &# con5erting all the data to a kno)n state, regardless of the starting case. Thus, it does not check the original data, &ut instead it checks the data after the con5ersion. The follo)ing 3ELECT uses the 2''E! function in the %CE!E( 8 !o) !eturned #o They match2 The# match %hen the data does not meet the reAuirements of the out0ut format, it is time to con5ert the data. The 2''E! and L1%E! functions can &e used to change the a00earance or characteristics of the data to a kno)n state. %hen case sensiti5it# is needed, 4-3I is one )a# to accom0lish it. If that is not an o0tion, the C43E3'ECI9IC function can &e incor0orated into the 3@L.
Aggregate Processing The aggregate functions are used to summari;e column data 5alues stored in ro)s. 4ggregates eliminate the detail information from the ro)s and onl# return the ans)er. Therefore, the result is one or more aggregated 5alues as a single line or one line 0er uniAue 5alue, as a grou0. The other characteristic of these functions is that the# all ignore null 5alues stored in column data 0assed to them. Mat" Aggregates The math aggregates are the original functions used to 0ro5ide sim0le t#0es of arithmetic o0erations for the data 5alues. Their names are descri0ti5e of the o0eration 0erformed. The functions are listed &elo) )ith e<am0les follo)ing their descri0tions. The ne)er, 2!: statistical aggregates are co5ered later in this cha0ter. T"e SUM !nction 4ccumulates the 5alues for the named column and 0rints one total from the addition. T"e A%& !nction 4ccumulates the 5alues for the named column and counts the num&er of 5alues added for the final di5ision to o&tain the a5erage. T"e MIN !nction Com0ares all the 5alues in the named column and returns the smallest 5alue. T"e MA' !nction Com0ares all the 5alues in the named column and returns the largest 5alue. T"e COUNT !nction 4dds one to the counter each time a 5alue other than null is encountered. The aggregates can all &e used together in a single reAuest on the same column, or indi5iduall# on different columns, de0ending on #our needs. The follo)ing s#nta< sho)s all si< aggregate functions in a single 3ELECT to 0roduce a single line ans)er set( The follo)ing ta&le is used to demonstrate the aggregate functions( 3tudent Ta&le + contains 80 students Student6I# Last6"ame )irst6name %lass6code 7rade6Pt PK FK UPI NUSI NUSI 8232"0 82"I3: 23:828 238222 2I0000 2G0023 322833 32:I"2 333:"0 :23:00 'hilli0s Canson Thomas %ilson Johnson Mc!o&erts .ond *elane# 3mith Larkins Martin Cenr# %end# 3usie 3tanle# !ichard Jimm# *ann# 4nd# Michael 3! 9! 9! 31 J! J! 3! 31 9! 3.00 2.GG :.00 3.G0 8.F0 3.F" 3.3" 2.00 0.00 Figure 5-1 The ne<t 3ELECT uses the 3tudent ta&le, to sho) all aggregates in one statement )orking on the same column( 8 !o) !eturned S*M(7rade6pt) A;7(7rade6pt) MI"(7rade6pt) MAF(7rade6pt) %(*"T(7rade6pt) 2:.GG 2.HI 0.00 :.00 F -otice that 3tanle#=s ro) is not included in the functions due to the null in his grade 0oint a5erage. 4lso notice that no indi5idual grade 0oint data is dis0la#ed &ecause the aggregates eliminate this le5el of column and ro) detail and onl# returns the summari;ed result for all included ro)s. The )a# to eliminate ro)s from &eing included in the aggregation is through the use of a %CE!E clause. 3ince the name of the selected column a00ears as the heading for the column, aggregate names make for funn# looking headings. To make the out0ut look &etter, it is a good idea to use an alias to dress u0 the name used in the out0ut. 4dditionall#, the alias can &e used else)here in the 3@L as the column name. The ne<t 3ELECT demonstrates the use of alias names for the aggregates( 8 !o) !eturned Total A<erage Smallest Largest %ount 2:.GG 2.HI 0.00 :.00 F -otice that )hen using aliases in the a&o5e 3ELECT the# a00ear as the heading for each column. 4lso the )ords Total, 45erage and Count are in dou&le Auotes. 4s mentioned earlier in this &ook, the dou&le Auoting techniAue is used to tell the 'E that this is a column name, o00osed to &eing the reser5ed )ord. %hereas, the single Auotes are used to identif# a literal data 5alue. Aggregates and Derived Data The 5arious aggregates can )ork on an# column. Co)e5er, most of the aggregates onl# )ork )ith numeric data. The C12-T function might &e the 0rimar# one used on either character or numeric data. The aggregates can also &e used )ith deri5ed data. The follo)ing ta&le is used to demonstrate deri5ed data and aggregation( Em0lo#ee Ta&le + contains F students Employee6"o Last6"ame )irst6name Salary #ept6"o PK FK UPI NUSI NUSI 8232"HG 82"I3:F 23:828G 238222" 2000000 800023: 882833: 832:I"H 8333:": Cham&ers Carrison !eill# Larkins Jones 3m#the 3trickling Coffing 3mith Mandee Cer&ert %illiam Loraine 3Auigg# !ichard Cletus .ill# John Y:G,G"0.00 Y":,"00.00 Y3I,000.00 Y:0,200.00 Y32,G00."0 YI:,300.00 Y":,"00.00 Y:8,GGG.GG Y:G,000.00 800 :00 :00 300 80 :00 200 200 Figure 5-2 This 3ELECT totals the salaries for all em0lo#ees and sho) )hat the total salaries )ill &e if e5er#one is gi5en a "S or a 80S raise( 8 !o) !eturned Salary Total HGI &aise H=?I &aise6 A<erage Salary %omputed A<erage Salary Y:28,03F.3G Y::2,0F8.3" Y:I3,8:3.32 Y:I,HG2.8" Y:I,HG2.8" -otice that since &oth TITLE and 91!M4T reAuire 0arentheses, the# can share the same set. 4lso, the 46 function and di5iding the 32M &# the C12-T 0ro5ide the same ans)er.
7&(*P ! It has &een sho)n that aggregates 0roduce one ro) of out0ut )ith one 5alue 0er aggregate. Co)e5er, the a&o5e 3ELECT is incon5enient if indi5idual aggregates are needed &ased on different 5alues in another column, like the class code. 9or e<am0le, #ou might )ant to see each aggregate for freshman, so0homores, 7uniors, and seniors. The follo)ing 3@L might &e run once for each uniAue 5alue s0ecified in the %CE!E clause for class code, here the aggregates onl# )ork on the senior class (V3!=$( 8 !o) !eturned Total A<erage Smallest Largest %ount I.3" 3.8H" 3.00 3.3" 2 4lthough this techniAue )orks for finding each class, it is not 5er# con5enient. The first issue is that each uniAue class 5alue needs to &e kno)n ahead of time for each e<ecution. 3econd, each %CE!E clause must &e manuall# modified for the different 5alues needed. Lastl#, each time the 3ELECT is e<ecuted, it 0roduces a se0arate out0ut. In realit#, it might &e &etter to ha5e all the results in a single re0ort format. 3ince the results of aggregates are incor0orated into a single out0ut line, it is necessar# to create a )a# to 0ro5ide one line returned 0er a uniAue data 5alue. To 0ro5ide a uniAue 5alue, it is necessar# to select a column )ith a 5alue that grou0s 5arious ro)s together. This column is sim0l# selected and not used in an aggregate. Therefore, it is a not an aggregated column. Co)e5er, )hen aggregates and >non+aggregates? (normal columns$ are selected at the same time, a 3"0: error message is returned to indicate the mi<ture and that the non+aggregate is not 0art of an associated grou0. Therefore, the 6!12' ./ is reAuired in the 3@L statement to identif# e5er# column selected that is not an aggregate. The resulting out0ut consists of one line for all aggregate 5alues for each uniAue data 5alue stored in the column(s$ named in the 6!12' ./. 9or e<am0le, if the de0artment num&er is used from the Em0lo#ee ta&le, the out0ut consists of one line 0er de0artment )ith at least one em0lo#ee )orking in it. The ne<t 3ELECT uses the 6!12' ./ to create one line of out0ut 0er uniAue 5alue in the class code column( " !o)s !eturned %lass6code Total A<erage Smallest Largest %ount 9! I.GG 2.2F 0.00 :.00 2 Q Q Q Q Q 0 J! ".G" 2.F2" 8.F0 3.F" 2 3! I.3" 3.8H" 3.00 3.3" 2 31 ".G0 2.F 2.00 3.G0 2 -otice that the null 5alue in the class code column is returned. 4t first, this ma# seem contrar# to the aggregates ignoring nulls. Co)e5er, class code is not &eing aggregated and is selected as a >uniAue 5alue.? 4ll the aggregate 5alues on the grade 0oint for this ro) are null, e<ce0t for C12-T. 4lthough, the C12-T is ;ero and this does indicate that the null 5alue is ignored. The C12-T 5alue initiall# starts at ;ero, so( 0 X 0 P 0. The 6!12' ./ is onl# reAuired )hen a non+aggregate column is selected along )ith one or more aggregates. %ithout &oth a non+aggregate and a 6!12' ./ clause, the aggregates return onl# one ro). %hereas, )ith a non+aggregate and a 6!12' ./ clause designating the column(s$, the aggregates return one ro) 0er uniAue 5alue in the column, as seen a&o5e. 4dditionall#, more than one non+aggregate column can &e s0ecified in the 3ELECT and in the 6!12' ./ clause. The normal result of this is that more ro)s are returned. This is &ecause one ro) a00ears )hene5er an# single column 5alue changes, the com&ination of each column constitutes a ne) 5alue. !emem&er, all non+aggregates selected )ith an aggregate must &e included in the 6!12' ./, or a 3"0: error is returned. 4s an e<am0le, the last name might &e added as a second non+ aggregate. Then, each com&ination of last name and class code are com0ared to other students in the same class. This com&ination creates more lines of out0ut. 4s a result, each aggregate 5alue is 0rimaril# the aggregation of a single ro). The onl# time multi0le ro)s are 0rocessed together is )hen multi0le students ha5e the same last name and are in the same class. Then the# grou0 together &ased on the 5alues in &oth columns &eing eAual. This 3ELECT demonstrates the correct s#nta< )hen using multi0le non+aggregates )ith aggregates and the out0ut is one line of out0ut for each student( 80 !o)s !eturned Last6name %lass6code Total A<erage Smallest Largest %ount Johnson Q Q Q Q Q 0 Thomas 9! :.00 :.00 :.00 :.00 8 3mith 31 2.00 2.00 2.00 2.00 8 Mc!o&erts J! 8.F0 8.F0 8.F0 8.F0 8 Larkins 9! 0.00 0.00 0.00 0.00 8 'hilli0s 3! 3.00 3.00 3.00 3.00 8 *elane# 3! 3.3" 3.3" 3.3" 3.3" 8 %ilson 31 3.G0 3.G0 3.G0 3.G0 8 .ond J! 3.F" 3.F" 3.F" 3.F" 8 Canson 9! 2.GG 2.GG 2.GG 2.GG 8 .e#ond sho)ing the correct s#nta< for multi0le non+aggregates, the a&o5e out0ut re5eals that it is 0ossi&le to reAuest too man# non+ aggregates. 4s seen a&o5e, e5er# out0ut line is a single ro). Therefore, e5er# aggregated 5alue consists of a single ro). Therefore, the aggregate is meaningless &ecause it is the same as the original data 5alue. 4lso notice that )ithout an 1!*E! ./, the 6!12' ./ does not sort the out0ut ro)s. Like the 1!*E! ./, the num&er associated )ith the column=s relati5e 0osition )ithin the 3ELECT can also &e used in the 6!12' ./. In the a&o5e e<am0le, the t)o columns are the first ones in the 3ELECT and therefore, it is )ritten using the shorter format( 6!12' ./ 8,2. Caution( 2sing the shorter techniAue can cause 0ro&lems if the location of a non+aggregate is changed in the 3ELECT list and the 6!12' ./ is not changed. The most common 0ro&lem is a 3"0: error message indicating that a non+aggregate is not included in the 6!12' ./, so the 3ELECT does not e<ecute. 4s 0re5iousl# sho)n, the default for a column heading is the column name. It is not 5er# 0rett# to see the name of the aggregate and column used as a heading. Therefore, an alias is suggested in all tools or o0tionall#, a TITLE in .TE@ to define a heading. 4lso seen earlier, a C12-T on the grade 0oint for de0artment null is ;ero. 4ctuall#, this is misleading in that 8 ro) contains a null not ;ero ro)s. .ut, &ecause of the null 5alue, the ro) is not counted. 4 &etter techniAue might &e the use of C12-T(K$, for a ro) count. 4lthough this im0lies counting all columns, in realit# it counts the ro). The o&7ecti5e of this reAuest is to find an# column that contains a non+null data 5alue. 4nother method to 0ro5ide the same result is to count an# column that is defined as -1T -2LL. Co)e5er, since it takes time to determine such a column and its name is longer than t#0ing an asterisk (K$, it is easier to use the C12-T(K$. 4gain, the 6!12' ./ clause creates one line of out0ut 0er uniAue 5alue, &ut does not 0erform a sort. It onl# creates the distinct grou0ing for all of the columns s0ecified. Therefore, it is suggested that #ou al)a#s include an 1!*E! ./ to sort the out0ut. The follo)ing might &e a &etter )a# to code the 0re5ious reAuest, using the C12-T(K$ and an 1!*E! ./( " !o)s !eturned %lass6code Total A<erage Smallest Largest %ount Q Q Q Q Q 8 9! I.GG 2.2F 0.00 :.00 3 J! ".G" 2.F2" 8.F0 3.F" 2 31 ".G0 2.F 2.00 3.G0 2 3! I.3" 3.8H" 3.00 3.3" 2 -o) the out0ut is sorted &# the class code )ith the null a00earing first, as the lo)est >5alue.? 4lso notice the count is one for the ro) containing mostl# -2LL data. The C12-T(K$ counts the ro).
Limiting (utput ;alues *sing 9A;I"7 4s in an# 3ELECT statement, a %CE!E clause can al)a#s &e used to limit the num&er or t#0es of ro)s used in the aggregate 0rocessing. Therefore, something &esides a %CE!E is needed to e5aluate aggregate 5alues &ecause the aggregate is not finished until all eligi&le ro)s ha5e &een read. 4gain, a %CE!E clause eliminates ro)s during the 0rocess of reading the &ase ta&le ro)s. To allo) for the elimination of s0ecific aggregate results, the C4I-6 clause is used to make the final com0arison &efore the aggregate results are returned. The 0re5ious 3ELECT is modified &elo) to com0are the aggregates and onl# return the students from s0ool )ith a grade 0oint a5erage of . (3.0$ or &etter( 8 !o)s !eturned %lass6code Total A<erage %ount 3! I.3" 3.8G 2 -otice that all of the 0re5iousl# seen out0ut )ith an a5erage 5alue less than 3.00 has &een eliminated as a result of using the C4I-6 clause. The %CE!E clause eliminates ro)sL the C4I-6 0ro5ides the last com0arison after the calculation of the aggregate and &efore results are returned to the user client.
Statistical Aggregates In Teradata !elease : (2!:$ there are se5eral ne) aggregates that 0erform statistical o0erations. Man# of them are used in other internal functions and no) the# are a5aila&le for use )ithin 3@L. -ot onl# are these statistical functions the ne)est, &ut there are t)o t#0es of statistical functions. The# are unar# (single in0ut 5alue$ functions, and &inar# (dual in0ut 5alue$ functions. The unar# functions look at indi5idual column 5alues for each ro) included and com0are all of the 5alues for trends, similarities and grou0ings. 4ll the original aggregate functions are unar# in that the# acce0t a single 5alue to 0erform their 0rocessing. The statistical unar# functions are( Murtosis 3ke) 3tandard *e5iation of a sam0le 3tandard *e5iation of a 0o0ulation ariance of a sam0le ariance of a 0o0ulation The &inar# functions e<amine the relationshi0 &et)een the t)o different 5alues. -ormall# these t)o 5alues re0resent t)o se0arate 0oints on an B a<is and /+a<is. The &inar# functions are( Correlation Co5ariance !egression Line Interce0t !egression Line 3lo0e The results from the statistical functions are not as o&5ious to demonstrate and figure out as the original functions, like 32M or 46. The 3tats ta&le in 9igure "+3 is used to demonstrate the statistical functions. Its column 5alues ha5e certain 0atterns in them. 9or instance C1L8 increases seAuentiall# from 8 to 30 )hile C1L: decreases seAuentiall# from 30 to 8. The remaining columns tend to ha5e the same 5alue re0eated and some 5alues re0eat more than others. These 5alues are used in &oth the unar# and &inar# functions to illustrate the t#0es of ans)ers generated using these statistical functions. The follo)ing ta&le demonstrates the o0eration and out0ut from the ne) statistical aggregate functions in 2!:. 3tats Ta&le + contains 30 ro)s %ol= %ol> %olA %olJ %olG %olK PK 8 2 3 : " I H G F 80 88 82 83 8: 8" 8I 8H 8G 8F 20 28 22 23 2: 2" 8 8 3 3 3 : " " " " H H F F F F 80 80 80 80 80 80 83 83 83 8 8 80 80 80 80 80 80 80 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 30 2F 2G 2H 2I 2" 2: 23 22 28 20 8F 8G 8H 8I 8" 8: 83 82 88 80 F G H I 8 2 3 : " I H G F 80 22 82 83 8: 8" 8: 83 82 88 F G H I " : 0 " 80 8" 20 30 30 30 3" 3" :0 :0 :" :" "0 "" "" I0 I0 I" I" I" H0 H0 G0 2I 2H 2G 2F 30 8: 8" 8" 8I 8I :0 :0 "0 "0 I0 " : 3 2 8 3 2 8 8 8 G" F0 F0 F" 800 Figure 5-3 T"e (U)TOSIS !nction The M2!T13I3 function is used to return a num&er that re0resents the shar0ness of a 0eak on a 0lotted cur5e of a 0ro&a&ilit# function for a distri&ution com0ared )ith the normal distri&ution. 4 high 5alue result is referred to as le0tokurtic. %hile a medium result is referred to as mesokurtic and a lo) result is referred to as 0lat#kurtic. 4 0ositi5e 5alue indicates a shar0 or 0eaked distri&ution and a negati5e num&er re0resents a flat distri&ution. 4 0eaked distri&ution means that one 5alue e<ists more often than the other 5alues. 4 flat distri&ution means there is the same Auantit# 5alues e<ist for each num&er. If #ou com0are this to the ro) distri&ution associated )ithin Teradata, most of the time a flat distri&ution is &est, )ith the same num&er of ro)s stored on each 4M'. Ca5ing ske)ed data re0resents more of a lum0# distri&ution. 3#nta< for using M2!T13I3( @-RTOS2S=<column-name>> The ne<t 3ELECT uses M2!T13I3 to com0are the distri&ution of the 3tats ta&le( 8 !o) !eturned 1o.%ol= 1o.%ol> 1o.%olA 1o.%olJ 1o.%olG 1o.%olK +8 +8 8 +8 +8 +8 T"e S(E* !nction The 3ke) indicates that a distri&ution does not ha5e eAual 0ro&a&ilities a&o5e and &elo) the mean (a5erage$. In a ske) distri&ution, the median and the mean are not coincident, or eAual. %here( a median 5alue N mean 5alue P a 0ositi5e ske) a median 5alue O mean 5alue P a negati5e ske) a median 5alue P mean 5alue P no ske) 3#nta< for using 3ME%( S@E3=<column-name>> The follo)ing 3ELECT uses 3ME% to com0are the distri&ution of the 3tats ta&le( 8 !o) !eturned S1o.%ol= S1o.%ol> S1o.%olA S1o.%olJ S1o.%olG S1o.%olK 0 +0 8 0 0 +0 T"e STDDE%+POP !nction The standard de5iation function is a statistical measure of s0read or dis0ersion of 5alues. It is the root=s sAuare of the difference of the mean (a5erage$. This measure is to com0are the amount &# )hich a set of 5alues differs from the arithmetical mean. The 3T**EW'1' function is one of t)o that calculates the standard de5iation. The 0o0ulation is of all the ro)s included &ased on the com0arison in the %CE!E clause. 3#nta< for using 3T**EW'1'( ST&&E1_+O+=<column-name>> The ne<t 3ELECT uses 3T**EW'1' to determine the standard de5iation on all columns of all ro)s )ithin the 3tats ta&le( 8 !o) !eturned S#Po.%ol= S#Po.%ol> S#Po.%olA S#Po.%olJ S#Po.%olG S#Po.%olK F : 8: F : 2H T"e STDDE%+SAMP !nction The standard de5iation function is a statistical measure of s0read or dis0ersion of 5alues. It is the root=s sAuare of the difference of the mean (a5erage$. This measure is to com0are the amount &# )hich a set of 5alues differs from the arithmetical mean. The 3T**EW34M' function is one of t)o that calculates the standard de5iation. The sam0le is a random selection of all ro)s returned &ased on the com0arisons in the %CE!E clause. The 0o0ulation is for all of the ro)s &ased on the %CE!E clause. 3#nta< for using 3T**EW34M'( ST&&E1_S,M+=<column-name>> The follo)ing 3ELECT uses 3T**EW34M' to determine the standard de5iation on all columns of a sam0le of the ro)s )ithin the 3tats ta&le( 8 !o) !eturned S#So.%ol= S#So.%ol> S#So.%olA S#So.%olJ S#So.%olG S#So.%olK F : 8: F " 2H T"e %A)+POP !nction The ariance function is a measure of dis0ersion (s0read of the distri&ution$ as the sAuare of the standard de5iation. There are t)o forms of ariance in Teradata, 4!W'1' is for the entire 0o0ulation of data ro)s allo)ed &# the %CE!E clause. 4lthough standard de5iation and 5ariance are regularl# used in statistical calculations, the meaning of 5ariance is not eas# to ela&orate. Most often 5ariance is used in theoretical )ork )here a 5ariance of the sam0le is needed. There are t)o methods for using 5ariance. These are the Mruskal+ %allis one+)a# 4nal#sis of ariance and 9riedman t)o+)a# 4nal#sis of ariance &# rank. 3#nta< for using 4!W'1'( 1,R_+O+=<column-name>> The follo)ing 3ELECT uses 4!W'1' to com0are the 5ariance of the distri&ution on all ro)s from the 3tats ta&le( 8 !o) !eturned ;Po.%ol= ;Po.%ol> ;Po.%olA ;Po.%olJ ;Po.%olG ;Po.%olK H" 8F 8F8 H" 20 H23 T"e %A)+SAMP !nction The ariance function is a measure of dis0ersion (s0read of the distri&ution$ as the sAuare of the standard de5iation. There are t)o forms of ariance in Teradata, 4!W34M' is used for a random sam0ling of the data ro)s allo)ed through &# the %CE!E clause. 4lthough standard de5iation and 5ariance are regularl# used in statistical calculations, the meaning of 5ariance is not eas# to ela&orate. Most often 5ariance is used in theoretical )ork )here a 5ariance of the sam0le is needed to look for consistenc#. There are t)o methods for using 5ariance. These are the Mruskal+ %allis one+)a# 4nal#sis of ariance and 9riedman t)o+)a# 4nal#sis of ariance &# rank. 3#nta< for using 4!W34M'( 1,R_S,M+=<column-name>> The ne<t 3ELECT uses 4!W34M' to com0are the 5ariance of the distri&ution on a ro) sam0le from the 3tats ta&le( 8 !o) !eturned ;So.%ol= ;So.%ol> ;So.%olA ;So.%olJ ;So.%olG ;So.%olK HG 20 8FG HG 20 H:G T"e CO)) !nction The C1!! function is a &inar# function, meaning that t)o 5aria&les are used as in0ut to it. It measures the association &et)een 2 random 5aria&les. If the 5aria&les are such that )hen one changes the other does so in a related manner, the# are correlated. Inde0endent 5aria&les are not correlated &ecause the change in one does not necessaril# cause the other to change. The correlation coefficient is a num&er &et)een +8 and 8. It is calculated from a num&er of 0airs of o&ser5ations or linear 0oints (B,/$. %here( 8 P 0erfect 0ositi5e correlation 0 P no correlation +8 P 0erfect negati5e correlation 3#nta< for using C1!!( CORR=<column-name>A <column-name>> The follo)ing 3ELECT uses C1!! to com0are the association of 5alues stored in 5arious columns from the 3tats ta&le( 8 !o) !eturned %o.%ol=L> %o.%ol=LA %o.%ol=LJ %o.%ol=LG %o.%ol=LK 0.FGI:G0 0.GG"8"" +8.000000 +0.8"8GHH 0.FF8I82 3ince there are t)o column 5alues 0assed to this function and the first e<am0le has data 5alues that seAuentiall# ascend, the ne<t e<am0le uses col: as the first 5alue &ecause it seAuentiall# descends. It demonstrates the im0act of this seAuence change on the result( 8 !o) !eturned %o.%olJL> %o.%olJLA %o.%olJL= %o.%olJLG %o.%olJLK +0.FGI:G0 +0.GG"8"" +8.000000 0.8"8GHH +0.FF8I82 T"e CO%A) !nction The co5ariance is a statistical measure of the tendenc# of t)o 5aria&les to change in con7unction )ith each other. It is eAual to the 0roduct of their standard de5iations and correlation coefficients. The co5ariance is a statistic used for &i5ariate sam0les or &i5ariate distri&ution. It is used for )orking out the eAuations for regression lines and the 0roduct+moment correlation coefficient. 3#nta<( CO1,R=<column-name>A <column-name>> The ne<t 3ELECT uses C14! to com0are the co5ariance association of 5alues stored in 5arious columns from the 3tats ta&le( 8 !o) !eturned %;o.%ol=L> %;o.%ol=LA %;o.%ol=LJ %;o.%ol=LG %;o.%ol=LK 3G 80I +H" +I 238 3ince there are t)o column 5alues 0assed to this function and the first e<am0le has data 5alues that seAuentiall# ascend, the ne<t e<am0le uses col: as the first 5alue &ecause it seAuentiall# descends. It demonstrates the im0act of this seAuence change on the result( 8 !o) !eturned %<o.%olJL> %<o.%olJLA %<o.%olJL= %<o.%olJLG %<o.%olJLK +3H +80I +H" I +238 T"e )E&)+INTE)CEPT !nction 4 regression line is a line of &est fit, dra)n through a set of 0oints on a gra0h for B and / coordinates. It uses the / coordinate as the *e0endent aria&le and the B 5alue as the Inde0endent aria&le. T)o regression lines al)a#s meet or interce0t at the mean of the data 0oints(<,#$, )here <P46(<$ and #P46(#$ and is not usuall# one of the original data 0oints. 3#nta< for using !E6!WI-TE!CE'T( RE:R_20TERCE+T=de6endent-e!6#ess$onA $nde6endent-e!6#ess$on> The follo)ing 3ELECT uses !E6!WI-TE!CE'T to find the interce0t 0oint &et)een the 5alues stored in 5arious columns from the 3tats ta&le( 8 !o) !eturned &Io.%ol=L> &Io.%ol=LA &Io.%ol=LJ &Io.%ol=LG &Io.%ol=LK +8 3 38 8G +8 3ince there are t)o column 5alues 0assed to this function and the first e<am0le has data 5alues that seAuentiall# ascend, the ne<t e<am0le uses col: as the first 5alue &ecause it seAuentiall# descends. It demonstrates the im0act of this seAuence change on the result( 8 !o) !eturned &Io.%olJL> &Io.%olJLA &Io.%olJL= &Io.%olJLG &Io.%olJLK 32 2G 0 83 32 T"e )E&)+S,OPE !nction 4 regression line is a line of &est fit, dra)n through a set of 0oints on a gra0h of B and / coordinates. It uses the / coordinate as the *e0endent aria&le and the B 5alue as the Inde0endent aria&le. The slo0e of the line is the angle at )hich it mo5es on the B and / coordinates. The 5ertical slo0e is / on B and the hori;ontal slo0e is B on /. 3#nta< for using !E6!W3L1'E( RE:R_SLO+E=de6endent-e!6#ess$onA $nde6endent-e!6#ess$on> The ne<t 3ELECT uses !E6!W3L1'E to find the slo0e for the 5alues stored in 5arious columns from the 3tats ta&le( 8 !o) !eturned &So.%ol=L> &So.%ol=LA &So.%ol=LJ &So.%ol=LG &So.%ol=LK 2 8 +8 +0 0 3ince there are t)o column 5alues 0assed to this function and the first e<am0le has data 5alues that seAuentiall# ascend, the ne<t e<am0le uses col: as the first 5alue &ecause it seAuentiall# descends. It demonstrates the im0act of this seAuence change on the result( 8 !o) !eturned &So.%olJL> &So.%olJLA &So.%olJL= &So.%olJLG &So.%olJLK +2 +8 8 0 +0 Using &)OUP B- Like the original aggregates, the ne) statistical aggregates ma# also take ad5antage of using non+aggregates )ith the aggregates. The 6!12' ./ is used to identif# and form grou0s for each uniAue 5alue in the selected non+aggregate column. Like)ise, the ne) statistical aggregates are com0ati&le )ith the original aggregates as seen in the follo)ing 3ELECT( H !o)s !eturned colA %nt A<g= S#= ;P= A<gJ S#J ;PJ A<gK S#K ;PK 8 2 2 0 0 30 0 0 2 2 I 80 H I 2 : 2" 2 : 2: F H: 20 8: 8I : 8I 8: : 8I ": 88 88I 30 2 2: 0 0 I 0 0 H" " 2" :0 2 2I 0 0 : 0 0 GG 2 I "0 2 2G 0 0 2 0 0 F2 2 I I0 8 30 0 0 8 0 0 800 0 0 Use o. /A%IN& 4lso like the original aggregates, the C4I-6 ma# &e used to eliminate s0ecific out0ut lines &ased on one or more of the final aggregate 5alues. The ne<t 3ELECT uses the C4I-6 to 0erform a com0ound com0arison on &oth the count and the co5ariance( 2 !o)s !eturned colA %nt A<g= S#= ;P= 80 H I 2 : 20 8: 8I : 8I
*sing the #ISTI"%T )unction with Aggregates 4t times throughout this &ook, e<am0les are sho)n using a function )ithin a function and the 0o)er it 0ro5ides. The C12-T aggregate 0ro5ides another o00ortunit# to demonstrate a ca0a&ilit# that might 0ro5e itself useful. It com&ines the *I3TI-CT and aggregate functions. The follo)ing ma# &e used to determine ho) man# courses are &eing taken instead of the total num&er of students (80$ )ith a 5alid class code( 8 !o) !eturned *ni:ue6%ourses *ni:ue67PA : F -ote( 'rior to 2!", #ou can onl# use a single column for all *I3TI-CT o0erations inside of aggregates. ersus using all of the 5alues( 8 !o) !eturned %ourses 7PAs F F It is allo)a&le to use the *I3TI-CT in multi0le aggregates )ithin a 3ELECT. Co)e5er, 0rior to 2!" there )as a restriction that onl# allo)ed the aggregates to use the same column for each *I3TI-CT function. -o), it can use different columns names.
Aggregates and ;ery Large #ata ases (;L#) 4s great as huge data&ases might &e, there are considerations to take into account )hen 0rocessing large num&ers of ro)s. This section enumerates a fe) of the situations that might &e encountered. !ead them and think a&out the reAuirement or &enefit of incor0orating them into #our 3@L. Potentia# o. Exec!tion Error 4ggregates use the data t#0e of the column the# are aggregating. 1n most data&ases, this )orks fine. Co)e5er, )hen )orking on a L*., this ma# cause the 3ELECT to fail on a numeric o5erflo) condition. 4n o5erflo) occurs )hen the 5alue &eing calculated e<ceeds the ma<imum si;e or 5alue for the data t#0e &eing used. 9or e<am0le, one &illion (8,000,000,000$ is a 5alid 5alue for an integer column &ecause it is less than 2,8:H,:G3,I:H. Co)e5er, if three ro)s each ha5e one &illion as their 5alue and a 32M o0eration is 0erformed, it fails on the third ro). Tr# the follo)ing series of commands to demonstrate an o5erflo) and its fi<( Create a ta&le called 15erflo) )ith 2 columns CT 15erflo)Wt&l (15rW&#te ./TEI-T, 15rWint I-T$L Insert 3 ro)s )ith 5er# large 5alues of 8 &illion )here ma< 5alue is 2,8:H,:3G,I:H I-3 o5erflo)Wt&l 5alues (8, 80KKF$L I-3 o5erflo)Wt&l 5alues (2, 80KKF$L I-3 o5erflo)Wt&l 5alues (3, 80KKF$L 4 32M aggregate on these 5alues )ill result in 3 &illion SEL S-M=oB#_$nt> ,S sum_col FROM oBe#"lo4_tbl; KKKKK 2I8I numeric o5erflo) 4ttem0ting this 32M, as )ritten, results in a 2I8I numeric o5erflo) error. That is &ecause 3 &illion is too large to &e stored in the default data t#0e of integer. This is the default &ecause of the data t#0e of the column &eing used )ithin the aggregate. To fi< it, use either of the follo)ing techniAues to con5ert the data column to a different t#0e &efore 0erforming the aggregation. 8 !o) !eturned sum6col 3,000,000,000 %hene5er #ou find #ourself in a situation )here the 3@L is failing due to a numeric o5erflo), it is most likel# due to the inherited data t#0e of the column. %hen this ha00ens, &e sure to con5ert the t#0e &efore doing the math. &)OUP B- vers!s DISTICT 4s seen in cha0ter 2, *I3TI-CT is used to eliminate du0licate 5alues. In this cha0ter, the 6!12' ./ is used to consolidate multi0le ro)s )ith the same 5alue into the same grou0. It does the consolidation &# eliminating du0licates. 1n the surface, the# 0ro5ide the same functionalit#. The ne<t 3ELECT uses 6!12' ./)ithout aggregation to eliminate du0licates( " !o)s !eturned class6code Q ,. J! 31 3! The 6!12' ./ )ithout aggregation returns the same ro)s as the *I3TI-CT. 3o the o&5ious Auestion &ecomes, )hich is more efficientQ The ans)er is not a sim0le one. Instead, something must &e kno)n a&out the characteristics of the data. 6enerall#, )ith more du0licate data 5alues D 6!12' ./is more efficient. Co)e5er, if onl# a fe) du0licates e<ist D *I3TI-CT is more efficient. To understand the reason, it is im0ortant to kno) ho) each of them eliminates the du0licate 5alues. TechniAue used to eliminate du0licates (can &e seen in EB'L4I-$( *I3TI-CT !eads a ro) on each 4M' Cashes the column(s$ 5alue identified in the *I3TI-CT !edistri&utes the ro) 5alue to the a00ro0riate 4M' 1nce all 0artici0ating ro)s ha5e &een redistri&uted 3orts the data to com&ine du0licates on each 4M' Eliminates du0licates on each 4M' 6!12' ./ !eads all the 0artici0ating ro)s Eliminates du0licates on each 4M' using >&uckets? Cashes the uniAue 5alues on each 4M' !edistri&utes the uniAue 5alues to the a00ro0riate 4M' 1nce all uniAue 5alues ha5e &een redistri&uted from e5er# 4M' 3orts the uniAue 5alues to com&ine du0licates on each 4M' Eliminates du0licates on each 4M' .ack to the original Auestion( )hich is more efficientQ 3ince *I3TI-CT redistri&utes the ro)s immediatel#, more data ma# mo5e &et)een the 4M's, com0ared to 6!12' ./ that onl# sends uniAue 5alues &et)een the 4M's. 3o, 6!12' ./ sounds more efficient. Co)e5er, )hen #ou consider that if the data is nearl# uniAue, 6!12' ./ s0ends time attem0ting to eliminate du0licates that do not e<ist. Therefore, it is )asting the time to check for du0licates the first time. Then, it must redistri&ute the same amount of data an#)a#. Therefore, for efficienc#, )hen there are( Man# du0licates D use 6!12' ./ 9e) to no du0licates D use *I3TI-CT 3'11L s0ace is e<ceeded D tr# 6!12' ./
Per.ormance (pportunities The Teradata o0timi;er has al)a#s had o0tions a5aila&le to it )hen 0erforming 3@L. It al)a#s attem0ts to use the most efficient 0ath to 0ro5ide the ans)er set. This is true for aggregation as )ell. %hen 0erforming aggregation, the main shortcut a5aila&le might include the use of a secondar# inde<. The inde< ro) is maintained in a su&ta&le. This ro) contains the ro) I* (ro) hash X uniAueness 5alue$ and the actual data 5alue stored in the data ro). Therefore, an inde< ro) is normall# much shorter than a data ro). Cence, more inde< ro)s e<ist in an inde< &lock than in a data &lock. 4s a result, the read of an inde< &lock makes more 5alues a5aila&le than the actual data &lock. 3ince IJ1 is the slo)est o0eration on all com0uter s#stems, less IJ1 eAuates to faster 0rocessing. If the o0timi;er can o&tain all the 5alues it needs for 0rocessing &# using the secondar# inde<, it )ill. This is referred to as a >co5ered Auer#.? The creation of a secondar# inde< is co5ered in this &ook as 0art of the *ata *efinition Language(**L$ cha0ter.
Su$:uery The su&Auer# is a commonl# used techniAue and 0o)erful )a# to select ro)s from one ta&le &ased on 5alues in another ta&le. It is 0redicated on the use of a 3ELECT statement )ithin a 3ELECT and takes ad5antage of the relationshi0s &uilt into a relational data&ase. The &asic conce0t &ehind a su&Auer# is that it retrie5es a list of 5alues that are used for com0arison against one or more columns in the main Auer#. To accom0lish the com0arison, the su&Auer# is )ritten after the %CE!E clause and normall# as 0art of an I- list. In an earlier cha0ter, the I- )as used to &uild a 5alue list for com0arison against the ro)s of a ta&le to determine )hich ro)s to select. The ne<t e<am0le illustrates ho) this techniAue can &e used to 3ELECT all the columns for ro)s containing an# of the three different 5alues 80, 20 and 30( : !o)s !eturned %olumn= %olumn> 80 4 ro) )ith 80 in column8 30 4 ro) )ith 30 in column8 80 4 ro) )ith 80 in column8 20 4 ro) )ith 20 in column8 4s 0o)erful as this is, )hat if the three 5alues turned into a thousand 5alues. That is too much )ork and too man# o00ortunities to forget one of the 5alues. Instead of )riting the 5alues manuall#, a su&Auer# can &e used to generate the 5alues automaticall#. The coding techniAue of a su&Auer# re0laces the 5alues 0re5iousl# )ritten in the I- list )ith a 5alid 3ELECT. Then the su&Auer# 3ELECT d#namicall# generates the 5alue list. 1nce the 5alues ha5e &een retrie5ed, it eliminates the du0licates &# automaticall# 0erforming a *I3TI-CT. The follo)ing is the s#nta< for a su&Auer#( Conce0tuall#, the su&Auer# is 0rocessed first so that all the 5alues are e<0anded into the list for com0arison )ith the column s0ecified in the %CE!E clause. These 5alues in the su&Auer# 3ELECT can onl# &e used for com0arison against the column or columns referenced in the %CE!E. Columns inside the su&Auer# 3ELECT cannot &e returned to the user 5ia the main 3ELECT. The onl# columns a5aila&le to the client are those in the ta&les named in the main (first$ 9!1M clause. The Auer# in 0arentheses is called the su&Auer# and it is res0onsi&le for &uilding the I- list. 4t the )riting of this document, Teradata allo)s u0 to I: ta&les in a single Auer#. Therefore, if each 3ELECT accessed onl# one ta&le, a Auer# might contain I3 su&Aueries in a single statement. The ne<t t)o ta&les are used to demonstrate the functionalit# of su&Aueries( Customer Ta&le + contains " customers %ustomer6num$er %ustomer6name Phone6num$er PK UPI NUSI NUSI 88888888 38383838 3832383: "HGFIGG3 GH323:"I .ill#=s .est Choice 4cme 'roducts 4CE Consulting B/[ 'lum&ing *ata&ases -+2 """+823: """+8888 """+8282 3:H+GF": 322+8082 Figure 6-1 1rder Ta&le + contains " orders (rder6num$er %ustomer6num$er (rder6date (rder6total PK FK UPI NUSI NUSI 823:"I 823"82 823""2 823"G" 823HHH 88888888 88888888 3832383: GH323:"I "HGFIGG3 FG0"0: FF0808 FF8008 FF8080 FF0F0F 823:H."3 0G00".F8 0"888.:H 8"238.I2 23:":.G: Figure 6-2 The ne<t 3ELECT uses a su&Auer# to find all customers that ha5e an order of more than Y80,000.00( 3 !o)s !eturned %ustomer6name Phone6num$er .ill#=s .est Choice """+823: B/[ 'lum&ing 3:H+GF": *ata&ases -+2 322+8082 This is an a00ro0riate 0lace to mention that the columns &eing com0ared &et)een the main and su&Aueries must &e from the same domain. 1ther)ise, if no eAual condition e<ists, no ro)s are returned. The a&o5e 3ELECT uses the customer num&er (9M$ in the 1rder ta&le to match the customer num&er ('M$ in the Customer ta&le. The# are &oth customer num&ers and therefore ha5e the o00ortunit# to com0are eAual from &oth ta&les. The ne<t su&Auer# s)a0s the Aueries to find all the orders &# a s0ecific customer( 3 !o)s !eturned (rder6num$er (rder6total 823:"I 823:H."3 823"82 G00".F8 -otice that the Customer ta&le is used in the main Auer# to ans)er a customer Auestion and the 1rder ta&le is used in the main Auer# to ans)er an order Auestion. Co)e5er, the# &oth com0are on the customer num&er as the common domain &et)een the t)o ta&les. .oth of the 0re5ious su&Aueries )ork fine for com0aring a single column in the main ta&le to a 5alue list in the su&Auer#. Thus, it is 0ossi&le to ans)er Auestions like, >%hich customer has 0laced the largest orderQ? Co)e5er, it cannot ans)er this Auestion, >%hat is the ma<imum order for each customerQ? To make 3u&Aueries more so0histicated and 0o)erful, the# can com0are more than one column at a time. The multi0le columns are referenced in the %CE!E clause, of the main Auer# and also enclosed in 0arentheses. The ke# is this( if multi0le columns are named &efore the I- 0ortion of the %CE!E clause, the e<act same num&er of columns must &e referenced in the 3ELECT of the su&Auer# to o&tain all the reAuired 5alues for com0arison. 9urthermore, the corres0onding columns (outside and inside the su&Auer#$ must res0ecti5el# &e of the same domain. Each of the columns must &e eAual to a corres0onding 5alue in order for the ro) to &e returned. It )orks like an 4-* com0arison. The follo)ing 3ELECT uses a su&Auer# to match t)o columns )ith t)o 5alues in the su&Auer# to find the highest dollar orders for each customer( : !o)s !eturned %ustomer (rder6num$er (rder6total 88888888 823":I 823:H."3 "HGFIGG3 823HHH 23:":.G: 3832383: 823""2 "888.:H GH323:"I 823"G" 8"238.I2 4lthough this )orks )ell for MI- and M4B t#0e of 5alues (eAualities$, it does not )ork )ell for finding 5alues greater than or less than an a5erage. 9or this t#0e of 0rocessing, a Correlated su&Auer# is the &est solution and )ill &e demonstrated later in this cha0ter. 3ince I: ta&les can &e in a single Teradata 3@L statement, as mentioned 0re5iousl#, this means that a ma<imum of I3 su&Aueries can &e )ritten into a single statement. The follo)ing sho)s a 3+ta&le access using t)o se0arate su&Aueries. 4dditional su&Aueries sim0l# follo) the same 0attern. 9rom the a&o5e ta&les, it is also 0ossi&le to find the customer )ho has ordered the single highest dollar amount order. To accom0lish this, the 1rder ta&le must &e used to determine the ma<imum order. Then, the 1rder ta&le is used again to com0are the ma<imum )ith each order and finall#, com0ared to the Customer Ta&le to determine )hich customer 0laced the order. The ne<t su&Auer# can &e used to find them( 8 !o) !eturned %ustomer6name Phone6num$er B/[ 'lum&ing 3:H+GF": It is no) kno)n that B/[ 'lum&ing has the highest dollar order. %hat is not kno)n is the amount of the order. 3ince the order total is in the 1rder ta&le, )hich is not referenced in the main Auer#, it cannot &e 0art of the 3ELECT list. In order to see the order total, a 7oin must &e used. Joins )ill &e co5ered in the ne<t cha0ter. Using NOT IN 4s seen in a 0re5ious cha0ter, )hen using the I- and a 5alue list, the -1T I- can &e used to find all of the ro)s that do not match. 2sing this techniAue, the su&Auer# a&o5e can &e modified to find the customers )ithout an order. The onl# changes made are to 8$ add the -1T &efore the I- and 2$ eliminate the %CE!E clause in the su&Auer#( 8 !o) !eturned %ustomer6name Phone6num$er *ata&ases ! 2s 322+8082 Caution needs to &e used regarding the -1T I- )hen there is a 0otential for including a -2LL in the 5alue list. 3ince the com0arison of a -2LL and an# other 5alue is unkno)n, and the -1T of an unkno)n is still an unkno)n no ro)s are returned. Therefore )hen there is 0otential for a -2LL in the su&Auer#, it is &est to also code a com0ound com0arison as seen in the follo)ing 3ELECT( Using 0!anti.iers In other !*.M3 s#stems and earl# Teradata 5ersions, using an eAualit# s#m&ol (P$ in a com0arison normall# 0ro5ed to &e more efficient than using an I- list. The reason )as that it allo)ed for indices, if the# e<isted, to &e used instead of a seAuential read of all ro)s. Toda#, Teradata automaticall# uses indices )hene5er the# are more efficient. 3o, the use of Auantifiers is o0tional and an I- )orks e<actl# the same. 4nother 0o)erful use for Auantifiers is )hen using ineAualities. It is sometimes necessar# to find all ro)s that are greater than or less than one or more other 5alues. To use Auantifiers, re0lace the I- )ith an P, N, O, 4-/, 31ME or 4LL as demonstrated in the follo)ing s#nta<( Earlier in this cha0ter, a t)o le5el su&Auer# )as used to find the customer )ho s0ent the most mone# on a single order. It used an I- list to find eAual 5alues. The ne<t 3ELECT uses P 4-/ to find the same customers( 2 !o)s !eturned %ustomer6name Phone6num$er .ill#=s .est Choice """+823: B/[ 'lum&ing 3:H+GF": In order to accom0lish this, the 1rder ta&le is first used to determine the a5erage order amount. Then, the 1rder ta&le is used again to com0are the a5erage )ith each order and finall#, com0ared to the Customer ta&le to determine )hich of the customers Aualif#. The Auantifiers of 31ME and 4-/ are interchangea&le. Co)e5er, the use of 4-/ conforms to 4-3I standard and 31ME is the Teradata e<tension. The P 4-/ is functionall# eAui5alent to using an I- list. The 4LL and the P are more limited in their sco0e. In order for them to )ork, there can onl# &e a single 5alue from the su&Auer# for each of the 5alues in the %CE!E clause. Co)e5er, earlier the -1T I- )as e<0lored. %hen using Auantifiers and the -1T, consider the follo)ing( E:ui<alency %hart I- is eAui5alent to P 4-/ -1T I- is eAui5alent to -1T P 4LL Figure 6-3 1f these, the -1T P 4LL takes the most thought. It forces the s#stem to e<amine e5er# 5alue in the list to make sure that the 5alue &eing com0ared is checked against all the 5alues. 1ther)ise, as soon as an# of the 5alues is different, the ro) is returned )ithout looking at the other 5alues (4LL$. 4lthough the a&o5e descri&es the conce0tual a00roach of a su&Auer#, the Teradata o0timi;er )ill normall# use a 7oin to o0timi;e and locate the ro)s that are needed from )ithin the data&ase. This usage ma# &e seen using the EB'L4I-. Joins are discussed in the ne<t cha0ter.
Quali.ying Ta$le "ames and %reating a Ta$le Alias This section 0ro5ides techniAues to s0ecificall# reference ta&le and columns throughout all data&ases and to tem0oraril# rename ta&les )ith an alias name. .oth of these techniAues are necessar# to 0ro5ide s0ecific and uniAue names to the o0timi;er at 3@L e<ecution time. 0!a#i.ying Co#!mn Names 3ince column names )ithin a ta&le must &e uniAue, the s#stem kno)s )hich data to access sim0l# &# using the column name. Co)e5er, )hen more that one ta&le is referenced &# the 9!1M in a single 3ELECT, this ma# not &e the case. The 0otential e<ists for columns of the same domain to ha5e the same name in more than one ta&le. %hen this ha00ens, the s#stem does not guess )hich column to reference. The 3@L must e<0licitl# declare )hich ta&le to use for accessing the column. This declaration is called Aualif#ing the column name. If the 3@L does not Aualif# the column name a00earing in more than one ta&le, the s#stem dis0la#s an error message that indicates too much am&iguit# e<ists in the Auer#. Correlated su&Aueries, addressed ne<t, and 7oin 0rocessing, in the ne<t cha0ter, &oth make use of more than one ta&le at the same time. Therefore, man# times it is im0ortant to make sure the s#stem kno)s )hich ta&le=s columns to use for all 0ortions of the 3@L statement. To Aualif# a column name, the ta&le name and column name are connected using a 0eriod or sometimes referred to as a dot (.$. The dot connects the names )ithout a s0ace to make the t)o names )ork as a single reference name. Co)e5er, if the column has different names in the multi0le ta&les, there is no confusion )ithin the s#stem and therefore, no need to Aualif# the name. To illustrate this conce0t, lets consider 0eo0le instead of ta&les. 9or instance, Mike is a common name. If t)o Mikes are in different rooms and someone uses the name in either room, there is no confusion. Co)e5er, if &oth Mikes are in the same room and someone uses the name, &oth Mikes res0ond and therefore confusion e<ists. To eliminate the conflict, the use of the first and last names makes the identification uniAue. The s#nta< for using Aualification le5els follo)s( C-leBel #e"e#ence) <database-name>D<table-name>D<column-name> ;-leBel #e"e#ence) <database-name>D<table-name> ;-leBel #e"e#ence) <table-name>D<column-name> %hene5er all 3 le5els are used, the first name is al)a#s the data&ase, the second is the ta&le and the last is the column. Co)e5er, )hen t)o names a00ear in a 2+le5el Aualification, the location of the names )ithin the 3@L must &e e<amined to kno) for sure their meaning. 3ince the 9!1M names the ta&les, the first name of the Aualified names is a data&ase name and the second is a ta&le. 3ince columns are referenced in the 3ELECT list and %CE!E clause, the first name is a ta&le name and the second is an K (all columns$ or a single column. In Teradata, the follo)ing is a 5alid statement, including the a&&re5iation for 3ELECT and missing 9!1M( SEL &'CDT,'LESD* ; This techniAue is not 4-3I standard, ho)e5er, the 'E has e5er#thing needed to get all columns and ro)s out of the T4.LE3 ta&le in the *.C data&ase. Creating an A#ias .or a Ta1#e 3ince ta&le names can &e u0 to 30 characters long, to sa5e t#0ing )hen the name is used more than once, a commonl# used techniAue is to 0ro5ide a tem0orar# name for the ta&le )ithin the 3ELECT. The ne) tem0orar# name for a ta&le is called an alias name. 1nce the alias is created for the ta&le, it is im0ortant to use the alias name throughout the reAuest. 1ther)ise the s#stem looks at the use of the full ta&le name as another ta&le and it causes undesira&le results. To esta&lish an alias for a ta&le, in the 9!1M, sim0l# follo) the name of the ta&le )ith an 43( 9!1M Nta&le+nameO 43 Nta&le+alias+ nameO.
%orrelated Su$:uery Processing The correlated su&Auer# is a 5er# 0o)erful tool. It is an e<cellent techniAue to use )hen there is a need to determine )hich ro)s to 3ELECT &ased on one or more 5alues from another ta&le. This is es0eciall# true )hen the 5alue for com0arison is &ased on an aggregate. It com&ines su&Auer# 0rocessing and 7oin 0rocessing into a single reAuest. 9or e<am0le, one Teradata user has the need to &ill their customers and incor0orate the latest 0a#ment date. Therefore, the latest date needs to &e o&tained from the ta&le. 3o, the 0a#ment date is found using the M4B aggregate in the su&Auer#. Co)e5er, it must &e the latest 0a#ment date for that customer, )hich might &e different for each customer. The 0rocessing in5ol5es the su&Auer# locating the ma<imum date onl# for one customer account. The correlated su&Auer# is 0erfect for this 0rocessing. It is more efficient and faster than using a normal su&Auer# )ith multi0le 5alues. 1ne reason for its s0eed is that it can 0erform some 0rocessing ste0s in 0arallel, as seen in an EB'L4I-. The other reason is that it onl# finds the ma<imum date )hen a 0articular account is read for 0rocessing, not for all accounts like a normal su&Auer#. The o0eration for a correlated su&Auer# differs from that of a normal su&Auer#. Instead of com0aring the selected su&Auer# 5alues against all the ro)s in the main Auer#, the correlated su&Auer# )orks &ack)ard. It first reads a ro) in the main Auer#, and then goes into the su&Auer# to find all the ro)s that match the s0ecified column 5alue. Then, it gets the ne<t ro) in the main Auer# and retrie5es all the su&Auer# ro)s that match the ne<t 5alue in this ro). This 0rocessing continues until all the Aualif#ing ro)s from the main 3ELECT are satisfied. 4lthough this sounds terri&l# inefficient and is inefficient on other data&ases, it is e<tremel# efficient in Teradata. This is due to the )a# the 4M's handle this t#0e of reAuest. The 4M's are smart enough to remem&er and share each 5alue that is located. Thus, )hen a second ro) comes into the com0arison that contains the same 5alue as an earlier ro), there is no need to re+read the matching ro)s again. That o0eration has alread# &een done once and the 4M's remem&er the ans)er from the first com0arison. The follo)ing is the s#nta< for )riting a correlated su&Auer#( The su&Auer# does not ha5e a semi+colon of its o)n. The 3ELECT in the su&Auer# is all 0art of the same 0rimar# Auer# and shares the one semi+colon. The aggregate 5alue is normall# o&tained using MI-, M4B or 46. Then this aggregate 5alue is in turn used to locate the ro) or ro)s )ithin a ta&le that com0ares eAuals, less than or greater than this 5alue. This ta&le is used to demonstrate correlated su&Aueries( Em0lo#ee Ta&le + contains F students Employee6"o Last6"ame )irst6name Salary #ept6"o PK FK UPI NUSI NUSI 8232"HG 82"I3:F 23:828G 238222" 2000000 800023: 882833: 832:I"H 8333:": Cham&ers Carrison !eill# Larkins Jones 3m#the 3trickling Coffing 3mith Mandee Cer&ert %illiam Loraine 3Auigg# !ichard Cletus .ill# John Y:G,G"0.00 Y":,"00.00 Y3I,000.00 Y:0,200.00 Y32,G00."0 YI:,300.00 Y":,"00.00 Y:8,GGG.GG Y:G,000.00 800 :00 :00 300 80 :00 200 200 Figure 6-4 2sing the a&o5e ta&le, this Correlated su&Auer# finds the highest 0aid em0lo#ee in each de0artment( I !o)s !eturned Last6name )irst6name #ept6no Salary 3m#the !ichard 80 YI:,300.00 Cham&ers Mandee 800 Y:G,G"0.00 3mith John 200 Y:G,000.00 Larkins Loraine 300 Y:0,200.00 Carrison Cer&ert :00 Y":,"00.00 3trickling Cletus :00 Y":,"00.00 -otice that &oth of the ta&les ha5e &een assigned alias names (em0 for the main Auer# and emt for the correlated su&Auer#$. .ecause the same Em0lo#ee ta&le is used in the main Auer# and the su&Auer#, one of them must &e assigned an alias. The aliases are used in the su&Auer# to Aualif# and match the common domain 5alues &et)een the t)o ta&les. This coding techniAue >correlates? the main Auer# ta&le to the one in the su&Auer#. The follo)ing Correlated su&Auer# uses the 46 function to find all em0lo#ees )ho earn less than the a5erage 0a# in their de0artment( " !o)s !eturned Last6name 6 )irst6name #ept6no Salary 6 3m#the !ichard 80 YI:,300.00 Cham&ers Mandee 800 Y:G,G"0.00 Coffing .ill# 200 Y:8,GGG.GG Larkins Loraine 300 Y:0,200.00 !eill# %illiam :00 Y3I,000.00 Earlier in this cha0ter, it )as indicated that a column from the su&Auer# cannot &e referenced in the main Auer#. This is still true. Co)e5er, nothing is )rong )ith using one or more column references from the main Auer# )ithin the su&Auer# to create a Correlated su&Auer#.
EFISTS 4nother 0o)erful resource that can &e used )ith a correlated su&Auer# is the EBI3T3. It 0ro5ides a true+false test )ithin the %CE!E clause. In the s#nta< that follo)s, it is used to test )hether or not a single ro) is returned from the su&Auer# 3ELECT( If a ro) is found, the EBI3T3 test is true, and con5ersel#, if a ro) is not found, the result is false. %hen a true condition is determined, the 5alue in the 3ELECT is returned from the main Auer#. %hen the condition is determined to &e false, no ro)s are selected. 3ince EBI3T3 returns one or no ro)s, it is a fast )a# to determine )hether or not a condition is 0resent )ithin one or more data&ase ta&les. The correlated su&Auer# can also &e 0art of a 7oin to add another le5el of test. It has 0otential to &e 5er# so0histicated. 4s an e<am0le, to find all customers that ha5e not 0laced an order the -1T I- su&Auer# can &e used. !emem&er, )hen #ou use the -1T I- clause the -2LL needs to &e considered and eliminated using the I3 -1T -2LL check in the su&Auer#. %hen using the -1T EBI3T3 )ith a correlated su&Auer#, the same ans)er is o&tained, it is faster than a normal su&Auer# and there is no concern for getting a null into the su&Auer#. These ne<t e<am0les sho) the EBI3T3 and the -1T EBI3T3 tests. -otice that the ne<t 3ELECT is the same correlated su&Auer# as seen earlier, e<ce0t here it is utili;ing the su&Auer# to find all customers )ith orders( : !o)s !eturned %ustomer6name 4ce Consulting *ata&ases ! 2s .ill#Rs .est Choice B/[ 'lum&ing .# changing the EBI3T3 to a -1T EBI3T3, the ne<t 3ELECT finds all customers )ithout orders( 8 !o) !eturned %ustomer6name 4cme 'roducts 3ince the Customer and 1rder ta&les are used in the a&o5e Correlated su&Auer#, the ta&le names did not reAuire an alias. Co)e5er, it )as done to shorten the names to ease the eAualit# coding in the su&Auer#. 4n added &enefit of this techniAue (-1T EBI3T3$ is that the 0resence of a -2LL does not affect the 0erformance. -otice that in &oth su&Aueries, the asterisk (K$ is used for the columns. 3ince it is a true or false test, the columns are not used and it is the shortest )a# to code the 3ELECT. If the column in the su&Auer# ta&le is a 'rimar# Inde< or a 2niAue 3econdar# Inde<, the correlated su&Auer# can &e 5er# fast. The e<am0les in this cha0ter onl# use a single column for the correlation. Co)e5er, it is common to use more than one column from the main Auer# in the correlated su&Auer#. 4lthough the techniAues 0resented in this last cha0ter seem relati5el# sim0le, the# can &e 5er# 0o)erful. 2nderstanding su&Aueries and Correlated su&Aueries can hel0 #ou unleash the 0o)er.
Doin Processing 4 7oin is the com&ination of t)o or more ta&les in the same 9!1M of a single 3ELECT statement. %hen )riting a 7oin, the ke# is to locate a column in &oth ta&les that is from a common domain. Like the correlated su&Auer#, 7oins are normall# &ased on an eAual com0arison &et)een the 7oin columns. 4n e<am0le of a common domain column might &e a customer num&er. %hether it re0resents a 0articular customer, as the 0rimar# ke#, in the Customer ta&le, or the customer that 0laced a s0ecific order, as a foreign ke#, in the 1rder ta&le, it re0resents the same entit# in &oth ta&les. %ithout a common 5alue, a match cannot &e made and therefore, no ro)s can &e selected using a 7oin. 4n eAualit# 7oin returns matching ro)s. 4n# ans)er set that a su&Auer# can return, a 7oin can also 0ro5ide. 2nlike the su&Auer#, a 7oin lists all of its ta&les in the same 9!1M clause of the 3ELECT. Therefore, columns from multi0le ta&les are a5aila&le for return to the user. The desired columns are the main factor in deciding )hether to use a 7oin or a su&Auer#. If onl# the columns come from a single ta&le are desired, a su&Auer# or a 7oin )ork fine. Co)e5er, if columns from more than one ta&le are needed, a 7oin must &e used. In ersion 2 !elease 3, the num&er of ta&les allo)ed in a single 7oin increased from si<teen (8I$ to si<t#+four (I:$ ta&les.
(riginal Doin Synta/ The 3@L 7oin is a traditional and 0o)erful tool in a relational data&ase. The first difference &et)een a 7oin and a single ta&le 3ELECT is that multi0le ta&les are listed using the 9!1M clause. The first techniAue, sho)n &elo), uses a comma &et)een the ta&le names. This is the same techniAue used )hen listing multi0le columns in the 3ELECT, 1!*E! ./ or most other area that allo)s for the identification of more than one o&7ect. The follo)ing is the original 7oin s#nta< for a t)o+ta&le 7oin( The follo)ing ta&les )ill &e used to demonstrate the 7oin s#nta<( Customer Ta&le + contains " customers %ustomer6num$er %ustomer6name Phone6num$er PK UPI NUSI NUSI 88888888 38383838 3832383: "HGFIGG3 GH323:"I .ill#=s .est Choice 4cme 'roducts 4CE Consulting B/[ 'lum&ing *ata&ases -+2 """+823: """+8888 """+8282 3:H+GF": 322+8082 Figure 7-1 1rder Ta&le + contains " orders (rder6num$er %ustomer6num$er (rder6date (rder6total PK K UPI NUSI NUSI 823:"I 823"82 823""2 823"G" 823HHH 88888888 88888888 3832383: GH323:"I "HGFIGG3 FG0"0: FF0808 FF8008 FF8080 FF0F0F 823:H."3 0G00".F8 0"888.:H 0"888.:H 23:":.G: Figure 7-2 The common domain &et)een these t)o ta&les is the customer num&er. It is used in the %CE!E clause )ith the eAual condition to find all the ro)s from &oth ta&les )ith matching 5alues. 3ince the column has e<actl# the same name in &oth ta&les, it &ecomes mandator# to Aualif# this column=s name so that the 'E kno)s )hich ta&le to reference for the data. E5er# a00earance of the customer num&er in the 3ELECT must &e Aualified. The ne<t 3ELECT finds all of the orders for each customer and sho)s the Customer=s name, 1rder num&er and 1rder total using a 7oin( " !o)s !eturned %ustomer6num$er %ustomer6name (rder6num$er (rder6total 3832383: 4CE Consulting 823""2 Y",888.:H 88888888 .ill#=s .est Choice 823:"I Y82,3:H."3 88888888 .ill#=s .est Choice 823"82 YG,00".F8 GH323:"I *ata&ases -+2 823"G" Y8",238.I2 "HGFIGG3 B/[ 'lum&ing 823HHH Y23,:":.G: In the a&o5e out0ut, all of the customers, e<ce0t one, ha5e a single order on file. Co)e5er, .ill#=s .est Choice has 0laced t)o orders and is dis0la#ed t)ice, once for each order. -otice that the Customer num&er in the 3ELECT list is Aualified and returned from the Customer ta&le. *oes it matter, in this 7oin )hich ta&le is used to o&tain the 5alue for the Customer num&erQ /our ans)er should &e no. This is &ecause the 5alue in the t)o ta&les is checked for eAual in the %CE!E clause of the 7oin. Therefore, the 5alue is the same regardless of )hich ta&le is used. Co)e5er, as mentioned earlier, #ou must use the ta&le name to Aualif# an# column name that e<ists in more than one ta&le )ith the same name. Teradata )ill not assume )hich column to use. The follo)ing sho)s the s#nta< for a three+ta&le 7oin( The ne<t three ta&les are used to demonstrate a three+ta&le 7oin( Course Ta&le + contains H courses %ourse6I# %ourse6"ame %redits Seats PK K UPI NUSI 800 200 280 220 300 :00 "00 Teradata Conce0ts Introduction to 3@L 4d5anced 3@L 2!3 3@L 9eatures 'h#sical *ata&ase *esign *ata&ase 4dministration Logical *ata&ase *esign 3 3 3 2 : : 2 "0 20 22 2" 20 8I 2: Figure 7-3 3tudent Ta&le + contains 80 students Student6I# Last6"ame )irst6name %lass6code 7rade6Pt PK K UPI NUSI NUSI 8232"0 82"I3: 23:828 238222 2I0000 2G0023 322833 32:I"2 333:"0 :23:00 'hilli0s Canson Thomas %ilson Johnson Mc!o&erts .ond *elane# 3mith Larkins Martin Cenr# %end# 3usie 3tanle# !ichard Jimm# *ann# 4nd# Michael 3! 9! 9! 31 J! J! 3! 31 9! 3.00 2.GG :.00 3.G0 8.F0 3.F" 3.3" 2.00 0.00 Figure 7-4 3tudentWCourse Ta&le (associati5e ta&le$ Student6I# %ourse6I# PK NUPI NUSI 8232"0 82"I3: 82"I3: 82"I3: 23:828 238222 238222 2I0000 2G0023 322833 322833 32:I"2 333:"0 800 800 200 220 800 280 220 :00 280 220 300 200 :00 Figure 7-5 The first t)o ta&les re0resent the students and courses the# can attend. 3ince a student can take more than one class, the third ta&le 3tudentWCourse is used to associate the t)o main ta&les. It allo)s for one student to take man# classes and one class to &e taken &# man# students (a man#+to+man# relationshi0$. The follo)ing 3ELECT 7oins these three ta&les on the common domain columns to find all courses &eing taken &# the students( 83 !o)s !eturned Last "ame )irst Student6I# %ourse Mc!o&erts !ichard 2G0023 4d5anced 3@L %ilson 3usie 238222 4d5anced 3@L Johnson 3tanle# 2I0000 *ata&ase 4dministration 3mith 4nd# 333:"0 *ata&ase 4dministration *elane# *ann# 32:I"2 Introduction to 3@L Canson Cenr# 82"I3: Introduction to 3@L .ond Jimm# 322833 'h#sical *ata&ase *esign Canson Cenr# 82"I3: Teradata Conce0ts 'hilli0s Martin 8232"0 Teradata Conce0ts Thomas %end# 23:828 Teradata Conce0ts .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures It is reAuired to ha5e one less eAualit# test in the %CE!E than the num&er of ta&les &eing 7oined. Cere there are three ta&les and t)o eAualities on common domain columns in the ta&les. If the ma<imum of I: ta&les is used, this means that there must &e I3 com0arisons )ith I3 4-* logical o0erators. If one com0arison is forgotten, the result is not a s#nta< errorL it is a Cartesian 0roduct 7oin. Man# times the reAuest adds some residual conditions to further refine the out0ut. 9or instance, the need might &e to see all the students that ha5e taken the 2!3 3@L class. This is 5er# common since most ta&les )ill ha5e thousands or millions of ro)s. 4 )a# is needed to limit the ro)s returned. The residual conditions also a00ear in the %CE!E clause. In the ne<t 7oin, the %CE!E of the 0re5ious 3ELECT has &een modified to add an additional com0arison for the course( 3 !o)s !eturned Last "ame )irst"ame Student6I# %ourse .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures The added residual condition does not re0lace the 7oin conditions. Instead it adds a third condition for the course. If one of the 7oin conditions is omitted, the result is a Cartesian 0roduct 7oin (e<0lained ne<t$.
Product Doin It is 5er# im0ortant to use an eAual condition in the %CE!E clause. 1ther)ise #ou get a 0roduct 7oin. This means that one ro) of a ta&le is 7oined to multi0le ro)s of another ta&le. 4 mathematic 0roduct means that multi0lication is used. The ne<t 7oin e<am0le uses a %CE!E clause, &ut it onl# limits )hich ro)s 0artici0ate in the 7oin and does not 0ro5ide a 7oin condition( " !o)s !eturned %ustomer6name (rder6num$er (rder6total .ill#=s .est Choice 823:"I 823:H."3 .ill#=s .est Choice 823"82 G00".F8 .ill#=s .est Choice 823""2 "888.:H .ill#=s .est Choice 823"G" "888.:H .ill#=s .est Choice 823HHH 23:":.G: The a&o5e out0ut resulted from 8 ro) in the customer ta&le &eing 7oined to all the ro)s of the order ta&le. The %CE!E limited the customer ro)s that 0artici0ated in the 7oin, &ut did not s0ecif# an eAual com0arison &et)een the 7oin columns. 4s a result, it looks like .ill# 0laced fi5e orders, )hich is not correct. 3o, &e careful )hen using 0roduct 7oins &ecause 3@L ans)ers the Auestion as asked, not necessaril# as intended. %hen all ro)s of one ta&le are 7oined to all ro)s of another ta&le, it is called a Cartesian 0roduct 7oin or an unconstrained 0roduct 7oin. Think a&out this( if one ta&le has one million ro)s and the other ta&le contains one thousand ro)s, the out0ut is one trillion ro)s (8,000,000 ro)s K 8,000 ro)s P 8,000,000,000 ro)s$. 4s seen a&o5e, the 5ast ma7orit# of the time, a 0roduct 7oin has no meaningful out0ut and is usuall# a mistake. The mistake is either that the %CE!E clause is omitted, a column com0arison is omitted for one of the ta&les using an 4-*, or the ta&le is gi5en an alias and the alias is not used (s#stem thought it )as an additional ta&le )ithout a com0arison$. The ne<t 3ELECT is the same as the one a&o5e, e<ce0t this time the entire %CE!E clause has &een commented out using JK and KJ( 3ince the 7oin condition is con5erted into a comment, the out0ut from the 3ELECT is a Cartesian 0roduct that )ill return FG0 ro)s (80KH0K8:PFG0$ using these 5er# small ta&les. The out0ut is com0letel# meaningless and im0lies that e5er# student is taking e5er# class. This out0ut does not reflect the correct situation. 9orgetting to include the %CE!E clause does not make the 7oin s#nta< incorrect. Instead, it results in a Cartesian 0roduct 7oin. 4l)a#s use the EB'L4I- to 5erif# that the result of the 7oin is reasona&le &efore e<ecuting the actual 7oin. The follo)ing sho)s the out0ut from an EB'L4I- of the a&o5e classic Cartesian 0roduct 7oin. -otice that ste0s I and H indicate a 0roduct 7oin on the condition that (8P8$. 3ince 8 is al)a#s eAual to 8 e5er# time a ro) is read, all ro)s are 7oined )ith all ro)s. The contents of 30ool 8 are sent &ack to the user as the result of statement 8. The total estimated time is 0."I seconds. If #ou remem&er from Cha0ter 3, the EB'L4I- sho)s immediatel# that this situation )ill occur if the 3ELECT is e<ecuted. This is &etter than )aiting, 0otentiall# hours, to determine that the 3ELECT is running too long, stealing 5alua&le com0uter c#cles, doing data transfer, and interfering )ith 5alid 3@L from other users. .e a good cor0orate citi;en and data&ase user( EB'L4I- #our 7oin s#nta< &efore e<ecutingE Make sure the estimates are reasona&le for the si;e of the data&ase ta&les in5ol5ed.
"ewer A"SI Doin Synta/ The 4-3I committee has created a ne) form of the 7oin s#nta<. Like most 4-3I com0liant code, it is a &it longer to )rite. Co)e5er, I 0ersonall# &elie5e that it is )orth the time and the effort due to &etter functionalit# and safeguards. 'lus, it is more difficult to get an accidental 0roduct 7oin using this form of s#nta<. This cha0ter descri&es and demonstrates the use of the I--E! J1I-, the 12TE! J1I-, the C!133 J1I- and the 3elf+7oin. INNE) 2OIN 4lthough the original s#nta< still )orks, there is a ne)er 5ersion of the 7oin using the I--E! J1I-s#nta<. It )orks e<actl# the same as the original 7oin, &ut is )ritten slightl# different. The follo)ing s#nta< is for a t)o+ta&le I--E! J1I-( There are t)o 0rimar# differences &et)een the ne) I--E! J1I- and the original 7oin s#nta<. The first difference is that a comma (,$ no longer se0arates the ta&le names. Instead of a comma, the )ords I--E! J1I- are used. 4s sho)n in the a&o5e s#nta< format, the )ord I--E! is o0tional. If onl# the J1I- a00ears, it defaults to an I--E! J1I-. The other difference is that the %CE!E clause for the 7oin condition is changed to an 1- to declare an eAual com0arison on the common domain columns. If the 1- is omitted, a s#nta< error is re0orted and the 3ELECT does not e<ecute. 3o, the result is not a Cartesian 0roduct 7oin as seen in the original s#nta<. Therefore, it is safer to use. 4lthough the I--E! J1I- is a slightl# longer 3@L statement to code, it does ha5e ad5antages. The first ad5antage, mentioned a&o5e, is fe)er accidental Cartesian 0roduct 7oins &ecause the 1- is reAuired. In the original s#nta<, )hen the %CE!E is omitted the s#nta< is still correct. Co)e5er, )ithout a com0arison, all ro)s of &oth ta&les are 7oined )ith each other and results in a Cartesian 0roduct. The last and most com0elling ad5antage of the ne)er s#nta< is that the I--E! J1I- and 12TE! J1I- statements can easil# &e com&ined into a single 3@L statement. The 12TE! J1I- s#nta<, e<0lanation and significance are co5ered in this cha0ter. The follo)ing is the same 7oin that )as 0erformed earlier using the original 7oin s#nta<. Cere, it has &een con5erted to use an I--E! J1I-( " !o)s !eturned %ustomer6num$er %ustomer6name (rder6num$er (rder6total 3832383: 4CE Consulting 823""2 Y",888.:H 88888888 .ill#=s .est Choice 823:"I Y82,3:H."3 88888888 .ill#=s .est Choice 823"82 YG,00".F8 GH323:"I *ata&ases -+2 823"G" Y8",238.I2 "HGFIGG3 B/[ 'lum&ing 823HHH Y23,:":.G: Like the original s#nta<, more than t)o ta&les can &e 7oined in a single I--E! J1I-. Each consecuti5e ta&le name follo)s an I--E! J1I- and associated 1- clause to tell )hich columns to match. Therefore, a ten+ ta&le 7oin has nine J1I- and nine 1- clauses to identif# each ta&le and the columns &eing com0ared. There is al)a#s one less J1I- J 1- com&ination than the num&er of ta&les referenced in the 9!1M. The follo)ing s#nta< is for an I--E! J1I- )ith more than t)o ta&les( The Nta&le+name-O reference a&o5e is intended to re0resent a 5aria&le num&er of ta&les. It might &e a 3+ta&le, a 80+ta&le or u0 to a I:+ta&le 7oin. The same a00roach is used regardless of the num&er of ta&les &eing 7oined together in a single 3ELECT. The other difference &et)een these t)o 7oin formats is that regardless of the num&er of ta&les in the original s#nta<, there )as onl# a single %CE!E clause. Cere, each additional I--E! J1I- has its o)n 1- condition. If one 1- is omitted from the I--E! J1I-, an error code of 3H0I )ill &e returned. This error kee0s the 7oin from e<ecuting, unlike the original s#nta<, )here a forgotten 7oin condition in the %CE!E is allo)ed, &ut creates an accidental Cartesian 0roduct 7oin. The ne<t I--E! J1I- is con5erted from the 3+ta&le 7oin seen earlier( 3 !o)s !eturned Last "ame )irst"ame Student6I# %ourse .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures The I--E! J1I- s#nta< can use a %CE!E clause instead of a com0ound 1- com0arison. It can &e used to add one or more residual conditions. 4 residual condition is a com0arison that is in addition to the 7oin condition. %hen it is used, the intent is to 0otentiall# eliminate ro)s from one or more of the ta&les. In other )ords, as ro)s are read the %CE!E clause com0ares each ro) )ith a condition to decide )hether or not it should &e included or eliminated from the 7oin 0rocessing. The %CE!E clause is a00lied as ro)s are read, &efore the 1- clause. Eliminated ro)s do not 0artici0ate in the 7oin against ro)s from another ta&le. 9or more details, read the section on %CE!E clauses at the end of this cha0ter. The follo)ing is the same 3ELECT using a %CE!E to com0are the Course name as a residual condition instead of a com0ound (4-*$ com0arison in the 1-( 4s far as the I--E! J1I-0rocessing is concerned, the 'E )ill normall# o0timi;e &oth of these last t)o 7oins e<actl# the same. The EB'L4I- is the &est )a# to determine ho) the o0timi;er uses s0ecific Teradata ta&les in a 7oin o0eration. OUTE) 2OIN 4s seen 0re5iousl#, the 7oin 0rocessing matches ro)s from multi0le ta&les on a column containing 5alues from a common domain. Most of the time, each ro) in a ta&le has a matching ro) in the other ta&le. Co)e5er, )e do not li5e in a 0erfect )orld and sometimes our data is not 0erfect. Im0erfect data is ne5er returned )hen a normal 7oin is used and the im0erfection ma# go unnoticed. The sole 0ur0ose of an 12TE! J1I- is to find and return ro)s that do not match at least one ro) from another ta&le. It is for >e<ce0tion? re0orting, &ut at the same time, it does the I--E! J1I- 0rocessing too. Therefore, the intersecting (matching$ common domain ro)s are returned along )ith all ro)s )ithout a matching 5alue from another ta&le. This non+matching condition might &e due to the e<istence of a -2LL or in5alid data 5alue in the 7oin column(s$. 9or instance, if the em0lo#ee and de0artment ta&les are 7oined using an I--E! J1I-, it dis0la#s all the em0lo#ees )ho )ork in a 5alid de0artment. Mechanicall#, this means it returns all of the em0lo#ee ro)s that contain a 5alue in the de0artment num&er column, as a foreign ke#, that matches a de0artment num&er 5alue in the de0artment ta&le, as a 0rimar# ke#. %hat it does not dis0la# are em0lo#ees )ithout a de0artment num&er (-2LL$ and em0lo#ees )ith in5alid de0artment num&ers (&reaks referential integrit# rules$. These additional ro)s can &e returned )ith the intersecting ro)s using one of the three formats for an 12TE! J1I- listed &elo). The three formats of an 12TE! J1I- are( Le"t_table LE)T (*TE& D(I" R$%ht_table E le"t table $s oute# table Le"t_table &I79T (*TE& D(I" R$%ht_table E #$%ht table $s oute# table Le"t_table )*LL (*TE& D(I" R$%ht_table E both a#e oute# tables The 12TE! J1I- has an outer ta&le. The outer ta&le is used to direct )hich e<ce0tion ro)s are out0ut. 3im0l# 0ut, it is the controlling ta&le of the 12TE! J1I-. 4s a result of this feature, all the ro)s from the outer ta&le )ill &e returned, those containing matching domain 5alues and those )ith non+matching 5alues. The I--E! J1I- has onl# inner ta&les. To code an 12TE! J1I- it is )ise to start )ith an I--E! J1I-. 1nce the 7oin is )orking, the ne<t ste0 is to con5ert the )ord I--E! to 12TE!. The 3ELECT list for matching ro)s can dis0la# data from an# of the ta&les in the 9!1M. This is &ecause a ro) )ith a matching ro) e<ists in the ta&les. Co)e5er, all non+matching ro)s )ith -2LL or in5alid data in the outer ta&le do not ha5e a matching ro) in the inner ta&le. Therefore, the entire inner ta&le ro) is missing and no column is a5aila&le for the 3ELECT list. This is the eAui5alent of a -2LL. 3ince the e<ce0tion ro) is missing, there is no data a5aila&le for dis0la#. 4ll referenced columns from the missing inner ta&le ro)s )ill &e re0resented as a -2LL in the dis0la#. The &asic s#nta< for a t)o+ta&le 12TE! J1I- follo)s( 2nlike the I--E! J1I-, there is no original 7oin s#nta< o0eration for an 12TE! J1I-. The 12TE! J1I- is a uniAue ans)er set. The closest functionalit# to an 12TE! J1I- comes from the 2-I1- set o0erator, )hich is co5ered later in this &ook. The other fantastic Aualit# of the ne)er I--E! and 12TE! 7oin s#nta< is that the# &oth can &e used in the same 3ELECT )ith three or more ta&les. The ne<t se5eral sections e<0lain and demonstrate all three formats of the 12TE! J1I-. The 0rimar# issue )hen using an 12TE! J1I- is that onl# one format can &e used in a 3ELECT &et)een an# t)o ta&les. The 9!1M list determines the outer ta&le for 0rocessing. It is im0ortant to understand the functionalit# in order to chose the correct outer 7oin. ,ET OUTE) 2OIN The outer ta&le is determined &# its location in the 9!1M clause of the 3ELECT as sho)n here( <Oute#-table> LE)T (*TE& D(I" <2nne#-table> O# <Oute#-table> LE)T D(I" <2nne#-table> In this format, the Customer ta&le is the one on the left of the )ord J1I-. 3ince this is a LE9T 12TE! J1I-, the Customer is the outer ta&le. This s#nta< can return all customer ro)s that match a 5alid order num&er (I--E! J1I-$ and Customers )ith -2LL or in5alid order num&ers (12TE! J1I-$. The ne<t 3ELECT sho)s customers )ith matching orders and those that need to &e called &ecause the# ha5e not 0laced an order( I !o)s !eturned %ustomer6name (rder6num$er (rder6total 4ce Consulting 823""2 Y",888.:H 4cme 'roducts Q Q .ill#Rs .est Choice 823:"I Y82,3:H."3 .ill#Rs .est Choice 823"82 YG,00".F8 *ata&ases -+2 823"G" Y8",238.I2 B/[ 'lum&ing 823HHH Y23,:":.G: The a&o5e out0ut consists of all the ro)s from the Customer ta&le &ecause it is the outer ta&le and there are no residual conditions. 2nlike the earlier I--E! J1I-, 4cme 'roducts is no) easil# seen as the onl# customer )ithout an order. 3ince 4cme 'roducts has no order at this time, the order num&er and the order total are &oth e<tended )ith the >Q? to re0resent a -2LL, or missing 5alue from a non+ matching ro) of the inner ta&le. This is a 5er# im0ortant conce0t. The result of the 3ELECT 0ro5ides the matching ro)s like the I--E! J1I- and the non+matching ro)s, or e<ce0tions that are missed &# the I--E! J1I-. It is 0ossi&le to add the order num&er to an 1!*E! ./ to 0ut all e<ce0tions either at the front (43C$ or &ack (*E3C$ of the out0ut re0ort. %hen using an 12TE! J1I-, the results of this 7oin are stored in the s0ool area and contain all of the ro)s from the outer ta&le. This includes the ro)s that match and all the ro)s that do not match from the 7oin ste0. The onl# difference is that the non+matching ro)s are carr#ing the -2LL 5alues for all columns for missing ro)s from the inner ta&le. The conce0t of a LE9T 12TE! J1I-is 0rett# straight for)ard )ith t)o ta&les. Co)e5er, additional thought is reAuired )hen using more then t)o ta&les to 0reser5e ro)s from the first outer ta&le. !emem&er that the result of the first 7oin is sa5ed in s0ool. This same s0ool is then used to 0erform all su&seAuent 7oins against an# additional ta&les, or other s0ool areas. 3o if #ou 7oin 3 ta&les using an outer 7oin the first t)o ta&les are 7oined together )ith the s0ooled results re0resenting the ne) outer ta&le and then 7oined )ith the third ta&le )hich &ecomes the !I6CT ta&le. 2sing the 3tudent, Course and 3tudentWCourse ta&les, the follo)ing 3ELECT 0reser5es the e<ce0tion ro)s from the 3tudent ta&le as the outer ta&le, throughout the entire 7oin. 3ince &oth 7oins are )ritten using the LE9T 12TE! J1I- and the 3tudent ta&le is the ta&le name that is the furthest to the left it remains as the outer ta&le( 8: !o)s !eturned Last "ame )irst"ame Student6I# %ourse Larkins Michael :23:00 Q Mc!o&erts !ichard 2G0023 4d5anced 3@L %ilson 3usie 238222 4d5anced 3@L Johnson 3tanle# 2I0000 *ata&ase 4dministration 3mith 4nd# 333:"0 *ata&ase 4dministration *elane# *ann# 32:I"2 Introduction to 3@L Canson Cenr# 82"I3: Introduction to 3@L .ond Jimm# 322833 'h#sical *ata&ase *esign Canson Cenr# 82"I3: Teradata Conce0ts 'hilli0s Martin 8232"0 Teradata Conce0ts Thomas %end# 23:828 Teradata Conce0ts .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures The a&o5e out0ut contains all the ro)s from the 3tudent ta&le as the outer ta&le in the three+ta&le LE9T 12TE! J1I-. The 12TE! J1I- returns a ro) for a student named Michael Larkins e5en though he is not taking a course. 3ince, his course ro) is missing, no course name is a5aila&le for dis0la#. 4s a result, the out0ut is e<tended )ith a -2LL in course name, &ut &ecomes 0art of the ans)er set. -o), it is kno)n that a student isn=t taking a course. It might &e im0ortant to kno) if there are an# courses )ithout students. The 0re5ious 7oin can &e con5erted to determine this fact &# rearranging the ta&le names in the 9!1M to make the Course ta&le the outer ta&le, or &# using the !I6CT 12TE! J1I-. )I&/T OUTE) 2OIN 4s indicated earlier, the outer ta&le is determined &# its 0osition in the 9!1M clause of the 3ELECT. Consider the follo)ing( <2nne#-table> &I79T (*TE& D(I" <Oute#-table> O# <2nne#-table> &I79T D(I" <Oute#-table> *n the ne"t e"ample, the Nustomer table is still ritten be'ore the 4rder table. Since it is no a .*/0- 4&-E. U4*3 and the 4rder table is on the right o' the ord U4*3, it is no the outer table. .emember, all ros can be returned 'rom the outer tableV To include the orders )ithout customers, the 0re5iousl# seen LE9T 12TE! J1I- has &een con5erted to a !I6CT 12TE! J1I-. It can &e used to return all of the ro)s in the 1rder ta&le, those that match customer ro)s and those that do not match customers. The follo)ing is con5erted to a !I6CT 12TE! J1I- to find all orders( I !o)s !eturned %ustomer6name (rder6num$er (rder6total Q FFFFFF Y8.00+ 4ce Consulting 823""2 Y",888.:H .ill#Rs .est Choice 823:"I Y82,3:H."3 .ill#Rs .est Choice 823"82 YG,00".F8 *ata&ases -+2 823"G" Y8",238.I2 B/[ 'lum&ing 823HHH Y23,:":.G: The a&o5e out0ut from the 3ELECT consists of all the ro)s from the 1rder ta&le, )hich is the outer ta&le. In a 2+ta&le 12TE! J1I-)ithout a %CE!E clause, the num&er of ro)s returned is usuall# eAual to the num&er of ro)s in the outer ta&le. In this case, the outer ta&le is the 1rder ta&le. It contains I ro)s and all I ro)s are returned. This 7oin returns all orders )ith a 5alid customer I* (like the I--E! J1I-$ and orders )ith a missing or an in5alid customer I* (12TE! J1I-$. Either of these last t)o conditions constitutes a critical &usiness 0ro&lem that needs immediate attention. It is im0ortant to determine that orders )ere 0laced, &ut that the &u#er of them is not kno)n. 3ince the out0ut )as sorted &# the customer name, the e<ce0tion ro) is returned first. This techniAue makes the e<ce0tion eas# to find, es0eciall# in a large re0ort. -ot onl# is the customer missing for this order, it o&5iousl# has additional 0ro&lems. The total is negati5e and the order num&er is all nines. %e can no) correct a situation )e kne) nothing a&out or correct the 0rocedure or 0olic# that allo)ed for the error to occur. 2sing the same 3tudent and Course ta&les from the 0re5ious 3+ta&le 7oin, it can &e con5erted from the t)o LE9T 12TE! J1I-o0erations to t)o !I6CT 12TE! J1I- o0erations in order to find the students taking courses and also find an# courses )ithout students enrolled( G !o)s !eturned Last "ame )irst"ame Student6I# %ourse Mc!o&erts !ichard 2G0023 4d5anced 3@L %ilson 3usie 238222 4d5anced 3@L *elane# *ann# 32:I"2 Introduction to 3@L Canson Cenr# 82"I3: Introduction to 3@L Q Q Q Logical *ata&ase *esign .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures -o), using the out0ut from the 12TE! J1I- on the Course ta&le, it is a00arent that no one is enrolled in the Logical *ata&ase *esign course. The enrollment needs to &e increased or the room needs to &e freed u0 for another course. %here inner 7oins are great at finding matches, outer 7oins are great at finding &oth matches and 0ro&lems. )*LL (*TE& D(I" The last form of the 12TE! J1I- is a 92LL 12TE! J1I-. If &oth Customer and 1rder e<ce0tions are to &e included in the out0ut re0ort, then the s#nta< should a00ear as( <Oute#-table> )*LL (*TE& D(I" <Oute#-table> O# <Oute#-table> )*LL D(I" <Oute#-table> 4 92LL 12TE! J1I- uses &oth of the ta&les as outer ta&les. The e<ce0tions are returned from &oth ta&les and the missing column 5alues from either ta&le are e<tended )ith -2LL. This 0uts the LE9T and !I6CT 12TE! J1I-out0ut into a single re0ort. To return the customers )ith orders, and include the orders )ithout customers and customers )ithout orders, the follo)ing 92LL 12TE! J1I- can &e used( H !o)s !eturned %ustomer6name (rder6num$er (rder6total Q FFFFFF Y8.00+ 4ce Consulting 823""2 Y",888.:H 4cme 'roducts Q Q .ill#Rs .est Choice 823"82 YG,00".F8 .ill#Rs .est Choice 823:"I Y82,3:H."3 *ata&ases -+2 823"G" Y8",238.I2 B/[ 'lum&ing 823HHH Y23,:":.G: The out0ut from the 3ELECT consists of all the ro)s from the 1rder and Customer ta&les &ecause the# are no) &oth outer ta&les in a 92LL 12TE! J1I-. The total num&er of ro)s returned is more difficult to 0redict )ith a 92LL 12TE! J1I-. The ans)er set contains( one ro) for each of the matching ro)s from the ta&les, 0lus one ro) for each of the missing ro)s in the left ta&le, 0lus one for each of the missing ro)s in the right ta&le. 3ince &oth ta&les are outer ta&les, not as much thought is reAuired for choosing the outer ta&le. Co)e5er, as mentioned earlier the I--E! and 12TE! 7oin 0rocessing can &e com&ined in a single 3ELECT. The I--E! J1I- still eliminates all non+matching ro)s. This is )hen the most consideration needs to &e gi5en to the a00ro0riate outer ta&les. Like all 7oins, more than t)o ta&les can &e 7oined using a 92LL 12TE! J1I-, u0 to I: ta&les. The ne<t 92LL 12TE! J1I- s#nta< uses 3tudent and Course ta&les for the outer ta&les through the entire 7oin 0rocess( 8" !o)s !eturned Last "ame )irst"ame Student6I# %ourse Larkins Michael :23:00 Q Mc!o&erts !ichard 2G0023 4d5anced 3@L %ilson 3usie 238222 4d5anced 3@L Johnson 3tanle# 2I0000 *ata&ase 4dministration 3mith 4nd# 333:"0 *ata&ase 4dministration *elane# *ann# 32:I"2 Introduction to 3@L Canson Cenr# 82"I3: Introduction to 3@L Q Q Q Logical *ata&ase *esign .ond Jimm# 322833 'h#sical *ata&ase *esign Canson Cenr# 82"I3: Teradata Conce0ts 'hilli0s Martin 8232"0 Teradata Conce0ts Thomas %end# 23:828 Teradata Conce0ts .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures The a&o5e 3ELECT uses the 3tudent, Course and >3tudent Course? (associati5e$ ta&les in a 92LL 12TE! J1I-. 4ll three ta&les are outer ta&les. The a&o5e includes one non+matching ro) from the 3tudent ta&le )ith a null in the course name and one non+matching ro) from the course ta&le )ith nulls in all three columns from the 3tudent ta&le. 3ince the 3tudent Course ta&le is also an outer ta&le, if there )ere an# non+matching ro)s in it, the# can also &e returned containing a null in its columns. Co)e5er, since it is an associati5e ta&le used onl# for a man#+to+man# relationshi0 &et)een the 3tudent and Course ta&les, missing ro)s in it )ould indicate a serious &usiness 0ro&lem. 4s a reminder, the result of the first 7oin ste0 is stored in s0ool, )hich is tem0orar# )ork s0ace that the s#stem uses to com0lete each ste0 of the 3ELECT. Then, the s0ool area is used for each consecuti5e J1I- ste0. This continues until all of the ta&les ha5e &een 7oined together, t)o at a time. Co)e5er, the s0ool areas are not held until the end of the 3ELECT. Instead, )hen the s0ool is no longer needed, it is released immediatel#. This makes more s0ool a5aila&le for another ste0, or &# another user. The release can &e seen in the EB'L4I- out0ut as (Last 2se$ for a s0ool area. 4lso, )hen using Teradata, do not s0end a lot of time )orr#ing a&out )hich ta&les to 7oin first. The o0timi;er makes this choice at e<ecution time. The o0timi;er al)a#s looks for the fastest method to o&tain the reAuested ro)s. It uses data distri&ution and inde< demogra0hics to make its final decision on a methodolog#. 3o, the ta&les 7oined first in the s#nta<, might &e the last ta&les 7oined in the e<ecution 0lan. 4ll data&ases 7oin ta&les t)o at a time, &ut most data&ases 7ust 0ick )hich ta&les to 7oin &ased on their 0osition in the 9!1M. 3ometimes )hen the 3@L runs slo), the user 7ust changes the order of the ta&les in the 7oin. 1ther)ise, 7oin schemas must &e &uilt to tell the !*.M3 ho) to 7oin s0ecific ta&les. Teradata is smart enough, using e<0licit or im0licit 3T4TI3TIC3, to e5aluate )hich ta&les to 7oin together first. %hene5er 0ossi&le, four ta&les might &e 7oined at the same time, &ut it is still done as t)o, t)o+ta&le 7oins in 0arallel. Joins in5ol5ing millions of ro)s are considered difficult for most data&ases, &ut Teradata 7oins them )ith ease. It is a good idea to use the Teradata EB'L4I-, to see )hat ste0s the o0timi;er 0lans to use to accom0lish the reAuest. 'rimaril# in the &eginning #ou are looking for an estimate of the num&er of ro)s that )ill &e returned and the time cost to accom0lish it. I recommend using the EB'L4I- &efore each 7oin as #ou are learning to make sure that the result is reasona&le. If these num&ers a00ear to &e too high for the ta&les in5ol5ed, it is 0ro&a&l# a Cartesian 0roductL )hich is not good. The EB'L4I- disco5ers the 0roduct 7oin )ithin seconds instead of hours. If it )ere actuall# running, it )ould &e )asting resources &# doing all the e<tra )ork to accom0lish nothing. 2se the EB'L4I- to learn this fact the eas# )a# and fi< it. C)OSS 2OIN 4 C!133 J1I- is the 4-3I )a# to )rite a 0roduct 7oin. This means that it 7oins one or more ro)s 0artici0ating from one ta&le )ith all the 0artici0ating ro)s from the other ta&le. 4s mentioned earlier in this cha0ter, there is not a large a00lication for a 0roduct 7oin and e5en fe)er for a Cartesian 7oin. 4lthough there are not man# a00lications for a C!133 J1I-, consider this( an airline might use one to determine the location and num&er of routes needed to fl# from one hu& to all of the other cities the# ser5e. 4 0otential route >7oins? e5er# cit# to the hu&. Therefore, the result needs a 0roduct 7oin. 'ro&a&l# )hat should still &e a5oided is to fl# from e5er# cit# to e5er# other cit# (Cartesian 7oin$. 4 C!133 J1I- is controlled using a %CE!E clause. 2nlike the other 7oin s#nta<, a C!133 J1I- results in a s#nta< error if an 1- clause is used. The follo)ing is the s#nta< for the C!133 J1I-( The ne<t 3ELECT 0erforms a C!133 J1I- (0roduct 7oin$ using the 3tudent and Course ta&les( 80 !o)s !eturned Last6name %ourse6name 'hilli0s Teradata Conce0ts Canson Teradata Conce0ts Thomas Teradata Conce0ts %ilson Teradata Conce0ts Johnson Teradata Conce0ts Mc!o&erts Teradata Conce0ts .ond Teradata Conce0ts *elane# Teradata Conce0ts 3mith Teradata Conce0ts Larkins Teradata Conce0ts 3ince e5er# student is not taking e5er# course, this out0ut has 5er# little meaning from a student and course 0ers0ecti5e. Co)e5er, this same data can &e 5alua&le in determining a 0otential for a situation or the resources that are needed to determine ma<imum room ca0acities. 9or e<am0le, it hel0s if the *ean )ants to kno) the ma<imum num&er of seats needed in a classroom if e5er# student )ere to enroll for e5er# 3@L class. Co)e5er, the ro)s are 0ro&a&l# counted (C12-T(K$$ and not dis0la#ed. This 3ELECT uses a C!133 J1I- to 0o0ulate a deri5ed ta&le (discussed later$, )hich is then used to o&tain the final count( 8 !o) !eturned Total SQL Seats "eeded 30 The 0re5ious 3ELECT can also &e )ritten to use the %CE!E clause to the main 3ELECT to com0are the ro)s of the deri5ed ta&le called *T instead of onl# &uilding those ro)s. Com0are the 0re5ious 3ELECT )ith the ne<t one and determine )hich is more efficient. %hich do #ou find to &e more efficientQ 4t first glance, it )ould a00ear that the first is more efficient &ecause the C!133 J1I- inside the 0arentheses for a deri5ed ta&le is not a Cartesian 0roduct. Instead, the C!133 J1I- that 0o0ulates the deri5ed ta&le is constrained in the %CE!E to onl# 3@L courses rather than all courses. Co)e5er, the 'Eo0timi;es them the same. I told #ou that Teradata )as smartE Se#. 2oin 4 3elf Join is sim0l# a 7oin that uses the same ta&le more than once in a single 7oin o0eration. The first reAuirement for this t#0e of 7oin is that the ta&le must contain t)o different columns of the same domain. This ma# in5ol5e de+normali;ed ta&les. 9or instance, if the Em0lo#ee ta&le contained a column for the manager=s em0lo#ee num&er and the manager is an em0lo#ee, these t)o columns ha5e the same domain. .# 7oining on these t)o columns in the Em0lo#ee ta&le, the managers can &e 7oined to the em0lo#ees. The ne<t 3ELECT 7oins the Em0lo#ee ta&le to itself as an em0lo#ee ta&le and also as a manager ta&le to find managers. Then, the managers are 7oined to the *e0artment ta&le to return the first ten characters of the manager=s name and their entire de0artment name( The self 7oin can &e the original s#nta< (ta&le , ta&le$, an I--E!, 12TE!, or C!133 7oin. 4nother reAuirement is that at least one of the ta&le references must &e assigned an alias. 3ince the alias name &ecomes the ta&le name, the ta&le is no) treated as t)o com0letel# different ta&les. -ormall#, a self 7oin reAuires some degree of de+normali;ation to allo) for t)o columns in the same ta&le to &e 0art of the same domain. 3ince our Em0lo#ee ta&le does not contain the manager=s em0lo#ee num&er, the out0ut cannot &e sho)n. Co)e5er, the conce0t is sho)n here. A#ternative 2OIN 3 ON Coding There is another format that ma# &e used for coding &oth the I--E! and 12TE! J1I- 0rocessing. 're5iousl#, all of the e<am0les and s#nta< for 7oins of more than t)o ta&les used an 1- immediatel# follo)ing the J1I- ta&le list. The follo)ing demonstrates the other coding s#nta< techniAue( %hen using this techniAue, care should &e taken to seAuence the J1I- and 1- 0ortions correctl#. There are t)o 0rimar# differences )ith this st#le com0ared to the earl# s#nta<. 9irst, the J1I- statements and ta&le names are all together. In one sense, this is more like the s#nta< of( ta&lename8, ta&lename2 as seen in the original 7oin. 3econd, the 1- statement seAuence is re5ersed. In the a&o5e s#nta< diagram, the 1- reference for ta&lename2 and ta&lename- is &efore the 1- reference for ta&lename8 and ta&lename2. Co)e5er, the J1I- for Nta&le+name8O and Nta&le+name2O are still &efore the J1I- of Nta&le+name2O and Nta&le+name-O. In other )ords, the first 1- goes )ith the last J1I- )hen the# are nested using this techniAue. The follo)ing three+ta&le I--E! J1I- seen earlier is con5erted here to use this re5ersed form of the 1- com0arisons( 'ersonall#, )e 0refer the first techniAue in )hich e5er# J1I- is follo)ed immediatel# &# its 1- condition. Cere are our reasons( It is harder to accidentall# forget to code an 1- for a J1I-, the# are together. Less de&ugging time needed, and )hen it is needed, it is easier. .ecause the 7oin allo)s I: ta&les in a single 3ELECT, the 3@L in5ol5ing se5eral ta&les ma# &e longer than a single 0age can dis0la#. Therefore, man# of the J1I- clauses )ill &e on a different 0age than its corres0onding 1- condition. It might reAuire 0aging &ack and forth multi0le times to locate all of the 1- conditions for e5er# J1I- clause. This in5ol5es too much effort. 2sing the J1I- J 1-, the# are 0h#sicall# ne<t to each other. 4dding another ta&le into the 7oin reAuires careful thought and 0lacement for &oth the J1I- and the 1-. %hen using the J1I- J 1-, the# can &e 0laced almost an#)here in the 9!1M clause.
Adding &esidual %onditions to a Doin Most of the e<am0les in this &ook ha5e included all ro)s from the ta&les &eing 7oined. Co)e5er, in the )orld of Teradata )ith millions of ro)s &eing stored in a single ta&le, additional com0arisons are 0ro&a&l# needed to reduce the num&er of ro)s returned. There are t)o )a#s to code residual conditions. The# are( the use of a com0ound condition using the 1-, or a %CE!E clause ma# &e used in the ne) J1I-. These residual conditions are in addition to the 7oin eAualit# in the 1- clause. Consideration should &e gi5en to the t#0e of 7oin )hen including the %CE!E clause. The follo)ing 0aragra0hs discuss the o0erational as0ects of mi<ing an 1- )ith a %CE!E for I--E! and 12TE! J1I-o0erations. INNE) 2OIN The %CE!E clause )orks e<actl# the same )hen used )ith the I--E! J1I- as it does on all other forms of the 3ELECT. It eliminates ro)s at read time &ased on the condition &eing checked and an# inde< columns in5ol5ed in the com0arison. -ormall#, as fe)er ro)s are read, the faster the 3@L )ill run. It is more efficient &ecause fe)er resources such as disk, IJ1, cache s0ace, s0ool s0ace, and C'2 are needed. Therefore, )hene5er 0ossi&le, it is &est to eliminate unneeded ro)s using a %CE!E condition )ith an I--E! J1I-. I like the use of %CE!E &ecause all residual conditions are located in one 0lace. The follo)ing sam0les are the same 7oin that )as 0erformed earlier in this cha0ter. Cere, one uses a %CE!E clause and the other a com0ound com0arison 5ia the 1-( 1r 2 !o)s !eturned %ustomer6name (rder6num$er (rder6total .ill#=s .est Choice 823:"I Y82,3:H."3 .ill#=s .est Choice 823"82 YG,00".F8 The out0ut is e<actl# the same )ith &oth coding methods. This can &e 5erified using the EB'L4I-. %e recommend using the %CE!E clause )ith an inner 7oin &ecause it consolidates all residual conditions in a single location that is eas# to find )hen changes are needed. 4lthough there are multi0le 1- com0arisons, there is onl# one %CE!E clause. OUTE) 2OIN Like the I--E! J1I-, the %CE!E clause can also &e used )ith the 12TE! J1I-. Co)e5er, its 0rocessing is the o00osite of the techniAue used )ith an I--E! J1I- and other 3@L constructs. If #ou remem&er, )ith the I--E! J1I- the intent of the %CE!E clause )as to eliminate ro)s from one or all ta&les referenced &# the 3ELECT. %hen the %CE!E clause is coded )ith an 12TE! J1I-, it is e<ecuted last, instead of first. !emem&er, the 12TE! J1I- returns e<ce0tions. The e<ce0tions must &e determined using the 7oin (matching and non+ matching ro)s$ and therefore ro)s cannot &e eliminated at read time. Instead, the# go into the 7oin and into s0ool. Then, 7ust &efore the ro)s are returned to the client, the %CE!E checks to see if ro)s can &e eliminated from the s0ooled 7oin ro)s. The follo)ing demonstrates the difference )hen using the same t)o techniAues in the 12TE! J1I-. -otice that the results are different( H !o)s !eturned Last "ame )irst"ame Student6I# %ourse Mc!o&erts !ichard 2G0023 4d5anced 3@L %ilson 3usie 238222 4d5anced 3@L *elane# *ann# 32:I"2 Introduction to 3@L Canson Cenr# 82"I3: Introduction to 3@L .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures -otice that onl# courses )ith 3@L as 0art of the name are returned. %hereas the ne<t 3ELECT using the same condition as a com0ound com0arison has a different result( 88 !o)s !eturned Last "ame )irst"ame Student6I# %ourse Mc!o&erts !ichard 2G0023 4d5anced 3@L %ilson 3usie 238222 4d5anced 3@L Q Q Q *ata&ase 4dministration *elane# *ann# 32:I"2 Introduction to 3@L Canson Cenr# 82"I3: Introduction to 3@L Q Q Q Logical *ata&ase *esign Q Q Q 'h#sical *ata&ase *esign Q Q Q Teradata Conce0ts .ond Jimm# 322833 2!3 3@L 9eatures Canson Cenr# 82"I3: 2!3 3@L 9eatures %ilson 3usie 238222 2!3 3@L 9eatures The reason for the difference makes sense after #ou think a&out the functionalit# of the 12TE! J1I-. !emem&er that an 12TE! J1I- retains all ro)s from the outer ta&le, those that match and those that do not match the 1- com0arison. Therefore, the ro) sho)s u0, &ut as a non+matching ro) instead of as a matching ro). There is one last consideration )hen using a %CE!E clause )ith an 12TE! J1I-. 4l)a#s use columns from the outer ta&le in the %CE!E. The reason( if columns of the inner ta&le are referenced in a %CE!E, the o0timi;er )ill 0erform an I--E! J1I- and not an 12TE! J1I-, as coded. It does this since no ro)s )ill &e returned e<ce0t those of the inner ta&le. Therefore, an I--E! J1I- is more efficient. The 0hrase >merge 7oin? can found &e in the EB'L4I- out0ut instead of >outer 7oin? to 5erif# this e5ent. The ne<t 3ELECT )as e<ecuted earlier as an inner 7oin and returned 2 ro)s. Cere it has &een con5erted to an outer 7oin. Co)e5er, the out0ut from the EB'L4I- sho)s in ste0 " that an inner (merge$ 7oin )ill &e used &ecause customer name is a column from the inner ta&le (Customer ta&le$( EB'L4I- E<0lanation
(*TE& D(I" 9ints The easiest )a# to &egin )riting an 12TE! J1I- is to( 8. 3tart )ith an I--E! J1I- and con5ert to an 12TE! J1I-. 1nce the I--E! J1I- is )orking, change the a00ro0riate I--E! descri0tors to LE9T 12TE!, !I6CT 12TE! or 92LL 12TE! 7oin &ased on the desire to include the e<ce0tion ro)s. 3ince I--E! and 12TE! 7oins can &e used together, one 7oin at a time can &e changed to 5alidate the out0ut. 2se the 7oin diagram &elo) to con5ert the I--E! J1I- to an 12TE! J1I-. 2. 9or 7oins )ith greater than t)o ta&les, think of it as( J1I- t)o ta&les at a time. It makes the entire 0rocess easier &# concentrating on onl# t)o ta&les instead of all ta&les. The o0timi;er )ill al)a#s 7oin t)o ta&les, )hether seriall# or in 0arallel and it is smart enough to do it in the most efficient manner 0ossi&le. 3. *on=t )orr# a&out )hich ta&les #ou 7oin first. The o0timi;er )ill determine )hich ta&les should &e 7oined first for the o0timal 0lan. :. The %CE!E clause, if used in an 12TE! J1I- to eliminate ro)s. 4. It is a00lied after then 7oin is com0lete, not )hen ro)s are read like the Inner Join. .. It should reference columns from the outer ta&le. If columns from the Inner ta&le are referenced in a %CE!E clause, the o0timi;er )ill most likel# 0erform a merge 7oin (I--E!$ for efficienc#. This is actuall# an I--E! J1I-o0eration and can &e seen in the EB'L4I- out0ut. Join *iagram( %here( Ta&le I ro)s 4 P that match Ta&le II ro)s and match Ta&le III ro)s (I--E! 7oin 0erfect data$ . P that match Ta&le II ro)s, &ut not Ta&le III ro)s C P that do not match Ta&le II ro)s or Ta&le III ro)s * P that do not match Ta&le II ro)s, &ut do match Ta&le III ro)s Ta&le II ro)s E P that do not match Ta&le I, nor Ta&le III ro)s 9 P that do not match Ta&le I, &ut do match Ta&le III Ta&le III ro)s 6 P that do not match Ta&le I or Ta&le II
Parallel Doin Processing There are four &asic t#0es of 7oins that Teradata can 0erform de0ending on the characteristics of the ta&le definition. %hen the 7oin domain is the 0rimar# inde< ('I$ column, )ith a uniAue secondar# inde< (23I$ the 7oin is referred to as a nested 7oin and in5ol5es, at most, three 4M's. The second t#0e of 7oin is a merge 7oin, )ith three different forms of a merge 7oin, &ased on the reAuest. The ne)est t#0e of 7oin in Teradata is the !o) Cash 7oin using the 0re+sorted !o) Cash 5alue instead of a sorted data 5alue match. This is &eneficial since the data ro) is stored &ased on the ro) hash 5alue and not the data 5alue. The last t#0e is the 0roduct 7oin. In Teradata, each 4M' 0erforms all 7oin 0rocessing in 0arallel locall#. This means that matching 5alues in the 7oin columns must &e on the same 4M' to &e matched. %hen the ro)s are not distri&uted and stored on the same 4M', the# must &e tem0oraril# mo5ed to the same 4M', in s0ool. !emem&er, ro)s are distri&uted on the 5alue in the 'I column(s$. If 7oins are 0erformed on the 'I of &oth ta&les, no ro) mo5ement is necessar#. This is &ecause the ro)s )ith the same 'I 5alue are on the same 4M' D eas#, &ut not al)a#s 0ractical. Most 7oins use a 0rimar# ke#, )hich might &e the 2'I and a foreign ke#, )hich is 0ro&a&l# not the 'I. !egardless of the 7oin t#0e, in a 0arallel en5ironment, the mo5ement of at least one ro) is normall# reAuired. This mo5ement 0uts all matching ro)s together on the same 4M'. The mo5ement is usuall# reAuired due to the user=s choice of a 'I. !emem&er, it is the 'I data 5alue that is used for hashing and ro) distri&ution to an 4M'. Therefore, since the 7oined columns are mostl# columns other than the 'I, ro)s need to &e redistri&uted to another 4M'. The redistri&uted ro)s )ill &e tem0oraril# stored in s0ool s0aceand used from there for the 7oin 0rocessing. The o0timi;er )ill attem0t to determine the most efficient 0ath for data ro) mo5ement. Its choice )ill &e &ased on the amount of data in5ol5ed. The three 7oin strategies a5aila&le are( 8+ du0licate all ro)s of one ta&le onto e5er# 4M', 2+ redistri&ute the ro)s of one ta&le &# hashing the non+'I 7oin column and sending them to the 4M' containing the matching 'I ro), and 3+ redistri&ute &oth ta&les &# hashed 7oin column 5alue. The du0lication of all ro)s is a 0o0ular a00roach )hen the non+'I column is on a small ta&le. Therefore, co0#ing all ro)s is faster than hashing and distri&uting all ro)s. This techniAue is also used )hen doing a 0roduct 7oin and )orse, a Cartesian 0roduct 7oin. %hen &oth ta&les are large, the redistri&ution of the non+'I column ro) to the 4M' )ith the 'I column )ill &e used to sa5e s0ace on each 4M'. 4ll 0artici0ating ro)s are redistri&uted so that the# are on the same 4M' )ith the same data 5alue used &# the 'I for the other ta&le. The last choice is the redistri&ution of all 0artici0ating ro) from &oth ta&les &# hashing on the 7oin column. This is reAuired )hen the 7oin is on a column that is not the 'I in either ta&le. 2sing this last t#0e of 7oin strateg# )ill reAuire the most s0ool s0ace. 3till, this techniAue allo)s Teradata to Auickl# 7oin ta&les together in a 0arallel en5ironment. .# com&ining the s0eed of the ./-ET, the e<0erience of the 'E o0timi;er, and the hashing ca0a&ilities of Teradata the data can &e tem0oraril# mo5ed to meet the demands of the 3@L Auer#. *o not underestimate the im0ortance or &rilliance of this ca0a&ilit#. 4s Aueries change and 0lace ne) demands on the data, Teradata is fle<i&le and 0o)erful enough to mo5e the data tem0oraril# and Auickl# to the 0ro0er location. !edistri&ution reAuires o5erhead 0rocessing. It has nothing to do )ith the 7oin 0rocessing, &ut e5er#thing to do )ith 0re0aring for the 7oin. This is the 0rimar# reason that man# ta&les )ill use a column that is not the 0rimar# ke# column as a -2'I. This )a#, the 7oin columns used in the %CE!E or the 1- are used for distri&ution and the ro)s are stored on the same 4M'. Therefore, the 7oin is 0erformed )ithout need to redistri&ute data. Co)e5er, normall# some re+distri&ution is needed. 3o, make sure to C1LLECT 3T4TI3TIC3 (see **L cha0ter$ on the 7oin columns. The strateg# that the o0timi;e chooses can &e seen in out0ut from an EB'L4I-.
Doin Inde/ Processing 3ometimes, regardless of the 7oin 0lan or indices defined, certain 7oins cannot &e 0erformed in a short enough time frame to satisf# the users. %hen this is the case, another alternati5e must &e e<0lored. Later cha0ters in this &ook discuss tem0orar# ta&les and summar# ta&les as a5aila&le techniAues. If none of these 0ro5ide a 5ia&le solution, #et another o0tion is needed. The other )a# to im0ro5e 7oin 0rocessing is the use of a J1I- I-*EB. It is a 0re+7oin that stores the 7oined ro)s. Then, )hen the 7oin inde< >co5ers? the user=s 3ELECT columns, the o0timi;er automaticall# uses the stored 7oin inde< ro)s to retrie5e the 0re+7oined ro)s from multi0le ta&les instead of doing the 7oin again. The term used here is co5ers. It means that if all columns reAuested &# the user are 0resent in the 7oin inde< it is used. If e5en one column is reAuested that is not in the 7oin inde<, it cannot &e used. Therefore, the actual 7oin must &e 0rocessed to get that e<tra column. The s0eed of the 7oin inde< is its main ad5antage. To enhance its on+ going use, )hene5er a 5alue in a column in a ro) for a ta&le used )ithin a 7oin inde< is changed, the corres0onding 5alue in the 7oin inde< ro)(s$ is also changed. This kee0s the 7oin inde< consistent )ith the ro)s in the actual ta&les. 9or more information on 7oin inde< usage, see Cha0ter 8G in this &ook.
#ATEM TIMEM and TIMESTAMP Teradata has a date function and a time function &uilt into the data&ase and the a&ilit# to reAuest this data from the s#stem. In the earl# releases, *4TE )as a 5alid data t#0e for storing the com&ination of #ear, month and da#, &ut TIME )as not. -o), TIME and TIME3T4M' are &oth 5alid data t#0es that can &e defined and stored )ithin a ta&le. The Teradata !*.M3 stores the date in ///MM** format on disk. The /// is an offset 5alue from the &ase #ear of 8F00. The MM is the month 5alue from 8 to 82 and the ** is the da# of the month. 2sing this format, the data&ase can currentl# )ork )ith dates &e#ond the #ear 3000. 3o, it a00ears that Teradata is /3M com0liant. Teradata al)a#s stores a date as a numeric I-TE6E! 5alue. The follo)ing calculation demonstrates ho) Teradata con5erts a date to the ///MM** date format, for storage of Januar# 8, 8FFF( 9ormula for I-TE6E!*4TE P ((/ear D 8F00$ K 80000$ X (Month K 800$ X *a# The stored data for the date Januar# 8, 8FFF is con5erted to( /ear P (8FFF D 8F00$ K 80000 P 0FF0000 (#ear 0ortion$ Month P 08 K 800 P X0800 (month 0ortion$ *a# P 08 X08 (da# 0ortion$ 0FF0808 stored on disk 4lthough #ears 0rior to 2000 look fairl# >normal? )ith an im0lied #ear for the 20 th Centur#, after 2000 #ears do not look like the normal conce0t of a #ear (800$. 9ortunatel#, Teradata automaticall# does all the con5ersion and makes it trans0arent to the user. The remainder of this &ook )ill 0ro5ide 3@L e<am0les using &oth a numeric date as )ell as the character formats of V//JMMJ**= and V////+MM+**=. The ne<t con5ersion sho)s the data stored for Januar# 8, 2000 (notice that ///P800 or 800 #ears from 8F00$( /ear P (2000 D 8F00$ K 80000 P 8000000 (#ear 0ortion$ Month P 08 K 800 P X0800 (month 0ortion$ *a# P 08 X08 (da# 0ortion$ 8000808 stored on disk 4dditionall#, since the date is stored as an integer and an integer is a signed 5alue, dates 0rior to the &ase #ear of 8F00 can also &e stored. The same formula a00lies for the date con5ersion regardless of )hich centur#. Co)e5er, since dates 0rior to 8F00, like 8G00 are smaller 5alues, the result of the su&traction is a negati5e num&er.
A"SI Standard #ATE &e.erence C2!!E-TW*4TE is the 4-3I 3tandard name for the date function. 4ll references to the original *4TEfunction continues to )ork and return the same date information. 9urthermore, the# &oth dis0la# the date in the same format.
I"TE7E&#ATE I-TE6E!*4TE is the default dis0la# format for most Teradata data&ase client utilities. It is in the form of //JMMJ**. It has nothing to do )ith the )a# the data is stored on disk, onl# the format of the out0ut dis0la#. The current e<ce0tion to this is @uer#man. 3ince it uses the 1*.C, it dis0la#s onl# the 4-3I date, as seen &elo). Later in this &ook, the Teradata 91!M4T function is also addressed to demonstrate alternati5e arrangements regarding #ear, month and da# for out0ut 0resentation. JK *is0la# toda#=s date, this e<am0le assumes 1ct. 8, 2008 KJ Traditional Teradata A"SI 3ELECT *4TE( 3ELECT C2!!E-TW*4TE( *4TE 08J80J08 C2!!E-TW*4TE 08J80J08 Figure 8-1 To change the out0ut default dis0la#, see the *4TE91!M o0tions in the ne<t section of this cha0ter.
A"SI#ATE Teradata )as u0dated in release 2!3 to include the 4-3I date dis0la# and reser5ed name. The 4-3I format is( ////+MM+**. JK *is0la# toda#=s date, this e<am0le assumes 1ct. 8, 2008 KJ Traditional Teradata A"SI 3ELECT *4TE( 3ELECT C2!!E-TW*4TE( *4TE 2008+80+08 C2!!E-TW*4TE 2008+80+08 Figure 8-2 3ince )e are no) &e#ond the #ear 8FFF, it is ad5isa&le to use this 4-3I format to guarantee that e5er#one kno)s the difference &et)een all the #ears of each centur# as( 2000, 8F00 and 8G00. If #ou regularl# use tools 5ia the 1*.C, )hich is soft)are for 10en *ata .ase Connecti5it#, this is the default dis0la# format for the date.
#ATE)(&M Teradata has traditionall# &een /2M com0liant. In realit#, it is com0liant to the #ears &e#ond 3000. Co)e5er, the default dis0la# format using //JMMJ** is not 4-3I com0liant. In Teradata, release 2!3 allo)s a choice of )hether to dis0la# the date in the original dis0la# format (//JMMJ**$ or the ne)er 4-3I format (////+MM+**$. %hen installed, Teradata defaults at the s#stem le5el to the original format, called I-TE6E!*4TE. Co)e5er, this s#stem default *4TE91!M ma# &e o5er+ridden &# u0dating the *.3 Control record. The *4TE91!M( Controls default dis0la# of selected dates Controls e<0ected format for im0ort and e<0ort of dates as character strings (V//JMMJ**= or V////+MM+**=$ in the load utilities Can &e o5er+ridden &# 23E! or )ithin a 3ession at an# time. System ,eve# De.inition M1*I9/ 6E-E!4L 8: P 0 JK I-TE6E!*4TE (//JMMJ**$ KJ M1*I9/ 6E-E!4L 8: P 8 JK 4-3I*4TE (////+MM+**$ KJ User ,eve# De.inition C!E4TE 23E! username \\
&,TEFORM ? F20TE:ER&,TE G ,0S2&,TEH ; Session ,eve# Dec#aration In addition to setting the s#stem default in the control record, a user can reAuest the format for their indi5idual session. The s#nta< is( SET SESS2O0 &,TEFORM ? F,0S2&,TE G 20TE:ER&,TEH ; In the a&o5e settings, the > ] > is used to re0resent an 1! condition. The setting can &e 4-3I*4TE or I-TE6E!*4TE. !egardless of the *4TE91!M &eing used, 4-3I*4TE or I-TE6E!*4TE, these define load and dis0la# characteristics onl#. !emem&er, the date is al)a#s stored on disk in the ///MM** format, &ut the *4TE91!M allo)s #ou to select the format for dis0la#.
#ATE Processing Much of the time s0ent 0rocessing dates is dedicated to storage and reference. /et, there are times that one date #ields or deri5es a second date. 9or instance, once a &ill has &een sent to a customer, the e<0ectation is that 0a#ment comes I0 da#s later. The challenge &ecomes the correct calculation of the e<act due date. 3ince Teradata stores the date as an I-TE6E!, it allo)s sim0le and com0le< mathematics to calculate ne) dates from dates. The ne<t 3ELECT o0eration uses the Teradata date arithmetic and *4TE91!MPI-TE6E!*4TE to sho) the month and da# of the 0a#ment due date in I0 da#s( : !o)s !eturned #ue #ate (rder6date (rder6total FFJ82J0F FFJ80J80 Y8",238.I2 FFJ03J02 FFJ08J08 YG,00".F8 FFJ88J0G FFJ0FJ0F Y23,:":.G: FFJ88J30 FFJ80J08 Y",888.:H .esides a due date, the 3@L can also calculate a discount 0eriod date 80 da#s 0rior to the 0a#ment due date using the alias name( : !o)s !eturned (rder6date #ue #ate (rder6total #iscount #ate #iscounted FFJ80J80 FFJ82J0F Y8",238.I2 FFJ88J2F Y8:,F2I.FF FFJ08J08 FFJ03J02 YG,00".F8 FFJ02J20 YH,G:".HF FFJ0FJ0F FFJ88J0G Y23,:":.G: FFJ80J2F Y22,FG".H: FFJ80J08 FFJ88J30 Y",888.:H FFJ88J20 Y",00F.2: In the a&o5e e<am0le, it )as demonstrated that a *4TE X or + an I-TE6E! results in a ne) date (date ^ X ] + _ integer P date$. Co)e5er, it 0ro&a&l# does not make a lot of sense to multi0l# or di5ide a date &# a num&er. 4s seen earlier in this cha0ter, the stored format of the date is ///MM**. 3ince ** is the lo)est com0onent, the I0 &eing added to the order date in the a&o5e 3ELECT is assumed to &e da#s. The s#stem is smart enough to kno) that it is dealing )ith a date. Therefore, it is smart enough to kno) that a normal #ear contains 3I" da#s. The associati5e 0ro0erties of math tell us that eAuations can &e rearranged and still &e 5alid. Therefore, a *4TE D a *4TE results in an I-TE6E! (date X]+ date P integer$. This I-TE6E! re0resents the num&er of da#s &et)een the dates. This chart summari;es the math o0erations on dates (peration &esult *4TE + *4TE Inter5al (da#s &et)een dates$ *4TE X or + integer *4TE Figure 8-3 This 3ELECT uses this 0rinci0al to dis0la# the num&er of da#s I )as ali5e on m# last &irthda#( 8 !o) !eturned MikeNs Age in #ays 8H"32 The a&o5e e<am0le su&tracted one of m# &irthda#s (1cto&er 8, 2000$ )ith m# actual &irthda# in 8F"2. -otice ho) a)ful an age looks in da#sE More im0ortantl#, notice ho) I sli00ed it into the Title the fact that #ou can use t)o single Auotes to store or dis0la# a literal single Auote in a character string. 4s mentioned a&o5e, an age in da#s looks a)ful and that is 0ro&a&l# )h# )e do not use that format. I am not read# to tell someone I am 7ust a little o5er 8H000. Instead, )e think a&out ages in #ears. To con5ert the da#s to #ears, again math can &e used as seen in the follo)ing 3ELECT( 8 !o) !eturned MikeOs Age in !ears :G %o)E I feel so much #ounger no). This is )here di5ision &egins to make sense, &ut remem&er, the I-TE6E! is not a *4TE. 4t the same time, it assumes that all #ears ha5e 3I" da#s. It onl# does the math o0erations s0ecified in the 3@L statement. -o), )hat da# )as he &ornQ The ne<t 3ELECT uses the concatenation, date arithmetic and a &lank TITLE to 0roduce the desired out0ut( 8 !o) !eturned Mike was $orn on day 2 The a&o5e su&traction results in the num&er of da#s &et)een the t)o dates. Then, the M1* H di5ides &# H to get rid of the num&er of )eeks and results in the remainder. 4 M1* H can onl# result in 5alues 0 thru I (al)a#s 8 less than the M1* o0erator$. 3ince Januar# 8, 8F00 ( 808(date$ $ is a Monda#, Mike )as &orn on a %ednesda#. This chart can &e used for the da# of the )eek &ased on the a&o5e formula and 808(date$ &esult #ay o. the 8eek 0 Monda# 8 Tuesda# 2 %ednesda# 3 Thursda# : 9rida# " 3aturda# I 3unda# Figure 8-4 The follo)ing 3ELECT uses a #ear=s )orth of da#s to deri5e a ne) date that is 3I" da#s a)a#( " !o)s !eturned (rder6date !ear Later #ate (rder6total FGJ0"J0: FFJ0"J0: Y82,3:H."3 FFJ08J08 00J08J08 YG,00".F8 FFJ0FJ0F 00J0FJ0G Y23,:":.G: FFJ80J08 00J0FJ30 Y",888.:H FFJ80J80 00J80J0F Y8",238.I2 In the a&o5e, the #ear 8FFF )as not a lea0 #ear. Therefore, the 5alue of 3I" is used. Like)ise, had the &eginning #ear &een 2000, then 3II needs to &e used &ecause it is a Lea0 /ear. !emem&er, the s#stem is sim0l# doing the math that is indicated in the 3@L statement. If a #ear )ere al)a#s needed, regardless of the num&er of da#s, see the 4**WM1-TC3function. ADD+MONT/S Com0ati&ilit#( Teradata E<tension The Teradata 4**WM1-TC3function can &e used to calculate a ne) date. This date ma# &e in the future (addition$ or in the 0ast (su&traction$. The calendar intelligence is &uilt+in for the num&er of da#s in a month as )ell as lea0 #ear 0rocessing. 3ince the 4-3I C2!!E-TW*4TE and C2!!E-TWTIME are com0ati&le )ith the original *4TE and TIMEfunctions, the 4**WM1-TC3 )orks )ith them as )ell. .elo) is the s#nta< for the 4**WM1-TC3 function( The ne<t 3ELECT uses literals instead of ta&le ro)s to demonstrate the calendar logic used &# the 4**WM1-TC3 function )hen &eginning )ith the last da# of a month and arri5ing at the last da# of 9e&ruar#( 8 !o) !eturned )E6"on6Leap (ct6P=>? )E6Leap6!r (ct6M>J? )E6Leap6!r> (ct6J!rs 2008+02+2G 08J02J2H 2000+02+2F 00J03J0: 200:+80+30 0:J80J30 -otice, )hen using the 4**WM1-TC3 function, that all the out0ut dis0la#s in 4-3I date form. This is true )hen using .TE@ or @uer#man. Con5ersel#, the date arithmetic uses the default date format. Like)ise, the second 4**WM1-TC3 uses DG, )hich eAuates to su&traction or going &ack in time 5ersus ahead. 4dditionall#, &ecause months ha5e a 5ar#ing num&er of da#s, the out0ut from math is likel# to &e different than the 4**WM1-TC3. The ne<t 3ELECT uses the 4**WM1-TC3 function as an alternati5e to the 0re5ious 3ELECT o0erations for sho)ing the month and da# of the 0a#ment due date in 2 months( " !o)s !eturned #ue #ate (rder6date (rder6total 8FFG+0H+0: 8FFG+0"+0: Y82,3:H."3 8FFF+03+08 8FFF+08+08 YG,00".F8 8FFF+88+0F 8FFF+0F+0F Y23,:":.G: 8FFF+82+08 8FFF+80+08 Y",888.:H 8FFF+82+80 8FFF+80+80 Y8",238.I2 The 4**WM1-TC3 function also takes into account the last da# of each month. The follo)ing goes from the last da# of one month to the last da# of another month( 8 !o) !eturned Leap6Ahead6>yrs Leap6ack6>yrs 8ithA?6A=6 2000+02+2F 2000+02+2F 2008+0H+38 %hether going for)ard or &ack)ard or &ack)ard in time, a lea0 #ear is still recogni;ed using 4**WM1-TC3.
A"SI TIME Teradata has also &een u0dated in 2!3 to include the 4-3I time dis0la#, reser5ed name and the ne) TIME data t#0e. 4dditionall#, the clock is no) intelligent and can carr# seconds o5er into minutes. C2!!E-TWTIME is the 4-3I name of the time function. 4ll current 3@L references to the original Teradata TIME function continue to )ork. JK *is0la# the time, this e<am0le assumes 82(8"'M KJ Traditional Teradata A"SI 3ELECT TIMEL 3ELECT C2!!E-TWTIMEL TIME W 82(8"(00 C2!!E-TWTIME 82(8"(00 Figure 8-5 4lthough the time could &e dis0la#ed 0rior to release 2!3, )hen stored, it )as con5erted to a character column t#0e. -o), TIMEis also a 5alid data t#0e, ma# &e defined in a ta&le, and retains the CC(MM(33 0ro0erties. 4s )ell as creating a TIME data t#0e, intelligence has &een added to the clock soft)are. It can increment or decrement TIME )ith the result increasing to the ne<t minute or decreasing from the 0re5ious minute &ased on the addition or su&traction of seconds. %hen storing TIME on disk, this chart indicates the amount of storage reAuired( TIME(n) as0 990MM0SSEnnnnnn n P ?4K (ma/imum is K digits to the right o. the decimalM de.ault P K) CC stored as &#teint (8 &#te$ MM stored as &#teint (8 &#te$ 33 stored as decimal(G,I$ (: &#tes$ Figure 8-6 TIME re0resentation character dis0la# length( TIME (0$ D 80(8:(3G CC4!(G$ TIME (I$ + 80(8:(3G.2088I3 CC4!(8"$
EFT&A%T Com0ati&ilit#( 4-3I .oth *4TE and TIME data are s0ecial in terms of relational design. 3ince each is com0rised of 3 0arts and the# are decom0osa&le. *ecom0osa&le data is data that is not at its most granular le5el. 9or e<am0le, #ou ma# onl# )ant to see the hour. The EBT!4CT function is designed to do the decom0osition on these data t#0es. It )orks )ith &oth the *4TE and TIME functions. This includes the original and ne)er 4-3I e<0ressions. The o0eration is to 0ull a s0ecific 0ortion of the 3@L techniAues. The s#nta< for EBT!4CT( The ne<t 3ELECT uses the EBT!4CT)ith date and time literals to demonstrate the coding techniAue and the resulting out0ut( 8 !o) !eturned !r6Part Mth6Part #ay6Part 9r6Part Min6Part Sec6Part <=== 1= =1 1= 1 I= The EBT!4CT can &e 5er# hel0ful )hen there is a need to ha5e a single com0onent for controlling access to data or the 0resentation of data. 9or instance, )hen calculating aggregates, it might &e necessar# to grou0 the out0ut on a change in the month. 3ince the data re0resents dail# acti5it#, the month 0ortion needs to &e e5aluated se0aratel#. The 1rder ta&le &elo) is used to demonstrate the EBT!4CT function in a 3ELECT( 1rder Ta&le + contains " orders (rder6num$er %ustomer6num$er (rder6date (rder6total PK K UPI NUSI NUSI 823:"I 823"82 823""2 823"G" 823HHH 88888888 88888888 3832383: GH323:"I "HGFIGG3 FG0"0: FF0808 FF8008 FF8080 FF0F0F 823:H."3 0G00".F8 0"888.:H 8"238.I2 23:":.G: Figure 8-7 The follo)ing 3ELECT uses the EBT!4CT to onl# dis0la# the month and also to control the num&er of aggregates dis0la#ed in the 6!12' ./( : !o)s !eturned EFT&A%T(M("T9 )&(M((rder6date) "$r6o.6rows A<erage((rder6total) 8 8 G00".F8 " 8 823:H."3 F 8 23:":.G: 80 2 808H8.": The ne<t 3ELECT o0eration uses entirel# 4-3I com0liant code )ith *4TE91!MP4-3I*4TE to sho) the month and da# of the 0a#ment due date in 2 months and : da#s, notice it uses dou&le Auotes to allo) reser5ed )ords as alias names and 4-3I*4TE in the com0arison and dis0la#( : !o)s !eturned Month #ay !ear (rder6date (rder6total *ue *ate( 3 I 8FFF Jan 08, 8FFF G00".F8 *ue *ate( 88 82 8FFF 4ug 0F, 8FFF 23:":.G: *ue *ate( 82 : 8FFF 1ct 80, 8FFF "888.:H *ue *ate( 82 83 8FFF 1ct 80, 8FFF 8"238.I2
Implied E/tract o. #ayM Month and !ear Com0ati&ilit#( Teradata E<tension 4lthough the EBT!4CT )orks great and it is 4-3I com0liant, it is a function. Therefore, it must &e e<ecuted and the 0arameters 0assed to it to identif# the desired 0ortion as data. Then, it must 0ass &ack the ans)er. 4s a result, there is additional o5erhead 0rocessing reAuired to use it. It )as mentioned earlier that Teradata stores a date as an integer and therefore allo)s math o0erations to &e 0erformed on a date. The s#nta< for im0lied e<tract( The follo)ing 3ELECT uses math to e<tract the three 0ortions of Mike=s literal &irthda#( 8 !o) !eturned #ay6portion Month6portion !ear6portion 8 80 2008 !emem&er that the date is stored as ###mmdd. The literal 5alues are used here to 0ro5ide a date of 1ct. 8, 2008. The da# 0ortion is o&tained here &# making the dd 0ortion (last 2 digits$ the remainder from the M1* 800. The month 0ortion is o&tained &# di5iding &# 800 to eliminate the dd to lea5e the mm (ne) last 2 digits$ 0ortion the remainder of the M1* 800. The #ear 0ortion is the trickiest. 3ince it is stored as ### (#### D 8F00$, )e must add 8F00 to the stored 5alue to con5ert it &ack to the #### format. %hat do #ou su00ose the EBT!4CT function doesQ 3ame thing.
A"SI TIMESTAMP 4nother ne) data t#0e, added to Teradata in 2!3 to com0l# )ith the 4-3I standard, is the TIME3T4M'. TIME3T4M' is no) a dis0la# format, a reser5ed name and a ne) data t#0e. It is a com&ination of the *4TE and TIMEdata t#0es com&ined together into a single column data t#0e. 3ince this is entirel# ne), there is no 0re5ious com0ati&ilit# to contrast. Teradata A"SI *id not 0re5iousl# e<ist SELECT C-RRE0T_T2MEST,M+; C2!!E-TWTIME3T4M' 2000+80+08 82(8"(00 Figure 8-8 Timestam0 re0resentation character dis0la# length( T2MEST,M+=<> IJJK-I;-<L II)CL)MK C*,R=IJ> T2MEST,M+=N> IJJK-I;-<L II)CL)MKD;IC<<< C*,R=;N> -otice that there is a s0ace &et)een the *4TE and TIME0ortions of a timestam0. This is a reAuired element to delimit or se0arate the da# from the hour.
TIME B("ES In 2!3, Teradata has the a&ilit# to access and store &oth the hours and the minutes reflecting the difference &et)een the user=s time ;one and the s#stem time ;one. 9rom a %orld 0ers0ecti5e, this difference is normall# the num&er of hours &et)een a s0ecific location on Earth and the 2nited Mingdom location that )as historicall# called 6reen)ich Mean Time (6MT$. 3ince the 6reen)ich o&ser5ator# has &een >decommissioned,? the ne) reference to this same time ;one is called 2ni5ersal Time Coordinate (2TC$. 4 time ;one relati5e to London (2TC$ might &e( LA Miami )rank.urt 9ong 1ong XG(00 X0"(00 00(00 +0G(00 4 time ;one relati5e to -e) /ork (E3T$ might &e( LA Miami )rank.urt 9ong 1ong X3(00 00(00 +0"(00 +83(00 Cere, the time ;ones used are re0resented from the 0ers0ecti5e of the s#stem at E3T. In the a&o5e, it a00ears to &e &ack)ard. This is &ecause the time ;one is set using the num&er of hours that the s#stem is from the user. To sho) an e<am0le of TIME5alues, )e randoml# chose a time 7ust after 80(004M. .elo), the 5arious TIME )ith time ;one 5alues are designated as( The default, for &oth TIME and TIME3T4M', is to dis0la# si< digits of decimal 0recision in the second=s 0ortion. Time ;ones are set either at the s#stem le5el (*.3 Control$, the user le5el ()hen user is created or modified$, or at the session le5el as an o5erride. SETTIN& TIME 4ONES 4 Time [one should &e esta&lished for the s#stem and e5er# user in each different time ;one. 3etting the s#stem default time ;one( M1*I9/ 6E-E!4L 8I P < JK Cours, nP +82 to 83 KJ M1*I9/ 6E-E!4L 8H P < JK Minutes, n P +"F to "F KJ 3etting a 2ser=s time ;one reAuires choosing either L1C4L, -2LL, or a 5ariet# of e<0licit 5alues( 4 Teradata session can modif# the time ;one during normal o0erations )ithout reAuiring a logoff and logon. Using TIME 4ONES 4 user=s time ;one is no) 0art of the information maintained &# Teradata. The settings can &e seen in the e<tended information a5aila&le in the CEL' 3E33I1-reAuest. 8 !o) !eturned *ser "ame MDL 4ccount -ame MJL Logon *ate 00J80J8" Logon Time 0G(:3(:" Current *ata.ase 4ccounting Collation 43CII Character 3et 43CII Transaction 3emantics Teradata Current *ate9orm Integer*ate Session Time "one ##$## *efault Character T#0e L4TI- E<0ort Latin 8 E<0ort 2nicode 8 E<0ort 2nicode 4d7ust 0 E<0ort Man7i3JI3 8 E<0ort 6ra0hic 0 .# creating a ta&le and reAuesting the %ITC TIME [1-E o0tion for a TIME or TIME3T4M' data t#0e, this additional offset is also stored. The follo)ing 3C1% command dis0la#s a ta&le containing one timestam0 column )ith TIME [1-Eand one column as a timestam0 column )ithout TIME [1-E( S*O3 T,'LE Tstam6_test; Te<t of **L 3tatement !eturned 4s ro)s )ere inserted into the ta&le, the time ;one of the user=s session )as automaticall# ca0tured along )ith the data for T3W)ithW;one. 3toring the time ;one reAuires an additional 2 &#tes of storage &e#ond the dateXtime reAuirements. The ne<t 3ELECT sho) the data ro)s currentl# in the ta&le( SELECT * FROM Tstam6_test ; : !o)s !eturned TS6Qone TS6with6Qone TS6without6Qone 2TC 2000+80+08 0G(82(00.000000X0"(00 2000+80+08 0G(82(00.000000 E3T 2000+80+08 0G(82(00.000000X00(00 2000+80+08 0G(82(00.000000 '3T 2000+80+08 0G(82(00.000000+03(00 2000+80+08 0G(82(00.000000 CMT 2000+80+08 0G(82(00.000000+88(00 2000+80+08 0G(82(00.000000 Norma#i5ing TIME 4ONES Teradata has the a&ilit# to incor0orate the use of time ;ones into 3@L for a relati5e 5ie) of the data &ased on one localit# 5ersus another. This 3ELECT ad7usts the data ro)s &ased on their TIME [1-E data in the ta&le( : !o)s !eturned TS6Qone TS6with6Qone T6"ormal 2TC 2000+80+08 0G(82(00.000000X0"(00 2000+80+08 03(82(00.000000 E3T 2000+80+08 0G(82(00.000000X00(00 2000+80+08 0G(82(00.000000 '3T 2000+80+08 0G(82(00.000000+03(00 2000+80+08 88(82(00.000000 CMT 2000+80+08 0G(82(00.000000+88(00 2000+80+08 8F(82(00.000000 -otice that the Time [one 5alue )as added to or su&tracted from the time 0ortion of the time stam0 to ad7ust them to a 0ers0ecti5e of the same time ;one. 4s a result, at that moment, it has normali;ed the different Times [ones in res0ect to the s#stem time. 4s an illustration, )hen the transaction occurred at G(82 4M locall# in the '3T Time [one, it )as alread# 88(82 4M in E3T, the location of the s#stem. The times in the columns ha5e &een normali;ed in res0ect to the time ;one of the s#stem.
#ATE and TIME Inter<als To make Teradata 3@L more 4-3I com0liant and com0ati&le )ith other !*.M3 3@L, -C! has added I-TE!4L 0rocessing. Inter5als are used to 0erform *4TE, TIME and TIME3T4M' arithmetic and con5ersion. 4lthough Teradata allo)ed arithmetic on *4TE and TIME, it )as not 0erformed in accordance to 4-3I standards and therefore, an e<tension instead of a standard. %ith I-TE!4L &eing a standard instead of an e<tension, more 3@L can &e 0orted directl# from an 4-3I com0liant data&ase to Teradata )ithout con5ersion. 4dditionall#, )hen a data 5alue )as used to 0erform date or time math, it )as al)a#s >assumed? to &e at the lo)est le5el for the definition (da#s for *4TE and seconds for TIME$. -o), an# 0ortion of either can &e e<0ressed and used. I-TE!4L Chart The simple inter<als are0 The more in<ol<ed inter<als are0 /E4! M1-TC *4/ C12! MI-2TE 3EC1-* *4/ T1 C12! *4/ T1 MI-2TE *4/ T1 3EC1-* C12! T1 MI-2TE C12! T1 3EC1-* MI-2TE T1 3EC1-* Figure 8-9 Using Interva#s To use the 4-3I s#nta< for inter5als, the 3@L statement must &e 5er# s0ecific as to )hat the data 5alues mean and the format in )hich the# are coded. 4-3I standards tend to &e lengthier to )rite and more restricti5e as to )hat is and )hat is not allo)ed regarding the 5alues and their use. 3im0le I-TE!4L E<am0les using literals( 20TER1,L 7M<<9 &,(=C> 20TER1,L 7C9 MO0T* 20TER1,L -7;K9 *O-R Com0le< I-TE!4L E<am0les using literals( 20TER1,L 9.M IK)C<)I<9 &,( TO SECO0& 20TER1,L 9I;)I;9 *O-R TO M20-TE 20TER1,L 9I;)I;9 M20-TE TO SECO0& 9or se5eral of the I-TE!4Lliterals, their use seems o&5ious &ased on the literal non+numeric literals used. Co)e5er, notice that the C12! T1 MI-2TE and the MI-2TE T1 3EC1-* a&o5e, are not so o&5ious. Therefore, the declaration of the meaning is im0ortant. 9or instance, notice that the# are coded as character literals. This allo)s for use of a slash (J$, colon (( $ and s0ace as 0art of the literal. 4lso, notice the use of a negati5e time frame reAuires a >+? sign to &e outside of the Auotes. The 0resence of the Auotes also denotes that the numeric 5alues are treated as character for con5ersion to a 0oint in time. The format of a timestam0 reAuires the s0ace &et)een the da# and hour )hen using inter5als. 9or e<am0le, notice the &lank s0ace &et)een the da# and hour in the com0ound *4/ T1 C12! inter5al. %ithout the s0ace, it is an error. INTE)%A, Arit"metic $it" DATE and TIME To use *4TE and TIME arithmetic, it is im0ortant to kee0 in mind the results of 5arious o0erations. The chart &elo) sho)s the Teradata im0lied arithmetic results. #ATE and TIME arithmetic &esults prior to inter<als0 *4TE + *4TE P Integer (da#s$ *4TE M1* *4TE P Integer (da# of month$ *4TE J 800 P Integer (#ear and month$ *4TE J 80000 P Integer (#ear$ -*;E - -*;E = *nteger 8hours9 65-E W or - *nteger = 65-E Figure 8-10 The chart &elo) sho)s the 4-3I e<0licit arithmetic results. #ATE and TIME arithmetic &esults prior to inter<als0 *4TE + *4TE P Inter5al TIME + TIME P Inter5al TIME3T4M' + TIME3T4M' P Inter5al *4TE X or + Inter5al P *4TE TIME X or + Inter5al P TIME TIME3T4M' X or + Inter5al P TIME3T4M' I-TE!4L X or + Inter5al P Inter5al Figure 8-11 -ote( It makes little sense to add t)o dates together. Traditionall#, the out0ut of the su&traction is an integer, u0 to 2.8:H &illion. Co)e5er, Teradata kno)s that )hen an integer is used in a formula )ith a date, it must re0resent a num&er of da#s. The follo)ing uses the 4-3I re0resentation for a *4TE( SELECT =&,TE OIJJJ-I<-<IO - &,TE OIJKK-I<-<I9> ,S ,ssumed_&as ; 8 !o) !eturned Assumed6#ays :08H The ne<t 3ELECT uses the 4-3I e<0licit *4/ inter5al( SELECT =&,TE OIJJJ-I<-<IO - &,TE OIJKK-I<-<I9> &,( ,S ,ctual_&as ; KKKK 9ailure H:"3 Internal 9ield 15erflo) The a&o5e reAuest fails on an o5erflo) of the I-TE!4L. 2sing this 4-3I inter5al, the out0ut of the su&traction is an inter5al )ith : digits. The default for all inter5als is 2 digits and therefore the o5erflo) occurs until the 3ELECT is modified )ith *4/(:$, &elo)( SELECT =&,TE OIJJJ-I<-<IO - &,TE OIJKK-I<-<I9> #A!(J) ,S ,ctual_&as ; 8 !o) !eturned Actual6#ays :08H -ormall#, a date minus a date #ields the num&er of da#s &et)een them. To see months instead, the follo)ing 3ELECT o0erations use literals to demonstrate the con5ersions 0erformed on 5arious *4TE and I-TE!4L data( SELECT =&,TE O;<<<-I<-<IO E &,TE OIJJJ-I<-<IO> MO0T* =T$tle 7Months9> ; 8 !o) !eturned Months 82 The ne<t 3ELECT sho)s I-TE!4Lo0erations used )ith TIME( 8 !o) !eturned Actual6hours Actual6minutes Actual6seconds Actual6secondsJ 2 8"" F300.000000 F300.0000 4lthough Inter5als tend to &e more accurate, the# are more restricti5e and therefore, more care is reAuired )hen coding them into the 3@L constructs. Co)e5er, one miscalculation, like in the o5erflo) e<am0le, and the 3@L fails. 4dditionall#, FFFF is the largest 5alue for an# inter5al. Therefore, it might &e reAuired to use a com&ination of inter5als, such as( M1-TC3 to *4/3 in order to recei5e an ans)er )ithout an o5erflo) occurring. CAST Using Interva#s Com0liance( 4-3I The C43T function )as seen in an earlier cha0ter as the 4-3I method for con5erting data from one t#0e to another. It can also &e used to con5ert one I-TE!4L to another I-TE!4L re0resentation. 4lthough the C43T is normall# used in the 3ELECT list, it )orks in the %CE!E clause for com0arison reasons. .elo) is the s#nta< for using the C43T )ith a date( The follo)ing con5erts an I-TE!4L of I #ears and 2 months to an I-TE!4L num&er of months( SELECT C,ST= =20TER1,L ON-<;O (E,R TO MO0T*> ,S 20TER1,L MO0T* >; 8 !o) !eturned K4?> H: Logic seems to dictate that if months can &e sho)n, the #ears and months should also &e a5aila&le. This reAuest attem0ts to con5ert 8300 months to sho) the num&er of #ears and months( KKK 9ailure H:"3 Inter5al 9ield 15erflo). The a&o5e failed &ecause the num&er of months takes more than t)o digits to hold a num&er of #ears greater than FF. The fi< is to change the /E4! to /E4!(3$ and rerun( 8 !o) !eturned !ears R Months 800+02 The &iggest ad5antage in using the I-TE!4L 0rocessing is that 3@L )ritten on another s#stem is no) com0ati&le )ith Teradata. 4t the same time, care must &e taken to use a re0resentation that is large enough to contain the ans)er. The default is 2 digits and an#thing larger, : digits ma<imum, must &e literall# reAuested. The incorrect si;e results in an 3@L runtime error. The ne<t section on the 3#stem Calendar demonstrates another )a# to con5ert from one inter5al of time to another.
(;E&LAPS Com0ati&ilit#( Teradata E<tension %hen )orking )ith dates and times, sometimes it is necessar# to determine )hether t)o different ranges ha5e common 0oints in time. Teradata 0ro5ides a .oolean function to make this test for #ou. It is called 1E!L4'3L it e5aluates true, if multi0le 0oints are in common, other)ise it returns a false. The s#nta< of the 1E!L4'3 is( The follo)ing 3ELECT tests t)o literal dates and uses the 1E!L4'3 to determine )hether or not to dis0la# the character literal( 8 !o) !eturned The dates o5erla0 The literal is returned &ecause &oth date ranges ha5e from 1cto&er 8" through -o5em&er 30 in common. The ne<t 3ELECT tests t)o literal dates and uses the 1E!L4'3 to determine )hether or not to dis0la# the character literal( -o !o)s 9ound The literal )as not selected &ecause the ranges do not o5erla0. 3o, the common single date of -o5em&er 30 does not constitute an o5erla0. %hen dates are used, 2 da#s must &e in5ol5ed and )hen time is used, 2 seconds must &e contained in &oth ranges. The follo)ing 3ELECT tests t)o literal times and uses the 1E!L4'3 to determine )hether or not to dis0la# the character literal( The times o5erla0 This is a trick# e<am0le and it is sho)n to 0ro5e a 0oint. 4t first glance, it a00ears as if this ans)er is incorrect &ecause 02(08(00 looks like it starts 8 second after the first range ends. Co)e5er, the s#stem )orks on a 2:+hour clock )hen a date and time (timestam0$ is not used together. Therefore, the s#stem considers the earlier time of 24M time as the start and the later time of G 4M as the end of the range. Therefore, not onl# do the# o5erla0, the second range is entirel# contained in the first range. The follo)ing 3ELECT tests t)o literal dates and uses the 1E!L4'3 to determine )hether or not to dis0la# the character literal( -o !o)s 9ound %hen using the 1E!L4'3function, there are a cou0le of situations to kee0 in mind( 8. 4 single 0oint in time, i.e. the same date, does not constitute an o5erla0. There must &e at least one second of time in common for TIME or one da# )hen using *4TE. 2. 2sing a -2LL as one of the 0arameters, the other *4TE or TIME constitutes a single 0oint in time 5ersus a range.
System %alendar Com0ati&ilit#( Teradata E<tension 4lso in 2!3, Teradata has a s#stem calendar that is 5er# hel0ful )hen date com0arisons more com0le< than month, da# and #ear are needed. 9or e<am0le, most &usinesses reAuire com0arisons from 8 st Auarter to 2 nd Auarter. It is &est used to a5oid maintaining #our o)n calendar ta&le or 0erforming #our o)n so0histicated 3@L calculations to deri5e the needed date 0ers0ecti5e. Teradata=s calendar is im0lemented using a &ase date ta&le named caldates )ith a single column named C*4TE3. The &ase ta&le is ne5er referenced. Instead, it is referenced using the 5ie) named C4LE-*4!. The &ase ta&le contains ro)s )ith dates Januar# 8, 8F00 through *ecem&er 38, 2800. The s#stem calendar ta&le and 5ie)s are stored in the 3#sWcalendar data&ase. This is a calendar from Januar# through *ecem&er and has nothing to do )ith fiscal calendars. The 0ur0ose of the s#stem calendar is to 0ro5ide an eas# )a# to com0are dates. 9or e<am0le, com0aring acti5ities from the first Auarter of this #ear )ith the same Auarter of last #ear can &e Auite 5alua&le. The 3#stem Calendarmakes these com0arisons eas# com0ared to tr#ing to figure out the com0le<it# of the 5arious dates. The ne<t 0age contains a list of column names, their res0ecti5e data t#0es, and a &rief e<0lanation of the 0otential 5alues calculated for each )hen using the C4LE-*4! 5ie)( %olumn "ame #ata Type #escription 6 calendarWdate *4TE 3tandard Teradata date EAui5alenc#( *4TE da#WofW)eek ./TEI-T 8+H, )here 8 is 3unda# EAui5alenc#( (*4TE + *4TE$ M1* H da#WofWmonth ./TEI-T 8+38, some months ha5e less EAui5alenc#( *4TE M1* H da#WofW#ear 3M4LLI-T 8+3II, Julian da# of the #ear EAui5alenc#( *4TE M1* 800 or EBT!4CT *a# da#WofWcalendar I-TE6E! -um&er of da#s since 08J08J8F00 EAui5alenc#( *4TE + 808(date$ )eekda#WofWmonth ./TEI-T The seAuence of a da# )ithin a month, first 3unda#P8, second 3unda#P2, etc EAui5alenc#( -one kno)n )eekWofWmonth ./TEI-T 0+", seAuential )eek num&er )ithin a month, 0artial )eek starts at 0 EAui5alenc#( -one kno)n )eekWofW#ear ./TEI-T 0+"3, seAuential )eek num&er )ithin a #ear, 0artial )eek starts at 0 EAui5alenc#( -one kno)n )eekWofWcalendar I-TE6E! -um&er of )eeks since 08J08J8F00 EAui5alenc#( (*4TE D 808(date$$JH monthWofWAuarter ./TEI-T 8+3, each Auarter has 3 months EAui5alenc#( C43E EBT!4CT Month monthWofW#ear ./TEI-T 8+82, u0 to 82 months 0er #ear EAui5alenc#( *4TEJ800 M1* 800 or EBT!4CTMonth monthWofWcalendar I-TE6E! -um&er of months since 08J08J8F00 EAui5alenc#( -one needed AuarterWofW#ear ./TEI-T 8+:, u0 to : Auarters 0er #ear EAui5alenc#( C43E EBT!4CT Month AuarterWofWcalendar I-TE6E! -um&er of Auarters since 08J08J8F00 EAui5alenc#( -one needed #earWofWcalendar 3M4LLI-T 3tarts at 8F00 EAui5alenc#( EBT!4CT /ear It a00ears that the least useful of these columns are all the names that end )ith >WofWcalendar.? 4s seen in the a&o5e descri0tions, these 5alues are all calculated starting at the calendar reference date of Januar# 8, 8F00. 2nless a &usiness transaction occurred on that date, the# are meaningless. The &iggest &enefit of the 3#stem Calendar is for determining the follo)ing( *a# of the %eek, %eek of the Month, %eek of the /ear, Month of the @uarter and @uarter of the /ear. Most of the 5alues are 5er# straightfor)ard. Co)e5er, the column called %eekWofWMonth deser5es some discussion. The descri0tion indicates that a 0artial )eek is )eek num&er 0. 4 0artial )eek is an# first )eek of a month that does not start on a 3unda#. Therefore, not all months ha5e a )eek 0 &ecause some do start on 3unda#. Ca5ing these column references a5aila&le, there is less need to make as man# com0ound com0arisons in 3@L. 9or instance, to sim0l# determine a Auarter reAuires 3 com0arisons, one for each month in that Auarter. %orse #et, each Auarter of the #ear )ill ha5e 3 different months. Therefore, the 3@L might reAuire modification each time a different Auarter )as desired. The ne<t 3ELECT uses the 3#stem Calendar to o&tain the 5arious date related ro)s for 1cto&er 8, 2008( 8 !o) !eturned calendar6date ?='=?'?= da#WofW)eek 2 da#WofWmonth 8 da#WofW#ear 2H: da#WofWcalendar 3H8I: )eekda#WofWmonth 8 )eekWofWmonth 0 )eekWofW#ear 3F )eekWofWcalendar "30F monthWofWAuarter 8 monthWofW#ear 80 monthWofWcalendar 8222 AuarterWofW#ear 3 AuarterWofWcalendar :0G #earWofWcalendar 2008 3ince the calendar is a 5ie), it is used like an# other ta&le and columns are selected or com0ared from it. Co)e5er, not all columns of all ro)s are needed for e5er# a00lication. 2nlike a user created calendar, it )ill &e faster. The 0rimar# reason for this is due to reduced in0ut reAuirements. Each date is onl# : &#tes stored as *4TE. The desired column 5alues are materiali;ed from the stored date. It makes sense that less I1 eAuates to a faster res0onse. 3o, : &#tes 0er date are read instead of 32 or more &#tes 0er date needed. There ma# &e hundreds of different dates in a ta&le )ith millions of ro)s. Therefore, utili;ing the Teradata s#stem calendar makes good sense. 3ince the s#stem calendar is a 5ie) or 5irtual ta&le, its 0rimar# access is 5ia a 7oin to a stored date (i.e. &illing or 0a#ment date$. %hether the date is the current date or a stored date, it can &e 7oined to the calendar. %hen a 7oin is 0erformed, a ro) is materiali;ed in cache to re0resent the 5arious as0ects of that date. The follo)ing e<am0les demonstrate the use of the %CE!E clause for these com0arisons using months instead of Auarters (%CE!E MonthWofW/ear P 8 1! MonthWofW/ear P 2 1! MonthWofW/ear P 3 5s. %CE!E @uarterWofW/ear P 8$ and the *a#WofW)eek column instead of *4TE M1* H to sim0lif# coding( 8 !o) !eturned (rder6date (rder6total Quarter6o.6!ear 8eek6o.6Month FFJ0FJ0F Y23,:":.G: 3 8 4s nice as it is to ha5e a num&er that re0resents the da# of the )eek, it still isn=t as clear as it might &e )ith a little creati5it#. This C!E4TE T4.LE &uilds a ta&le called %eekW*a#s and 0o0ulates it )ith the English name of the )eek da#s( 1nce the ta&le is a5aila&le, it can &e incor0orated into 3@L to make the out0ut easier to read and understand, like the follo)ing( 2 !o)s !eturned (rder6date (rder6total #ay6o.68eek 8kday6#ay FFJ0FJ0F Y23,:":.G: " Thursda# FFJ80J08 Y",888.:H I 9rida# 4s demonstrated in this cha0ter, there are man# )a#s to incor0orate dates and date logic into 3@L. The format of the date can &e ad7usted using the *4TE91!M. The 3@L ma# use 4-3I functions or Teradata ca0a&ilities and functions. -o) #ou are read# to go &ack and forth )ith a date (0un intended$.
Trans.orming %haracter #ata Most of the time, it is acce0ta&le to dis0la# data directl# as it is stored in the data&ase. Co)e5er, there are times )hen it is not acce0ta&le and the character data must &e tem0oraril# transformed. It might need shortening or something as sim0le as eliminating undesired s0aces from a 5alue. The tools to make these changes are discussed here. Earlier, )e sa) the C43T function as a techniAue to con5ert data. It can &e used to truncate data unless running in 4-3I mode, )hich does not allo) truncation. These functions 0ro5ide an alternati5e to using C43T, &ecause the# do not truncate data. Instead, the# allo) a 0ortion of the data to &e returned. This is a slight distinction, &ut enough to allo) the 0rocessing to 0ro5ide some interesting ca0a&ilities. %e )ill e<amine the CC4!4CTE!3, T!IM, 32.3T!I-6, 32.3T!, '13ITI1- and I-*EB functions. 4lone, each function 0ro5ides a ca0a&ilit# that can &e useful )ithin 3@L. Co)e5er, )hen com&ined, the# 0ro5ide some 0o)erful functionalit#. This is an e<cellent time to remem&er one of the 0rimar# differences &et)een 4-3I mode and Teradata mode. 4-3I mode is case sensiti5e and Teradata mode is not. Therefore, the out0ut from most of these functions is sho)n here in &oth modes.
%9A&A%TE&S )unction Com0ati&ilit#( Teradata E<tension The CC4!4CTE!3 function is used to count the num&er of characters stored in a data column. It is easiest to use and the most hel0ful )hen the characters &eing counted are stored in a 5aria&le length as a 4!CC4! column. 4 4!CC4! stores onl# the characters in0ut and no trailing s0aces after the last non+s0ace character. %hen referencing a fi<ed length CC4! column, the CC4!4CTE!3 function al)a#s returns a num&er that re0resents the ma<imum num&er of characters defined. This is &ecause the data&ase must store the data and 0ad to the full length using literal s0aces. 4 s0ace is a 5alid character and therefore, the CC4!4CTE!3 function counts e5er# s0ace. The s#nta< of the CC4!4CTE!3 function( C%ARACTERS ( Ncolumn+nameO $ 1r C%AR ( Ncolumn+nameO $ To use the CC4!4CTE!3 (can &e a&&re5iated as CC4!$ function, sim0l# 0ass it a column name. %hen referenced in the 3ELECT list, it dis0la#s the num&er of characters. %hen )ritten into the %CE!E clause, it can &e used as a com0arison 5alue to decide )hether or not the ro) should &e returned. The Em0lo#ee ta&le is used to demonstrate the functions in this cha0ter. The contents of this ta&le is listed &elo)( Em0lo#ee Ta&le + contains F em0lo#ees Employee6"o Last6"ame )irst6name Salary #ept6"o PK FK UPI NUSI NUSI 8232"HG 82"I3:F 23:828G Cham&ers Carrison !eill# Mandee Cer&ert %illiam :G,G"0.00 ":,"00.00 3I,000.00 800 :00 :00 238222" 2000000 800023: 882833: 832:I"H 8333:": Larkins Jones 3m#the 3trickling Coffing 3mith Loraine 3Auigg# !ichard Cletus .ill# John :0,200.00 32,G00."0 I:,300.00 ":,"00.00 :8,GGG.GG EH,00=.== 300 80 :00 200 200 Figure 9-1 The ne<t 3ELECT demonstrates ho) to code using the CC4!function in &oth the 3ELECT list as )ell as in the %CE!E, 0lus the ans)er set( : !o)s !eturned )irst6name %6length Mandee I Cletus I .ill# " John : If there are leading and im&edded s0aces stored )ithin the column, the CC4! function counts them as 5alid or significant data characters. The ans)er is e<actl# the same using CC4! in the 3ELECT list and the alias in the %CE!E instead of re0eating the CC4! function( : !o)s !eturned )irst6name %6length Mandee I Cletus I .ill# " John : 4s mentioned earlier, the CC4!function )orks &est on 4!CC4!data. The follo)ing demonstrates its result on CC4! data &# retrie5ing the last name and the length of the last name )here the first name contains more than H characters( : !o)s !eturned Last6name %6length Cham&ers 20 Coffing 20 3mith 20 3trickling 20 4gain, the s0ace characters are 0resent in the data and therefore counted. Cence, all the last names are 20 characters long. The com0arison is on the first name &ut the dis0la# is &ased entirel# on the last name. The CC4! function is hel0ful for determining demogra0hic information regarding the 4!CC4!data stored )ithin the Teradata data&ase. Co)e5er, sometimes this same information is needed on fi<ed length CC4! data. %hen this is the case, the T!IM function is hel0ful.
%9A&A%TE&6LE"7T9 )unction Com0ati&ilit#( 4-3I The CC4!4CTE!WLE-6TC function is used to count the num&er of characters stored in a data column. It is the 4-3I eAui5alent of the Teradata CC4!4CTE!3 function a5aila&le in 2!:. Like CC4!4CTE!3, it=s easiest to use and the most hel0ful )hen the characters &eing counted are stored in a 5aria&le length 4!CC4! column. 4 4!CC4! stores onl# the characters in0ut and no trailing s0aces. %hen referencing a fi<ed length CC4! column, the CC4!4CTE!WLE-6TC function al)a#s returns a num&er that re0resents the ma<imum num&er of characters defined. This is &ecause the data&ase must store the data and 0ad to the full length using literal s0aces. 4 s0ace is a 5alid character and therefore, the CC4!4CTE!WLE-6TC function counts e5er# s0ace. The s#nta< of the CC4!4CTE!WLE-6TC function( %9A&A%TE&6LE"7T9 = <column-name> > To use the CC4!4CTE!WLE-6TC function, sim0l# 0ass it a column name. %hen referenced in the 3ELECT list, it dis0la#s the num&er of characters. %hen )ritten into the %CE!E clause, it can &e used as a com0arison 5alue to decide )hether or not the ro) should &e returned. The contents of the same Em0lo#ee ta&le a&o5e is also used to demonstrate the CC4!4CTE!WLE-6TC function. The ne<t 3ELECT demonstrates ho) to code using the CC4!4CTE!WLE-6TC function in &oth the 3ELECT list as )ell as in the %CE!E, 0lus the ans)er set( : !o)s !eturned )irst6name %6length Mandee I Cletus I .ill# " John : If there are leading and im&edded s0aces stored )ithin the column, the CC4!4CTE!WLE-6TC function counts them as 5alid or significant data characters. 4s mentioned earlier, the CC4!4CTE!WLE-6TC function )orks &est on 4!CC4! data. The follo)ing demonstrates its result on CC4! data &# retrie5ing the last name and the length of the last name )here the first name contains more than H characters( : !o)s !eturned Last6name %6length Cham&ers 20 Coffing 20 3mith 20 3trickling 20 4gain, the s0ace characters are 0resent in the data and therefore counted. Cence, all the last names are 20 characters long. The com0arison is on the first name &ut the dis0la# is &ased entirel# on the last name. The CC4!4CTE!WLE-6TC function is hel0ful for determining demogra0hic information regarding the 4!CC4! data stored )ithin the Teradata data&ase. Co)e5er, sometimes this same information is needed on fi<ed length CC4!data. %hen this is the case, the T!IM function is hel0ful.
(%TET6LE"7T9 )unction Com0ati&ilit#( 4-3I The 1CTETWLE-6TC function is used to count the num&er of characters stored in a data column. It is another 4-3I eAui5alent of the Teradata CC4!4CTE!3 function a5aila&le in 2!:. Like CC4!4CTE!3, it=s easiest to use and the most hel0ful )hen the characters &eing counted are stored in a 5aria&le length 4!CC4! column. 4 4!CC4! stores onl# the characters in0ut and no trailing s0aces. %hen referencing a fi<ed length CC4! column, the 1CTETWLE-6TCfunction al)a#s returns a num&er that re0resents the ma<imum num&er of characters defined. This is &ecause the data&ase must store the data and 0ad to the full length using literal s0aces. 4 s0ace is a 5alid character and therefore, the 1CTETWLE-6TC function counts e5er# s0ace. The s#nta< of the 1CTETWLE-6TCfunction( &CTET_LENGT% ( Ncolumn+nameO $ To use the 1CTETWLE-6TCfunction, sim0l# 0ass it a column name. %hen referenced in the 3ELECT list, it dis0la#s the num&er of characters. %hen )ritten into the %CE!E clause, it can &e used as a com0arison 5alue to decide )hether or not the ro) should &e returned. The contents of the same Em0lo#ee ta&le a&o5e is also used to demonstrate the 1CTETWLE-6TCfunction. The ne<t 3ELECT demonstrates ho) to code using the 1CTETWLE-6TC function in &oth the 3ELECT list as )ell as in the %CE!E, 0lus the ans)er set( : !o)s !eturned )irst6name %6length Mandee I Cletus I .ill# " John : If there are leading and im&edded s0aces stored )ithin the column, the 1CTETWLE-6TC function counts them as 5alid or significant data characters. 4s mentioned earlier, the 1CTETWLE-6TC function )orks &est on 4!CC4! data. The follo)ing demonstrates its result on CC4! data &# retrie5ing the last name and the length of the last name )here the first name contains more than H characters( : !o)s !eturned Last6name %6length Cham&ers 20 Coffing 20 3mith 20 3trickling 20 4gain, the s0ace characters are 0resent in the data and therefore counted. Cence, all the last names are 20 characters long. The com0arison is on the first name &ut the dis0la# is &ased entirel# on the last name. The 1CTETWLE-6TCfunction is hel0ful for determining demogra0hic information regarding the 4!CC4! data stored )ithin the Teradata data&ase. Co)e5er, sometimes this same information is needed on fi<ed length CC4!data. %hen this is the case, the T!IM function is hel0ful. N43-*3&E 0E.E ---- httpXKK.co''ingd.comKs!lKtds!lutpKtrim.htm