Sie sind auf Seite 1von 41

SQL Training

Contents:

Basic Structure of SQL


 SELECT & FROM
 Table Aliases
 Column Aliases with and without Spaces
 Types of Column Content
 Concatenating
 ORDER BY

Building Queries to Pull Just the Results You Want


 The WHERE Clause – SQL’s Filter
 Using Comparison Operators
 Using Other Operators
 Handling Cells with No Data – aka NULLs

Aggregate Queries and HAVING Clause


 Aggregate Queries
 Pulling DISTINCT Records
 Aggregate Functions
 Aggregate Functions with DISTINCT
 The GROUP BY Clause
 The HAVING Clause

Joining Tables
 Joining 2 or More Tables – Old School & New School Approaches
 INNER Joins vs. OUTER Joins
 WHERE Clause Conditions with OUTER Joins
 One-to-Many Joins

Dealing with Dates in SQL


 DATE vs. DATETIME columns
 The TO_CHAR() Function with Dates
 The TRUNC() Function with Dates
 The TO_DATE() Function
 Using BETWEEN with Dates
 Other Date functions

Subqueries
 Subqueries

1
 Avoiding 1-to-many joins

DECODE & CASE


 The DECODE() Function
 The CASE Function
 NVL2 Function
 Data Type Consistency

Basic Structure of SQL


 SELECT & FROM
 Table Aliases
 Column Aliases with and without Spaces
 Types of Column Content
 Concatenating
 ORDER BY

SQL TOPICS
SQL is a language used (for our purposes) to ask a Database for specific information in a specific format. We generally use it to pull
sets of data, known as Result Sets, which we use to create reports and answer business questions, such as “What was the List Price
value of all goods shipped to customers in 2008?” or “When did we receive the first unit of Twilight Book 1 from Hachette?”.

There are two necessary sections, or clauses, to an SQL query: SELECT and FROM

SELECT & FROM


SELECT tells the database a list of one or more Elements that you want to include in your results
FROM tells the database a list of one or more tables (or views) from which you want to pull the information

A basic query with just these necessary clauses might be:

SELECT WAREHOUSE_ID WAREHOUSE_ID NAME


, NAME IMJO Ingram Micro, Jonestown, PA
FROM D_WAREHOUSES; TUL1 Coffeyville
GCWP Granite City Tools – MN
NRT3 Ichikawa
This query would pull two elements – in this case columns
A00L SED International-Dallas
WAREHOUSE_ID and NAME - from the table named MSC7 Bemrose Booth
2
D_WAREHOUSES, producing a Result Set like this one. Note ECEL Amazon Wireless
that only part of the result set is shown: TAJ9 Target: Light Source

WAREHOUSE_ID and NAME are the column names that we wanted to pull, so we listed them as elements in the SELECT clause.
Notice that the elements in the SELECT clause are separated by commas in the SQL query. The query also ends with a semicolon,
which lets Oracle know that’s the end of the query. ETL Manager doesn’t require this, but it’s good practice to include it.

You may also notice that I put the comma at the beginning of a new line followed by the next element, which is a little different than
might seem intuitive. This is because the comma is only there because there is a second element. If I wanted to delete the NAME
column, I’d also need to delete the comma; otherwise I’d get an error. I find that by putting it at the start of the new line where the
new element is, I can easily delete that row when editing and won’t miss it and cause an error.

SELECT WAREHOUSE_ID
FROM D_WAREHOUSES;

Table Aliases
For reasons that will become apparent when you start joining multiple tables together, it’s best to use Table Aliases when writing
your SQL. A Table Alias is a shorthand name, like nickname, that tells the query from which table each column referenced comes.
To alias a table, you simply add your nickname after the table name, with a space separating them (fcs in the example below). You
also put that alias at the start of each column name, separated from the column name by a period, like this:

SELECT WAREHOUSE_ID SELECT fcs.WAREHOUSE_ID


, NAME BECOMES: , fcs.NAME
FROM D_WAREHOUSES; FROM D_WAREHOUSES fcs;

The table alias ensures that the system knows exactly where each column is derived. For example, in a more complex query you
might have two tables, each with a WAREHOUSE_ID column, and Oracle needs to know to which table the column is associated.

Column Aliases
Another type of alias is the Column Alias. This is a way to change the name of a column to something that’s more meaningful to you
or your customers, and is what shows up as the Column Headers in your result set. A few examples of Column aliases are below:

SELECT fcs.WAREHOUSE_ID FC_Code


, fcs.REGION_ID AS Region
, fcs.NAME AS "FC Name"
FROM D_WAREHOUSES fcs;

There are two ways you can change the name of a column. The first is to simply put a space between the column name, and your
alias for it after, as we’ve done above to alias the first column, WAREHOUSE_ID, to FC_Code. You can also put the word AS in
between the column name and your alias, as we’ve done to alias the second column, REGION_ID, to Region. Including AS isn’t
necessary, but arguably makes it more clear that you’ve aliased the column. If you want to alias a column to something that has a
space in it, like we’ve done to alias the column NAME to FC Name, you have to enclose it in double quotes, so the system knows
where the alias starts and ends. I usually avoid including spaces in column aliases, because it can lead to problems in more complex
queries, and the standard is to use underscores, as we did with FC_Code. Also, be sure not to start your Column Aliases with a
number, or make them a SQL keyword (like DATE, CUBE or FROM), as this will cause confusing errors.

Types of Elements
A SELECT clause can include elements beyond just columns from tables. There are a number of different elements that can be
included, depending on your needs, including:

 Literal values, such as numbers (13) or text strings (‘Howdy!’), that return exactly what you enter
 Expressions (aka formulas), such as doi.QUANTITY_SHIPPED + 5, which do math or other logical procedures
 Function calls, such as TO_CHAR(ddo.ORDER_DAY,’MM/DD/YYYY’) that transform column information

3
 Pseudocolumns, such as ROWID, ROWNUM, or LEVEL

(Pseudocolumns aren’t columns that actually exist in any table, but are columns you can include in any query for specific uses.)

An example of a query with each of these types of columns is:

SELECT fcs.WAREHOUSE_ID WAREHOUSE_I 1 'HOWDY FCS.REGION_ID+ SUBSTR(FCS.NAME,0, ROWNU


, 13 D 3 !' 5 5) M
, 'Howdy!' 1
TGC2 3 Howdy! 6 Alen 1
, fcs.REGION_ID + 5
1
, SUBSTR(fcs.NAME,0,5) AAOP 3 Howdy! 6 Trend 5
, ROWNUM 1
FROM D_WAREHOUSES fcs; SAIN 3 Howdy! 6 Saint 7
1
Which yields this a result set that includes JACK 3 Howdy! 6 Jacks 8
1
these rows: WCSC 3 Howdy! 6 West 9
1
ABE1 3 Howdy! 6 Allen 757
1
IBEW 3 Howdy! 6 Ingra 969
1
LEX1 3 Howdy! 6 Lexin 970
1
SEA1 3 Howdy! 6 Seatt 971

You’ll notice that the 13 wasn’t enclosed in single quotation marks, like Howdy!. This is because the system understands the 13 is a
number. Text strings, like ‘Howdy!’ or ‘PHL1’ or ‘I love SQL’ need to be enclosed in single-quotes whenever you use them. Note that
the quotes don’t show up in your results. Also – the single quote in MS Word, Excel and Outlook is a different character, so it’s best
not to edit SQL in these programs (stick to Notepad, Notepad++, or the ETL Manager Profile SQL window). Notepad++ is a favorite of
many coders, as you can adjust the ‘Language’ to SQL and get helpful formatting, as well as indent entire sections with tab. It’s
available for download in Advertised Programs as “Open Source Notepad++”.

Concatenating
You can also concatenate information together in your select clause, including columns, numbers, text strings, etc. Unlike Excel,
where you concatenate using the ampersand (&) symbol, SQL uses two pipes ||. To get two pipes, hold down shift and hit the key
just above the Enter key on your keyboard twice. An example of concatenating is in the query below:

SELECT fcs.WAREHOUSE_ID
, 'Howdy! from ' || fcs.NAME
FROM D_WAREHOUSES fcs;

Which yields a result set that includes:

WAREHOUSE_I
D 'HOWDY!FROM'||FCS.NAME
SBTK Howdy! from Softbank BB
DGJP Howdy! from Digital Goods JP
ABGM Howdy! from Step2 UK Limited
ABGL Howdy! from Universal Cycles
LHR2 Howdy! from Plot 8 - Marston Gate

Notice that you need to include your space after 'Howdy! from inside the quotations in order to get it in the results, otherwise,
they’d look like:

WAREHOUSE_I
D HOWDY!FROM'||FCS.NAME
SBTK Howdy! fromSoftbank BB
DGJP Howdy! fromDigital Goods JP
ABGM Howdy! fromStep2 UK Limited
4
ABGL Howdy! fromUniversal Cycles
LHR2 Howdy! fromPlot 8 - Marston Gate

ORDER BY
Sometimes, the order of the results is important to answering your question or to displaying the results in the most meaningful way,
like ranking the highest units at the top, or alphabetizing a list of vendor codes. To order your results, you add an ORDER BY clause
to the end of the query, and then specify in that clause which element(s) to order your results by, and even which direction. For
example, you might want to see a list of warehouses names in alphabetical order:

SELECT fcs.NAME NAME


FROM D_WAREHOUSES fcs 2 Red Hens
ORDER BY fcs.NAME; 3 GIRLS DESIGN/KITTY A GO
32 North Corp
A & W Products Co In
A C R Logistics
A Plus Marketing
A'HOMESTEAD SHOPPE INC
A-America Inc
A.S. Diamonds
AAB Gourmet - Garden City
ABM Corp - Mira Loma
Beijing

Notice that numbers, spaces & symbols are ordered before letters, so 2 Red Hens comes before A.S. Diamonds, which comes before
AAB Gourmet. The default ordering is Ascending (0-9,A-Z).

5
You may also want to sort a list in the other direction. A common use case is when you want the highest number of something at
the top of a list, like the highest number of glance views in a list of ASINs. To do that, you add a space and the word DESC after the
element in your ORDER BY clause, to specify descending order:

SELECT fcs.NAME NAME


FROM D_WAREHOUSES fcs Beijing
ORDER BY fcs.NAME DESC; A'HOMESTEAD SHOPPE INC
ABM Corp - Mira Loma
A-America Inc
AAB Gourmet - Garden City
A.S. Diamonds
A Plus Marketing
A C R Logistics
A & W Products Co In
32 North Corp
3 GIRLS DESIGN/KITTY A GO
2 Red Hens

You may also want to order your results by multiple elements, in which case you include them all in your ORDER BY clause, in order
of importance, separated by commas:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D
FROM D_WAREHOUSES fcs 3 AARF
ORDER BY fcs.REGION_ID DESC 3 AARG
, fcs.WAREHOUSE_ID; 3 CAN1
3 DEKN
3 DGJP
3 NRT1
2 GLA1
2 LEJ1
2 LHR1
1 KTKN
1 MYTK
1 RNO1

Each element indicated in your ORDER BY clause can be sorting in a different direction. In the example above, we ordered the
REGION_ID column descending, and ordered the WAREHOUSE_ID column ascending (which is the default). You can also use any
type of element in your ORDER_BY clause, just like in your SELECT clause, including function calls and expressions.

6
Building Queries to Pull Just the Results You Want

 The WHERE Clause – SQL’s Filter


 Using Comparison Operators
 Using Other Operators
 Handling Cells with No Data – aka NULLs

SQL TOPICS

The WHERE Clause – SQL’s Filter


Although the SELECT and FROM clauses are the only required sections of a SQL query, they only allow you to pull every record from
a table – not just the ones you want. Imagine querying the D_CUSTOMER_SHIPMENT_ITEMS table, which BI Metadata shows has
over 5 billion rows of data. The output would be too large for Excel, and you’d have a lot of information you don’t really want.
That’s where the WHERE clause comes in.

I think of WHERE as the filters I put on the table, to filter out what I don’t want, and only let what I do want get through to my result
set. Each ‘filter’ in the where clause is a Condition that must be true in order to be returned by the query.

The WHERE clause goes after the FROM clause, but before the ORDER_BY clause (if you’re using one). The WHERE clause starts
with the word WHERE, and then is followed by one or more filters, called conditions. For example, if we wanted to pull an
alphabetical list of FCs in Japan & China (REGION_ID 3), we could run the following:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D NAME
, fcs.NAME 3 AARF ¿¿¿¿
FROM D_WAREHOUSES fcs 3 AARG Amazon¿¿
WHERE fcs.REGION_ID = 3 3 CAN1 Guangzhou
ORDER BY fcs.WAREHOUSE_ID; 3 DEKN ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 DGJP Digital Goods JP
3 FFSA ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
This would return the following dataset, limited to only WHERE
3 FMTT ¿¿¿¿¿¿¿¿¿
the REGION_ID is equal to 3: 3 FUOS ¿¿¿¿¿¿¿¿¿¿¿¿¿
3 KCFK Kenko.com, INC.
3 KTKN ¿¿¿¿¿¿¿
3 MYTK ¿¿¿¿¿¿¿¿¿¿¿¿¿¿-
3 NRT1 Narita
3 NRT2 Yachiyo-shi
3 NRT3 Ichikawa
3 OSKF Osakaya Books
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 OTOS ¿
3 PEK3 Beijing
3 SBTK Softbank BB
3 SHA1 Shanghai
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 YYGF ¿¿

Of course, you’re not limited to just one condition. You may want to filter you results by several criteria, and so would have multiple
conditions in the WHERE clause. For example, we might limit the above query further to only those FCs with WAREHOUSE_IDs that
start with the letter Y. (Don’t worry about what LIKE ‘Y%’ means exactly right now – we’ll get to that shortly. Just know it means
‘starts with Y’):

SELECT fcs.REGION_ID Yeilding the following result set:


7
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs REGION_I
WHERE fcs.REGION_ID = 3 D WAREHOUSE_ID NAME
AND fcs.WAREHOUSE_ID LIKE 'Y%' 3 YYGF ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
ORDER BY fcs.WAREHOUSE_ID;

Notice that we separated the two conditions in the WHERE clause with AND. This means that BOTH the first condition
(fcs.REGION_ID = 3) AND the second condition (fcs.WAREHOUSE_ID LIKE ‘Y%’) must be true.

You can also separate multiple conditions with OR, in which case either condition must be true. If we change the AND in our query
above to an OR, the results are much different:

SELECT fcs.REGION_ID REGION_I


, fcs.WAREHOUSE_ID D WAREHOUSE_ID NAME
, fcs.NAME 3 AARF ¿¿¿¿
FROM D_WAREHOUSES fcs 3 AARG Amazon¿¿
WHERE fcs.REGION_ID = 3 3 CAN1 Guangzhou
OR fcs.WAREHOUSE_ID LIKE 'Y%' 3 DEKN ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
ORDER BY fcs.WAREHOUSE_ID; 3 DGJP Digital Goods JP
3 FFSA ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 FMTT ¿¿¿¿¿¿¿¿¿
In this case, we pulled all FCs where the REGION_ID is equal to 3
3 FUOS ¿¿¿¿¿¿¿¿¿¿¿¿¿
OR where the WAREHOUSE_ID begins with Y, so we got all the 3 KCFK Kenko.com, INC.
FCs in Region 3 regardless of what letter they start with, plus all 3 KTKN ¿¿¿¿¿¿¿
the FCs in other regions that start with Y – which happens to 3 MYTK ¿¿¿¿¿¿¿¿¿¿¿¿¿¿-
3 NRT1 Narita
only include YAHA.
3 NRT2 Yachiyo-shi
3 NRT3 Ichikawa
3 OSKF Osakaya Books
3 OTOS ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 PEK3 Beijing
3 SBTK Softbank BB
3 SHA1 Shanghai
1 YAHA Yamazaki Tableware -- Hackettstown
3 YYGF ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿

And you can even get more complex by using parentheses to change how the AND and OR logic is applied. For example, maybe you
want a list of all FCs where the REGION_ID is 3 and either the WAREHOUSE_ID starts with Y or it starts with D. We could do that
using parentheses:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D NAME
, fcs.NAME 3 DEKN ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
FROM D_WAREHOUSES fcs 3 DGJP Digital Goods JP
WHERE fcs.REGION_ID = 3 ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 YYGF ¿
AND (fcs.WAREHOUSE_ID LIKE 'Y%'
OR
fcs.WAREHOUSE_ID LIKE 'D%') The results include only FCs where REGION_ID = 3 AND where
ORDER BY fcs.WAREHOUSE_ID; either the WAREHOUSE_ID starts with Y OR where the
WAREHOUSE_ID starts with D. Pages 20-22 of your textbook has
additional examples and some charts on how various logical
combinations of AND and OR, with and without parentheses are
evaluated.

Using Comparison Operators


In the examples above, we used two different ‘comparison operators’ in our WHERE clauses to limit our results: the equals symbol
(=) and LIKE. There are many more comparison operators available to help us apply conditions in our query.

8
The equals sign can be used to evaluate if something is equal to something else in a condition, as we did when we put fcs.REGION_ID
= 3 in our WHERE clause in the example above. Equality can also be evaluated for columns that contain text strings (VAR and
VARCHAR data type columns), in which case you must put that text string in a set of single quotation marks. For example:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D NAME
, fcs.NAME 1 PHL1 New Castle
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID = 'PHL1';
You can also evaluate whether something is NOT equal to something else, using either of two symbols: <> or !=. If we changed the
operator in the query above from = to !=, the query would give you all FCs except for PHL1.

SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID != 'PHL1';

If you want records that are greater than or less than a certain value, you can use the > and < symbols, as you do the =. You can also
evaluate if something is greater than or equal to using >=, and evaluate if something is less than or equal to using <=. And just like =
and !=, they can be used on text strings, too.

SELECT fcs.REGION_ID REGION_I


, fcs.WAREHOUSE_ID D WAREHOUSE_ID NAME
, fcs.NAME 1 YAHA Yamazaki Tableware -- Hackettstown
FROM D_WAREHOUSES fcs 1 WYTN WYNIT, Inc.
WHERE fcs.WAREHOUSE_ID >= 'WYTN';

The operator IN can also be used in a WHERE clause, when you have a list of things that you want to check for. To pull data from
D_WAREHOUSES where the FC was either PHL1 or RNO1, we could do it with two conditions, this way:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D NAME
, fcs.NAME 1 PHL1 New Castle
FROM D_WAREHOUSES fcs 1 RNO1 Fernley
WHERE fcs.WAREHOUSE_ID = 'PHL1'
OR fcs.WAREHOUSE_ID = 'RNO1';

Or we could get the same results with a single condition by using the IN operator:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D NAME
, fcs.NAME 1 PHL1 New Castle
FROM D_WAREHOUSES fcs 1 RNO1 Fernley
WHERE fcs.WAREHOUSE_ID IN ('PHL1','RNO1');

When you use the IN operator, you follow it with a list of values inside a set of parentheses, separated by commas. The condition
could be read as WHERE the WAREHOUSE_ID matches any of the values in the list ‘PHL1’,’RNO1’, so it returns information for any
record where WAREHOUSE_ID matches any value in the list. With just two values, as in the example, either the OR or IN method
takes about the same amount of time to write – but when you have many more values to evaluate, such as a list of 50 vendor codes,
IN becomes much quicker. (Note that the upper limit on values in the list of an IN condition is reportedly 1000.)

You can also query for records that are NOT IN a list, by putting NOT in front of IN. If we changed the condition in the last example
from IN to NOT IN, we’d get every FC except PHL1 and RNO1.

SELECT fcs.REGION_ID

9
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID NOT IN ('PHL1','RNO1');

Another great shortcut operator is BETWEEN. To query for warehouses PHL1 and PHL2, we can query:

This way: Or this way: Or even this way:

SELECT fcs.WAREHOUSE_ID SELECT fcs.WAREHOUSE_ID SELECT fcs.WAREHOUSE_ID


, fcs.NAME , fcs.NAME , fcs.NAME
FROM D_WAREHOUSES fcs FROM D_WAREHOUSES fcs FROM D_WAREHOUSES fcs
WHERE WHERE fcs.WAREHOUSE_ID IN WHERE fcs.WAREHOUSE_ID >= 'PHL1'
(fcs.WAREHOUSE_ID='PHL1'
('PHL1','PHL2'); AND fcs.WAREHOUSE_ID <= 'PHL2';
OR fcs.WAREHOUSE_ID='PHL2');

But imagine that you had a very long range, such as a date range of many weeks, and only wanted to pull a portion of them. You
wouldn’t want to have to list all of them. The quicker way to do this type of query would be to use the BETWEEN operator.

To use the BETWEEN operator, you follow the column you’re evaluating by the word BETWEEN, then the first value in the range,
followed by AND, and finally the last value in the range. It’s important to remember that the BETWEEN operator is Inclusive,
meaning your results will include anything between the numbers AND anything that matches the numbers.

SELECT fcs.WAREHOUSE_ID WAREHOUSE_I By using BETWEEN, we can return


, fcs.NAME D NAME PHL1, PHL3 (the ends of the range)
FROM D_WAREHOUSES fcs PHL1 New Castle and PHL2 – which falls between
WHERE fcs.WAREHOUSE_ID BETWEEN them alphabetically.
PHL2 Chambersburg
'PHL1' AND 'PHL3';
PHL3 Centerpoint

Yet another comparison operator that we used earlier is LIKE. The LIKE operator evaluates matching for columns with text strings
(CHAR and VARCHAR columns), and is usually used with a ‘pattern matching character’. The two ‘pattern matching characters’ (aka
wildcards) are % and _. The percent (%) symbol matches to a string of characters of any length, whereas the underscore (_) symbol
matches to any one character. Now our previous example of FCs starting with the letter Y should make more sense:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, fcs.WAREHOUSE_ID D D NAME
, fcs.NAME ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 YYGF ¿
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.WAREHOUSE_ID LIKE 'Y%';

So we are looking for any FC where REGION_ID = 3 and WAREHOUSE_ID is like a text string that starts with a letter Y, and is followed
by any number of characters. If we wanted to be more specific, we could query for any FC that starts with PH and ends with 1 using
the single character wildcard:

SELECT fcs.REGION_ID REGION_I


, fcs.WAREHOUSE_ID D WAREHOUSE_ID NAME
, fcs.NAME 1 PHL1 New Castle
FROM D_WAREHOUSES fcs 1 PHX1 Phoenix
WHERE fcs.WAREHOUSE_ID LIKE 'PH_1'; 1 PH01 LaserShip Philly

So we get back the three FCs that start with PH, end with 1, and have a single character in between: PHL1, PHX1 and PH01.
The LIKE operator can also be negated, like IN, with the addition of NOT:

SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
10
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID NOT LIKE 'PH_1';

Using Other Operators in the WHERE clause


Just like in the SELECT clause, you can use mathematical operators like +, -, and * in the WHERE clause to evaluate conditions. The
following query would return all FCs in Region 2, given that 2+1 = 3. An odd example, but I promise this is useful when you begin
using dates.

SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID + 1 = 3;

Handling Cells with No Data – aka NULLs


A NULL is a blank cell. A void. Nothing. Nada. Ziltch. But not Zero. Zero is something, which represents nothing. Confused yet?

REGION_I
D WAREHOUSE_ID NAME
1 PHL4 Carlisle
1 PHL5  
0 PHL1 New Castle

In the imaginary table above, the third record has a REGION_ID that is 0. But that second record has a NAME value that is NULL. It’s
empty. Since something can never be equal to nothing, you can’t use many of the usual conditional operators to evaluate whether a
record in a column is NULL. So SQL has special operators for NULLs, and some special functions for dealing with them, too.

If we wanted to look for any records in D_WAREHOUSES where the IP Address is NULL, we’d use the IS NULL operator:

SELECT fcs.REGION_ID REGION_I IP_ADDRESS_LIST_I


, fcs.IP_ADDRESS_LIST_ID D D WAREHOUSE_ID NAME
, fcs.WAREHOUSE_ID 3   NRT1 Narita
, fcs.NAME 3   AARG Amazon¿¿
FROM D_WAREHOUSES fcs 3   AARF ¿¿¿¿
WHERE fcs.REGION_ID = 3 3   OSKF Osakaya Books
AND fcs.IP_ADDRESS_LIST_ID IS NULL; 3   KCFK Kenko.com, INC.
3   SBTK Softbank BB
3   DGJP Digital Goods JP
3   OTOS ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3   YYGF ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3   MYTK ¿¿¿¿¿¿¿¿¿¿¿¿¿¿-
3   FFSA ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3   FMTT ¿¿¿¿¿¿¿¿¿
3   DEKN ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3   FUOS ¿¿¿¿¿¿¿¿¿¿¿¿¿
3   KTKN ¿¿¿¿¿¿¿

And just like IN and LIKE, you can negate IS NULL by sticking in a NOT:

SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.IP_ADDRESS_LIST_ID IS NOT NULL;

11
The IS NOT query will return the opposite results – all FCs where the IP Address field isn’t blank.

Since some columns have nulls (and we can tell which by the Nullable field in BI Metadata), and since <> or != operators will exclude
NULLs, you have to be careful sometimes if you want all records where a field is EITHER NULL or is not equal to a specified value.
You could write two conditions in your WHERE clause to evaluate the same column, like this:

SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND (fcs.IP_ADDRESS_LIST_ID IS NULL
OR fcs.IP_ADDRESS_LIST_ID != 1035);

But thankfully, SQL has a handy function called NVL, which translates any NULL values to another value that you specify, so you can
use standard comparison operators to evaluate the column in a single condition, without much extra work.

SELECT fcs.REGION_ID REGION_I IP_ADDRESS_LIST_I


, fcs.IP_ADDRESS_LIST_ID D D WAREHOUSE_ID NAME
, fcs.WAREHOUSE_ID 3   AARF ¿¿¿¿
, fcs.NAME 3   AARG Amazon¿¿
FROM D_WAREHOUSES fcs 3 1040 CAN1 Guangzhou
WHERE fcs.REGION_ID = 3 3   DEKN ¿¿¿¿¿¿¿¿¿¿¿¿¿¿
AND NVL(fcs.IP_ADDRESS_LIST_ID,0) != 1035; 3 25 NRT3 Ichikawa
3   OSKF Osakaya Books
The format of the NVL function is NVL, followed by a 3   OTOS ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
3 1039 PEK3 Beijing
parenthesis, inside of which are your column name, a comma, 3   SBTK Softbank BB
and then what you want nulls to be translated to. In the 3 1041 SHA1 Shanghai
example above, we translated any nulls in the column 3   YYGF ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
IP_ADDRESS_LIST_ID to the number 0. Then, we evaluate the
results for whether they are not equal to 1035. Since the nulls
are converted to 0, they are not equal to 1035, and will appear
in the results.

Had we left out the NVL function, the results would be very different:

SELECT fcs.REGION_ID REGION_I IP_ADDRESS_LIST_I WAREHOUSE_I


, fcs.IP_ADDRESS_LIST_ID D D D NAME
, fcs.WAREHOUSE_ID 3 1039 PEK3 Beijing
, fcs.NAME 3 1041 SHA1 Shanghai
FROM D_WAREHOUSES fcs 3 1040 CAN1 Guangzhou
WHERE fcs.REGION_ID = 3 3 25 NRT3 Ichikawa
AND fcs.IP_ADDRESS_LIST_ID != 1035;

And NVL can also be used in the SELECT clause, to replace NULLs with something more meaningful to the audience of the data. For
example, we might change any NULLS in the IP Address column to a zero, like this:

SELECT fcs.REGION_ID REGION_I WAREHOUSE_I


, NVL(fcs.IP_ADDRESS_LIST_ID,0) D NVL(FCS.IP_ADDRESS_LIST_ID,0) D NAME
, fcs.WAREHOUSE_ID 3 0 SBTK Softbank BB
, fcs.NAME Kenko.com,
3 0 KCFK INC.
FROM D_WAREHOUSES fcs
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
WHERE fcs.REGION_ID = 3 3 0 MYTK -
AND fcs.IP_ADDRESS_LIST_ID IS NULL
AND fcs.WAREHOUSE_ID LIKE '___K';

12
Aggregate Queries and HAVING Clause

 Aggregate Queries
 Pulling DISTINCT Records
 Aggregate Functions
 Aggregate Functions with DISTINCT
 The GROUP BY Clause
 The HAVING Clause

SQL TOPICS

Aggregate Queries
So far, we’ve created queries that pull all rows of data from a table using SELECT and FROM and used the WHERE clause to limit
which rows we pull. Now we’re going to aggregate (group together) multiple rows of data into a single row in the result set, using
the DISTINCT keyword, the GROUP BY clause, and some aggregate operators.

DISTINCT
The simplest form of aggregate query is one where you simply want to know all the unique values in a certain column of a table. For
example, you might want a list of all the possible values for the REGION_ID column in D_WAREHOUSES, so you know how to limit
your query properly. There are over 3000 rows of data in D_WAREHOUSES, but you can use DISTINCT to pull only the unique values
for the REGION_ID column.

To do that, write your query as you would to pull all the records for that column, but put the word DISTINCT after the SELECT but
before the column, like this:

SELECT /*+ use_hash(fcs) */ REGION_I


DISTINCT D
fcs.REGION_ID 1
FROM D_WAREHOUSES fcs; 2
3

The query returns 3 rows of data, one for each DISTINCT value in the REGION_ID column. Even though each value is in the table
many times in many records, the addition of the DISTINCT keyword limits the results to only the unique values.

If your SELECT clause has multiple elements, DISTINCT will return all the unique combinations of elements. Now that you know that
the values in REGION_ID are 1,2, and 3, you might want to know whether each Region has Delayed Allocation warehouses or not. To
do this, you again put DISTINCT before your first column:

SELECT /*+ use_hash(fcs) */ REGION_I IS_DELAYED_ALLOCATIO


DISTINCT D N
fcs.REGION_ID 1 Y
, fcs.IS_DELAYED_ALLOCATION 1 N
FROM D_WAREHOUSES fcs; 2 N
3 N

Now results tell us that Region 1 has some warehouses that are Delayed Allocations (Y) and some that are not (N), but the other two
regions only have warehouses that are not Delayed Allocation nodes. There are two rows for REGION_ID 1, because the value in the
IS_DELAYED_ALLOCATION for each is distinct, and DISTINCT finds all unique combinations of all the elements in the SELECT clause.
(Notice that you only include the DISTINCT keyword once, after the first element, even when there are multiple element.)

Other examples of using DISTINCT would be to find out what all the unique ORDER_TYPE values are in D_DISTRIBUTOR_ORDERS, or
to find a list of all ASINs we’ve ordered from a specific vendor in the past 6 months, and which of those were ever backordered.

Aggregate Functions

13
The DISTINCT function can help you get lists of unique values, and even answer some business questions, but you’ll also find you
want to count the number of POs placed, or sum the total quantity we ordered on a PO, or find the first date that we received
something from a vendor. All of these require the use of aggregate functions.

The main aggregate functions are


 COUNT – which counts how many values there are in a column
 MAX – which finds the maximum value in a column
 MIN – which finds the minimum value in a column
 SUM – which adds together the values in a column
 AVG – which averages the values in a column

All of these functions are used in the SELECT clause. The format is to start with the function, and then put the column you want to
aggregate in parentheses after it, like SUM(doi.QUANTITY) or COUNT(fcs.REGION_ID). Make sure to put the table alias inside the
function along with the column name. If we wanted to COUNT how many records there are in the D_WAREHOUSES table, we could
write the following:

SELECT /*+ use_hash(fcs) */ COUNT(FCS.WAREHOUSE_ID)


COUNT(fcs.NAME) 4960
FROM D_WAREHOUSES fcs;

Using COUNT to count the number of records in the NAME column, we know that there are 3513 records in the D_WAREHOUSES
table, without having to pull all the records and count them ourselves.

If we wanted to know when the first and last dates that a record was entered into the D_WAREHOUSE table, we could use the MIN
and MAX functions:

SELECT /*+ use_hash(fcs) */ MIN(FCS.DW_CREATION_DATE) MAX(FCS.DW_CREATION_DATE)


MIN(fcs.DW_CREATION_DATE) 1/20/2009 17:45 8/8/2011 7:07
, MAX(fcs.DW_CREATION_DATE)
FROM D_WAREHOUSES fcs;

From the results, we learn that the first record was created by DataWarehouse on 1/20/2009, and the last record was created on
3/20/2009. Notice that the same column name (DW_CREATION_DATE) was evaluated in both fields of the SELECT clause, but in the
first field we ran the MIN function on that column, and in the second field we ran the MAX function on that column.

The SUM function allows you to add up everything in a column, and get a total. One example might be if you want to know how
many units we submitted on a specific PO. We can find that out using the SUM function:

SELECT /*+ use_hash(doi) */ SUM(DOI.QUANTITY_SUBMITTED)


SUM(doi.QUANTITY_SUBMITTED) 37
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY =
to_date('20090319','YYYYMMDD')
AND doi.ORDER_ID = 'C4811075';

If we want to know the average number of units submitted on that PO, we could exchange out the SUM function for AVG:

SELECT /*+ use_hash(doi) */ AVG(DOI.QUANTITY_SUBMITTED)


AVG(doi.QUANTITY_SUBMITTED) 7.4
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY =
to_date('20090319','YYYYMMDD')
AND doi.ORDER_ID = 'C4811075';

This shows us that for the ASINs we ordered on PO C4811075, the average number of units ordered was 7.4.

Putting all these functions together, we could learn a lot about the PO in one query:

14
SELECT /*+ use_hash(doi) */
COUNT(doi.ISBN)
, MIN(doi.QUANTITY_SUBMITTED)
, MAX(doi.QUANTITY_SUBMITTED)
, SUM(doi.QUANTITY_SUBMITTED)
, AVG(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090319','YYYYMMDD')
AND doi.ORDER_ID = 'C4811075';

COUNT(DOI.ISB MAX(DOI.QUANTITY_SUBMITTE SUM(DOI.QUANTITY_SUBMITTE


N MIN(DOI.QUANTITY_SUBMITTED D D AVG(DOI.QUANTITY_SUBMITTED
5 1 32 37 7.4

We learn that we ordered 5 ASINs on this PO, that the minimum units ordered was 1 but the maximum was 32, that we ordered 37
units total, and the average was 7.4.

Aggregate Functions with DISTINCT


Sometimes, particularly with the COUNT function, you’ll want to find out how many unique records are in a table, which might be
different than the count of total records. For example, in the table D_WAREHOUSES, we learned earlier that there are 4960
records, by using COUNT to count the NAME column.

SELECT /*+ use_hash(fcs) */ COUNT(FCS.WAREHOUSE_ID)


COUNT(fcs.NAME) 4960
FROM D_WAREHOUSES fcs;

However, some of those names might repeat, so there might not be 4960 unique NAME values in the table. To find that out, we
combine DISTINCT with our aggregate function, but this time putting it inside the function, before the column name. In this example,
we put DISTINCT inside the COUNT function, to COUNT the DISTINCT values in the fcs.NAME column:

SELECT /*+ use_hash(fcs) */ COUNT(DISTINCTFCS.NAME)


COUNT(DISTINCT fcs.NAME) 4790
FROM D_WAREHOUSES fcs;

By adding DISTINCT to our COUNT function, we find that although there are 4960 values in the NAME column, there are only 4790
DISTINCT values in that column, so some must repeat.

GROUP BY
So far, we’ve aggregated information for a whole table (in the case of D_WAREHOUSES) and for a set of records limited by the
WHERE clause (as in our queries D_DISTRIBUTOR_ORDER_ITEMS to learn about PO C4811075). Now we’ll talk about how to use
those same aggregate functions to group sets of records together for each unique value in certain columns, while aggregating other
columns. For example, we might want to know how many units we ordered on each PO we placed with Wiley on a given day.

We could query how many units we ordered from Wiley on 3/16/2009, like this:

SELECT /*+ use_hash(doi) */ SUM(DOI.QUANTITY_SUBMITTED


SUM(doi.QUANTITY_SUBMITTED) )
FROM D_DISTRIBUTOR_ORDER_ITEMS doi 57218
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY';

But that doesn’t tell us for each PO. Rather than run the query once for each PO, we can add ORDER_ID to the SELECT clause and
add the GROUP BY clause with ORDER_ID, so the query SUMs up the number of units ordered for each PO. The GROUP BY clause
comes after the WHERE clause (but before the ORDER_BY clause, if you’re using one), and indicates which columns from your
SELECT clause you want to group the results by. To group our query above by PO, we’d add it to the SELECT clause and to the
GROUP BY clause, like this:

15
SELECT /*+ use_hash(doi) */ SUM(DOI.QUANTITY_SUBMITTE
doi.ORDER_ID ORDER_ID D)
, SUM(doi.QUANTITY_SUBMITTED) M744452
1 1
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
P4010601 3
WHERE doi.REGION_ID = 1
R7453213 57203
AND doi.LEGAL_ENTITY_ID = 101
U1897503 5
AND doi.ORDER_DAY =
Q0625613 6
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
GROUP BY doi.ORDER_ID;

Now we know how much we ordered on each of the 5 POs we placed with Wiley on that day.

We can add additional columns to our SELECT clause to get more information. If they’re an aggregate column, such as a COUNT or
AVG function, then we don’t need to put them in the GROUP BY clause. But if they aren’t an aggregate column, we’ll need to also
add them to the GROUP BY clause. For example, we could add a COUNT of DISTINCT ASINs on each PO, as well as add the STATUS
column - which is an ASIN level attribute in the table that indicates whether that ASIN was Backordered (BO) or not on that PO. For
each PO, there may be some ASINs that are backordered, and some that aren’t.

SELECT /*+ use_hash(doi) */


doi.ORDER_ID
, doi.STATUS
, SUM(doi.QUANTITY_SUBMITTED)
, COUNT(DISTINCT doi.ISBN)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY = to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
GROUP BY doi.ORDER_ID
, doi.STATUS;

SUM(DOI.QUANTITY_SUBMITTED
ORDER_ID STATUS ) COUNT(DISTINCTDOI.ISBN)
M744452
1   1 1
P4010601   3 1
Q0625613 BO 2 2
Q0625613   4 3
R7453213 BO 1296 416
R7453213   55907 5930
U1897503   5 1

We don’t need to put COUNT(DISTINCT doi.ISBN) in the GROUP BY clause, because that column includes an aggregate function. But
we do need to put doi.STATUS in the GROUP BY clause, because it doesn’t include an aggregate function. You’ll notice that since we
grouped by two columns, we got some additional rows of data. That’s because POs Q0625613 and R7453213 both had some ASINs
that were backordered, and some that were not. Our SUM and COUNT data is now grouped by both ORDER_ID and STATUS.

For people familiar with Excel Pivot tables, it can be helpful to think of queries using GROUP BY as something like a Pivot table, with
certain fields being grouped and certain columns being summed, counted, averaged, etc. Each time you add in a new level of
grouping, the columns being aggregated change.

A GROUP BY clause is only needed if you have BOTH Aggregate functions and non-Aggregate elements in your SELECT clause. One
easy way to make sure they’re in synch is to copy all the elements in your SELECT clause and paste them in your GROUP BY clause,
then delete any elements with Aggregate functions (SUM, COUNT, MIN, etc). (You also need to delete any Column Aliases from the
GROUP BY clause.)

The HAVING Clause

16
Once you begin aggregating, you’ll find that you may want to limit your results to only records where the result of an aggregation
meets a certain criteria. For example, we might only want to look at POs were we ordered 1 unit on the entire PO. We can’t do this
in the WHERE clause, because the conditions in the WHERE clause are evaluated before we aggregate.

Going back to our earlier example, where we summed the units submitted on all POs for Wiley on 3/16/2009, if we tried to find all
POs where we only ordered one unit on the entire PO by limiting the WHERE clause, we’d get the wrong results:

SELECT /*+ use_hash(doi) */ SUM(DOI.QUANTITY_SUBMITTE


doi.ORDER_ID ORDER_ID D)
, SUM(doi.QUANTITY_SUBMITTED) M744452
1 1
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
Q0625613 4
WHERE doi.REGION_ID = 1
R7453213 1811
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
AND doi.QUANTITY_SUBMITTED = 1
GROUP BY doi.ORDER_ID;

We’re actually looking at all the POs WHERE we only ordered one unit of at least one ASIN on that PO, and then summing the
quantities of those ASINs – which we can see because the SUM of the QUANTITY_SUBMITTED on two POs is greater than one. This is
a totally valid query, but doesn’t answer the question we were asking: Which POs submitted to Wiley on 3/16/09 only had one unit
submitted on the entire PO. To get the answer to that, we use a HAVING clause.

A HAVING clause is put at the end of an aggregate query, after the GROUP BY, to limit the results AFTER the aggregation is done. It’s
a filter, just like the WHERE clause, but the filtering is done after things are summed and counted and averaged.

SELECT /*+ use_hash(doi) */ SUM(DOI.QUANTITY_SUBMITTED


doi.ORDER_ID ORDER_ID )
, SUM(doi.QUANTITY_SUBMITTED) M7444521 1
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
GROUP BY doi.ORDER_ID
HAVING SUM(doi.QUANTITY_SUBMITTED) = 1;

One way to think about having is to imagine the results of the query if we’d run it without the HAVING clause, then filter those by
the conditions in the HAVING clause. We actually ran this query without the HAVING clause in an earlier example, getting:

SUM(DOI.QUANTITY_SUBMITTE
ORDER_ID D)
M744452
1 1
P4010601 3
R7453213 57203
U1897503 5
Q0625613 6

So we could expect the result we got – only PO M7444521 had just a single unit ordered on the entire PO. It’s worth noting that
since the HAVING clause adds a second round of filtering to the query, it can add a lot of time to the query, too.

17
Joining Tables

 Joining 2 or More Tables – Old School & New School Approaches


 INNER Joins vs. OUTER Joins
 WHERE Clause Conditions with OUTER Joins
 One-to-Many Joins

Joining 2 or More Tables – Old School & New School Approaches


Getting data out of one table is great, but ETL allows you the flexibility to join multiple tables in the Data Warehouse together and
pull custom data sets that meet your business needs. With the roll out version 9i of Oracle SQL, a new method of joining tables was
introduced, which is what our text, and I, will use. But you’ll surely run into code that uses the old syntax, so I recommend reading
the Appendix on page 449 of Mastering Oracle SQL, so you aren’t left confused when you find commas in the FROM clause and (+) in
the WHERE clause. There are several advantages to the new syntax that you can read about in your text, and I feel it’s easier to
understand than the old syntax.

The ‘New School’ approach to joining tables uses the FROM clause to indicate which tables you want information from AND how
they are joined together. For example, if I wanted to join the VENDORS table (which has lots of great Vendor Master data) to the
O_AMAZON_BUSINESS_GROUPS table to translate the AMAZON_BUSINESS_GROUP_ID number into the description of the business
group that I’m familiar with, I’d do the following:

SELECT /*+ use_hash(v,o_abg) */


v.VENDOR_ID
, v.PRIMARY_VENDOR_CODE
, v.VENDOR_NAME
, v.AMAZON_BUSINESS_GROUP_ID
, o_abg.TYPE
FROM VENDORS v
JOIN O_AMAZON_BUSINESS_GROUPS o_abg
ON v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID
WHERE v.PRIMARY_VENDOR_CODE = 'RANDO';

VENDOR_I VENDOR_NAM AMAZON_BUSINESS_GROUP_I


D PRIMARY_VENDOR_CODE E D TYPE
3453 RANDO Random House 1 US Books

The syntax is to start your FROM clause and enter the name and alias of the first table. Then specify the type of join (in this case a
standard inner JOIN) and the name and alias of the second table. Follow that by the word ON, and then indicate which columns
define the join between your two tables, with an equals sign between them. Above, we joined the VENDORS table to the
O_AMAZON_BUSINESS_GROUPS table

FROM VENDORS v
JOIN O_AMAZON_BUSINESS_GROUPS o_abg

and returned results where the AMAZON_BUSINESS_GROUP_ID in VENDORS is equal to the ID in O_AMAZON_BUSINESS_GROUPS.

ON v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID

18
You can also join ON multiple columns between two tables, by adding them to the ON clause, separated by AND:

SELECT /*+ use_hash(ddo,doi) */


ddo.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM D_DISTRIBUTOR_ORDERS ddo
JOIN D_DISTRIBUTOR_ORDER_ITEMS doi
ON ddo.ORDER_ID = doi.ORDER_ID
AND ddo.DISTRIBUTOR_ID = doi.DISTRIBUTOR_ID
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'N9161983'
AND doi.REGION_ID = 1
AND doi.ORDER_DAY = TO_DATE('20090312','YYYYMMDD');

QUANTITY_SUBMITTE
ORDER_ID ISBN D
032135797
N9161983 3 1

*When you join two tables, and both have Partitioning Schemes, be sure to include conditions in your WHERE clause to ensure
you’re making use of the partitions in both tables.*

You can also join 3 or more tables together, of course, by specifying the JOIN type and JOIN ON condition for each additional table:

SELECT /*+ use_hash(ddo,doi) */


ddo.DISTRIBUTOR_ID
, v.VENDOR_NAME
, ddo.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM D_DISTRIBUTOR_ORDERS ddo
JOIN D_DISTRIBUTOR_ORDER_ITEMS doi
ON ddo.ORDER_ID = doi.ORDER_ID
AND ddo.DISTRIBUTOR_ID = doi.DISTRIBUTOR_ID
JOIN VENDORS v
ON ddo.DISTRIBUTOR_ID = v.PRIMARY_VENDOR_CODE
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'N9161983'
AND doi.REGION_ID = 1
AND doi.ORDER_DAY = TO_DATE('20090312','YYYYMMDD');

DISTRIBUTOR_I ORDER_I QUANTITY_SUBMITTE


D VENDOR_NAME D ISBN D
Pearson Technology N916198 32135797
PEAED Group 3 3 1

You can see that we joined ddo to doi on two columns, and we joined ddo to v on one column. Each table needs to be joined to at
least one other table to avoid a Cartesian join.

Notice that as you begin joining multiple tables, you can begin including columns from all the tables as elements in your SELECT
clause, and include conditions in your WHERE clause on columns from each of those tables. This is where the need for table aliases
becomes clear – to let Oracle know that you want the DISTRIBUTOR_ID from D_DISTRIBUTOR_ORDERS, not VENDORS.

19
INNER Joins vs. OUTER joins
There are two main types of JOINs used in writing SQL: INNER and OUTER JOINs.

INNER JOINs will likely be what you use most often, and is the default join type (thus you only need to type JOIN to use it). They
return only results where the condition specified in your JOIN ON section is true. In other words, it returns only records where it
finds matching records in both tables. In the example above, the INNER JOIN limits the results to only return records from the table
VENDORS that match to records in the table O_AMAZON_BUSINESS_GROUPS where the join condition
v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID is true. Because INNER JOIN is the default join type, any query where the join type is
simply JOIN is actually an INNER JOIN.

An OUTER JOIN is used when you want to join two tables but you want all the records from one table and any results from the
second table that match. OUTER JOINs can be of two main types, which seem confusing at first, but are really quite simple: LEFT and
RIGHT OUTER JOINs.

One way to think about the differences between INNER and OUTER joins
is with a Venn diagram, where each circle represents a table.

An INNER JOIN (or simply JOIN) selects only those records that have
values in common between both tables (the grey section, labeled B).

An OUTER JOIN selects all records from the primary table, and any
matching records for the secondary table (where the secondary table has
values in common with the primary table). A LEFT OUTER JOIN (or simply
LEFT JOIN) would select A + B, whereas a RIGHT JOIN would select B + C.

The ON condition(s) specified in the JOIN indicate what values are


evaluated for commonality.

To further illustrate the difference between INNER JOINs, LEFT JOINs, and RIGHT JOINs, we’ll use a silly example, joining the tables
O_WAREHOUSES and D_WAREHOUSES in several ways. The results will be meaningless from a business sense, but hopefully
illustrate the differences in these types of joins. First off, we’ll look at the contents of these tables for all WAREHOUSE_ID values
that start with ‘SDF’:

O_WAREHOUSES D_WAREHOUSES

SELECT /*+ use_hash(ow) */ SELECT /*+ use_hash(dw) */


ow.WAREHOUSE_ID ow_warehouse_id dw.WAREHOUSE_ID dw_warehouse_id
FROM O_WAREHOUSES ow FROM D_WAREHOUSES dw
WHERE ow.WAREHOUSE_ID LIKE 'SDF_'; WHERE dw.WAREHOUSE_ID LIKE 'SDF_';

OW_WAREHOUSE_ID DW_WAREHOUSE_I
SDF1 D
SDF2 SDF1
SDF3 SDF2
SDF4 SDF4
SDF6 SDF6

As the results above show, the O_WAREHOUSES has records for SDF1, SDF2, SDF3, SDF4 and SDF6, while the D_WAREHOUSES table
only has records for SDF1, SDF2, SDF4 and SDF6.

20
If we do an INNER JOIN of these two tables, we’ll only get results where a match is found between the two tables (as defined by the
columns in our ON condition:

SELECT /*+ use_hash(ow,dw) */ OW_WAREHOUSE_I DW_WAREHOUSE_I


ow.WAREHOUSE_ID ow_warehouse_id D D
, dw.WAREHOUSE_ID dw_warehouse_id SDF1 SDF1
FROM O_WAREHOUSES ow SDF2 SDF2
JOIN D_WAREHOUSES dw SDF4 SDF4
ON ow.WAREHOUSE_ID = dw.WAREHOUSE_ID SDF6 SDF6
WHERE ow.WAREHOUSE_ID LIKE 'SDF_';

Since D_WAREHOUSES doesn’t have a record for WAREHOUSE_ID SDF3, no result is returned from either table with an INNER JOIN.

If we change the query to an LEFT JOIN, the results will change:

SELECT /*+ use_hash(ow,dw) */ OW_WAREHOUSE_I DW_WAREHOUSE_I


ow.WAREHOUSE_ID ow_warehouse_id D D
, dw.WAREHOUSE_ID dw_warehouse_id SDF1 SDF1
FROM O_WAREHOUSES ow SDF2 SDF2
LEFT JOIN D_WAREHOUSES dw SDF3
ON ow.WAREHOUSE_ID = dw.WAREHOUSE_ID SDF4 SDF4
WHERE ow.WAREHOUSE_ID LIKE 'SDF_'; SDF6 SDF6

This time, we got results for all the records in O_WAREHOUSES, and the matching records (where they existed) in D_WAREHOUSES,
and got a NULL in the second column where it didn’t find a match.

The difference between a LEFT JOIN and a RIGHT JOIN is simply which tables are listed on the LEFT and RIGHT of the JOIN. In our last
example, O_WAREHOUSES is on the LEFT of the LEFT JOIN and D_WAREHOUSES is on the RIGHT of the LEFT JOIN. In a LEFT JOIN, the
table on the LEFT is given priority, and is the table that will return all results, even if no match is found in the table on the RIGHT of
the JOIN.

The same query could be written as a RIGHT JOIN and get the same results, simply by switching the order of the tables:

SELECT /*+ use_hash(ow,dw) */ OW_WAREHOUSE_I DW_WAREHOUSE_I


ow.WAREHOUSE_ID ow_warehouse_id D D
, dw.WAREHOUSE_ID dw_warehouse_id SDF1 SDF1
FROM D_WAREHOUSES dw SDF2 SDF2
SDF3
RIGHT JOIN O_WAREHOUSES ow
SDF4 SDF4
ON ow.WAREHOUSE_ID = dw.WAREHOUSE_ID
SDF6 SDF6
WHERE ow.WAREHOUSE_ID LIKE 'SDF_';

The difference between RIGHT and LEFT JOINs is strictly placement of table names in the SQL. To keep things simple, I always use
LEFT JOINs. But it’s no better or worse than switching between LEFT and RIGHT JOINs, or using RIGHT JOINs exclusively. I
recommend using whatever works best for you.

There are some additional types of JOINs described in the text, but these are rarely used and often wildly inefficient.

21
WHERE Clause Conditions with OUTER Joins
Regardless of whether you have an OUTER join specified or not, anything in your WHERE clause will limit your results. If you include
a condition in your WHERE clause that applies to the secondary table on the RIGHT of a LEFT JOIN (or on the LEFT of a RIGHT JOIN),
the query will not act like an OUTER join, because you’ve limited the results with conditions on both tables, making it behave like an
INNER join. You’ve essentially overridden the OUTER JOIN by limiting the results to only records that exist in the secondary table.

For example, if we added a WHERE clause condition that applies to the DESCRIPTION column in O_PAYMENT_ITEM_TYPES – which is
on the RIGHT of a LEFT JOIN, we get only results where that condition is true – making the query behave like a INNER join:

SELECT /*+ use_hash(o_pit,o_pt) */


o_pit.PAYMENT_ITEM_TYPE_ID
, o_pit.DESCRIPTION
, o_pt.PAYMENT_TYPE_ID
, o_pt.DESCRIPTION
FROM O_PAYMENT_TYPES o_pt
LEFT JOIN O_PAYMENT_ITEM_TYPES o_pit
ON o_pit.PAYMENT_ITEM_TYPE_ID = o_pt.PAYMENT_TYPE_ID
WHERE o_pit.DESCRIPTION = 'Refund';

PAYMENT_ITEM_TYPE_ DESCRIPTIO PAYMENT_TYPE_I DESCRIPTION_


ID N D 1
2 Refund 2 zShops

One way around this problem is to place those conditions in the JOIN clause, like this:

SELECT /*+ use_hash(o_pit,o_pt) */


o_pit.PAYMENT_ITEM_TYPE_ID
, o_pit.DESCRIPTION
, o_pt.PAYMENT_TYPE_ID
, o_pt.DESCRIPTION
FROM O_PAYMENT_TYPES o_pt
LEFT JOIN O_PAYMENT_ITEM_TYPES o_pit
ON o_pit.PAYMENT_ITEM_TYPE_ID = o_pt.PAYMENT_TYPE_ID
AND o_pit.DESCRIPTION = 'Refund';

PAYMENT_ITEM_TYPE_ DESCRIPTIO PAYMENT_TYPE_I DESCRIPTION_


ID N D 1
    1 Auctions
2 Refund 2 zShops
    5 zMe
    6 Marketplace
    7 MVP
    8 Catalogue

Putting the condition in the JOIN clause no longer limits the full query, as when it was in the WHERE clause, but it does still limit the
results. Think of it as limiting only the JOIN when it’s in the JOIN clause, but limiting the whole query when in the WHERE clause.

One-to-Many Joins
As the final topic this week, I wanted to end with a warning about JOINs of all kinds, by introducing the concept of the ‘grain’ of a
table. People talk about the grain of a table, and they mean the level of detail is in that table. For example,
D_DISTRIBUTOR_ORDERS is at the grain of POs. That means it contains just one row of data for each Purchase Order. The related
table D_DISTRIBUTOR_ORDER_ITEMS is at the grain of the PO and ASIN, so it has one row for each unique combination of PO and
ASIN. The somewhat related table, D_DISTRIBUTOR_SHIPMENT_ITEMS contains all the records of PO items that have been received,
and its grain is PO, ASIN and Shipment – because a single ASIN can be received to a single PO on multiple occasions. Knowing the
grain of a table (usually by looking at some sample data) is important to understanding how to properly join to it.

22
If I join D_DISTRIBUTOR_ORDER_ITEMS (with a grain of PO/ASIN) to D_DISTRIBUTOR_SHIPMENT_ITEMS (with a grain of
PO/ASIN/Shipment) on the PO and ASIN columns (ORDER_ID and ISBN), the results look straightforward for PO L9549101:

SELECT /*+ use_hash(doi,dsi) */


doi.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
, doi.QUANTITY
, dsi.QUANTITY_UNPACKED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN D_DISTRIBUTOR_SHIPMENT_ITEMS dsi
ON doi.ORDER_ID = dsi.ORDER_ID
AND doi.ISBN = dsi.ISBN
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090115','YYYYMMDD')
AND doi.ORDER_ID = 'L9549101'
AND dsi.REGION_ID = 1
AND dsi.RECEIVED_DAY = to_date('20090119','YYYYMMDD')

ORDER_I QUANTITY_SUBMITT QUANTIT QUANTITY_UNPACK


D ISBN ED Y ED
L954910 031603222
1 0 40 40 40

It shows we submitted 40 units of ASIN 0316032220, 40 units were confirmed (QUANTITY), and 40 units were received
(QUANTITY_UNPACKED).

However, for an ASIN on a PO that was received in multiple shipments, things can look a little odd in the results:

SELECT /*+ use_hash(doi,dsi) */


doi.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
, doi.QUANTITY
, dsi.QUANTITY_UNPACKED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN D_DISTRIBUTOR_SHIPMENT_ITEMS dsi
ON doi.ORDER_ID = dsi.ORDER_ID
AND doi.ISBN = dsi.ISBN
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090126','YYYYMMDD')
AND doi.ORDER_ID = 'R1735263'
AND doi.ISBN = '0738210943'
AND dsi.REGION_ID = 1
AND dsi.RECEIVED_DAY BETWEEN to_date('20090205','YYYYMMDD')
AND to_date('20090213','YYYYMMDD')

ORDER_I QUANTITY_SUBMITT QUANTIT QUANTITY_UNPACKE


D ISBN ED Y D
R173526 73821094
3 3 19 19 6
R173526 73821094
3 3 19 19 13

Because there are two records in the table D_DISTRIBUTOR_SHIPMENT_ITEMS that match to the PO and ASIN we are querying in
D_DISTRIBUTOR_ORDER_ITEMS, we get two records back. This is a One-to-Many join. Sometimes that’s just what you want, but in
this case, we might mistakenly think that we ordered 38 units (19+19), which is twice what we actually ordered.

We’ll explore some ways to avoid this issue later, but wanted you to begin thinking about table granularity and be aware of how it
can result in one-to-many joins and possible double-counting of records.
23
24
Dealing with Dates in SQL

 DATE vs. DATETIME columns


 The TO_CHAR() Function with Dates
 The TRUNC() Function with Dates
 The TO_DATE() Function
 Using BETWEEN with Dates
 Other Date functions

DATE vs. DATETIME columns


While working with Data Warehouse tables, you’ll find two types of DATE columns: DATE columns that are truncated to only the
Month, Day, and Year information (e.g. 12/31/2008), and DATE columns that also contain the Hour, Minute, and Seconds (e.g.
12/31/2008 08:13:52) – known as the DATETIME format.

Both types of columns are of the Data Type ‘DATE’, and store full date & time information, but the DATE format columns are
truncated to the beginning of the first second of the day. Although it’s not always obvious from just looking at BI Metadata which
type a column is, most of the DATETIME fields have DATETIME in their name (like the columns ORDER_DATETIME and
CONFIRMATION_DATETIME in D_DISTRIBUTOR_ORDERS), while DATE type columns often use DATE or DAY in their column name
(like ORDER_DAY in D_DISTRIBUTOR_ORDERS). This isn’t a hard and fast rule, however, even within a single table. For example, the
column CREATION_DATE in D_DISTRIBUTOR_ORDERS is actually a DATETIME field, which we see via this query of the various date
fields in D_DISTRIBUTOR_ORDERS for PO M5969483.

SELECT /*+ use_hash(ddo) */ ORDER_I CREATION_DAT ORDER_DA ORDER_DATETIM CONFIRMATION_DATETI


ddo.ORDER_ID D E Y E ME
, ddo.CREATION_DATE M596948 3/25/2009
3 18:14 3/25/2009 3/25/2009 11:14 3/25/2009 11:52
, ddo.ORDER_DAY
, ddo.ORDER_DATETIME
, ddo.CONFIRMATION_DATETIME
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

How do you know which type of format a Date field is in? Running a simple query, like the one above, is one way, but BI Metadata
will almost always provide the answer.

25
Above are BI Metadata Column Details for the table D_DISTRIBUTOR_ORDERS, filtered down to some of the columns of the Data
Type ‘DATE’. You’ll notice that the Data Type is the same regardless of whether the column contains DATE or DATETIME data. In
this case, the Description field indicates which are Day and which are Day and Time fields. The column names are also helpful in this
case. But if the Description and Column Names weren’t so clear, another way to tell whether a column contains DATE or DATETIME
data is the Num Distinct value found by clicking through to sample data, which will reveal all zeros in the hour/minute/seconds
section of the date if it doesn’t include timestamps.

As you can see, the values in the ORDER_DATETIME column include detailed timestamps (e.g. 07:03:48), whereas the ORDER_DAY
column is all 00:00:00 in the timestamp portion of the value.

Each Column’s detail page can also provide some insight. For example, you’ll notice the number of distinct values in the ORDER_DAY
column is just 5,074, whereas the number of distinct ORDER_DATETIME values is over 100 times greater: 657,351.

Why worry about which are which? We’ll see in some examples coming up that it can make a big difference.
The TO_CHAR() Function with Dates

26
There are many ways to write a date, from the US standard of 03/31/2009 to the UK standard of 31/03/2009, writing them as March
31st, 2009, or combinations of words and numbers, like 31-MAR-09. Some of these formats can be very precise, while others are less
so. For example, if a Book was published on 31-MAR-09, do we know if it was published in 2009 or 1909? Unfortunately, we don’t,
and programs like Excel may make assumptions that could be wrong.

When writing SQL queries, you may find you want to control the format of a date column in your results, so you always know what
format it will be in and so there is never any question of exactly what the date means. To do this, we use the TO_CHAR() function,
which converts the DATE to a character string, in a format specified by you. To use the TO_CHAR() function, you include the column
name followed by a comma and then the format (enclosed in single quotes) within the parentheses. For example, we could convert
the ORDER_DATETIME to just the Month, Day, and Year format, we put the ORDER_DATETIME column name in the TO_CHAR()
function and then enter the format MM/DD/YYYY in single quotes, like this:

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TO_CHAR(ddo.ORDER_DATETIME,'MM/DD/YYYY')
, ddo.ORDER_DAY
, TO_CHAR(ddo.ORDER_DAY,'MM/DD/YYYY HH24:SS:MI')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

TO_CHAR(DDO.ORDER_DATETIME,'M ORDER_DA TO_CHAR(DDO.ORDER_DAY,'MM/DD/


ORDER_ID ORDER_DATETIME M Y Y
M596948 3/25/2009 11:14:56
3 AM 3/25/2009 3/25/2009 03/25/2009 00:00:00

The format of the column returned in your results is what we specified, without the time stamp information. Also, notice that we
also formatted the ORDER_DAY column to include the full DATETIME - hours, minutes and seconds - in column 5 of our results. It
returns 03/25/2009 00:00:00, because the time data is always stored in, but is stored as the beginning of the first second of the day.

There are numerous formats you can use to get dates into the style you want, and you can mix-and-match components, as well. A
table begins on page 135 in Mastering Oracle SQL with a detailed list of options and their output, but here are some examples:

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TO_CHAR(ddo.ORDER_DATETIME,'YYYYMMDD')
, TO_CHAR(ddo.ORDER_DATETIME,'D')
, TO_CHAR(ddo.ORDER_DATETIME,'DAY')
, TO_CHAR(ddo.ORDER_DATETIME,'CC')
, TO_CHAR(ddo.ORDER_DATETIME,'HH AM" on a "Day", the "DDDTH" day of "YYYY"')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

ORDER_I ORDER_DATETIM YYYYMMD HH AM" on a "Day", the "DDDTH" day of


D E D D DAY CC "YYYY"
M596948 WEDNESDA 11 AM on a Wednesday, the 084TH day of
3 3/25/2009 11:14 20090325 4 Y 21 2009

You can get very simple (like finding the Century with CC) or very complex, such as creating a text string. Think about the format
that will be most meaningful to the people using your data. And don’t take for granted that a date field will output MM/DD/YYYY if
you don’t specify a format – ETL often seems to default to the troublesome DD-MON-YY format (e.g. 31-MAR-09).

27
The TRUNC() Function with Dates
Another type of conversion you can do to a DATE field is to truncate the date using the TRUNC() function. TRUN() is used much like
TO_CHAR, but instead of translating the DATE field into a character string, it truncates it to the level you specify, but leaves it in a
DATE format. One common example is to truncate a date to the first day of the week, which can be done like this:

SELECT /*+ use_hash(ddo) */ ORDER_I ORDER_DATETIM TRUNC(DDO.ORDER_DATETIME,'


ddo.ORDER_ID D E D')
, ddo.ORDER_DATETIME M596948
3 3/25/2009 11:14 3/22/2009
, TRUNC(ddo.ORDER_DATETIME,'D')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

You’ll notice that when we used TRUNC with the ‘D’ option, it truncated the ORDER_DATETIME of 3/25/2009 11:14 to the first
second of the first hour of the first day of the week: 3/22/2009. A similar option, ‘DDD’, will truncate a date to the first second of
the first hour of the same day – essentially chopping off the timestamp information from a DATETIME field:

SELECT /*+ use_hash(ddo) */ ORDER_DATETIM TRUNC(DDO.ORDER_DATETIME,'DD


ddo.ORDER_ID ORDER_ID E D'
, ddo.ORDER_DATETIME M596948
3 3/25/2009 11:14 3/25/2009
, TRUNC(ddo.ORDER_DATETIME,'DDD')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

Superficially, this looks like the same result we got from the TO_CHAR() function, but because TRUNC returns it’s result still in a DATE
format, we can perform math functions on the result, such as adding days, and logical functions like comparing to another date.
Since truncating a DATETIME to the start of that day is probably the most common use of the TRUNC() function, the developers of
SQL made it the default. So you can get the same result as above by leaving off a format, saving yourself some time:

SELECT /*+ use_hash(ddo) */ ORDER_I ORDER_DATETIM TRUNC(DDO.ORDER_DATETIM


ddo.ORDER_ID D E E)
, ddo.ORDER_DATETIME M596948
3 3/25/2009 11:14 3/25/2009
, TRUNC(ddo.ORDER_DATETIME)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

Like TO_CHAR(), there are numerous options to choose from when using TRUNC(), which are listed in a table that begins on page
159 of Mastering Oracle SQL. Here are just a few examples, truncating to the beginning of the month, quarter, year, and century:

SELECT /*+ use_hash(ddo) */ ORDER_ID ORDER_DATETIME MM Q Y CC


ddo.ORDER_ID M596948 3/1/200
, ddo.ORDER_DATETIME 3 3/25/2009 11:14 9 1/1/2009 1/1/2009 1/1/2001
, TRUNC(ddo.ORDER_DATETIME,'MM')
, TRUNC(ddo.ORDER_DATETIME,'Q')
, TRUNC(ddo.ORDER_DATETIME,'Y')
, TRUNC(ddo.ORDER_DATETIME,'CC')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

28
The TO_DATE() Function
One frequent use of DATE columns, besides returning them in your results, is to use them in your WHERE clause to limit your results.
In fact, DATE columns are commonly used as partitions on tables, so this use is very common. A function called TO_DATE() comes in
handy when working with DATE columns in your WHERE clause. It’s essentially the opposite of the TO_CHAR() function – turning a
character string into a DATE format. This is vital, because you can’t compare a column that is in a DATE format to a text string – only
to a DATE. So when setting a conditional in your WHERE clause, you use the TO_DATE() function to translate a text string into a
DATE format, and then compare a DATE column to it. For example, if we wanted to see which POs have an ORDER_DAY of
3/25/2009, we’d compare the ORDER_DAY field to the text string 03/25/2009, but we’d convert that text string to a date before
doing the comparison using TO_DATE, like this:

SELECT /*+ use_hash(ddo) */ ORDER_I ORDER_DA


ddo.ORDER_ID D Y
, ddo.ORDER_DAY P061830
1 3/25/2009
FROM D_DISTRIBUTOR_ORDERS ddo
M596948
WHERE ddo.REGION_ID = 1 3 3/25/2009
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY = TO_DATE('03/25/2009','MM/DD/YYYY');

The TO_DATE() function is taking the text string 03/25/2009, and converting it to a date format. The second part of the TO_DATE()
function indicates what format the text string is in, so it knows which numbers are the month, which are the day, and which are the
year. We could get the same results using a different format, as long as we change our text string to match that format:

SELECT /*+ use_hash(ddo) */ ORDER_I ORDER_DA


ddo.ORDER_ID D Y
, ddo.ORDER_DAY P061830
1 3/25/2009
FROM D_DISTRIBUTOR_ORDERS ddo
M596948
WHERE ddo.REGION_ID = 1 3 3/25/2009
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY = TO_DATE('20090325','YYYYMMDD');

If the format of your text string and the format are not the same, however, you’ll get an error. For example, the following would
cause an error, because the format of the text string (‘20090325’) is not the same as the format indicated in the function
(‘MM/DD/YYYY’):

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.ORDER_DAY
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY = TO_DATE('20090325','MM/DD/YYYY');

ORA-12801: error signaled in parallel query server P054, instance db-dw2-6001.iad6.amazon.com:dw2-1 (1)
ORA-01843: not a valid month

29
Using BETWEEN with Dates
You can limit a DATE field to a specific date using the equal operator, but you can use other operators to build conditions in your
WHERE clause, too. The BETWEEN operator is commonly used to define a specific date range, that begins with the first date
specified, and ends with last date specified. Below, the query is limited to the date range 3/23/2009 through 3/25/2009:

SELECT /*+ use_hash(ddo) */ ORDER_ID ORDER_DAY


ddo.ORDER_ID M9119427 3/23/2009
, ddo.ORDER_DAY M2666981 3/23/2009
FROM D_DISTRIBUTOR_ORDERS ddo U3517863 3/23/2009
WHERE ddo.REGION_ID = 1 R5273263 3/23/2009
AND ddo.LEGAL_ENTITY_ID = 101 N5183001 3/23/2009
AND ddo.DISTRIBUTOR_ID = 'RANDO' T0475345 3/23/2009
AND ddo.ORDER_DAY BETWEEN TO_DATE('20090323','YYYYMMDD') M5969483 3/25/2009
AND TO_DATE('20090325','YYYYMMDD'); P0618301 3/25/2009

It’s important when using BETWEEN with DATETIME fields to remember that the second date listed in the range (03/25/2009 in our
example) is the end of the range, and that a date of 03/25/2009 means the first second of the first minute of the first hour of that
day. It’s actually 03/25/2009 00:00:00. When working with fields that are in the DATE format that isn’t an issue, as the example
above shows.

However, if we changed the WHERE condition so that it was on the ORDER_DATETIME field, instead of the ORDER_DAY field, we’ll
see a problem:

SELECT /*+ use_hash(ddo) */ ORDER_I ORDER_DATETI


ddo.ORDER_ID D ME
, ddo.ORDER_DATETIME M911942
7 3/23/2009 19:42
FROM D_DISTRIBUTOR_ORDERS ddo
M266698
WHERE ddo.REGION_ID = 1 1 3/23/2009 20:26
AND ddo.LEGAL_ENTITY_ID = 101 U351786
AND ddo.DISTRIBUTOR_ID = 'RANDO' 3 3/23/2009 19:41
AND ddo.ORDER_DATETIME BETWEEN TO_DATE('20090323','YYYYMMDD') R527326
AND TO_DATE('20090325','YYYYMMDD'); 3 3/23/2009 19:42
N518300
1 3/23/2009 20:27
T0475345 3/23/2009 19:41

Even though our DATE range ends with 03/25/2009, we don’t get any results for that day – even though we know from our previous
example that 2 POs were created that day for RANDO. That’s because the ORDER_DATETIME value for those 2 POs were after
03/25/2009 00:00:00 – the start of 03/25/2009. Another way of saying that is that 03/25/2009  03:03:48 (the order datetime of PO
P0618301) is greater than 03/25/2009 00:00:00, so is outside the range specified by the BETWEEN clause.

We can solve this problem by using a DATE column for our WHERE clause, if one is available, or by using the TRUNC() function in our
WHERE clause, so that we’re comparing the ORDER_DATETIME value truncated to the start of the day to our BETWEEN range.

SELECT /*+ use_hash(ddo) */ ORDER_ID ORDER_DATETIME


ddo.ORDER_ID U3517863 3/23/2009 19:41
, ddo.ORDER_DATETIME T0475345 3/23/2009 19:41
FROM D_DISTRIBUTOR_ORDERS ddo R5273263 3/23/2009 19:42
WHERE ddo.REGION_ID = 1 M9119427 3/23/2009 19:42
AND ddo.LEGAL_ENTITY_ID = 101 M2666981 3/23/2009 20:26
AND ddo.DISTRIBUTOR_ID = 'RANDO' N5183001 3/23/2009 20:27
AND TRUNC(ddo.ORDER_DATETIME) BETWEEN TO_DATE('20090323','YYYYMMDD') P0618301 3/25/2009 3:03
AND TO_DATE('20090325','YYYYMMDD'); M5969483 3/25/2009 11:14

Now the results show the two POs placed on 3/25/2009, because the truncated version of the ORDER_DATETIME field is within the
date range. You could also change the second date in the range to be one date larger (03/26/2009 in our example) without using
the TRUNC() function, but then you’d risk getting results that happened to occur at 03/26/2009 00:00:00, which is a possibility with
some data sets. Using TRUNC() is a cleaner, safer, and easier method. When in doubt, use TRUNC().
30
Adding and Subtracting with Dates
Just like numerical fields, you can add to and subtract from DATE column values, both in your SELECT and WHERE clauses. When
adding and subtracting from DATE column values a value of 1 is equal to 1 day and not 1 hour or 1 minute or 1 second. If we add 1
to the ORDER_DATETIME values we returned in our last example, we see it increases the DATE value by 1 full day:

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.ORDER_DATETIME
, ddo.ORDER_DATETIME + 1
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND TRUNC(ddo.ORDER_DATETIME) BETWEEN TO_DATE('20090323','YYYYMMDD')
AND TO_DATE('20090325','YYYYMMDD');

ORDER_I ORDER_DATETIM DDO.ORDER_DATETIME


D E +1
M911942
7 3/23/2009 19:42 3/24/2009 19:42
P0618301 3/25/2009 3:03 3/26/2009 3:03
M266698
1 3/23/2009 20:26 3/24/2009 20:26
U351786
3 3/23/2009 19:41 3/24/2009 19:41
R5273263 3/23/2009 19:42 3/24/2009 19:42
M596948
3 3/25/2009 11:14 3/26/2009 11:14
N518300
1 3/23/2009 20:27 3/24/2009 20:27
T0475345 3/23/2009 19:41 3/24/2009 19:41

Thus, the value 3/23/2009 19:42 becomes 3/24/2009 19:42 – one full day later. (To add hours, minutes or seconds to a date, use a
fraction, such as 1/24 to add an hour, or 20/1440 to add twenty minutes.)

Perhaps a more common use is to add and subtract days from a date value in your WHERE clause. For example, we could rewrite
our query to change the BETWEEN range a bit, like this:

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.ORDER_DATETIME
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND TRUNC(ddo.ORDER_DATETIME) BETWEEN TO_DATE('20090325','YYYYMMDD')-2
AND TO_DATE('20090325','YYYYMMDD');

ORDER_I ORDER_DATETI
D ME
U351786
3 3/23/2009 19:41
T047534
5 3/23/2009 19:41
R527326
3 3/23/2009 19:42
M911942
7 3/23/2009 19:42
M266698
1 3/23/2009 20:26
N518300
1 3/23/2009 20:27
P061830 3/25/2009 3:03
31
1
M596948
3 3/25/2009 11:14

Instead of the start of the range being 03/23/2009, we’ve made it 2 days prior to the date 03/25/2009. This may seem strange, but
we’ll see how that can be very helpful in just a minute, when we talk about the Run Date Wildcard available in ETL Manager.

32
Other Date functions
Although TO_CHAR(), TRUNC(), and TO_DATE() are probably the most commonly used DATE functions, SQL includes several more
that you may find useful. These include:

 ROUND( date , format ) – used to round a date up or down to the nearest day, month, year, etc.
 ADD_MONTHS( date , number of months) – used to add (or subtract) months from a date
 LAST_DAY( date) – used to determine the last day of the month the date falls in
 NEXT_DAY( data , weekday ) – used to find the date of the next day following the date specified of the weekday specified
 MONTHS_BETWEEN( later date, earlier date) – used to determine how many months are between two dates

Here are some examples of these functions in action:

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.ORDER_DAY
, ROUND(ddo.ORDER_DAY,'D')
, ADD_MONTHS(ddo.ORDER_DAY,-5)
, LAST_DAY(ddo.ORDER_DAY)
, NEXT_DAY(ddo.ORDER_DAY,'Friday')
, MONTHS_BETWEEN(ddo.ORDER_DAY,TO_DATE('20090101','YYYYMMDD'))
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';

ORDER_I ORDER_DA ADD_MONTHS LAST_DA NEXT_DA MONTHS_BETWEE


D Y ROUND() () Y Y N
M596948 3/22/200 3/31/200 3/27/200
3 3/25/2009 9 10/25/2008 9 9 2.774193548

(We subtracted months using ADD_MONTHS and -5 as our number of months.) There’s more information on using these fields in
your text. You can also use many of the standard aggregate functions, like AVG(), COUNT(), MAX(), and MIN() on DATE fields.

Subqueries
 Subqueries
 Avoiding 1-to-many joins
Subqueries
A subquery is a whole SQL statement that’s nested within another SQL statement – like a query within a query. The subquery runs
first then its results are stored in memory temporarily - like a temporary table – and then it’s discarded when the full SQL statement
is done running. Subqueries can be in the FROM clause and incorporated into a JOIN, or (less commonly due to efficiency issues) in
the WHERE clause to limit the results of the outer query. Here are examples of each:

FROM Clause JOIN to a SELECT /*+ use_hash(dma,ords) */


Subquery: dma.ASIN
, dma.ITEM_NAME
, ords.QUANTITY_SUBMITTED
FROM D_MP_ASINS_ESSENTIALS dma
JOIN (SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807') ords
ON dma.ASIN = ords.ISBN
33
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1;

ASIN ITEM_NAME QUANTITY_SUBMITTED


037584726X The Big Book of Princesses (Giant Coloring Book) 7
0789399903 Skylines: American Cities Yesterday and Today 1

WHERE Clause limit SELECT /*+ use_hash(dma) */


using a Subquery: dma.ASIN
, dma.ITEM_NAME
FROM D_MP_ASINS_ESSENTIALS dma
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN (SELECT /*+ use_hash(doi) */
doi.ISBN
FROM d_distributor_order_items doi;
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807');

ASIN ITEM_NAME
037584726X The Big Book of Princesses (Giant Coloring Book)
0789399903 Skylines: American Cities Yesterday and Today

Subqueries are just like any SQL statement, but are enclosed in parentheses within another query. I think of the results of that
subquery as a table – so when you JOIN to a subquery, you’ll alias it, like you would a table, because you’ll need to define the
columns from each table in the JOIN condition and you may want to return some of the columns from your subquery in your results.

Stepping back to our first example of a subquery in the FROM clause, we see that we’ve inserted a full SELECT/FROM/WHERE query,
enclosed in parentheses in the FROM clause, and inner JOINed to it to effectively limit the ASINs in the table
D_MP_ASINS_ESSENTIALS to only those that match to the ASINs returned by the subquery – namely the ASINs on PO S2236807.

SELECT /*+ use_hash(dma,ords) */


dma.ASIN
, dma.ITEM_NAME
, ords.QUANTITY_SUBMITTED
FROM D_MP_ASINS_ESSENTIALS dma
JOIN (SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807') ords
ON dma.ASIN = ords.ISBN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1;

ASIN ITEM_NAME QUANTITY_SUBMITTED


037584726X The Big Book of Princesses (Giant Coloring Book) 7
0789399903 Skylines: American Cities Yesterday and Today 1

The subquery could be run on its own, giving you the list of ASINs – which is the first thing that happens when the SQL statement
runs. It runs the subquery, and then stores the results like a temporary table.

SELECT /*+ use_hash(doi) */


doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1

34
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807';

QUANTITY_SUBMIT
ISBN TED
07893999
03 1
03758472
6X 7

Then the query JOINs the table D_MP_ASINS_ESSENTIALS to that temporary table, limiting the results because it’s an INNER join, but
also returning information from that temporary table – the QUANTITY_SUBMITTED.

Of course, you can also do an OUTER JOIN to a subquery, such as in this example:

SELECT /*+ use_hash(dma,ords) */


dma.ASIN
, dma.ITEM_NAME
, ords.QUANTITY_SUBMITTED
FROM D_MP_ASINS_ESSENTIALS dma
LEFT JOIN (SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807') ords
ON dma.ASIN = ords.ISBN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN ('037584726X','0789399903','0394873742');

QUANTITY_SUBMIT
ASIN ITEM_NAME TED
07893999
03 Skylines: American Cities Yesterday and Today 1
03758472
6X The Big Book of Princesses (Giant Coloring Book) 7
03948737
42 Richard Scarry's Biggest Word Book Ever!  

In this case, the LEFT OUTER JOIN resulted in all results being pulled from the left table (D_MP_ASINS_ESSENTIALS) and results from
the table on the right (our subquery) were returned, if available.

Avoiding 1-to-many joins


One of the many uses of subqueries is to avoid 1-to-many joins – situations where the grain of one table is different than the grain of
another, which can result in errors. Here’s an example of a 1-to-many join that causes a problem, from data tables that hold
Problem Receive information.

In the table O_RECEIVE_PROBLEM_ITEMS, we find one record associated with RECEIVE_PROBLEM_ITEM_ID 5739750, which shows a
QUANTITY of 1 was received into Problem Receive for ASIN B00158THNW.

SELECT /*+ use_hash(rpi) */ RECEIVE_PROBLEM_ITE QUANTI


rpi.RECEIVE_PROBLEM_ITEM_ID M_ID ASIN TY
, rpi.ASIN B00158TH
, rpi.QUANTITY 5739750 NW 1
FROM O_RECEIVE_PROBLEM_ITEMS rpi
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750);

And in the table O_RPI_PROBLEM_LIST, we find that there are two records associated with that same RECEIVE_PROBLEM_ITEM_ID,
one for each of the two problem types found to have occurred for that item.
35
SELECT /*+ use_hash(rpl) */ RECEIVE_PROBLEM_ITE RECEIVE_PROBLEM_T
rpl.RECEIVE_PROBLEM_ITEM_ID M_ID YPE
, rpl.RECEIVE_PROBLEM_TYPE 5739750 OVERAGE
FROM O_RPI_PROBLEM_LIST rpl 5739750 WRONG_DC
WHERE rpl.RECEIVE_PROBLEM_ITEM_ID IN (5739750);

Based on these two queries, we know that the one unit of ASIN B00158THNW recorded as RPI ID 5739750 had two problems. It was
an OVERAGE on the PO and it was delivered to the WRONG_DC.

If we join the two tables, the one record in the first table is duplicated for each record in the second table, including QUANTITY:

SELECT /*+ use_hash(rpi,rpl) */


rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, rpl.RECEIVE_PROBLEM_TYPE
, rpi.QUANTITY
FROM O_RECEIVE_PROBLEM_ITEMS rpi
JOIN O_RPI_PROBLEM_LIST rpl
ON rpi.RECEIVE_PROBLEM_ITEM_ID = rpl.RECEIVE_PROBLEM_ITEM_ID
AND rpi.WAREHOUSE_ID = rpl.WAREHOUSE_ID
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750);

RECEIVE_PROBLEM_ITEM_ RECEIVE_PROBLEM_TY QUANTIT


ID ASIN PE Y
B00158THN
5739750 W OVERAGE 1
B00158THN
5739750 W WRONG_DC 1

Based on this data, one might think that there were 2 units that arrived, not 1. The problem gets even less obvious when we
aggregate the query, counting the RECEIVE_PROBLEM_TYPE and summing the QUANTITY from our results:

SELECT /*+ use_hash(rpi,rpl) */


rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, COUNT(rpl.RECEIVE_PROBLEM_TYPE)
, SUM(rpi.QUANTITY)
FROM O_RECEIVE_PROBLEM_ITEMS rpi
JOIN O_RPI_PROBLEM_LIST rpl
ON rpi.RECEIVE_PROBLEM_ITEM_ID = rpl.RECEIVE_PROBLEM_ITEM_ID
AND rpi.WAREHOUSE_ID = rpl.WAREHOUSE_ID
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750)
GROUP BY rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN;

RECEIVE_PROBLEM_ITEM_I COUNT(RPL.RECEIVE_PROBLEM_TYP SUM(RPI.QUANTITY


D ASIN E )
B00158THN
5739750 W 2 2

One way we could get around this problem is to use a subquery to aggregate the results from the O_RPI_PROBLEM_LIST table first,
then join them to the O_RECEIVE_PROBLEM_ITEMS table:

SELECT /*+ use_hash(rpi,rpl2) */


rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, rpl2.PROBLEM_COUNT
, rpi.QUANTITY
FROM O_RECEIVE_PROBLEM_ITEMS rpi
JOIN (SELECT /*+ use_hash(rpl) */
rpl.RECEIVE_PROBLEM_ITEM_ID
, rpl.WAREHOUSE_ID
36
, COUNT(rpl.RECEIVE_PROBLEM_TYPE) PROBLEM_COUNT
FROM O_RPI_PROBLEM_LIST rpl
WHERE rpl.RECEIVE_PROBLEM_ITEM_ID IN (5739750)
GROUP BY rpl.RECEIVE_PROBLEM_ITEM_ID
, rpl.WAREHOUSE_ID
) rpl2
ON rpi.RECEIVE_PROBLEM_ITEM_ID = rpl2.RECEIVE_PROBLEM_ITEM_ID
AND rpi.WAREHOUSE_ID = rpl2.WAREHOUSE_ID
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750);

RECEIVE_PROBLEM_ITEM_ PROBLEM_COUN QUANTIT


ID ASIN T Y
B00158THN
5739750 W 2 1

Now we get the proper results, showing the quantity of 1 unit, with 2 problems.

Notice that we aliased the column COUNT(rpl.RECEIVE_PROBLEM_TYPE) to PROBLEM_COUNT in our subquery, then referred to the
column by its alias in the outer query. Because the subquery is executed by Oracle first, and the results are saved as a table that’s
then used for the outer query, any column aliases in the subquery are now the column names of that temporary table, and that’s
how you must refer to them in the outer query.

This type of subquery is something that can be used any time you have two tables with a different grain of data that you need to
join, such as when you want to join PO ASIN information from D_DISTRIBUTOR_ORDER_ITEMS to PO ASIN Shipment information
from D_DISTRIBUTOR_SHIPMENT_ITEMS.

DECODE & CASE

 The DECODE() Function


 The CASE Function

The DECODE() Function


DECODE() is one of SQL’s functions that fills the need of If-Then functionality. It’s essentially a way to translate or decode the values
in a column to another value. The format is DECODE(A,B,C,D) – which functions as: if A is equal to B, then return C, otherwise,
return D. It’s very much like the Excel function = IF(A=B,C,D).

The first value (A) is generally a column in one of the tables you’re querying, and B is a value that would be found in that column of
that table. C is what you want that value translated to in your results, and D is what you want returned in that column of your
results if A doesn’t match B. The B and C spots in the function can be repeated, giving you the ability to translate any of several
values in a single column to new values in your results (e.g. DECODE(A,B1,C1,B2,C2,B3,C3,D) ).

One example is the need to translate Order Type numbers to Order Type codes – such as translating the number 17 to NP and 9 to
LA – because PO Order Type is stored in all the key tables (e.g D_DISTRIBUTOR_ORDERS) as a number. There is a table in the Data
Warehouse that translates the number to text, but it doesn’t translate it to the two character code folks are familiar with:

SELECT /*+ use_hash(vot) */ VENDOR_ORDER_TYP


vot.VENDOR_ORDER_TYPE E VENDOR_ORDER_TYPE_DESC
, vot.VENDOR_ORDER_TYPE_DESC 0 None Specified / Distributor O
FROM VENDOR_ORDER_TYPES vot 2 Special Order
WHERE vot.VENDOR_ORDER_TYPE IN (0,4);

As the sample above shows, the table includes a description, which isn’t always clear. For example, ‘Pubdirect Order’ is the
Advantage Order Type, and ‘None Specified/Distributor O’ is actually DS. Most folks seem to talk about these in terms of Order Type

37
Codes (like DS and LA), so it can be very useful to translate to those values when you run your queries. You can use DECODE() to do
this:

SELECT /*+ use_hash(vot) */ VENDOR_ORDER_TYP


vot.VENDOR_ORDER_TYPE E DECODE(VOT.VENDOR_ORDER_TYPE,0
, DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',NULL) 0 DS
FROM VENDOR_ORDER_TYPES vot 4
WHERE vot.VENDOR_ORDER_TYPE IN (0,4);

Here we’ve decoded the VENDOR_ORDER_TYPE column, and anytime the value in that column is 0, we return ‘DS’ as the result,
otherwise it returns NULL. So for 0 we get DS, and for 4 we get a null returned.

Translating one value is useful, but DECODE() can be used for multiple values, allowing you to specify what you want returned for
each. Here’s an example where we are decoding multiple values (0, 2, and 4) to what we want to see returned (DS, SP, and PD):

SELECT /*+ use_hash(vot) */


vot.VENDOR_ORDER_TYPE
, DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',2,'SP',4,'PD','Unknown')
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,2,3,4);

VENDOR_ORDER_TYP DECODE(VOT.VENDOR_ORDER_TYPE,
E 0
0 DS
2 SP
3 Unknown
4 PD

In this example, the DECODE is translating the values in the column vot.VENDOR_ORDER_TYPE. When it finds a value in that column
that’s equal to 0, it returns the text string ‘DS’. When it finds 2, it returns ‘SP’. When it finds 4, it returns ‘PD’, and if it finds anything
else (3 in this example) it returns ‘Unknown’.

The values returned can be text (as in the examples above), a number, or even another column. For example, we could change the
‘Unknown’ value in the above query to the VENDOR_ORDER_TYPE_DESC column:

SELECT /*+ use_hash(vot) */


vot.VENDOR_ORDER_TYPE
, DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',2,'SP',4,'PD',vot.VENDOR_ORDER_TYPE_DESC)
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,2,3,4);

VENDOR_ORDER_TYP DECODE(VOT.VENDOR_ORDER_TYPE,
E 0
0 DS
2 SP
3 Publisher Order
4 PD

Instead of Unknown, we get the value of the column VENDOR_ORDER_TYPE_DESC for any column that doesn’t match one of the
value we’ve already defined in the DECODE() statement – in this case, order type number 3.

You can keep adding pairs of values to translate various values, up to about 125 pairs. For example, here’s the full decode to
translate the numbers to the code for most of the current Order Type values:

SELECT /*+ use_hash(vot) */


vot.VENDOR_ORDER_TYPE
,
DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',1,'OP',2,'SP',3,'PB',4,'PD',6,'SU',7,'IS',8,'MS',9,'LA
',10,'LB',11,'LC',12,'LD',13,'SA',14,'SB',15,'SC',16,'SD',17,'NP',18,'RE',19,'VP',20,'MU',
38
21,'T1',22,'T2',23,'T3',24,'B1',25,'B2',26,'B3',27,'M1',28,'M2',29,'M3',30,'R1',31,'R2',32
,'R3',33,'PT',34,'DR',35,'MX', vot.VENDOR_ORDER_TYPE) AS ORDER_TYPE
FROM VENDOR_ORDER_TYPES vot;

39
The CASE Function
The CASE function is similar to DECODE, but with more advanced options. With CASE, you can evaluate not just if a column is equal
to a value, but if an expression is true, and return your results depending on whether or not that expression is true.

Here’s the same example we explored with DECODE above, but using CASE:

SELECT /*+ use_hash(vot) */ VENDOR_ORDER_TYP CASEWHENVOT.VENDOR_ORDER_TYP


vot.VENDOR_ORDER_TYPE E E=
, CASE WHEN vot.VENDOR_ORDER_TYPE = 0 THEN 'DS' 0 DS
WHEN vot.VENDOR_ORDER_TYPE = 2 THEN 'SP' 2 SP
WHEN vot.VENDOR_ORDER_TYPE = 4 THEN 'PD' 3 Unknown
ELSE 'Unknown' END 4 PD
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,2,3,4);

In this example, we again evaluated the vot.VENDOR_ORDER_TYPE column, using the equal operator to see if it was equal to various
values. This is functionally identical to what DECODE does, just in a different way.

CASE really shows its value when you use other types of operators (rather than equals), or when it’s evaluating multiple columns.

In the example below, we evaluate the DEAL_CODE column to see if it’s NULL, and if it’s not NULL, return ‘Deal Buy’. If it is NULL,
then we move on to the next WHEN/THEN combo, which checks the ORDER_TYPE column to see if it’s a 9, in which case it returns
‘LA’, and so on.

SELECT /*+ use_hash(ddo) */


ddo.ORDER_ID
, ddo.DEAL_CODE
, ddo.ORDER_TYPE
, CASE WHEN ddo.DEAL_CODE IS NOT NULL THEN 'Deal Buy'
WHEN ddo.ORDER_TYPE = 9 THEN 'LA'
WHEN ddo.ORDER_TYPE = 2 THEN 'SP'
ELSE NULL END
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID IN ('L3937793','B3074533','Q9166581');

ORDER_I ORDER_TYP CASEWHENDDO.DEAL_CODEISNOTNUL


D DEAL_CODE E L
B307453
3   2 SP
L3937793   9 LA
Q916658 D000000106
1 9 9 Deal Buy

It’s important to note that the first WHEN/THEN combination in a CASE statement is the first that’s evaluated, and if it’s true, the
following WHEN/THEN combinations aren’t evaluated, even if they’re true. In the example above, the first evaluation found that the
DEAL_CODE column was NOT NULL for the third record, so it returned ‘Deal Buy’ and stopped evaluating the rest of the CASE
statement. So even though the ORDER_TYPE was 9 (the second WHEN/THEN combination), because the previous WHEN/THEN was
true, the CASE statement stopped. So the order you enter your WHEN/THEN combinations in a CASE statement can impact your
results.

40
The NVL2 Function
Back in Week X we discussed the NVL() function, which translates any Null values to whatever you specify, and leaves Non-Null
values as is. A related but slightly more powerful function is NVL2(). NVL2() gives you the option of translating the Non-Null values
to something else, too.

The format is NVL2(A,B,C) – where A is the column or element to evaluate, B is what to return if it’s NOT Null, and C is what to return
if it IS Null.

For example, we might want to return a ‘N’ if we find a Null and return a ‘N’ if we find a Non-Null, as when we’re defining which
ASINs are Textbooks:

SELECT /*+ use_hash(dma,dmma) */


dma.ASIN
, dma.ITEM_NAME
, dmma.TEXTBOOK_TYPE
, NVL2(dmma.TEXTBOOK_TYPE,'Y','N')
FROM D_MP_ASINS_ESSENTIALS dma
LEFT JOIN D_MP_MEDIA_ASINS dmma
ON dma.MARKETPLACE_ID = dmma.MARKETPLACE_ID
AND dma.ASIN = dmma.ASIN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN ('0596006322','B00167YLVA','B004GEB67C');

TEXTBOOK_TYP
ASIN ITEM_NAME E IS_TEXTBOOK
B004GEB67
C Beginning SQL Joes 2 Pros: (SQL Exam Prep Series 70-433 Volume 1 of 5) (DVD)   N
B00167YLVA Fiskars SQL-7312 Squeeze Paper Punch, Large, Comma, Comma, Chameleon   N
0596006322 Mastering Oracle SQL, 2nd Edition unknown Y

Notice that a LEFT JOIN was used, because not all ASINs are found in the D_MP_MEDIA_ASINS table.

Data Type Consistency


When using functions like NVL(), DECODE(), CASE and NVL2() that convert values, it’s important to keep data types in mind (meaning
character strings, dates and numbers). These functions may fail if you mix data types in the outputs. For example, the query below
mixes numerical values (15+2) with text strings (‘N’) in the NVL2() function, resulting in an ORA-01722: invalid number error.

SELECT /*+ use_hash(dma,dmma) */


dma.ASIN
, dma.ITEM_NAME
, dmma.TEXTBOOK_TYPE
, NVL2(dmma.TEXTBOOK_TYPE,15+2,'N')
FROM D_MP_ASINS_ESSENTIALS dma
LEFT JOIN D_MP_MEDIA_ASINS dmma
ON dma.MARKETPLACE_ID = dmma.MARKETPLACE_ID
AND dma.ASIN = dmma.ASIN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN ('0596006322','B00167YLVA','B004GEB67C');

So when using these helpful functions, be sure to keep their outputs all of the same data type.

41

Das könnte Ihnen auch gefallen