You are on page 1of 180

S2:

Data Structures and SQL


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30


APIs

Phases in the DM Process: CRISP-DM


Business Understanding

Data Understanding Data PreparaIon Modeling EvaluaIon Deployment

www.crisp-dm.org
4

CRoss Industry Standard Process-DM


Business Understanding: Understanding project objecIves and data mining problem idenIcaIon Data Understanding: Capturing, understand, explore your data for quality issues Data Prepara2on: Data cleaning, merge data, derive aUributes etc. Modeling: Select the data mining techniques, build the model Evalua2on: Evaluate the results and approved models Deployment: Put models into pracIce, monitoring and maintenance
5

Data Structures
A data structure is a scheme for organizing data in the memory of a computer. Some of the more commonly used data structures include lists, arrays, stacks, queues, heaps, trees, and graphs.
Binary Tree

Data Structures
The way in which the data is organized aects the performance of a program for dierent tasks. Computer programmers decide which data structures to use based on the nature of the data and the processes that need to be performed on that data.

Binary Tree

Example: A Queue
A queue is an example of commonly used simple data structure. A queue has beginning and end, called the front and back of the queue. Data enters the queue at one end and leaves at the other. Because of this, data exits the queue in the same order in which it enters the queue, like people in a checkout line at a supermarket.

Example: A Binary Tree


A binary tree is another commonly used data structure. It is organized like an upside down tree. Each spot on the tree, called a node, holds an item of data along with a leY pointer and a right pointer.

Binary Tree

Example: A Binary Tree


The pointers are lined up so that the structure forms the upside down tree, with a single node at the top, called the root node, and branches increasing on the leY and right as you go down the tree.
Binary Tree

Choosing Data Structures


By comparing the queue with the binary tree, you can see how the structure of the data aects what can be done eciently with the data.

Choosing Data Structures


A queue is a good data structure to use for storing things that need to be kept in order, such as a set of documents waiIng to be printed on a network printer.

Choosing Data Structures


The jobs will be printed in the order in which they are received. Most network print servers maintain such a print queue. .

Choosing Data Structures


A binary tree is a good data structure to use for searching sorted data. The middle item from the list is stored in the root node, with lesser items to the leY and greater items to the right.

Choosing Data Structures


A search begins at the root. The computer either nd the data, or moves leY or right, depending on the value for which you are searching. Each move down the tree cuts the remaining data in half.

Choosing Data Structures


Items can be located very quickly in a tree. Telephone directory assistance informaIon is stored in a tree, so that a name and phone number can be found quickly.

Lecture Outline
See the Access Database File (Lab1_Database_BBB.accdb) for this lecture on the course website ! Relational Databases
Tables Columns Rows

Queries
Query Design View Structured Query Language (SQL View)
Selecting rows meeting a condition Algebraic operations Text matching (with LIKE) Computing summary figures Joining multiple tables

Inserting, updating, and deleting data


17

Databases: Overview
Overview what databases are, and what a database management system is. Present key concepts and vocabulary. Demonstrate Microsoft Access and the BBB database.
BBB: Bookbinders Book Club We are using a reduced version, of a real database, for class discussion Full version exists

18

Rationale
Why do we have databases?
Because data are valuable and essential for conduct of business Because data are created in huge volumes, and need to be retained (on disk) and efficiently accessed

Why do we have DBMSs?


To manage the huge amounts of data and to facilitate use of these data

Our focus will be on extracting data from databases in useful ways. a.k.a. querying.
19

RelaIonal Databases
Relational databases
Databases that use a series of logically related two-dimensional tables to store their information Tables are comprised of fields/records, which in turn contain field values

Field
Last Name SS#

Student
DOB 06/11/84 1/1/85 12/31/81 3/3/88

Table
Major IS FIN MKT ACC
20

Record

Smith Kim

100201122 200202222 300201232 999132212

Field value

Davis Pat

RelaIonal Databases
Relational Database Tables Records Fields Field values Bytes, bits
Field
Last Name SS# 100201122 200202222 300201232 999132212

Student
DOB 06/11/84 1/1/85 12/31/81 3/3/88

Table
Major IS FIN MKT ACC
21

Record

Smith Kim

Field value

Davis Pat

Why do we need relaIonal databases?


Problems arise when data is stored in one big file

Order# Date

Customer ID Last Name First Name Address

ISBN

Book Name

Author

Price

1 9/1/03 C1001 2 9/2/03 C1004 3 9/3/03 C1002 4 9/4/03 C1003 5 9/5/03 C1003 6 9/6/03 C1001 7 9/7/03 C1002 8 9/8/03 C1001

Bezos Sproull Student Gates Gates Bezos Student Bezos

Jeff Lee Pat Bill Bill Jeff Pat Jeff

1 Amazon Plaza Dean's Office, Stern School, New York

#0465039138 Code and other laws of cyberspace Lessig, Lawrence Digital Copyright: Protecting #1573928895 Intellectual Property on the Internet Litman, Jessica Haag, Stephen Barabasi, AlbertLaszlo Rheingold, Howard Rheingold, Howard Litman, Jessica Rheingold, Howard

$25.00 $55.00 $98.75 $34.95 $29.95 $29.95 $55.00 $29.95

Tisch LC-12, New York #0072952849 MIS in the Information Age Microsoft Corporation, Redmond Microsoft Corporation, Redmond Linked: The New Science of #0738206679 Networks

Smart Mobs: The Next Social #0738206083 Revolution Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution Digital Copyright: Protecting Tisch LC-12, New York #1573928895 Intellectual Property on the Internet Smart Mobs: The Next Social 1 Amazon Plaza #0738206083 Revolution

22

Insertion anomalies

Anomalies in un-normalized data


Inability to insert a piece of information about an object that can exist independently of another object without having to insert a piece of information about the other object Example: Adding a new customer/book before it is ordered

Deletion anomalies
The loss of a piece of information about one object when a piece of information about a different object is deleted Example: Deleting an order => deleting a customer/book

Update anomalies
A need to change the same piece of information about an object multiple times Example: Changing Bill Gates address
23

A normalized version of Amazon s data


Normalization
The process of assuring that a database can be implemented effectively as a set of two-dimensional tables Prevents insertion, deletion and update anomalies

24

Benefits of Normalization
Greater overall database organizaIon Minimize data redundancies Data consistency within the database A more exible database design Data can be used more producIvely A beUer handle on database security

Disadvantage of Normalization
Reduced database performance because database must locate requested tables and join data - requires additional processing logic A lot of Planning goes in to the design of a database

Primary keys

ConnecIng tables together

A field (or group of fields in some cases) that uniquely describes each record in a table Examples: Customer ID, ISBN, Order#

Foreign keys
A field that is a primary key in one table and appears in a different table (though not as the primary key) Examples: Customer ID in Orders

Integrity constraints
rules (most built in) that help ensure the quality of the information. Not NULL
26

Concepts & Terminology


Database: an organized collection of information about entities Sales, business transactions, personnel, inventory, products, Entities have attributes. e.g. Person entity has attributes: Last Name, First Name, IDnumber, Database management system (DBMS): a computerized recordkeeping system. DBMS: software for managing (creating, modifying, viewing, retrieving information, etc.) from databases. Note: We focus on Relational DBMSs (RDBMs), the kind most widely used in business
27

Let s Look at Microsoft Access


Microsoft Access: A RDBMS, part of the Microsoft Office Professional suite, bundled with Word, Excel, PowerPoint, NOT an industrial strength DBMS Very useful for small applications, data analysis, decision making Works well with Excel; easy to use

28

ACCDB files
Access 2007/2010 files end in .accdb Previous versions ended in .mdb One database = one .accdb file Access handles one database file at a time. Each database consists of a number of tables. In other databases, you might find one file per table, rather than just one file per database. You will notice a temporary .laccdb file created whenever you open your Access database: this is a lock file that prevents problems when there is concurrent access by multiple users.

29

Industrial Strength Databases


Different brands of RDBMSs:
Commercial providers: Oracle, Microsoft SQL Server, DB2, Sybase, Informix, Open Source (free): MySQL, PostgreSql, All share a common view of things: tables All may be interacted with using SQL Knowledge of one directly transfers to all

30

Launch Microsoft Access


Open Lab1_Database_BBB.accdb The database window: your portal into the database

There are 6 tables in this database. Double-click a table to open it.

31

Relationships between tables


To explore the relationships between tables Go to Database Tools tab Click on Relationships You will see an overview of tables and their interconnections (see next slide)

32

Relationships between Tables

Primary key for Product table is ProdNum

33

Concepts & Terminology


Tables and Relationships In an RDBMS, all records (all data) are kept in tables (a.k.a. relations).
A relation = A table A relationship = A cross-reference between tables

The database consists of tables, each table having many rows (a.k.a. records or tuples). The 'Relationships View' of the database shows the tables and their attributes (columns), and the crossreferences between tables.
34

Structure vs Content
of a Database Table To view or edit the structure of a table:
Right-click it, then choose Design View or Table Design

To view or edit the contents of a table:


Double-click it, or Right-click it, then choose Open

35

Structure of a Table
Design View of Customers Table (ACCTNUM is Key)

Primary key for Customer table is AcctNum


36

Concepts & Terminology


Design View of a Single Table (e.g. Customers) Every table is made-up of attributes Every attribute has a data type Every table has a primary key: one or more attributes whose values uniquely identify each table record. Primary keys are shown with a key icon in Design View

37

Contents
of a Database Table If in Design View, and you want to view or edit the contents of the table:
Choose Datasheet View from the Design tab

38

Data in the Table


A Portion of the Customer Table

Records (rows) Attributes (columns) Fields (row elements or "cells").

39

Design View of Purchase


A two-attribute primary key: ACCTNUM and CAMPNUM Details of primary key assignment and database design are beyond the scope of this course. Typically, though, it is best to create a single, system generated (auto-incremented) column, that uniquely identifies each row.

40

Indexes
to improve performance
To improve the performance of your database queries, you should create indexes on any columns that are regularly looked up. Creating an index is easy: simply indicate Indexed as Yes in the settings for the column in Design view.

41

Key Points
Tables: hold all data in a database Rows = records Columns = attributes Attributes: have a data type (number, text, date, etc.) Fields: at the intersection of a row and a column Primary Keys: attributes whose values uniquely refer to any given row in a table

42

Single-Table Queries: Overview


Querying single tables:
Intuitions Microsoft Access Query Design View SQL Microsoft SQL-details

Database design intuitions


(Why split one table into many small ones)

43

Querying single tables


Table structure
Attributes (columns) Records (rows) Column order and row order do not matter

Queries subdividing a table


Pick the table you want to look at Pick the column(s) you wish to see Pick the row(s) you wish to see

44

Querying single tables


BBB Customers: What data do we have about customers
Name (first, middle, last) Address (street num, street, city, state, zip) Demographics (gender, money non-book purchases)

What do we want to know

45

Querying single tables: Intuitions


Names of all male customers
ACCTNUM 13015 13016 13017 13018 13971 14200 17640 17641 17642 17643 17644 17645 17646 19630 19631 19632 19633 19634 19635 23088 23089 23090 23360 23361 FIRSTNAME DENNIS LOIS JANET ANN SAMUEL EARL AUDREY DOROTHY SANDRA GERALD BETTY LYNN MARY CATHERINE EDITH KIMBERLY MARILYN JOSEPH ROBERT JAMES PATRICIA MARY CATHERINE HARRIET MIDDLEN AME FAMILYNAME N SKRYPEK T ENGLANDER SIECK A MARUZZI E HAYES J TUCHMAN S KAUFFMANN VACCHER M POSNOCK C CANNON K DEL HUEBBERS C TALLUTO R CASANO E WISCHHUSEN A SCHOENBERG FLAHERTY L FAIELLA A SANDLER M MILLER WERNER J MILLER J IMPERATORE A FAILLA STREET NUMBER STREET 1260 34th AV 105 9th ST 767 3rd ST 1948 10th ST 382 7th ST 1707 29th ST 1002 RICHMOND HILL AV 1751 17th AV 1016 MOON AV 1841 2nd AV 1794 MARIA ST 1357 14th AV 1861 2nd ST 965 7th ST 1520 1st ST 1081 18th ST 33 24th AV 1322 4th AV 316 18th ST 409 11th AV 1535 MARIS ST 379 5th ST 1671 24th ST 140 JAMES ST CITY PRT WASHINGTN WILLINGBORO FOXBORO TEWKSBURY CENTERVILLE LAKEWOOD WEST ORANGE BROOKLYN RIDGWAY LEXINGTN MULLICA ANDOVER NEPTUNE SCRANTON PITTSBURGH KINGSVILLE WARREN CLEVERDALE PHILADELPHIA SPRING LONG BEACH FRAMINGHAM BROOKLYN REHOBOTH STATE NY NJ MA MA PA NJ NJ NY PA MD NJ MA NJ PA PA MD NJ NY PA NY NY MA NY DE ZIP GENDER MONEY 11050 M 164 8046 F 254 2035 F 169 1876 F 65 16404 M 81 8701 M 135 7052 F 236 11234 M 68 15853 F 180 20653 M 159 8062 F 112 1810 F 41 7753 F 102 18503 F 209 15243 F 89 21087 F 262 7059 F 170 12820 M 200 19148 M 93 10977 M 63 11561 F 137 1701 F 50 11226 F 187 19971 F 190

46

Querying single tables: Intuitions


Names of all male customers

MIDDLE FIRSTNAME NAME DENNIS N SAMUEL E EARL J DOROTHY GERALD C JOSEPH L ROBERT A JAMES M

FAMILYNAME SKRYPEK HAYES TUCHMAN VACCHER CANNON FAIELLA SANDLER MILLER

47

Querying single tables: Intuitions


Query variations on single tables: 1. Pick attributes (columns) 2. Pick records (rows) specifying a single condition
(e.g. Pick all rows where Gender = 'M')

3. 4. 5. 6. 7. 8. 9.

Pick rows using two or more conditions Selecting rows with attributes you do not display Using string matches to pick rows Renaming an attribute (column) name Arithmetic operations in a query Sorting the output (ordering the rows) Mathematical aggregations in a query

48

Querying single tables:


Microsoft Access Query Design View
Go to the Create tab Choose either Query Wizard or Query Design Pick one table (for querying single tables) Pick the columns you want to retrieve, and specify the criteria for the rows you want to retrieve. Once in Query Design view, choose Design tab, then Run (! button on the toolbar) OR Datasheet View to see your query results.

49

Query Wizard

50

Query Wizard

Specify which table your data is in.

Press > button to select columns one at a time. Press >> button to select all columns.
51

Query Wizard

Choose Detail to retrieve individual rows that meet certain criteria. Choose Summary to compute aggregates (e.g. min, max, average, ..)
52

Query Wizard

Name your query, so that you can save it and view / modify it later
53

Query Design View

Select the Table you want to query, then press Add the button. Press Done button when finished.

54

Query Design View


Selecting Columns

Double-click the columns you want to include in your query, or drag-and-drop them from the table into the grid view at the bottom Double-click * to include them all

55

Query Design View


Selecting Columns

The grid view at the bottom shows you whether the column is visible (tick Show row). It also lets you specify filter conditions, using the Criteria row.
56

Query Design View


Selecting Columns

View the query results

Show tables, so you can add more columns to your query results.

Choose whether you want to: View data (SELECT) Insert a table (MAKE) Insert a row (APPEND) Edit a row (UPDATE) Produce aggregate reports (CROSSTAB) Remove rows of data (DELETE)

57

Viewing Saved Queries


Click on the down arrow to the right of the Tables heading in the database view Then choose Queries

58

Query Design View


Query 1: Picking attributes (columns)

Drag and drop

59

Query Design View


Viewing the query results

60

Query Design View


Saving the query
Right-click in the title bar for the Query Then choose Save from the pop-up menu Choose an intuitve and descriptive name (e.g. Names of our customers )

61

Viewing Saved Queries


After you save the SELECT query, it will look like this in the query list on the left

62

Query Results View


Switching back to the query design

63

Query Design View


Making a copy of a query
Right-click in the Query list on the left Choose Copy Right-click again, and choose Paste Choose an intuitive name for your new query

64

Query Design View


Query 2: Picking records (rows)
Drag and drop your criterion into the grid Specify the filter condition (e.g. = PA )

65

Query Design View


Query 2: Picking records (rows)
Now press Run or press DataSheet View to view your results Notice how the results only include customers who live in Pennsylvania (PA)

66

Query Design View


Query 3.1: Picking rows with two conditions - AND

67

Query Design View


Query 3.1: Picking rows with two conditions - AND
Here there are no records satisfying both conditions.

68

Query Design View


Query 3.2: Picking rows with two conditions - AND

69

Query Design View


Query 3.2: Picking rows with two conditions - AND
The ACCTNUM column is numeric, so specifying a string condition causes a data type mismatch error..

70

Query Design View


Query 3.3: Picking rows with two conditions - OR

71

Query Design View


Query 3.3: Picking rows with two conditions - OR

72

Query Design View


Query 4: Picking rows using attributes you do not display
Uncheck the Show box if you don t want to show the column in your results

73

Query Design View


Query 4: Picking rows using attributes you do not display

Notice that the customers Last Name is not shown, because we unchecked the Show box for Last Name in the Query Design View in the previous slide. 74

Query Design View


Query 5: Using string matching to pick rows
LIKE means match wildcards The wildcard * means match any number of characters So LIKE *IL* means find any strings that have IL anywhere inside them. That is, anything before IL and anything after. You can also use ? to match a single character In most other databases: use % instead of * use _ instead of ? You must use the keyword LIKE if you use a wildcard (e.g. * or ?). If you forget LIKE the database will look for an actual * or ? and you won t get the matches you wanted !!

75

Query Design View


Query 5: Using string matching to pick rows

Notice here we have all customers whose familyname has 'IL' in the middle of the string
76

Query Design View


Query 6: Renaming an attribute (column) name

[New column name] : [Old column name]

77

Query Design View


Query 6: Renaming an attribute (column) name

New column name appears in results, in place of old column name.

78

Query Design View


Query 7: Arithmetic operations in a query

Simply type the formula in the Field row. No loops or VBA code are necessary.

79

Query Design View


Query 7: Arithmetic operations in a query
SQL is called a declarative language, because we tell the database what we need, rather than how to get it, and the database gets the result in the most efficient way possible. There is no need to write loops. In contrast, VBA is a procedural language: we have to write loops to tell VBA exactly how to get the data we want. Procedural languages are far more verbose, and require more code to get the same job done.

80

Query Design View


Query 8: Sorting the output (ordering the rows)

81

Query Design View


Query 8: Sorting the output (ordering the rows)
Notice our results are now sorted by Tax, from highest to lowest

82

Query Design View


Query 9.1: Mathematical aggregations Average for all customers

Other aggregate functions: Min, Max, Count, See the Query Design Wizard.
83

Query Design View


Query 9.1: Mathematical aggregations Average for all customers

We can see the average Money for all customers in the data set

84

Query Design View


Query 9.2: Mathematical aggregations Average for a subset of customers

Aggregate function returns one row.

Non-Aggregate criterion returns multiple rows.

85

Query Design View


Query 9.2: Mathematical aggregations Average for a subset of customers

Aggregate function returns one row.

Non-Aggregate criterion returns multiple rows.

86

Query Design View


Query 9.2: Mathematical aggregations Average for a subset of customers

Solution: Do not show the non-aggregate filter condition column. That will fix the error.
87

Query Design View


Query 9.2: Mathematical aggregations Average for a subset of customers

We can see the average Money for all customers in Pennsylvania

88

Query Design View


Query 9.3: Mathematical aggregations Average for each group of customers

Sort by average money spent for the state.

Group customers by state.


89

Query Design View


Query 9.3: Mathematical aggregations Average for each group of customers
We can see the average money spent for customers in each state, and we ve sorted by average money so we can easily see which are most lucrative states are.

90

Query Design View


Building Complex Expressions
Right-click in the Field row. Then choose Build to build a more complex expression.

91

SQL
General query framework SQL is the language underlying Microsoft Access, and is the query language standard used by all major RDBMS

SELECT <pick attribute(s)>


in the order you want to see them

FROM <pick the table> [WHERE] <pick the row(s)>

92

Editing SQL
Just like "Editing a recorded VBA Macro : Write a query using Query Design View and then edit using SQL View or Open the SQL View from a new query and write in SQL from scratch

93

SQL View
Viewing the SQL underlying all your graphical query designs
For any query, you can view the SQL that Microsoft Access wrote for you, by simply choosing View and then SQL View. Editing the SQL directly is quicker and more powerful, so expert users tend to use SQL view, whereas novices tend to use the Graphical Design view.

94

SQL
Query 1: Picking attributes (columns)

SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER SELECT STATE, ACCTNUM, FAMILYNAME FROM CUSTOMER

Attribute order does not matter

95

SQL
Query 2: Picking records (rows)

SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA'


ACCTNUM FAMILYNAME 17642 POSNOCK 19630 CASANO 19631 WISCHHUSEN 19635 SANDLER 50670 KEYSER STATE PA PA PA PA PA ACCTNUM FAMILYNAME 19630 CASANO 50670 KEYSER 17642 POSNOCK 19635 SANDLER 19631 WISCHHUSEN STATE PA PA PA PA PA

Order of rows in output does not matter, and is not necessarily predictable
96

SQL
Query 3.1: Picking rows with two conditions - AND
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME = 'MILLER' SQL is case sensitive within quotes Microsoft Access is NOT SELECT acctnum, familyname, state FROM customer WHERE state = 'PA' and familyname = 'Miller' SELECT acctnum, familyname, state FROM customer WHERE state = 'PA' AND familyname = 'MILLER'
97

SQL
Query 3.2: Picking rows with two conditions - AND Remember that all attributes have a datatype SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND ACCTNUM = 'MILLER'

ACCTNUM is a NUMBER MILLER is a STRING


98

SQL
Query 3.3: Picking rows with two conditions - OR

SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' OR FAMILYNAME = 'MILLER' SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND (FAMILYNAME = 'MILLER' OR FAMILYNAME = 'SMITH')

99

SQL
Query 4: Picking rows using attributes you do not display SELECT ACCTNUM, STATE FROM CUSTOMER WHERE STATE = 'PA' OR FAMILYNAME = 'MILLER'

100

SQL
Query 5: Using string matching to pick rows*
SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '%IL%' SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE 'IL%' SELECT ACCTNUM, FAMILYNAME, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '_IL'

% matches any number of characters (Use * in Access)

_ matches any one character (Use ? in Access)

Standard SQL notation shown above Microsoft Access notation differs as shown

SQL
Query 6: Renaming an attribute (column) name

SELECT ACCTNUM, FAMILYNAME AS Last_Name, STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '*IL*' SELECT ACCTNUM, FAMILYNAME AS [Last Name], STATE FROM CUSTOMER WHERE STATE = 'PA' AND FAMILYNAME LIKE '*IL*' Bracket notation to allow white-space is unique to Microsoft

102

SQL
Query 7: Arithmetic operations in a query

SELECT ACCTNUM, FAMILYNAME AS Last_Name, MONEY * 0.10 AS Taxed_Expense FROM CUSTOMER WHERE FAMILYNAME LIKE '*IL*' SELECT ACCTNUM, FAMILYNAME AS Last_Name, (MONEY + ACCTNUM)*1.09 AS Random_nonsense FROM CUSTOMER WHERE FAMILYNAME LIKE '*IL*'

103

SQL
Query 8: Sorting the output (ordering the rows)
SELECT ACCTNUM, FAMILYNAME AS Last_Name, MONEY * 1.09 AS Taxed_Expense FROM CUSTOMER You must use the original WHERE FAMILYNAME LIKE '%IL%' column name in your ORDER BY clause. ORDER BY FAMILYNAME ASC Use ASC for Ascending order or
DESC for Descending order.

SELECT ACCTNUM, FAMILYNAME AS Last_Name, MONEY * 1.09 AS Taxed_Expense FROM CUSTOMER If you use the new column name, it won t work. WHERE FAMILYNAME LIKE '%IL%' ORDER BY Last_Name DESC

104

SQL
Query 9.1: Mathematical aggregations Average for all customers
SELECT AVG(MONEY) AS Average_Expense FROM CUSTOMER

105

SQL
Query 9.2: Mathematical aggregations Average for a subset of customers

SELECT AVG(MONEY) AS Average_Expense, STATE FROM CUSTOMER WHERE STATE = "PA"

SELECT AVG(MONEY) AS Average_Expense, STATE FROM CUSTOMER

Be careful when selecting both aggregations and ordinary attributes


106

SQL
Query 9.2: Mathematical aggregations Average for a subset of customers

SELECT AVG(MONEY) AS Average_Expense, STATE FROM CUSTOMER WHERE STATE = "PA"

Be careful when selecting both aggregations and ordinary attributes

107

SQL
Query 9.3: Mathematical aggregations Average for each group of customers

SELECT Avg(CUSTOMER.[Money]) AS [Average Money], CUSTOMER.State FROM CUSTOMER GROUP BY CUSTOMER.State ORDER BY Avg(CUSTOMER.[Money]) DESC;

108

Microsoft Flavored SQL


Attribute names in the SELECT clause are prefaced by table name
(e.g. CUSTOMER.ACCTNUM)

Use of the * versus % in matching multiple characters in LIKE expression Use of ? Versus _ in String matching single character in LIKE expression Use of [ ] in renaming to allow white spaces e.g. CUSTOMER.FAMILYNAME AS [Last Name] Case sensitivity (Microsoft Access is case insensitive e.g. State = "PA" vs. State = "pa")
109

WARNING

Case Sensitivity in SQL


Text in Microsoft Access is not case-sensitive. However, many other databases are case-sensitive. To avoid problems with case-sensitivity in SQL queries, use the UPPER() function, to convert the text to uppercase, before using it !
SELECT * FROM CUSTOMER WHERE UPPER(STATE) = PA

110

Key Points
Querying is simple
Pick the table Pick the columns Pick the rows (using some simple or complex criteria)

Create queries just like VBA Macros


Begin in Query Design View; Edit in SQL View Write queries in SQL View from scratch

Nine (9) simple cases were given as examples. Be wary of exceptions (e.g. SELECT AVG(MONEY), STATE)
111

Multi-Table Queries: Overview


Database design intuitions
(Why split one table into many small ones)

Querying two tables


Intuitions SQL Microsoft Access Query Design View Microsoft SQL-details

Querying more than two tables


112

Laying out Data in a Single Spreadsheet Table


Sort by product
PRODNAME CAR MAINTANANCE CATNAME PRICE CAMPDATE CHANNAME DO-IT-YOURSELF 15 12/1/1987 2 MIN TV SPOT ESPN ACCT NUM QTY PURCHDATE 13971 1 1/21/1988 14200 1 1/8/1988 42763 1 1/13/1988 13018 1 1/26/1987 19635 1 1/9/1987 23361 1 1/22/1987 25508 1 1/26/1987 27028 1 1/20/1987 27259 1 1/22/1987 34652 1 1/23/1987 39403 1 1/18/1987 50670 1 1/26/1987 58775 1 1/11/1987 17646 1 3/8/1987 23088 1 3/20/1987 32698 1 3/18/1987 44985 1 3/16/1987 54690 1 3/20/1987

SECRETS OF FRENCH COOKING COOK

15

12/1/1986 "ROCK STARS" - MAG

2/1/1987 2 MIN TV SPOT ESPN

113

Laying out Data in a Single Spreadsheet Table


Sort by channel
PRODNAME CATNAME PRICE CAMPDATE CHANNAME SECRETS OF FRENCH COOKING COOK 15 12/1/1986 "ROCK STARS" - MAG CAR MAINTANANCE DO-IT-YOURSELF 15 12/1/1987 2 MIN TV SPOT ESPN 2/1/1987 2 MIN TV SPOT ESPN ACCT NUM QTY PURCHDATE 13018 1 1/26/1987 13971 1 1/21/1988 17646 1 3/8/1987 14200 1 1/8/1988 42763 1 1/13/1988 19635 1 1/9/1987 23361 1 1/22/1987 25508 1 1/26/1987 27028 1 1/20/1987 27259 1 1/22/1987 34652 1 1/23/1987 39403 1 1/18/1987 50670 1 1/26/1987 58775 1 1/11/1987 23088 1 3/20/1987 32698 1 3/18/1987 44985 1 3/16/1987 54690 1 3/20/1987

114

Laying out Data in a Single Spreadsheet Table


Filling out the table
PRODNAME CAR MAINTANANCE CAR MAINTANANCE CAR MAINTANANCE SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING SECRETS OF FRENCH COOKING CATNAME DO-IT-YOURSELF DO-IT-YOURSELF DO-IT-YOURSELF COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK COOK PRICE CAMPDATE CHANNAME 15 12/1/1987 2 MIN TV SPOT ESPN 15 12/1/1987 2 MIN TV SPOT ESPN 15 12/1/1987 2 MIN TV SPOT ESPN 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 12/1/1986 "ROCK STARS" - MAG 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN 15 2/1/1987 2 MIN TV SPOT ESPN ACCT NUM QTY PURCHDATE 13971 1 1/21/1988 14200 1 1/8/1988 42763 1 1/13/1988 13018 1 1/26/1987 19635 1 1/9/1987 23361 1 1/22/1987 25508 1 1/26/1987 27028 1 1/20/1987 27259 1 1/22/1987 34652 1 1/23/1987 39403 1 1/18/1987 50670 1 1/26/1987 58775 1 1/11/1987 17646 1 3/8/1987 23088 1 3/20/1987 32698 1 3/18/1987 44985 1 3/16/1987 54690 1 3/20/1987

115

Database Design
Splitting tables Products
PROD CAT NUM PRODNAME NUM 1 ALICE IN WONDERLAND 1 2 PINNOCCHIO 1 21 SECRETS OF FRENCH COOKING 3 31 CAR MAINTANANCE 4 33 GARDENING 4

Campaign
CAMP PROD CHAN NUM NUM PRICE CAMPDATE NUM 21 21 15 12/1/1986 13 221 21 15 2/1/1987 3 201 1 15 2/1/1987 4

Category
CAT NUM 1 2 3 4 5 6 7 CATNAME CHILDREN YOUTH COOK DO-IT-YOURSELF REFERENCE ART GEOGRAPHY

Channels
CHAN NUM 3 4 6 13 CHANNAME 2 MINUTES SPOT - TV - ESPN (SPORT CHANNEL) 2 MINUTES SPOT - TV - MTV (MUSIC CHANNEL) ADVERTISING IN LOCAL NEWSPAPERS "ROCK STARS" - MAGAZINE

116

Database Design
Splitting tables (cont)
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987 25508 21 1 1/26/1987 27028 21 1 1/20/1987 27259 21 1 1/22/1987 34652 21 1 1/23/1987 39403 21 1 1/18/1987 50670 21 1 1/26/1987 58775 21 1 1/11/1987 17646 221 1 3/8/1987 23088 221 1 3/20/1987 32698 221 1 3/18/1987 44985 221 1 3/16/1987 54690 221 1 3/20/1987 13971 1231 1 1/21/1988 14200 1231 1 1/8/1988 42763 1231 1 1/13/1988
117

Purchase

Database Design
Splitting tables

Go to Database Tools table and click Relationships button to view this


118

Database Design
Splitting tables

Avoid repetition within individual tables Separate attributes that are used independently into separate tables Introduce (foreign) keys to link tables and keep any redundancy minimal/simple

119

Querying two tables


Table structure
Attributes (columns) Records (rows) Column order and row order do not matter

Queries combine and then subdivide


Pick two tables that you want to look at Combine the two tables (Cartesian product) Pick the row(s) you wish to see Pick the column(s) you wish to see

120

Querying two tables

BBB Customers and Purchases What data do we have about customers What data do we have about purchases
Who purchased (Acctnum) In response to what advertising campaign In what quantity When did they purchase

What do we want to know


e.g. Find responses by customers to campaigns

121

Querying two tables


Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987

Step 1. Pick two tables

Customer (truncated to fit on the slide)


ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190

122

Querying two tables


Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987

Step 2. Combine the two tables


(Cartesian Product i.e. every row from first table combined, in turn, with every row from second)

Customer (truncated to fit on the slide)


ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190

123

Querying two tables


Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987

Customer

ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET

STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190

ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987

ACCT NUM FIRSTNAME STATE ZIP GENDER MONEY 13015 DENNIS NY 11050 M 164

13018 13018

21 21

1 1

1/26/1987 1/26/1987

19635 ROBERT
23361 HARRIET

PA
DE

19148 M
19971 F

93
190

continue for all permutations of Purchase and Customer records


124

Querying two tables


Customer response (purchases) to campaigns
Purchase
ACCT CAMP NUM NUM QTY PURCHDATE 13018 21 1 1/26/1987 19635 21 1 1/9/1987 23361 21 1 1/22/1987

Customer

ACCT NUM FIRSTNAME 13015 DENNIS 19635 ROBERT 23361 HARRIET

STATE ZIP GENDER MONEY NY 11050 M 164 PA 19148 M 93 DE 19971 F 190

Step 3. Pick some rows:


Purchase.Acctnum = Customer.Acctnum

PURC. CUST. ACCT CAMP ACCT NUM NUM QTY PURCHDATE NUM FIRSTNAME 13018 21 1 1/26/1987 13015 DENNIS 13018 21 1 1/26/1987 19635 ROBERT 13018 21 1 1/26/1987 23361 HARRIET 19635 21 1 1/9/1987 13015 DENNIS 19635 21 1 1/9/1987 19635 ROBERT 19635 21 1 1/9/1987 23361 HARRIET 23361 21 1 1/22/1987 13015 DENNIS 23361 21 1 1/22/1987 19635 ROBERT 23361 21 1 1/22/1987 23361 HARRIET

STATE NY PA DE NY PA DE NY PA DE

ZIP 11050 19148 19971 11050 19148 19971 11050 19148 19971

GENDER MONEY M 164 M 93 F 190 M 164 M 93 F 190 M 164 M 93 F 190

125

Querying two tables


Customer response (purchases) to campaigns
PURC. CUST. ACCT CAMP ACCT NUM NUM QTY PURCHDATE NUM FIRSTNAME 13018 21 1 1/26/1987 13015 DENNIS 13018 21 1 1/26/1987 19635 ROBERT 13018 21 1 1/26/1987 23361 HARRIET 19635 21 1 1/9/1987 13015 DENNIS 19635 21 1 1/9/1987 19635 ROBERT 19635 21 1 1/9/1987 23361 HARRIET 23361 21 1 1/22/1987 13015 DENNIS 23361 21 1 1/22/1987 19635 ROBERT 23361 21 1 1/22/1987 23361 HARRIET

STATE NY PA DE NY PA DE NY PA DE

ZIP 11050 19148 19971 11050 19148 19971 11050 19148 19971

GENDER MONEY M 164 M 93 F 190 M 164 M 93 F 190 M 164 M 93 F 190

Step 4. Pick some columns

PURC. CUST. ACCT CAMP ACCT NUM NUM QTY PURCHDATE NUM FIRSTNAME STATE ZIP GENDER MONEY PA 19148 M 93 19635 21 1 1/9/1987 19635 ROBERT DE 19971 F 190 23361 21 1 1/22/1987 23361 HARRIET

126

Querying two tables: The final result


Customer response (purchases) to campaigns

CAMP NUM QTY 21 1 21 1

ACCT NUM GENDER 19635 M 23361 F

127

SQL
General JOIN framework: two tables

SELECT

<pick attribute(s)>

in the order you want to see them prefix attributes with tablename e.g. table1.attr1

FROM WHERE

<table1>, <table2> <pick the row(s)>

You must cross-reference (i.e. join) the related columns in the tables, using table1.columnA = table2.columnA

128

SQL
Customer responses to campaigns (i.e. customer purchases)
Notice how, unlike VBA, no line-continuation character is used in SQL

SELECT CAMPNUM, QUANTITY, CUSTOMER.ACCTNUM, GENDER FROM PURCHASE, CUSTOMER WHERE PURCHASE.ACCTNUM = CUSTOMER.ACCTNUM

129

SQL
Customer responses to campaigns (i.e. customer purchases)
If you make a spelling mistake in your query, you ll get an (error) message like the one shown below

130

SQL
A nonsense query for example purposes

SELECT CAMPNUM, PRICE, ACCTNUM FROM CAMPAIGN, CUSTOMER WHERE PRICE > MONEY OR FIRSTNAME LIKE 'ROB%'

131

Query Design View


Querying two tables Two cases using Microsoft Access Query Design View: 1. If both tables have a key attribute with the same name (e.g. PURCHASE and CUSTOMER both have an attribute called ACCTNUM), then Microsoft Access will automatically create the join. 2. In all other circumstances, you must manually specify the join:
the tables have attributes with the same name, but one or both are not designated as key attributes the key attributes in different tables do not have the same name the tables have no shared attributes whatsoever
132

Query Design View


Finding all purchases by customers: Picking our tables

Press Shift + Click to select multiple tables, and press the Add button. Press Close button when done.

133

Query Design View


Case 1: Key attribute with same name

Drag and drop your columns into the grid In this case there is no need to explicitly say: Customer.AcctNum = Purchase.Acctnum because Microsoft Access is intelligent enough to infer the relationship between the tables automatically

134

Query Design View: Results of Query


Case 1: Key attribute with same name

This result isn t very readable Campaign number doesn t mean much: Product name (from the Product table) would be more descriptive Account number doesn t mean much: Customer name (from the Customer table) would be more descriptive
135

SQL View
Case 1: Key attribute with same name
Microsoft Access generates some pretty nasty looking SQL:

You can actually simplify it to this:

136

Query Design View


Case 2: No shared attributes Finding all products purchased by customers

The Customer and Product tables are not directly related, so Microsoft Access is unable to infer the relationship.

137

Query Design View


Case 2: No shared attributes Finding all products purchased by customers
To find the relationship between Customers and Products: Go to Show Tables, then Select Purchase and Campaign tables, which link Customer to Product Press Add button Notice the relationships now visible

138

Querying more than two tables


Example 1: What products did each customer purchase?

139

Querying more than two tables


What products did each customer purchase?

140

Querying more than two tables


Following the Yellow Brick Road

141

Querying more than two tables


What products did each customer purchase?

142

Querying more than two tables


What products did each customer purchase?
Microsoft Access generates some pretty nasty looking SQL:

You can actually simplify it to this:

143

Querying more than two tables: Results


What products did each customer purchase?

To confirm you have the correct number of results, check that you have the same number of rows here as there are in the PURCHASES table. There were 19 purchases, so there should be 19 results here !
144

Querying more than two tables: Results What happens if you forget the join / cross-reference?

Notice that you get an impossible number of results if you forget the JOIN in the query: i.e. if you forget to specify which columns cross-reference ! There were 19 purchases, so there should be 19 results here !

145

Querying more than two tables


Example 2: What channels was each product advertised on ?

146

Querying more than two tables


What channels was each product advertised in ?

147

Querying more than two tables


What channels was each product advertised in ?

148

Querying more than two tables


What channels was each product advertised in ?

149

Querying more than two tables


What channels was each product advertised in ?

150

Querying more than two tables


What channels was each product advertised in ?

Microsoft Access generates some pretty nasty looking SQL:

You can actually simplify it to this:

151

Querying more than two tables


What channels was each product advertised in ?

The results of the query :

To confirm you have the correct number of results, check that you have the same number of rows here as there are in the CAMPAIGN table. There were 3 campaigns, so there should be 3 results here !
152

Querying more than two tables


Option 1: Create one giant table Create one giant table (Cartesian Product)
ABCD

Option 2: Combine tables pair-wise Associative: (A B) C = A (B C) Combine in stages. For three or more
A B (A B) C ((A B) C ) D

153

Key Points
Divide tables for ease in managing data Query two (or more) tables like one big table
Combine tables: Cartesian Product Put tables side-by-side: every row permutation Look for attributes that link tables (e.g. Acctnum in CUSTOMER and PURCHASES)

Create queries just like VBA Macros


Begin in Query Design View; Edit in SQL View, or Write queries in SQL View from scratch
154

Inserting, Updating, and Deleting Data


When you open Access, you must click Options on the message bar, and choose Enable this content, if you want SQL INSERTs, UPDATES, and DELETES to work!

155

Inserting Data
Use the INSERT statement to append new data.

Here are the various parts of the INSERT statement: INSERT INTO [table-name] ( [ column1, column2, ] ) VALUES ( [value for column1], [value for column2], );

156

Inserting Data
Use the INSERT statement to append new data.

157

Inserting Data
After you save the INSERT query, it will look like this in the query list on the left

158

Updating Data
Use the UPDATE statement to edit existing data.

Here are the various parts of the UPDATE statement: UPDATE [table-name] SET [column1] = [value], [column2] = [value], WHERE [column] = [value]
DON T FORGET YOUR WHERE CLAUSE !! IF YOU FORGET IT YOU COULD END UP MISTAKENLY UPDATING ALL DATA IN THE TABLE ! Access 2007 will give you a warning if you inadvertently update multiple rows, but, if you execute the UPDATE statement from Excel 2007, you ll get no such warning !!

159

Updating Data
Use the UPDATE statement to edit existing data.

160

Updating Data
After you save the UPDATE query, it will look like this in the query list on the left

NOTE: It is often bad practice to update data in a database. For example, rather than update a customer s address, you can add an event table e.g. a table called MOVE with a date the customer moved. That way you always have an audit log of when the customer moved, and you have not lost any data. You might, for instance, later want to count how many times the customer moved in the past 3 years, so you can compute a credit score for them.
161

Deleting Data
Use the DELETE statement to permanently remove a row of data.

Here are the various parts of the DELETE statement: DELETE FROM [table-name] WHERE [column] = [value]
DON T FORGET YOUR WHERE CLAUSE !! IF YOU FORGET IT YOU COULD END UP MISTAKENLY DELETING ALL DATA IN THE TABLE ! Access 2007 will give you a warning if you inadvertently delete multiple rows, but, if you execute the DELETE statement from Excel 2007, you ll get no such warning, and it will be impossible to recover the data you have deleted !!
162

Deleting Data
Use the DELETE statement to permanently remove a row of data.

163

Deleting Data
After you save the DELETE query, it will look like this in the query list on the left

NOTE: It is generally bad practice to delete data from a database. Rather, you want to add a column, and mark the data as inactive (include a column with the date it was marked inactive and a column with the user-ID of the user who marked it as inactive, if necessary). That way you always have an audit log of when data was deleted , and you can always recover the data later.
164

Obtaining Data via Email


Microsoft Access 2007 contains great new features to allow you to collect data via email. You need to have Microsoft Outlook 2007 installed and working. Click on External Data tab in Access 2007. Then click on Create Email.

165

Obtaining Data via Email


A wizard will help you through the process of composing your electronic data collection form.

166

Obtaining Data via Email


Recipients will need to have an HTML-capable email client, to be able to respond to your emails. Luckily, most people have one nowadays.

167

Obtaining Data via Email


Specify what you would like to do with the data.

168

Obtaining Data via Email


Specify which fields (database columns) you want to include in your email.

169

Obtaining Data via Email


Specify how you want to hand replies.

170

Obtaining Data via Email


Set automatic processing options. You can choose to process replies automatically, or to manually add replies to the database after you have checked each individually in Outlook.

171

Obtaining Data via Email


Select how you want to specify the email addresses of recipients.

172

Obtaining Data via Email


If your email addresses are in your database, specify whether the addresses are in the current table, or in a linked table.

173

Obtaining Data via Email


Specify the contents of the email you are sending.

174

Obtaining Data via Email


Confirm the email addresses of the message recipients.

175

Obtaining Data via Email


After sending your email, wait for some responses, then click Manage Replies to manage your responses.

176

Obtaining Data via Email


Under Manage Replies, you can choose to resend the email, if it wasn t sent successfully the first time.

177

Lab Exercise

Our next session will be a

Databases
lab exercise ROOM JMHH 380

178

CREDITS: Alan Abrahams, VA Tech

179

S2: Data Structures and SQL


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30