Beruflich Dokumente
Kultur Dokumente
CustomerID * AccountNumber*
Name CustomerID TransactNumber
Address AccountNumber Amount
Phone
SSN
Account Table
Transaction Table
Customers Table
2
Some Vocabulary
• Database – set of (related) tables
• Table – set of rows and columns
• Column – field, attribute
• Row – Tuple, observation, case
• RDBMS – Relational Database Management
System
3
Spreadsheets vs.
Database
Spreadsheets are best if: Databases are best if:
– Data can be stored in a single – Data are readily stored in
datasheet without lots of multiple related tables
redundancy – You need multiple user
– You are doing calculations or access
making charts – You want to be able to do
– Don’t need to link several complex manipulations with
spreadsheets together to get the data
the results you want – You want to develop data
entry tools
4
Big Issue with
Spreadsheets
• Data integrity
– Internal record consistency is not maintained
– Updating more than one record
– Removing information
– Creating incomplete cells
ID DeptName DeptAddress ContactName ContactTitle ContactPhone
2
2 Finance
Finance 110
110 5th
5th Street
Street Ted
Ted Smith
Smith Senior
Senior Analyst
Analyst 555-1112
555-1112
3 Benefits
3 118 5th Street Brian Williams Manager 555-3333
5
Relational Database Is…
• Collection of data organized into tables
• Each table contains records
• Each record identifies the same set of
fields
• Tables may have relationships with
another table
• Tools help you manage the table
relationships
6
Database Features
• Set of tables
• Explicit control over data (column)
types in tables
Date Site Height Count
<dates only> <text < real numbers only> < integers only>
only>
7
Database Features
• Relationships are defined
between tables
Date Sit Species Height Diamete
e r
A
Site Latitude Longitude
B
A
A
B
C
C
D
8
Identify Data Needs
• Data Modeling
– Analyze data needs
– Visualize the data objects
• Questions to Ask
– Why do you want to collect the data?
– What do you want to do with it?
– Who else will be using the data, what do they
want from it?
9
3 Steps to Database
Design
1. Split Data Into Tables
– Normalization
– Each field must contain only one value
– Each field must have a unique name
10
Normalization
• Process of efficiently organizing data in a
database
11
Normalization Process
1st Normal Form
– Eliminate duplicate columns from the same
table
– Create separate tables for each group of
related data
– Identify each row with a unique column or
set of columns (the primary key)
12
First Normal: Eliminate
Duplicate
Columns and Assign Keys
Books
TITLE AUTHOR 1 AUTHOR 2 PUBLISHER ISBN QTY.
Ecology 101 Smith, A.B. Gordon, D.A. Univ. Press 4873895759 4324
Ecology for Dummies Doe, J. Wiley & Sons 0493802020 8998
Ecology and Politics Kim, J.B. McGraw-Hill 7482929292 900
Ecology and Modern Cinema Kim, C.B. Univ. Press 2234849302 1
Books Authors
Author
TITLE PUBLISHER ISBN QTY.
Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324
Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998
Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900
Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1
Kim, C.B.
13
First Normal: Eliminate
Duplicate
Columns and Assign Keys
Books
TITLE AUTHOR 1 AUTHOR 2 PUBLISHER ISBN QTY.
Ecology 101 Smith, A.B. Gordon, D.A. Univ. Press 4873895759 4324
Ecology for Dummies Doe, J. Wiley & Sons 0493802020 8998
Ecology and Politics Kim, J.B. McGraw-Hill 7482929292 900
Ecology and Modern Cinema Kim, C.B. Univ. Press 2234849302 1
Primary
Key
Books Authors
TITLE PUBLISHER ISBN QTY. Id Author
Ecology 101 Univ. Press 4873895759 4324 0 Smith, A.B.
Ecology for Dummies Wiley & Sons 0493802020 8998 1 Gordon, D.A.
Ecology and Politics McGraw-Hill 7482929292 900 2 Doe, J.
Ecology and Modern Cinema Univ. Press 2234849302 1 3 Kim, J.B.
4 Kim, C.B.
14
First Normal: Eliminate
Duplicate
Columns and Assign Keys
Books
TITLE AUTHOR 1 AUTHOR 2 PUBLISHER ISBN QTY.
Ecology 101 Smith, A.B. Gordon, D.A. Univ. Press 4873895759 4324
Ecology for Dummies Doe, J. Wiley & Sons 0493802020 8998
Ecology and Politics Kim, J.B. McGraw-Hill 7482929292 900
Ecology and Modern Cinema Kim, C.B. Univ. Press 2234849302 1
Foreign Key
Books Authors
TITLE PUBLISHER ISBN QTY.
Id ISBN Author
Ecology 101 Univ. Press 4873895759 4324
0 4873895759 Smith, A.B.
Ecology for Dummies Wiley & Sons 0493802020 8998 1 4873895759 Gordon, D.A.
Ecology and Politics McGraw-Hill 7482929292 900 2 0493802020 Doe, J.
Ecology and Modern Cinema Univ. Press 2234849302 1 3 7482929292 Kim, J.B.
4 2234849302 Kim, C.B.
15
First Normal Exercise
16
First Normal
Eliminate duplicate columns
Last First M.I Institutio Sector Position 1 Position 2
. n
Smith Ann A SDSU Academi Professor Community
c Liaison
Smith Ann Z Acme Inc. Private Administrator Field Technician
Kim John B SDSU Academi P.I. Data Manager
c positions Foreign Key
personnel
id Pers_id Position
Pers Last First M.I. Institution Sector 0 0 Professor
_id
1 0 Community Liaison
0 Smith Ann A SDSU Academi
c 2 1 Administrator
1 Smith Ann Z Acme Inc. Private
3 1 Field Technician
2 Kim John B SDSU Academi
c 4 2 P.I.
5 2 Data Manager
17
Normalization Process
2nd Normal Form
– Meet all the requirements of the first normal
form
– Remove subsets of data that apply to multiple
rows of a table and place them in separate
tables
– Create relationships between these new tables
and their predecessors through the use of
foreign keys
18
Second Normal: Eliminate
Duplicate
Rows and Assign Keys
Books Authors
Id ISBN Author
TITLE PUBLISHER ISBN QTY.
0 4873895759 Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324 1 4873895759 Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998 2 0493802020 Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900 3 7482929292 Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1 4 2234849302 Kim, C.B.
Books Publishers
19
Second Normal: Eliminate
Duplicate
Rows and Assign Keys
Books Authors
Id ISBN Author
TITLE PUBLISHER ISBN QTY.
0 4873895759 Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324 1 4873895759 Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998 2 0493802020 Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900 3 7482929292 Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1 4 2234849302 Kim, C.B.
Books Publishers
20
Second Normal: Eliminate
Duplicate
Rows and Assign Keys
Books Authors
Id ISBN Author
TITLE PUBLISHER ISBN QTY.
0 4873895759 Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324 1 4873895759 Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998 2 0493802020 Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900 3 7482929292 Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1 4 2234849302 Kim, C.B.
Books Publishers
21
Final Tables with Primary
and Foreign Keys
Books Authors
TITLE PUBLISHER_id ISBN QTY.
Ecology 101 0 4873895759 4324 Id ISBN Author
Publishers
Publisher_id PUBLISHER
0 Univ. Press
1 Wiley & Sons
2 McGraw-Hill
22
Second Normal Exercise
Pause for exercise
personnel positions
Pers_id Last First M.I. Institution Sector id pers_id Position
4 2 P.I.
5 2 Data Manager
23
Second Normal
Eliminate duplicate rows
personnel positions
Pers_ Last First M.I. Institution Sector id pers_id Position
id
0 0 Professor
0 Smith Ann A SDSU Academic
1 0 Community Liaison
1 Smith Ann Z Acme Inc. Private
2 1 Administrator
2 Kim John B SDSU Academic
3 1 Field Technician
4 2 P.I.
5 2 Data Manager
personnel
Pers_ Last First M.I. Institution_id
id
institutions
0 Smith Ann A 0 Institution_id Institution Sector
24
Final Tables with Primary
and Foreign Keys
positions
personnel
id pers_id Position
Pers_ Last First M.I. Institution_id
id 0 0 Professor
0 Smith Ann A 0 1 0 Community Liaison
1 Smith Ann Z 1
2 1 Administrator
2 Kim John B 0
3 1 Field Technician
4 2 P.I.
5 2 Data Manager
institutions
0 SDSU Academic
25
Normalization Process
3rd Normal Form
– Meet all the requirements of the second
normal form
– Remove columns that are not dependent
upon the primary key
Table:
Schools
id schoolName schoolPhone dean deanEmail
1 Architecture 555-1111 Michelle Myers mm@university.edu
2 Arts & Sciences 555-2222 Ted Smith ts@university.edu
3 Education 555-3333 Bryan Williams bw@university.edu
26
Determine Data Types
Data Type Definition
Text 0-255 characters
Memo 0-64000 characters
Number Integer, long integer, single, double
27
Determine Data Types
Books Authors
TITLE Text (255)
Id Number(integer)
Number
PUBLISHER_id (integer) ISBN Text(10)
ISBN Text(10) Author text(255)
QTY. Number(integer)
Publishers
PUBLISHER_i
d integer
PUBLISHER text(255)
28
Determine Data Types
Exercise
Pause for exercise
29
Determine Data Types
Personnel
Pers_id Number Institutions
(integer)
Institution_id Number(integer)
Last Text(255)
Institution text(255)
First Text(255)
Sector Text(25)
M.I. Text(1)
Institution_id Number
(integer)
Positions
id Number(integer)
pers_id Number(integer)
position Text(200)
30
Identify the Relationships
• How the tables are “related” to each other
– One-to-one
– One-to-many
– Many-to-many
31
Identify the Relationships
Books
PUBLISHER_i
TITLE d ISBN QTY.
Ecology 101 0 4873895759 4324
Ecology for Dummies 1 0493802020 8998
Ecology and Politics 2 7482929292 900
Ecology and Modern
Cinema 0 2234849302 1
1 to many
1 to many
Authors
Publishers
Id ISBN Author
PUBLISHER_i
487389575 d PUBLISHER
0 9 Smith, A.B.
0 Harcourt Brace
487389575 Gordon,
1 9 D.A. 1 Wiley & Sons
049380202 2 McGraw-Hill
2 0 Doe, J.
748292929
3 2 Kim, J.B.
223484930 32
4 2 Kim, C.B.
Identify the Relationships
personnel positions
pers_id Last First M.I. Institution_id id pers_id Position
0 Smith Ann A 0 0 0 Professor
3 1 Field Technician
4 2 P.I.
Pause for Exercise 5 2 Data Manager
contact_address
institutions pers_id street city state
Institution_id Institution Sector
0 523 Main St. Amherst MA
0 SDSU Academic
1 1010 Sea St. San Diego CA
1 Acme Inc. Private
2 99 Ridge Way Portland ME
33
Identify the Relationships
personnel positions
pers_id Last First M.I. Institution_id id pers_id Position
0 Smith Ann A 0 0 0 Professor
3 1 Field Technician
1 to many 4 2 P.I.
5 2 Data Manager
1 to many 1 to 1
contact_address
institutions pers_id street city state
Institution_id Institution Sector
0 523 Main St. Amherst MA
0 SDSU Academic
1 1010 Sea St. San Diego CA
1 Acme Inc. Private
2 99 Ridge Way Portland ME
34
Identify the Relationships
35
Relational Database
Functions
• Organize data – reduce or eliminate redundancy
• Improve data quality – reject “bad” data
– Wrong type of data
– Only good “codes” allowed
• Retrieve data – query/search/select
• Sort data
• Update data
• Output – link to other software with statistical and
graphical functionality
36
Database Management
Systems
• FileMaker Pro
– Client/Server architecture
• Microsoft Access (SQL Server)
– Good for single users
– Sophisticated user interface
• MySQL
– Client/Server architecture
– Limited User Interface (PHPMyAdmin)
• Oracle
37
Excel vs. Access
– Optimized for data – Used to collect,
analysis & calculations manipulate & sort data
– Limited sorting – Sort by different
– Good for performing selections
complex calculations, – Maintain data integrity
exploring possible – Good for managing data
outcomes, and producing – Easy to create data
high quality charts entry forms
– Record limit is 1,048,576
38
Database Systems
Access
• Access workstation based, single user application
• Platform dependent
• Cannot be accessed concurrently
• No security other than workstation
• Part of MS Office suite, not free
MySQL
• MySQL is cross platform, multi user access
• Accessible to more users thru the web, client program or other
admin tools to access database (via authentication)
• Can be integrated with Web Server (web programming languages)
• Data available remotely
• Free, open-source
39
More Information
• Database Normalization Basics
– http://databases.about.com/od/specificproducts/a/normal
ization.htm
• Step-by-step Guides to Using Databases
– http://www.geekgirls.com/menu_databases.htm
• Interactive Online SQL Training
– http://www.sqlcourse.com/
• Comparison between Access & Excel
– http://office.microsoft.com/en-us/access/HA10210195103
3.aspx
40
Mailing List Subscription
41