Sie sind auf Seite 1von 41

Introduction to Databases for

Managing Research Data

CustomerID * AccountNumber*
Name CustomerID TransactNumber
Address AccountNumber Amount
Phone
SSN

Account Table
Transaction Table
Customers Table

* Denotes Primary Key


Sherry Lake Bill Corey
Data Management Consultant Data Management Consultant
University of Virginia Library University of Virginia Library
shLake@virginia.edu wtc2h@virginia.edu

© 2013 by the Rector and Visitors of the University of Virginia.


This work is made available under the terms of the Creative Commons Attribution-ShareAlike 4.0
International license http://creativecommons.org/licenses/by-sa/4.0/
Goals for the Workshop
• Understand differences between Spreadsheets
and Databases
• Learn the features of databases
• Learn the 3 steps of database design
• Gain peer and expert feedback

2
Some Vocabulary
• Database – set of (related) tables
• Table – set of rows and columns
• Column – field, attribute
• Row – Tuple, observation, case
• RDBMS – Relational Database Management
System

3
Spreadsheets vs.
Database
Spreadsheets are best if: Databases are best if:
– Data can be stored in a single – Data are readily stored in
datasheet without lots of multiple related tables
redundancy – You need multiple user
– You are doing calculations or access
making charts – You want to be able to do
– Don’t need to link several complex manipulations with
spreadsheets together to get the data
the results you want – You want to develop data
entry tools

4
Big Issue with
Spreadsheets
• Data integrity
– Internal record consistency is not maintained
– Updating more than one record
– Removing information
– Creating incomplete cells
ID DeptName DeptAddress ContactName ContactTitle ContactPhone

1 Finance 110 5th Street Michael Jones Manager 555-1111

2
2 Finance
Finance 110
110 5th
5th Street
Street Ted
Ted Smith
Smith Senior
Senior Analyst
Analyst 555-1112
555-1112

3 Benefits
3 118 5th Street Brian Williams Manager 555-3333

5
Relational Database Is…
• Collection of data organized into tables
• Each table contains records
• Each record identifies the same set of
fields
• Tables may have relationships with
another table
• Tools help you manage the table
relationships

6
Database Features
• Set of tables
• Explicit control over data (column)
types in tables
Date Site Height Count
<dates only> <text < real numbers only> < integers only>
only>

7
Database Features
• Relationships are defined
between tables
Date Sit Species Height Diamete
e r
A
Site Latitude Longitude
B
A
A
B
C
C
D

8
Identify Data Needs
• Data Modeling
– Analyze data needs
– Visualize the data objects

• Questions to Ask
– Why do you want to collect the data?
– What do you want to do with it?
– Who else will be using the data, what do they
want from it?

9
3 Steps to Database
Design
1. Split Data Into Tables
– Normalization
– Each field must contain only one value
– Each field must have a unique name

2. Determine Data Type for Each Column


3. Identify Relationships Between the Tables
– No two records can be identical

10
Normalization
• Process of efficiently organizing data in a
database

• Goals of the normalization process:


– eliminating redundant data (for example,
storing the same data in more than one table)
– ensuring data dependencies make sense (only
storing related data in a table)

11
Normalization Process
1st Normal Form
– Eliminate duplicate columns from the same
table
– Create separate tables for each group of
related data
– Identify each row with a unique column or
set of columns (the primary key)

12
First Normal: Eliminate
Duplicate
Columns and Assign Keys
Books
TITLE AUTHOR 1 AUTHOR 2 PUBLISHER ISBN QTY.
Ecology 101 Smith, A.B. Gordon, D.A. Univ. Press 4873895759 4324
Ecology for Dummies Doe, J. Wiley & Sons 0493802020 8998
Ecology and Politics Kim, J.B. McGraw-Hill 7482929292 900
Ecology and Modern Cinema Kim, C.B. Univ. Press 2234849302 1

Books Authors
Author
TITLE PUBLISHER ISBN QTY.
Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324
Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998
Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900
Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1
Kim, C.B.

13
First Normal: Eliminate
Duplicate
Columns and Assign Keys
Books
TITLE AUTHOR 1 AUTHOR 2 PUBLISHER ISBN QTY.
Ecology 101 Smith, A.B. Gordon, D.A. Univ. Press 4873895759 4324
Ecology for Dummies Doe, J. Wiley & Sons 0493802020 8998
Ecology and Politics Kim, J.B. McGraw-Hill 7482929292 900
Ecology and Modern Cinema Kim, C.B. Univ. Press 2234849302 1

Primary
Key
Books Authors
TITLE PUBLISHER ISBN QTY. Id Author
Ecology 101 Univ. Press 4873895759 4324 0 Smith, A.B.
Ecology for Dummies Wiley & Sons 0493802020 8998 1 Gordon, D.A.
Ecology and Politics McGraw-Hill 7482929292 900 2 Doe, J.
Ecology and Modern Cinema Univ. Press 2234849302 1 3 Kim, J.B.
4 Kim, C.B.

14
First Normal: Eliminate
Duplicate
Columns and Assign Keys
Books
TITLE AUTHOR 1 AUTHOR 2 PUBLISHER ISBN QTY.
Ecology 101 Smith, A.B. Gordon, D.A. Univ. Press 4873895759 4324
Ecology for Dummies Doe, J. Wiley & Sons 0493802020 8998
Ecology and Politics Kim, J.B. McGraw-Hill 7482929292 900
Ecology and Modern Cinema Kim, C.B. Univ. Press 2234849302 1

Foreign Key
Books Authors
TITLE PUBLISHER ISBN QTY.
Id ISBN Author
Ecology 101 Univ. Press 4873895759 4324
0 4873895759 Smith, A.B.
Ecology for Dummies Wiley & Sons 0493802020 8998 1 4873895759 Gordon, D.A.
Ecology and Politics McGraw-Hill 7482929292 900 2 0493802020 Doe, J.
Ecology and Modern Cinema Univ. Press 2234849302 1 3 7482929292 Kim, J.B.
4 2234849302 Kim, C.B.

15
First Normal Exercise

Pause for exercise


Personnel
Last First M.I Institutio Sector Position 1 Position 2
. n
Smith Ann A SDSU Academi Professor Community
c Liaison
Smith Ann Z Acme Inc. Private Administrator Field Technician
Kim John B SDSU Academi P.I. Data Manager
c

16
First Normal
Eliminate duplicate columns
Last First M.I Institutio Sector Position 1 Position 2
. n
Smith Ann A SDSU Academi Professor Community
c Liaison
Smith Ann Z Acme Inc. Private Administrator Field Technician
Kim John B SDSU Academi P.I. Data Manager
c positions Foreign Key
personnel
id Pers_id Position
Pers Last First M.I. Institution Sector 0 0 Professor
_id
1 0 Community Liaison
0 Smith Ann A SDSU Academi
c 2 1 Administrator
1 Smith Ann Z Acme Inc. Private
3 1 Field Technician
2 Kim John B SDSU Academi
c 4 2 P.I.

5 2 Data Manager

17
Normalization Process
2nd Normal Form
– Meet all the requirements of the first normal
form
– Remove subsets of data that apply to multiple
rows of a table and place them in separate
tables
– Create relationships between these new tables
and their predecessors through the use of
foreign keys

18
Second Normal: Eliminate
Duplicate
Rows and Assign Keys
Books Authors
Id ISBN Author
TITLE PUBLISHER ISBN QTY.
0 4873895759 Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324 1 4873895759 Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998 2 0493802020 Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900 3 7482929292 Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1 4 2234849302 Kim, C.B.

Books Publishers

TITLE ISBN QTY. PUBLISHER


Ecology 101 4873895759 4324 Univ. Press
Ecology for Dummies 0493802020 8998 Wiley & Sons

Ecology and Politics 7482929292 900 McGraw-Hill

Ecology and Modern Cinema 2234849302 1

19
Second Normal: Eliminate
Duplicate
Rows and Assign Keys
Books Authors
Id ISBN Author
TITLE PUBLISHER ISBN QTY.
0 4873895759 Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324 1 4873895759 Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998 2 0493802020 Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900 3 7482929292 Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1 4 2234849302 Kim, C.B.

Books Publishers

TITLE ISBN QTY. Publisher_id PUBLISHER


Ecology 101 4873895759 4324 0 Univ. Press
Ecology for Dummies 0493802020 8998 1 Wiley & Sons

Ecology and Politics 7482929292 900 2 McGraw-Hill

Ecology and Modern Cinema 2234849302 1

20
Second Normal: Eliminate
Duplicate
Rows and Assign Keys
Books Authors
Id ISBN Author
TITLE PUBLISHER ISBN QTY.
0 4873895759 Smith, A.B.
Ecology 101 Univ. Press 4873895759 4324 1 4873895759 Gordon, D.A.
Ecology for Dummies Wiley & Sons 0493802020 8998 2 0493802020 Doe, J.
Ecology and Politics McGraw-Hill 7482929292 900 3 7482929292 Kim, J.B.
Ecology and Modern Cinema Univ. Press 2234849302 1 4 2234849302 Kim, C.B.

Books Publishers

TITLE PUBLISHER_id ISBN QTY. Publisher_id PUBLISHER


Ecology 101 0 4873895759 4324 0 Univ. Press
Ecology for Dummies 1 0493802020 8998 1 Wiley & Sons

Ecology and Politics 2 7482929292 900 2 McGraw-Hill

Ecology and Modern Cinema 0 2234849302 1

21
Final Tables with Primary
and Foreign Keys
Books Authors
TITLE PUBLISHER_id ISBN QTY.
Ecology 101 0 4873895759 4324 Id ISBN Author

Ecology for Dummies 1 0493802020 8998 0 4873895759 Smith, A.B.

Ecology and Politics 2 7482929292 900 1 4873895759 Gordon, D.A.

Ecology and Modern Cinema 0 2234849302 1 2 0493802020 Doe, J.


3 7482929292 Kim, J.B.
4 2234849302 Kim, C.B.

Publishers
Publisher_id PUBLISHER
0 Univ. Press
1 Wiley & Sons
2 McGraw-Hill

22
Second Normal Exercise
Pause for exercise

personnel positions
Pers_id Last First M.I. Institution Sector id pers_id Position

0 Smith Ann A SDSU Academic 0 0 Professor


1 0 Community Liaison
1 Smith Ann Z Acme Inc. Private
2 1 Administrator
2 Kim John B SDSU Academic
3 1 Field Technician

4 2 P.I.

5 2 Data Manager

23
Second Normal
Eliminate duplicate rows
personnel positions
Pers_ Last First M.I. Institution Sector id pers_id Position
id
0 0 Professor
0 Smith Ann A SDSU Academic
1 0 Community Liaison
1 Smith Ann Z Acme Inc. Private
2 1 Administrator
2 Kim John B SDSU Academic
3 1 Field Technician

4 2 P.I.

5 2 Data Manager
personnel
Pers_ Last First M.I. Institution_id
id
institutions
0 Smith Ann A 0 Institution_id Institution Sector

1 Smith Ann Z 1 0 SDSU Academic


2 Kim John B 0 1 Acme Inc. Private

24
Final Tables with Primary
and Foreign Keys
positions
personnel
id pers_id Position
Pers_ Last First M.I. Institution_id
id 0 0 Professor
0 Smith Ann A 0 1 0 Community Liaison
1 Smith Ann Z 1
2 1 Administrator
2 Kim John B 0
3 1 Field Technician

4 2 P.I.

5 2 Data Manager

institutions

Institution_id Institution Sector

0 SDSU Academic

1 Acme Inc. Private

25
Normalization Process
3rd Normal Form
– Meet all the requirements of the second
normal form
– Remove columns that are not dependent
upon the primary key
Table:
Schools
id schoolName schoolPhone dean deanEmail
1 Architecture 555-1111 Michelle Myers mm@university.edu
2 Arts & Sciences 555-2222 Ted Smith ts@university.edu
3 Education 555-3333 Bryan Williams bw@university.edu

26
Determine Data Types
Data Type Definition
Text 0-255 characters
Memo 0-64000 characters
Number Integer, long integer, single, double

Date/Time Dates, times, or both at once

AutoNumber Automatically incremented as


records are added

OLE object Image, sound files


Hyperlink Link to an internet resource

27
Determine Data Types
Books Authors
TITLE Text (255)
Id Number(integer)
Number
PUBLISHER_id (integer) ISBN Text(10)
ISBN Text(10) Author text(255)
QTY. Number(integer)

Publishers
PUBLISHER_i
d integer
PUBLISHER text(255)

28
Determine Data Types
Exercise
Pause for exercise

29
Determine Data Types
Personnel
Pers_id Number Institutions
(integer)
Institution_id Number(integer)
Last Text(255)
Institution text(255)
First Text(255)
Sector Text(25)
M.I. Text(1)
Institution_id Number
(integer)

Positions
id Number(integer)
pers_id Number(integer)
position Text(200)

30
Identify the Relationships
• How the tables are “related” to each other
– One-to-one
– One-to-many
– Many-to-many

• Foreign Keys define the relationships

31
Identify the Relationships
Books
PUBLISHER_i
TITLE d ISBN QTY.
Ecology 101 0 4873895759 4324
Ecology for Dummies 1 0493802020 8998
Ecology and Politics 2 7482929292 900
Ecology and Modern
Cinema 0 2234849302 1
1 to many
1 to many
Authors
Publishers
Id ISBN Author
PUBLISHER_i
487389575 d PUBLISHER
0 9 Smith, A.B.
0 Harcourt Brace
487389575 Gordon,
1 9 D.A. 1 Wiley & Sons
049380202 2 McGraw-Hill
2 0 Doe, J.
748292929
3 2 Kim, J.B.
223484930 32
4 2 Kim, C.B.
Identify the Relationships
personnel positions
pers_id Last First M.I. Institution_id id pers_id Position
0 Smith Ann A 0 0 0 Professor

1 Smith Ann Z 1 1 0 Community Liaison

2 Kim John B 0 2 1 Administrator

3 1 Field Technician

4 2 P.I.
Pause for Exercise 5 2 Data Manager

contact_address
institutions pers_id street city state
Institution_id Institution Sector
0 523 Main St. Amherst MA
0 SDSU Academic
1 1010 Sea St. San Diego CA
1 Acme Inc. Private
2 99 Ridge Way Portland ME

33
Identify the Relationships
personnel positions
pers_id Last First M.I. Institution_id id pers_id Position
0 Smith Ann A 0 0 0 Professor

1 Smith Ann Z 1 1 0 Community Liaison

2 Kim John B 0 2 1 Administrator

3 1 Field Technician

1 to many 4 2 P.I.

5 2 Data Manager

1 to many 1 to 1
contact_address
institutions pers_id street city state
Institution_id Institution Sector
0 523 Main St. Amherst MA
0 SDSU Academic
1 1010 Sea St. San Diego CA
1 Acme Inc. Private
2 99 Ridge Way Portland ME

34
Identify the Relationships

Many-to-many: need to create a 3rd table to relate the two

35
Relational Database
Functions
• Organize data – reduce or eliminate redundancy
• Improve data quality – reject “bad” data
– Wrong type of data
– Only good “codes” allowed
• Retrieve data – query/search/select
• Sort data
• Update data
• Output – link to other software with statistical and
graphical functionality

36
Database Management
Systems
• FileMaker Pro
– Client/Server architecture
• Microsoft Access (SQL Server)
– Good for single users
– Sophisticated user interface
• MySQL
– Client/Server architecture
– Limited User Interface (PHPMyAdmin)
• Oracle

37
Excel vs. Access
– Optimized for data – Used to collect,
analysis & calculations manipulate & sort data
– Limited sorting – Sort by different
– Good for performing selections
complex calculations, – Maintain data integrity
exploring possible – Good for managing data
outcomes, and producing – Easy to create data
high quality charts entry forms
– Record limit is 1,048,576

38
Database Systems
Access
• Access workstation based, single user application
• Platform dependent
• Cannot be accessed concurrently
• No security other than workstation
• Part of MS Office suite, not free
MySQL
• MySQL is cross platform, multi user access
• Accessible to more users thru the web, client program or other
admin tools to access database (via authentication)
• Can be integrated with Web Server (web programming languages)
• Data available remotely
• Free, open-source

39
More Information
• Database Normalization Basics
– http://databases.about.com/od/specificproducts/a/normal
ization.htm
• Step-by-step Guides to Using Databases
– http://www.geekgirls.com/menu_databases.htm
• Interactive Online SQL Training
– http://www.sqlcourse.com/
• Comparison between Access & Excel
– http://office.microsoft.com/en-us/access/HA10210195103
3.aspx

40
Mailing List Subscription

• Please check the box on our sign-in sheet to


receive occasional emails to keep up with our
services, training, and news.

• Please encourage others to subscribe:


http://eepurl.com/CJwYT

41

Das könnte Ihnen auch gefallen