Sie sind auf Seite 1von 61

Chapter 5

Organizing Data and


Information
ITEC 1010 Information and Organizations
Data
 Data
 A necessity for almost any enterprise to carry out its
business. Consists of raw facts, and when organized
may be transformed into information
 Database
 A collection of data organized to meet users’ needs
 Database management system (DBMS)
 A group of programs that manipulate the database and
provide an interface between the database and the user
of the database or other application programs

ITEC 1010 Information and Organizations


DBMS ‘Discussion’ (1)
A collection of programs that enables you to store, modify, and
extract information from a database. There are many different types
of DBMSs, ranging from small systems that run on personal
computers to huge systems that run on mainframes. The following
are examples of database applications:
 computerized library systems
 automated teller machines
 flight reservation systems
 computerized parts inventory systems

From a technical standpoint, DBMSs can differ widely. The terms


ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (2)
relational, network, flat, and hierarchical all refer to the way a
DBMS organizes information internally. The internal organization can
affect how quickly and flexibly you can extract information.

Requests for information from a database are made in the form of a


query, which is a stylized question. For example, the query

SELECT ALL WHERE NAME = "SMITH" AND AGE > 35

requests all records in which the NAME field is SMITH and the AGE
ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (3)
field is greater than 35. The set of rules for constructing queries is
known as a query language. Different DBMSs support different query
languages, although there is a semi-standardized query language
called SQL (structured query language). Sophisticated languages for
managing database systems are called fourth-generation languages,
or 4GLs for short.

The information from a database can be presented in a variety of


formats. Most DBMSs include a report writer program that enables
you to output data in the form of a report. Many DBMSs also include
ITEC 1010 Information and Organizations
DBMS ‘Discussion’ (4)
a graphics component that enables you to output information in the
form of graphs and charts.

ITEC 1010 Information and Organizations


Hierarchy of Data

Schematic

ITEC 1010 Information and Organizations


Hierarchy of data Example
Personel file

Database Department file (Project database)

Payroll file

005-10-6321 Johns Francine 10-7-65


(Personnel file)
Files 549-77-1001 Buckley Bill 2-17-79
098-40-1370 Fiske Steven 1-5-85

098-40-1370 Fiske Steven 1-5-85 598 (Record containing


Records SSN, last name,
first name, date of
hire)
Fields Fiske (Last name field)

Characters
1000100 (Letter ‘F’ in ASCII)
(bytes)

ITEC 1010 Information and Organizations


Terminology
 Database
• A collection of integrated and related files
 File
• A collection of related records
 Record
• A collection of related fields
 Field
• A group of characters
 Character
• Basic building block of information, represented by a byte
ITEC 1010 Information and Organizations
Data Entities, Attributes, and
Keys
 Entity
• A generalized class of people, places, or things (objects) for
which data are collected, stored, and maintained
• E.g., Customer, Employee
 Attribute
• A characteristic of an entity; something the entity is identified by
• E.g., Customer name, Employee name
 Keys
• A field or set of fields in a record that is used to identify the
record
• E.g, A field or set of fields that uniquely identifies the record
ITEC 1010 Information and Organizations
Keys and Attributes

Employee # Last name First name Hire date Dept. #

005-10-6321 Johns Francine 10-7-65 257

549-77-1001 Buckley Bill 2-17-79 650

098-40-1370 Fiske Steven 1-5-85 598

Key field Entities


Attributes (fields)
(records)

ITEC 1010 Information and Organizations


The Traditional Approach
 The traditional approach…
 Separate files are created and stored for each
application program

Schematic

ITEC 1010 Information and Organizations


Application
Data Files programs Users

Payroll Reports
Payroll programs

Invoicing Reports
Invoicing programs

Inventory
Inventory control Reports
control programs

Management
Management inquiries Reports
inquiries programs

ITEC 1010 Information and Organizations


Drawbacks
 Data redundancy
 Duplication of data in separate files
 Lack of data integrity
 The degree to which the data in any one file is accurate
 Program-data dependence
 A situation in which program and data organized for
one application are incompatible with programs and
data organized differently for another application

ITEC 1010 Information and Organizations


Database Approach
 The database approach…
 A pool of related data is shared by multiple
application programs
 Rather than having separate data files, each
application uses a collection of data that is
either joined or related in the database

Schematic

ITEC 1010 Information and Organizations


Payroll Reports
program
Payroll data

Inventory Inventory
program Reports
data Database
management
Invoicing system
Data Invoicing
Reports
program
Other
data
Other Reports
programs

Database Interface Applications programs Users

ITEC 1010 Information and Organizations


Advantages
 Improved strategic use of corporate data
 Reduced data redundancy
 Improved data integrity
 Easier modification and updating
 Data and program independence
 Better access to data and information
 Standardization of data access
 A framework for program development
 Better overall protection of the data
 Shared data and information resources
ITEC 1010 Information and Organizations
Disadvantages
 Relatively high cost of purchasing and
operating a DBMS in a mainframe operating
environment
 Increased cost of specialized staff
 Increased vulnerability

ITEC 1010 Information and Organizations


Data Modeling and
Database Models (1)
 Planned data redundancy
 A way of organizing data in which the logical
database design is altered so that certain data
entities are combined
 Summary totals are carried in the data records
rather than calculated from elemental data
 Some data attributes are repeated in more than
one data entity to improve database
performance

ITEC 1010 Information and Organizations


Data Modeling and
Database Models (2)
 Data model
 A map or diagram of entities and their relationships
 Enterprise data modeling
 Data modeling done at the level of the entire
organization
 Entity-relationship (ER) diagrams
 A data model that uses basic graphical symbols to
show the organization of and relationships between
data

ITEC 1010 Information and Organizations


Example:
Entity Relationship (ER)
Diagram for a Customer
Ordering Database

Schematic

ITEC 1010 Information and Organizations


Last name Attributes
Colour

Entities

First 1 N
Customer Order Product Name
name

1:N one-to-many
relationship

Identification Identification
number number

ITEC 1010 Information and Organizations


Hierarchical Database Model
 Hierarchical database model
 A data model in which data are organized in a
top-down, or inverted tree structure

Schematic

ITEC 1010 Information and Organizations


Project 1

Department A Department B Department C

Employee Employee Employee Employee Employee Employee


1 2 3 4 5 6

ITEC 1010 Information and Organizations


Network Data Model
 Network data model
• An expansion of the hierarchical database model
with an owner-member relationship in which a
member may have many owners

Project 1 Project 2

Department A Department B Department C

ITEC 1010 Information and Organizations


Relational Data Model
 Relational data model
 All data elements are placed in two-
dimensional tables, called relations, that are
the logical equivalent of files

Schematic

ITEC 1010 Information and Organizations


Data Table 1: Project Table Data Table 2: Department Table

Project Number Description Dept. Number Dept. Number Dept. Name Manager SSN

155 Payroll 257 257 Accounting 421-55-99993

498 Widgets 632 632 Manufacturing 765-00-3192

226 Sales manager 598 598 Marketing 098-40-1370

Data Table 3: Manager Table


SSN Last Name First Name Hire Date Dept. Number

005-10-6321 Johns Francine 10-7-65 257

549-77-1001 Buckley Bill 2-17-79 650

098-40-1370 Fiske Steven 1-5-85 598

ITEC 1010 Information and Organizations


Relational Database Terminology
 Selecting
 Data manipulation that eliminates rows according to
certain criteria
 Projecting
 Data manipulation that eliminates columns in a table
 Joining
 Data manipulation that combines two or more tables
 Linked
 Relating tables in a relational database together

ITEC 1010 Information and Organizations


Linking Data Tables to Answer
an Inquiry

Schematic

ITEC 1010 Information and Organizations


Project Number Description Dept. Number
155 Payroll 257
498 Widgets 632
226 Sales manager 598

Dept. Number Dept. Name Manager SSN


257 Accounting 421-55-99993
632 Manufacturing 765-00-3192
598 Marketing 098-40-1370

SSN Last Name First Name Hire Date Dept. Number


005-10-6321 Johns Francine 10-7-65 257
549-77-1001 Buckley Bill 2-17-79 650
098-40-1370 Fiske Steven 1-5-85 598

ITEC 1010 Information and Organizations


Building and Modifying a
Relational Database
 Using Microsoft Access

Screen snap

ITEC 1010 Information and Organizations


ITEC 1010 Information and Organizations
Schemas and Subschemas
 Schema
 A description of the entire database
 Subschema
 A file that contains a description of a subset of
the database and identifies which users can
perform modifications on the data items in that
subset
Schematic

ITEC 1010 Information and Organizations


DBMS

Schema

Subschema Subschema Subschema


A B C

User User User User User


1 2 3 4 5

ITEC 1010 Information and Organizations


Schema ‘Discussion’
Pronounced skee-ma, the structure of a database system, described
in a formal language supported by the database management
system (DBMS). In a relational database, the schema defines the
tables, the fields in each table, and the relationships between fields
and tables.

Schemas are generally stored in a data dictionary. Although a


schema is defined in text database language, the term is often used
to refer to a graphical depiction of the database structure.
ITEC 1010 Information and Organizations
Data Definition Language
 Data Definition Language (DDL)
 A collection of instructions and commands
used to define and describe data and data
relationships in a specific database

Schematic

ITEC 1010 Information and Organizations


SCHEMA DESCRIPTION
SCHEMA NAME IS XXXX
AUTHOR XXXX
DATE XXXX
FILE DESCRIPTION
FILE NAME IS XXXX
ASSIGN XXXX
FILE NAME IS XXXX
ASSIGN XXXX
AREA DESCRIPTION
AREA NAME IS XXXX
RECORD DESCRIPTION
RECORD NAME ISXXXX
RECORD ID IS XXXX
LOCATION MODE ISXXXX
WITHIN XXX AREA FROM XXXX THRU XXXX
SET DESCRIPTION
SET NAME IS XXXX
ORDER IS XXXX
MODE IS XXXX
MEMBER IS XXXX
.
.
.

ITEC 1010 Information and Organizations


Data Dictionary
 Data Dictionary
 A detailed description of all data used in the
database

Schematic

ITEC 1010 Information and Organizations


NORTHWESTERN MANUFACTURING

PREPARED BY: D. BORDWELL


DATE: 04 AUGUST
APPROVED BY: J. EDWARDS DATE: 13 OCTOBER
VERSION: 3.1
PAGE: 1 OF 1

DATA ELEMENT NAME: PARTNO


DESCRIPTION: INVENTORY PART NUMER
OTHER NAMES: PTNO
VALUE RANGE: 100 TO 5000
DATA TYPE: NUMERIC
POSITIONS: 4 POSITIONS OR COLUMNS

ITEC 1010 Information and Organizations


Data Dictionary Features
 Provide a standard definition of terms and data
elements
 Assist programmers in designing and writing
programs
 Simplify database modification
 Reduce data redundancy
 Increase data reliability
 Faster program development
 Easier modification of data and information
ITEC 1010 Information and Organizations
Logical and Physical
Access Paths
 Logical access path (LAP)
 Application requires information from the
DBMS
 Physical access path (PAP)
 DBMS accesses a storage device to retrieve
data

Schematic

ITEC 1010 Information and Organizations


Data
on
storage
devices

Physical access path (PAP)

DBMS

Logical access path (LAP)

Management Other Application


inquiries software programs

ITEC 1010 Information and Organizations


Manipulating Data
 Concurrency control
 A method of dealing with a situation in which two or
more people need to access the same record in a
database at the same time
 Data manipulation language (DML)
 The commands that are used to manipulate the data in
a database
 Structured query language (SQL)
 A standardized data manipulation language
ITEC 1010 Information and Organizations
Structured Query Language (SQL)
 “Invented” at IBM’s Almaden Research
Centre (San Jose, CA) in the 1970s
 E.g.,
SELECT * FROM EMPLOYEE WHERE
JOB_CLASSIFICATION = “C2”

Select all (“*”) columns from the EMPLOYEE


table in which the JOB_CLASSIFICATION field
is equal to “C2”

ITEC 1010 Information and Organizations


SQL ‘Discussion’ (1)
Abbreviation of structured query language, and pronounced either
see-kwell or as separate letters. SQL is a standardized query
language for requesting information from a database. The original
version called SEQUEL (structured English query language) was
designed by an IBM research center in 1974 and 1975. SQL was first
introduced as a commercial database system in 1979 by Oracle
Corporation.

Historically, SQL has been the favorite query language for database
management systems running on minicomputers and mainframes.
ITEC 1010 Information and Organizations
SQL ‘Discussion’ (2)
Increasingly, however, SQL is being supported by PC database
systems because it supports distributed databases (databases that
are spread out over several computer systems). This enables
several users on a local-area network to access the same database
simultaneously.

Although there are different dialects of SQL, it is nevertheless the


closest thing to a standard query language that currently exists. In
1986, ANSI approved a rudimentary version of SQL as the official
standard, but most versions of SQL since then have included many
ITEC 1010 Information and Organizations
SQL ‘Discussion’ (3)
extensions to the ANSI standard. In 1991, ANSI updated the
standard. The new standard is known as SAG SQL.

ITEC 1010 Information and Organizations


Database Output

Screen snap

ITEC 1010 Information and Organizations


ITEC 1010 Information and Organizations
Popular Database Management
Systems for End Users
 Microsoft Access 98
 Lotus Approach 98
 Inprise (formerly Borland) dBASE
 DBMS Selection Criteria
• Database size
• Number of concurrent users
• Performance
• Integration
• Features
• The vendor
• Cost
ITEC 1010 Information and Organizations
Distributed Databases

 Distributed database…
 A database in which the actual data may be
spread across several smaller databases
connected via telecommunications devices

‘Pretty’ picture

ITEC 1010 Information and Organizations


ITEC 1010 Information and Organizations
Data Warehouse
 Data warehouse
 A relational database management system designed
specifically to support management decision making
 Current evolution of Decision Support Systems (DSSs)
 Data mart
 A subset of a data warehouse for small and medium-
size businesses or departments within larger
companies
Schematic

ITEC 1010 Information and Organizations


Relational
databases

Data
Hierarchical extraction
databases process

Network
databases Data
cleanup
process
Flat files

Spreadsheets Data
End user access wharehouse

Query and
analysis
tools

ITEC 1010 Information and Organizations


Designing a Customer Data Warehouse

 Sharply define your goals and objectives


before you build the warehouse
 Choose the software that best fits your goals
 Determine who/what should be in the database
 Develop a plan
 Measure results

ITEC 1010 Information and Organizations


Data Mining Applications
 Data mining
 The automated discovery of patterns and relationships
in a data warehouse
 Data mining applications
• Market segmentation
• Customer queries
• Fraud detection
• Direct marketing
• Market basket analysis
• Trend analysis

ITEC 1010 Information and Organizations


On-Line Analytical
Processing (OLAP)
 On-line analytic processing (OLAP)
• Access to multidimensional databases providing
managerially useful display techniques
• Now used to store and deliver data warehouse
information
 Data warehouse and OLAP
• Provides top-down, query-driven analysis
 Data mining
• Provides bottom-up, discovery-driven analysis
ITEC 1010 Information and Organizations
Open Database
Connectivity (ODBC)
 Open database connectivity (ODBC)
• A set of standards that ensures software written to
comply with these standards can be used with any
ODBC-compliant database

Schematic

ITEC 1010 Information and Organizations


dBASE

ODBC Import

ACCESS Paradox
database database
ODBC Export

ODBC Link

Lotus 1-2-3
spreadsheet

ITEC 1010 Information and Organizations


Object-Relational Database Management
Systems (ORDBMS)
 Object-relational database management system (ORDBMS)
• A DBMS capable of manipulating audio, video, and graphical data.
 Hypertext
• Users can search and manipulate alphanumeric data in an
unstructured way
 Hypermedia
• Allows businesses to search and manipulate multimedia forms of
data
 Spatial data technology
• Use of an object-relational database to store and access data
according to the location it describes and to permit spatial queries
and analysis

ITEC 1010 Information and Organizations


End of Chapter 5

Chapter 6

ITEC 1010 Information and Organizations