Sie sind auf Seite 1von 13

THE DATABASE APPROACH PAGE 1

THE DATABASE APPROACH

RELATED READING

AVISON D (1986) Information Systems Development: A Database Approach. Blackwell Scientific,


Oxford. Chapter 1
BENYON D (1990) Information and Data Modelling Blackwell Oxford.
pp. 36-47
CURTIS G (1989) Business Information Systems: Analysis, Design and Practice. Addison Wesley.
Chapter 4
DATE C.J. (1986) An Introduction to Database Systems (5th Ed.) .Addison-Wesley.
Chapter 1 and pp. 31-40
LOOMIS M E S (1987) The Database Book. Macmillan NY. Chapters 1 and 2
McFADDEN F R & HOFFER J A (1991) Data Base Management (3rd Ed.) Benjamin Cummings
Redwood Ca. Chapter 1

INTRODUCTION

Databases are now so common that they hold little mystique but remain one of the most important
forms of software of all. They are the basis of most data processing and even the newest
developments in the use of ICTs rely upon them. For example, developments such as internet
shopping or PC banking which are likely to have enormous effects upon the way we live and work
are entirely dependent upon the efficacy, effectiveness and efficiency of the databases that store the
data that they use.
Given the modern day ubiquity of databases it can be hard to think back to a time when they were
not the principal way of storing commercial data. But it is necessary to do so in order to understand
why organisations first moved to using a database approach to storing and processing data, and the
advantages that we gain by doing so.
We can then move to discussing the basic concepts of a database and some of the terminology used.

FILE BASED DATA PROCESSING

The simplest approach to data processing is to organise the storage and retrieval of data through
files. This is sometimes still used for one-off or small-scale data handling. For example, if you were
carrying out simulation or other modelling tasks it may not be worth setting up a formal database
structure to hold the data. All computer-based information-systems used this approach until the
beginning of the 1970s.
In a file-based information-system all the data is stored within separate files. Files consist of a
collection of records and that each record is composed of a number of fields. The structure of a file is
the layout of each record, showing how much space each field occupies and the relative position of
each field within a record.

Different types of file


Figure 1 shows the schematic of a simple suite of data processing applications, and within this there
are examples of some different types of file.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 2

DAILY SORTED
SORT
SALES DAILY
PROGRAM
FILE SALES

DAILY
UPDATE
PROGRAM

BACK-UP CLEAN-UP
COPY OF PROGRAM PROGRAM
SALES FILE
SALES FILE

PAST
INVOICING SALES
FREIGHT PROGRAM FILE
RATES FILE

INVOICES

FIGURE 1 A SIMPLE DATA PROCESSING APPLICATION

In this application the details of the sales made each day are collected on-line by point-of-sales
terminals and added to a Daily Sales File. This file is used at the close of business to update a
centrally held Sales File. The Sales file is then later used for customer invoicing. The central
importance of the Sales File to the business makes it essential that copies are periodically taken for
recovery in case of disaster. Periodically the file is examined for details of sales completed over three
years in the past; these are of no operational interest to the organisation but details must be kept in
case of enquiries by the tax authorities. Such details are therefore removed to a Past Sales File.

Within this application there are examples of many of the different types of file which may be found
in an information-system and are described in Figure 2.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 3

Type of file Usage Example


in Figure
1.
MASTER FILES are the basis of an organisations data processing
activities. They contain the information which SALES FILE
needs to be kept over an extended period.
TRANSACTION contain records that relate to events or activities. DAILY SALES
FILES They are used to update in batch mode the FILE
information held on the master files.
TEMPORARY are files that are required by the design of the data
FILES processing system. For example, the decision to SORTED
sort the daily sales records into product sequence DAILY SALES
requires the creation of a temporary file FILE
containing the sorted records. Temporary files are
also known as Work Files and are usually deleted
after use.
PROGRAM contain the programs which define processing ALL
FILES instructions for the information-system. PROGRAMS

BACKUP FILES are copies of files used by the system which are COPY OF
kept in case the operational files are lost or SALES FILE
corrupted.
ARCHIVE FILES contain records which are no longer operationally PAST SALES
relevant but which cannot be deleted because of FILE
legal requirements or in case there arises an
unforeseen need for them in the future.
REFERENCE contain standard information which may be FREIGHT
FILES referenced by a number of different data RATES FILE
processing applications. Common examples are
tax rates or standard delivery charges for different
locations.
INDEX FILES contain information about where particular data is
physically stored within the system. They are
likely to be invisible to the user and maintained by
the system itself.

FIGURE 2 DIFFERENT TYPES OF FILE

Application ownership of data


In a file-based system each application has its own set of files. This means that throughout the
organisation the same information may be held in many different files.
For example the personnel department of an organisation may have a file holding employee records
and recording amongst other things each employee's name and job title. This data may also be held,
quite separately in other files held and used by the pay-roll department, as in Figure 3.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 4

PERSONNEL DATA

EMPLOYEE NUMBER
NAME
ADDRESS
PERSONNEL DATE JOINED
APPLICATIONS CURRENT JOB TITLE
CURRENT DEPARTMENT
MARITAL STATUS
SEX

PAYROLL DATA

EMPLOYEE NUMBER
NAME
JOB TITLE
BANK NAME
PERSONNEL BANK ADDRESS
APPLICATIONS BANK ACCOUNT NUMBER
SALARY
DEPARTMENT
N.I. NUMBER
TAX CODE

FIGURE 3 IN A FILE BASED PROCESSING SYSTEM EACH APPLICATION USES ITS OWN DATA
FILES

PROBLEMS OF A FILE BASED INFORMATION-SYSTEM

Organising information provision through a file-based approach was the norm for many years, but it
can give rise to a number of problems. These are:

Duplication and redundancy


A file-based approach may lead to data redundancy, where the same information is held in many
different places.
The resultant wastage of space might not be too serious but data redundancy is undesirable because
of a much more serious risk, that changes will lead to data inconsistency and incorrect processing.
For example, if a customer's address is held in a number of different places then should the
customers address change it is possible that one copy of the data may be overlooked and not
updated.

Data dependent programs


In a file oriented system each program is dependent upon the physical format of the files which it
accesses and knowledge of the data structures must be built into the programs themselves. Programs
written in the COBOL programming language for example must include definitions of the structure
of the files used by that program, as shown in Figure 4. If the structure of a file changes then all the
programs using that file must be amended to reflect the new file structure.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 5

DATA DIVISION.
FILE SECTION.
FD INPUT-FILE
LABEL RECORDS ARE OMITTED
DATA RECORD IS CARD-RECORD.
01 CARD-RECORD PCTURE X(80)
03 CUST-CODE PICTURE 9(5).
03 AMT-OWED PICTURE 9(8).
03 FILLER PICTURE X(67).

Coded instructions for


data processing

FIGURE 4. FILE STRUCTURE DEFINED WITHIN A COBOL PROGRAM

This need to change all the programs at the very least adds to the maintenance burden of an
organisations DP. department but more seriously, if one occurrence of the definition of the file is
overlooked and unchanged, this can lead to data processing jobs "falling down" in the middle of
operational runs or incorrect data processing.
For example, if an organisation uses a five digit customer code then every program accessing any file
which holds a customer code field will have within it an explicit definition of that field as five digits
long. One file in which the customer code appears is in the invoicing file. This file is used by an
invoicing program, contains instructions to invoice the customer identified by the CUSTOMER-
CODE field for the amount given in the field AMOUNT-OWED. The file definition contained within
the program specifies where to find within each record the fields CUSTOMER-CODE and
AMOUNT-OWED.
Now, because of a large number of new customers, the organisation decides to increase the length of
the customer code to six digits. All the data files and all the programs are amended to reflect this
change, except for the invoicing program. This therefore continues to expect that the first five digits
of the file are a customer code and the next eight digits are the amount for which the customer is to
be invoiced. The data is therefore wrongly interpreted. Figure 5 shows how the invoicing program,
instead of invoicing customer number 011062 for 11,890 will in instead invoice customer number
01106 for 2,0001189.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 6

CUSTOMER AMOUNT Other


CODE OWED fields

01 10 6 200 01 18 90 01

DATA DIVISION.
FILE SECTION.
FD INPUT-FILE
LABEL RECORDS ARE OMITTED
DATA RECORD IS CARD-RECORD.
01 CARD-RECORD PICTURE X(80)
03 CUST-CODE PICTURE 9(5).
03 AMT-OWED PICTURE 9(8).
03 FILLER PICTURE X(67).

CUSTOMER AMOUNT Other


CODE OWED fields

01 10 6 200 01 18 90 01

FIGURE 5 PROCESSING ERROR RESULTING FROM INCORRECT FILE DEFINITION

Difficulties in relating separately stored data


The file oriented approach makes it difficult to share data between different applications and to relate
data held on different files; it might be useful for example to analyse whether orders received from
customers with a low credit limit were more quickly paid for than from those with a high credit limit,
but if the Sales Department had details of customers credit limits whilst the Invoicing Department
files held dates of invoicing and payment this might not be possible.

Lack of flexibility
This illustrates too another problem, that of lack of flexibility. File oriented information-systems
generally provide pre-defined information in a pre-defined format. New requirements will require
new programming which will be expensive and particularly in the case of ad-hoc requirements may
be too slow.

Lack of security and control


As individual applications operate individually upon the stored data there no opportunity for
centralised control over the organisations data. This means that because of redundancy incorrect and
conflicting versions of the same data to exist within the organisation. Maintaining records of which
users use which data is difficult and access to a files gives the user access to all the data contained in
that file, even if not all of it is relevant to their work.
Also, the same data may be understood in different ways, and even called different things, by
different users. The result is a data aggregate rather than a data system

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 7

DATABASE SYSTEMS

Movement towards use of databases


It is because of these reasons as well as increasing pressure on the DP. department (the applications
backlog ) and the requirement for more sophisticated and more rapidly reacting information-systems
that there has been for many years now a movement towards database systems, and that the design
and use of such systems has become a major concern within the fields of information-systems.
A database is a set of co-ordinated and integrated data files. Within a database data is structured,
shared and controlled in a way that removes redundancy.

Figures 6 and 7 illustrate the essential difference between database information-systems and file-
based information-systems; namely that data is shared between different users.

Within the file based system of Figure 6 the same details about employees may be held quite
separately in the files used by the Personnel Department and the Pay-Roll Department. Details of
products may be duplicated in the files used by the Sales Department and Stock Control.
Within the database system of Figure 7 however there is no such duplication. Only one copy of every
piece of data is stored within the integrated database, but shared and concurrent use of that data is
possible for all the individual users.

having
Personnel
individual and
Department Personnel files
exclusive access to

having
Payroll individual and
Department Pay-roll files
exclusive access to

having
Sales
individual and
Department Sales files
exclusive access to

having
Stock control individual and
exclusive access to Stock control files

FIGURE 6 NON-INTEGRATED FILE SYSTEM. WHERE WITHIN WHICH THERE MAY BE


DUPLICATION OF DATA

Personnel
Department

Payroll Concurrent
Department and
controlled
access to
Sales organizational
data base
Department with no
redundancy
Stock control

FIGURE 7. DATABASE SYSTEM WITH NO DUPLICATION OF DATA

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 8

COMPONENTS OF THE DATABASE ENVIRONMENT

Database management systems


Storing and accessing data in a database requires a sophisticated piece of software, a database
management system (DBMS). This is a set of programs which sits between the application
programs and the operating system, controlling the storage and manipulation of the stored data, as
shown in Figure 8.

DBMS SOFTWARE
OPERATING SYSTEM

MACHINE LEVEL
OPERATIONS

APPLICATION
PROGRAMS

QUERY
LANGUAGES

FIGURE 8. DBMS IS AN ADDITIONAL LEVEL OF SOFTWARE

Database systems are distinguished from file systems in that


i. They allow multiple users to access the data at the same time
ii. The data is stored in an integrated way
iii. There is independence between data storage and data use
iv. There is independence between how the data is physically stored and how it may be logically
viewed.
v. There is use of a central data dictionary or repository to hold definitions of data
vi. There are built in query language allowing ad-hoc access to the data
vii. There is DBMS software that manages storage and access, providing high levels of security and
monitors performance and reorganises physical storage when required.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 9

Data definition and manipulation languages


As indicated in Figure 8 interaction with a DBMS may take place via programs, for example a
COBOL program may have "calls" to the DBMS embedded within it, or via a specific Query
Language, a high level programming language understood by the DBMS.

In addition to a query language a DBMS uses two other types of language:

A Data Definition Language


is used to define database structures, for example defining that CUSTOMER ADDRESS records
are related to CUSTOMER records.

A Data Manipulation Language


allows one to process database structures, for example to delete all CUSTOMER ADDRESS
records relating to the CUSTOMER record with a key of 013854. The DML may have interfaces
with languages such as COBOL and will allow the basic operations such as adding, deleting,
inserting, and sorting data.

Data Dictionaries
An automated data dictionary (often called a repository) is normally supplied as part of the DBMS
package, and in most cases is central to the operation of all the elements of the DBMS. Others are
more general and provide interfaces with the most common DBMS systems.
The data dictionary is itself a database, which acts as the repository of definitions of all the data used
by an organisation. The meaning of the data to be held in the database is agreed with the users before
being defined within the data dictionary. Any inconsistencies that are discovered must be resolved
(for example different users may use the same name to refer to different data items or different names
for the same item) so that the end result is an organisation wide set of standard definitions. The data
dictionary is thus used for defining the facts which are associated with every entity relevant to the
organisations business. The facts to be recorded will include a description, details of the type of the
data, maximum and minimum sizes, ranges of possible values.
During database design the external and conceptual schema are recorded in the data dictionary. Data
dictionaries allow the stored data to be used to produce the command statements needed to generate
the physical data structures. Language specific outputs such as DATA DIVISION statements for
COBOL may also be generated automatically.
The data dictionary is a key element in controlling the data resource and is essential in implementing
data administration. It is the basic tool used by the data administrator to maintain control over the
organisations data. Since the data dictionary controls the structure of the database it allows control
over any changes to the database and ensures total consistency as to the use of data items throughout
the database.
It may also be used to estimate the impact of any changes to be made; most automated data
dictionaries have the facility to produce cross referenced reports of the use of any data item and so
allow an evaluation to be made of the work which would be necessitated by making a change in the
size or other characteristics of that data item.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 10

DATABASE SCHEMA

Database management systems employ three types of schema or views of the database as shown in
Figure 9.

EXTERNAL SCHEMA
EXTERNAL SCHEMA EXTERNAL SCHEMA
(User views)
(User views) (User views)

provide a limited view, appropriate


to a particular requirement, of

CONCEPTUAL SCHEMA

is a high-level description of the database independent


of any particular DBMS

LOGICAL SCHEMA

Gives a logical (not physical) description of a structure


that could be implemented on target DBMS

INTERNAL SCHEMA
(Physical model)

describes the physical arrangements for storage of

PHYSICAL STORED DATA

FIGURE 9. THREE LEVELS OF DATABASE SCHEMA

The internal schema


The internal schema of a database describes how data is physically stored within the computer. It
contains details of the storage and access methods used, indexing and physical location of data
records. It is tailored to a specific DBMS package.

The logical schema


Is a description of the structure of the database that can be processed by the DBMS. The logical
schema is dependent upon the type of DBMS being used (hierarchical, network, relational or O-O)
but not on the specific detail and capabilities of the specific software.

The conceptual schema


The conceptual schema is the logical view of the whole of the stored data and is completely
independent of the DBMS or type of DBMS being used. Contained within it are:
Details of the things (entities) which are of importance to the business and about which data are
stored

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 11

Details of the facts (attributes) which are recorded about the different entities
Details about the nature of those attributes. For example whether an attribute is numeric, the number
of characters used to describe it, or whether only a certain range of values is legitimate.
Definitions of the relationships which can exist between the entities. For example, that ORDERs are
always associated with the CUSTOMER that places them.

Separation of the conceptual and internal schema means that use of the database need not be affected
by any changes to the way in which the data is physical stored For example, re-organising the
physical storage would have had major impacts in a file system, requiring every application program
to be amended. In a database system only the rules which the DBMS uses to map the conceptual
schema onto the physical schema need to be amended. Users of the database can continue to use the
database oblivious to any change having occurred.

The external schema


A number of different external schema present each user or application program with a view of the
database which is sufficient for their processing need but which is only a subset of the overall
conceptual schema. The use of external schema means that users need not concern themselves with
parts of the database which are not of interest to them, so making easier their use of the database.
As the users have no knowledge of the remainder of the database they have no way to access data in
other areas, so improving the security of the data.
If changes are made to the internal or the conceptual schema in areas which are not included in the
external schema used by a particular user then that user will be unaffected by those changes and need
not be aware that they have occurred.

Physical and logical views


It is the employment of these three levels of schema that allows two separate views of the data to be
taken. We may take a physical view of the data, meaning that we are look at the way in which the
data is physically stored on the computer or storage device. But most users are not interested in how
the data is actually held; as long as they the can store, process and retrieve the data they are not
really concerned about what is going on inside the machine and will happily leave worrying about
that sort of thing to the DBMS. Since the DBMS is capable of translating the user's view of what
data is held onto the physical reality this means that the user is free to take a logical view of the data;
a logical view is thus a view of the data which is relevant to a particular application.
But different applications may not be interested in all the data held but only a subset of it. For
example, an organisations database might consist of a set of connected files, holding data about
departments, about employees, about the projects upon which an employee has worked and details
of employees' salaries.
Figure 10 shows the conceptual schema of the database; it may not however be necessary or even
desirable that everyone works with this view of the data.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 12

DEPARTMENT

EMPLOYEE

1 1

N N

SALARY
PROJECT
PAYMENT

FIGURE 10 CONCEPTUAL SCHEMA OF A DATABASE

If a new application arose, that of using the employees' past project experience as a basis for deciding
future project assignments then such an application would have no need to know that the files
holding information about salaries or departments even existed. The logical view of the data for that
application would be one which showed the database as consisting of only two files, the
EMPLOYEE file and the PROJECT file. Providing that application with an external schema which
reflects the view, shown in Figure 11 will therefore restrict that applications access to only the parts
of the database that it requires. The application could be programmed based upon such a logical view
and the DBMS would then map this logical view onto the conceptual and internal or physical view.
This ability to work with logical views is very useful in that it improves control and security of an
organisations data resource.

EMPLOYEE

PROJECT

FIGURE 11 AN EXTERNAL SCHEMA OFFERING ONLY A LIMITED VIEW OF THE DATABASE

It is the Data Definition Language (DDL) which provides the link between the physical and logical
structures. The DDL defines the schema of the database, specifying the name, type and size of each
field, what fields are in each record and the key fields for each record; in this you may see a similarity
with the definition of files in packages such as dBASE or Access. The DDL also allows the definition
of sub-schema, each sub- schema corresponding to a logical view. There may be many different sub-
schema relating to a single schema.

P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 13

ADVANTAGES AND DISADVANTAGES OF DATABASES

Advantages
Eliminates redundancy
Reduces risk of errors in stored data and provides better integrity
Reduction of storage costs
Reduced data entry costs
Relates associated data
Provides data independence
Greater ease of development
Flexibility
Gives end user easier access and allow ad-hoc enquiry
Provides better security and control

Disadvantages
Cost of software
Transition and set-up costs
Running costs, especially increased machine usage
Skills required of staff
Sharing of data may lead to conflict
Long payoff period
Need for monitoring, tuning and control of database

P J DUNNING-LEWIS (INTRO.DOC)

Das könnte Ihnen auch gefallen