Beruflich Dokumente
Kultur Dokumente
RELATED READING
INTRODUCTION
Databases are now so common that they hold little mystique but remain one of the most important
forms of software of all. They are the basis of most data processing and even the newest
developments in the use of ICTs rely upon them. For example, developments such as internet
shopping or PC banking which are likely to have enormous effects upon the way we live and work
are entirely dependent upon the efficacy, effectiveness and efficiency of the databases that store the
data that they use.
Given the modern day ubiquity of databases it can be hard to think back to a time when they were
not the principal way of storing commercial data. But it is necessary to do so in order to understand
why organisations first moved to using a database approach to storing and processing data, and the
advantages that we gain by doing so.
We can then move to discussing the basic concepts of a database and some of the terminology used.
The simplest approach to data processing is to organise the storage and retrieval of data through
files. This is sometimes still used for one-off or small-scale data handling. For example, if you were
carrying out simulation or other modelling tasks it may not be worth setting up a formal database
structure to hold the data. All computer-based information-systems used this approach until the
beginning of the 1970s.
In a file-based information-system all the data is stored within separate files. Files consist of a
collection of records and that each record is composed of a number of fields. The structure of a file is
the layout of each record, showing how much space each field occupies and the relative position of
each field within a record.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 2
DAILY SORTED
SORT
SALES DAILY
PROGRAM
FILE SALES
DAILY
UPDATE
PROGRAM
BACK-UP CLEAN-UP
COPY OF PROGRAM PROGRAM
SALES FILE
SALES FILE
PAST
INVOICING SALES
FREIGHT PROGRAM FILE
RATES FILE
INVOICES
In this application the details of the sales made each day are collected on-line by point-of-sales
terminals and added to a Daily Sales File. This file is used at the close of business to update a
centrally held Sales File. The Sales file is then later used for customer invoicing. The central
importance of the Sales File to the business makes it essential that copies are periodically taken for
recovery in case of disaster. Periodically the file is examined for details of sales completed over three
years in the past; these are of no operational interest to the organisation but details must be kept in
case of enquiries by the tax authorities. Such details are therefore removed to a Past Sales File.
Within this application there are examples of many of the different types of file which may be found
in an information-system and are described in Figure 2.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 3
BACKUP FILES are copies of files used by the system which are COPY OF
kept in case the operational files are lost or SALES FILE
corrupted.
ARCHIVE FILES contain records which are no longer operationally PAST SALES
relevant but which cannot be deleted because of FILE
legal requirements or in case there arises an
unforeseen need for them in the future.
REFERENCE contain standard information which may be FREIGHT
FILES referenced by a number of different data RATES FILE
processing applications. Common examples are
tax rates or standard delivery charges for different
locations.
INDEX FILES contain information about where particular data is
physically stored within the system. They are
likely to be invisible to the user and maintained by
the system itself.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 4
PERSONNEL DATA
EMPLOYEE NUMBER
NAME
ADDRESS
PERSONNEL DATE JOINED
APPLICATIONS CURRENT JOB TITLE
CURRENT DEPARTMENT
MARITAL STATUS
SEX
PAYROLL DATA
EMPLOYEE NUMBER
NAME
JOB TITLE
BANK NAME
PERSONNEL BANK ADDRESS
APPLICATIONS BANK ACCOUNT NUMBER
SALARY
DEPARTMENT
N.I. NUMBER
TAX CODE
FIGURE 3 IN A FILE BASED PROCESSING SYSTEM EACH APPLICATION USES ITS OWN DATA
FILES
Organising information provision through a file-based approach was the norm for many years, but it
can give rise to a number of problems. These are:
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 5
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE
LABEL RECORDS ARE OMITTED
DATA RECORD IS CARD-RECORD.
01 CARD-RECORD PCTURE X(80)
03 CUST-CODE PICTURE 9(5).
03 AMT-OWED PICTURE 9(8).
03 FILLER PICTURE X(67).
This need to change all the programs at the very least adds to the maintenance burden of an
organisations DP. department but more seriously, if one occurrence of the definition of the file is
overlooked and unchanged, this can lead to data processing jobs "falling down" in the middle of
operational runs or incorrect data processing.
For example, if an organisation uses a five digit customer code then every program accessing any file
which holds a customer code field will have within it an explicit definition of that field as five digits
long. One file in which the customer code appears is in the invoicing file. This file is used by an
invoicing program, contains instructions to invoice the customer identified by the CUSTOMER-
CODE field for the amount given in the field AMOUNT-OWED. The file definition contained within
the program specifies where to find within each record the fields CUSTOMER-CODE and
AMOUNT-OWED.
Now, because of a large number of new customers, the organisation decides to increase the length of
the customer code to six digits. All the data files and all the programs are amended to reflect this
change, except for the invoicing program. This therefore continues to expect that the first five digits
of the file are a customer code and the next eight digits are the amount for which the customer is to
be invoiced. The data is therefore wrongly interpreted. Figure 5 shows how the invoicing program,
instead of invoicing customer number 011062 for 11,890 will in instead invoice customer number
01106 for 2,0001189.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 6
01 10 6 200 01 18 90 01
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE
LABEL RECORDS ARE OMITTED
DATA RECORD IS CARD-RECORD.
01 CARD-RECORD PICTURE X(80)
03 CUST-CODE PICTURE 9(5).
03 AMT-OWED PICTURE 9(8).
03 FILLER PICTURE X(67).
01 10 6 200 01 18 90 01
Lack of flexibility
This illustrates too another problem, that of lack of flexibility. File oriented information-systems
generally provide pre-defined information in a pre-defined format. New requirements will require
new programming which will be expensive and particularly in the case of ad-hoc requirements may
be too slow.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 7
DATABASE SYSTEMS
Figures 6 and 7 illustrate the essential difference between database information-systems and file-
based information-systems; namely that data is shared between different users.
Within the file based system of Figure 6 the same details about employees may be held quite
separately in the files used by the Personnel Department and the Pay-Roll Department. Details of
products may be duplicated in the files used by the Sales Department and Stock Control.
Within the database system of Figure 7 however there is no such duplication. Only one copy of every
piece of data is stored within the integrated database, but shared and concurrent use of that data is
possible for all the individual users.
having
Personnel
individual and
Department Personnel files
exclusive access to
having
Payroll individual and
Department Pay-roll files
exclusive access to
having
Sales
individual and
Department Sales files
exclusive access to
having
Stock control individual and
exclusive access to Stock control files
Personnel
Department
Payroll Concurrent
Department and
controlled
access to
Sales organizational
data base
Department with no
redundancy
Stock control
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 8
DBMS SOFTWARE
OPERATING SYSTEM
MACHINE LEVEL
OPERATIONS
APPLICATION
PROGRAMS
QUERY
LANGUAGES
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 9
Data Dictionaries
An automated data dictionary (often called a repository) is normally supplied as part of the DBMS
package, and in most cases is central to the operation of all the elements of the DBMS. Others are
more general and provide interfaces with the most common DBMS systems.
The data dictionary is itself a database, which acts as the repository of definitions of all the data used
by an organisation. The meaning of the data to be held in the database is agreed with the users before
being defined within the data dictionary. Any inconsistencies that are discovered must be resolved
(for example different users may use the same name to refer to different data items or different names
for the same item) so that the end result is an organisation wide set of standard definitions. The data
dictionary is thus used for defining the facts which are associated with every entity relevant to the
organisations business. The facts to be recorded will include a description, details of the type of the
data, maximum and minimum sizes, ranges of possible values.
During database design the external and conceptual schema are recorded in the data dictionary. Data
dictionaries allow the stored data to be used to produce the command statements needed to generate
the physical data structures. Language specific outputs such as DATA DIVISION statements for
COBOL may also be generated automatically.
The data dictionary is a key element in controlling the data resource and is essential in implementing
data administration. It is the basic tool used by the data administrator to maintain control over the
organisations data. Since the data dictionary controls the structure of the database it allows control
over any changes to the database and ensures total consistency as to the use of data items throughout
the database.
It may also be used to estimate the impact of any changes to be made; most automated data
dictionaries have the facility to produce cross referenced reports of the use of any data item and so
allow an evaluation to be made of the work which would be necessitated by making a change in the
size or other characteristics of that data item.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 10
DATABASE SCHEMA
Database management systems employ three types of schema or views of the database as shown in
Figure 9.
EXTERNAL SCHEMA
EXTERNAL SCHEMA EXTERNAL SCHEMA
(User views)
(User views) (User views)
CONCEPTUAL SCHEMA
LOGICAL SCHEMA
INTERNAL SCHEMA
(Physical model)
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 11
Details of the facts (attributes) which are recorded about the different entities
Details about the nature of those attributes. For example whether an attribute is numeric, the number
of characters used to describe it, or whether only a certain range of values is legitimate.
Definitions of the relationships which can exist between the entities. For example, that ORDERs are
always associated with the CUSTOMER that places them.
Separation of the conceptual and internal schema means that use of the database need not be affected
by any changes to the way in which the data is physical stored For example, re-organising the
physical storage would have had major impacts in a file system, requiring every application program
to be amended. In a database system only the rules which the DBMS uses to map the conceptual
schema onto the physical schema need to be amended. Users of the database can continue to use the
database oblivious to any change having occurred.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 12
DEPARTMENT
EMPLOYEE
1 1
N N
SALARY
PROJECT
PAYMENT
If a new application arose, that of using the employees' past project experience as a basis for deciding
future project assignments then such an application would have no need to know that the files
holding information about salaries or departments even existed. The logical view of the data for that
application would be one which showed the database as consisting of only two files, the
EMPLOYEE file and the PROJECT file. Providing that application with an external schema which
reflects the view, shown in Figure 11 will therefore restrict that applications access to only the parts
of the database that it requires. The application could be programmed based upon such a logical view
and the DBMS would then map this logical view onto the conceptual and internal or physical view.
This ability to work with logical views is very useful in that it improves control and security of an
organisations data resource.
EMPLOYEE
PROJECT
It is the Data Definition Language (DDL) which provides the link between the physical and logical
structures. The DDL defines the schema of the database, specifying the name, type and size of each
field, what fields are in each record and the key fields for each record; in this you may see a similarity
with the definition of files in packages such as dBASE or Access. The DDL also allows the definition
of sub-schema, each sub- schema corresponding to a logical view. There may be many different sub-
schema relating to a single schema.
P J DUNNING-LEWIS (INTRO.DOC)
THE DATABASE APPROACH PAGE 13
Advantages
Eliminates redundancy
Reduces risk of errors in stored data and provides better integrity
Reduction of storage costs
Reduced data entry costs
Relates associated data
Provides data independence
Greater ease of development
Flexibility
Gives end user easier access and allow ad-hoc enquiry
Provides better security and control
Disadvantages
Cost of software
Transition and set-up costs
Running costs, especially increased machine usage
Skills required of staff
Sharing of data may lead to conflict
Long payoff period
Need for monitoring, tuning and control of database
P J DUNNING-LEWIS (INTRO.DOC)