Beruflich Dokumente
Kultur Dokumente
There is less danger of damaging the surface of the disks than there is of
breaking a tape.
Indexed Sequential Access Method
To explain the indexed sequential access method (ISAM), let's go back to the
example of the cassette tape. A cassette tape label has a printed list of the songs
contained on it which gives you a general idea of where to go on the tape to find
a particular tune. So too with computer records on a sequential access tape
using the key field. It gives the computer a pretty accurate idea of where a
particular record is located. That's why it is so important to have a unique ID as
the key field.
Direct file access method
This access method also uses key fields in combination with mathematical
calculations to determine the location of a record. If you order something by
phone from a mail order catalogue, the person taking your order does not have to
wait for the computer to randomly select your record; using the direct file access
method, the computer can find you very quickly.
Limitations of File-based Approach
1. File processing systems store groups of records in separate files
2. Separation and isolation of data
Each program maintains its own set of data.
Users of one program may be unaware of potentially useful data held by other
programs.
3. Duplication of data (redundancy)
Same data is held by different programs.
Wasted space and potentially different values and/or different formats for the
same item- data integrity problem, produce inconsistent results
4. Data dependence/ application program dependency
File structure/ format and records are defined in the program/application code.
Time consuming and error prone tasks
5. Incompatible file formats
Programs are written in different languages, and so cannot easily access each
other's files; rather files written in different programming languages cannot readily
be combined or compared.
6. Fixed Queries/Proliferation of application programs
Programs are written to satisfy particular functions. Any new requirement needs a
new program.
5.2 THE DATABASE ENVIRONMENT
Database Approach
Integrated data
Enables users to define, create, and maintain the database and which
provides controlled access to this database.
Allows the storage, retrieval, and manipulation of information in a
prescribed format.
It interfaces with application programs that access the database data
Allows users to deal with the data in abstract terms, rather than as the
computer stores the data
Links between the physical database, the computer and the operating
system, and on the other hand, the users.
Examples of DBMS:
Microsoft Works
SQL Server
INNOPAC
Oracle
CDS/ISIS
Dbase I,II,III,IV
Microsoft Access
Lotus Approach
Paradox
Components Of A DBMS Used By Systems Personnel
Data Dictionary: contains the names and descriptions of every data element in a
database. It also contains a description of how data elements relate to one
another. Through the use of a data dictionary the DBMS stores the data in a
consistent manner, thus reducing redundancy.
Data Languages: To place the data in database, the special language is used to
describe the characteristics of the data elements. This language is called the
Data Definition language. DML is used to retrieve and process data from a
Database.
Teleprocessing Monitor: Communication software that manages
communication between the Database and remote terminals.
Application Development: This is a set of programs designed to help
programmers develop application program that use the database.
Security Software: This provides a variety of tools to shield the database from
unauthorised access.
Archiving and recovery system: Provides the database manager with tools to
make copies of the database, which can be used by in case original database
records are damaged. Restart, Recovery system are tools used to restart the
database and to recover lost data in the event of a failure.
the stored data by application programs. A DBMS has three components, all of
them important for the long-term success of the system.
Data definition language. Marketing looks at customer addresses differently
from Shipping. So you must make sure that all users of the database are
speaking the same language. Think of it this way: Marketing is speaking French,
Production is speaking German, and Human Resources is speaking Japanese.
They are all saying the same thing, but it's very difficult for them to understand
each other. Defining the data definition language it sometimes gets shortchanged. It is critical to involve users in the development of the Data Definition.
Data manipulation language. This is a formal language used by programmers
to manipulate the data in the database and make sure they are formulated into
useful information. The goal of this language should be to make it easy for users.
The basic idea is to establish a single data element that can serve multiple users
in different departments depending on the situation. Otherwise, you'll be tying up
programmers to get information from the database that users should be able to
get on their own.
Data dictionary. Each data element or field should be carefully analysed to
determine what it will be used for, who will be the primary user, and how it fits into
the overall scheme of things. Then write it all down and make it easily available
to all users. This is one of the most important steps in creating a database.
Logical And Physical Views Of Data
Physical views of items are often different from the logical views of the same
items when they are actually being used.
The physical view of data cares about where the data are actually stored in the
record or in a file. The physical view is important to programmers who must
manipulate the data as they are physically stored in the database.
Does it really matter to the user that the customer address is physically stored on
the disk before the customer name? However, when users create a report of
customers located in Harare they generally will list the customer name first and
then the address. So it's more important to the end user to bring the data from
their physical location on the storage device to a logical view in the output device,
whether screen or paper.
Users' view of the database: Describes that part of database that is relevant to a
particular user. This is the level at which users interact with the system via
applications programs, a host language or data sub language.
Within these records the user may need access to only a few selected fields in
order to perform the specified tasks. The external schema supplies the user with
this limited window on the conceptual schema. Different views may have different
representations of the same data. E.g., user1 views dates as (day,month,year)
whereas user2 may view them as (year, month, day). Some views may include
some derived or calculated data, data not actually stored in the database as
such. E.g., ages of employees may be included in a view on an employee
relation but are unlikely to be stored. Instead, their dates of birth would be stored
and their ages calculated from them by the DBMS. The external schema also
contains the method of deriving the objects in the external view from the objects
in the conceptual view. The objects include entities, attributes and relationships.
Conceptual Level
The conceptual view is a representation of the entire information content of the
database. This level describes what data is stored in the database and the
relationships among the data. This level contains the logical structure of the
entire database as seen by the database administrator (DBA). The conceptual
schema hides the details of physical storage structures and concentrates on
describing entities, data types, relationships, user operations, and constraints.
This level mainly represents: all entities, their attributes and their relationships,
security and integrity information.
This level must not contain any storage-dependent details (e.g., storage structure
and access technique).
The schema can be regarded as derived from a model of the organization and
should be designed with care, as it is usual for its structure to remain relatively
unchanged in the life of the database.
Internal Level
Internal view is a low-level representation of the entire database. This level
describes how the data is stored in the database and the access paths for the
Database. The internal view is described by means of the internal schema which
defines the various stored record types, how stored fields are represented, what
indexes exist, what physical sequence the stored records are in, and so on. It is
concerned with storage details that are not part of a logical view of the database.
DBMS Users/ Roles in the DBMS
The classification of DBMS users can be done depending on their degree of
expertise or the mode of their interactions with the DBMS.
Nave users: These are users who need not be aware of the presence of a
database system or any other supporting their usage, e.g. a user of an automatic
teller machine.
Online users: These are aware of the presence of the database system and they
maybe communicating with the DB directly via an online terminal or indirectly via
a user interface or application program.
Application programmers: These are responsible for developing application
programs or user interfaces used by the nave and online users.
DBA(Database Administrator)
The database will be able to meet the demands of various users in the
organization effectively only if it is maintained and managed properly. Usually a
person (or a group of persons) centrally located, with an overall view of the
database, is needed to keep the database running smoothly. The DBA is the
custodian of the data and controls the database structure; he administers the
three levels of the database.
The DBA would normally have a large number of tasks related to maintaining and
managing the database. These tasks would include the following:
Deciding and Loading the Database Contents - The DBA in consultation with
senior management is normally responsible for defining the conceptual schema
of the database. The DBA would also be responsible for making changes to the
conceptual schema of the database if and when necessary.
Assisting and Approving Applications and Access - The DBA would normally
provide assistance to end-users interested in writing application programs to
access the database. The DBA would also approve or disapprove access to the
various parts of the database by different users.
Deciding Data Structures - Once the database contents have been decided, the
DBA would normally make decisions regarding how data is to be stored and what
indexes need to be maintained. In addition, a DBA normally monitors the
performance of the DBMS and makes changes to data structures if the
performance justifies them. In some cases, radical changes to the data structures
may be called for.
Backup and Recovery - Since the database is such a valuable asset, the DBA
must make all the efforts possible to ensure that the asset is not damaged or lost.
This normally requires a DBA to ensure that regular backups of a database are
carried out and in case of failure (or some other disaster like fire or flood),
suitable recovery procedures are used to bring the database up with as little
down time as possible.
Monitor Actual Usage - The DBA monitors actual usage to ensure that policies
laid down regarding use of the database are being followed. The usage
information is also used for performance tuning.
5.3 DESIGNING DATABASES
10
Activity1
Tennis
Squash
Tennis
Cost1 Activity2
Cost2
$3600 Swimming $1700
$4000 Swimming $1700
$3600
11
$4700
Table 1 Activity
Step 3: Analyse the data. In this case, above in table 1 there are two John
Dubes, and theres no way to differentiate them. There comes the need for a
uniquely identifier.
Uniquely identify records
Step 4: Modify the design. Each student is identified uniquely by giving each
one a unique ID(primary key ). This field (primary key) can be used to retrieve
any specific record.
The table structure is now: ID, Activity1, Cost1, Activity2, Cost2.
While its easy for the computer to keep track of ID codes, its not so useful for
humans. Therefore there is need to introduce a second table that lists each ID
and the student it belongs to. Using a database program, both table structures
can be linked by the common field, ID. The initial flat-file design has been
converted into a relational database: a database containing multiple tables linked
together by key fields.
Step 5: Test the table with sample data.
Student
John Dube
Rudo Masuku
John Dube
Mark Ruvende
ID*
084
100
182
219
ID*
Student
084 John
Dube
100 Rudo
Masuku
182 John
Dube
219 Mark
Ruvende
Activity1
Cost1 Activity2
Tennis
Squash
Tennis
$3600
Figure 1
Step 6: Analyse the data.
Theres still a lot wrong with the Activities table:
12
Cost2
$4700
Wasted space. Some students dont take a second activity, and so wasting space
when data is stored. It doesnt seem much of a bother in this sample, but what if
millions of records are involved? Of course the waste will be significant.
Addition anomalies. What if number 219 ( Mark Ruvende) wants to do a third
activity? University rules allow it, but theres no space in this structure for another
activity. Another for Mark, as that would violate the unique key field ID, and it
would also make it difficult to see all his information at once.
Redundant data entry. If the tennis fees go up to $3900, the database designer
has to go through every record containing tennis and modify the cost.
Querying difficulties. Its difficult to find all people doing swimming: a search has
to be made through both activities (Activity 1 and Activity 2) to make sure all are
caught.
Redundant information. If 50 students take swimming, then there is need to type
in both the activity and its cost each time.
Inconsistent data. It can be noted that there are conflicting prices for swimming?
Should it be $1500 or $1700? This happens when one record is updated and
another is not.
Eliminate recurring fields
The Students table is fine. But there are so many errors still to be corrected
Step 7: Modify the design.
The first four database design errors can be fixed by creating a separate record
for each activity a student takes, instead of one record for all the activities a
student takes.
Eliminate the Activity 2 and Cost 2 fields. Adjustments to the table structure to
accommodate multiple records entry for each student are necessary. The
refinement is done on the key so that it consists of two fields, ID and Activity. As
each student can only take an activity once, this combination gives us a unique
key for each record.
The Activities table has now been simplified to: ID, Activity, Cost. Note how the
new structure lets students take any number of activities they are no longer
limited to two.
Step 8: Test sample data.
Student Table
Student
ID*
Activity Table
13
John Dube
Rudo Masuku
John Dube
Mark Ruvende
084
100
182
219
ID*
084
084
100
100
182
219
219
219
Figure 2
Activity
Swimming
Tennis
Squash
Swimming
Tennis
Golf
Swimming
Squash
Cost
$1700
$3600
$4000
$1700
$3600
$4700
$1500
$4000
14
The final design will thus contain three tables: the Students table (Student, ID), a
Participants table (ID, Activity), and a modified Activities table (Activity, Cost).
It can be observed that each non-key value depends on the whole key: the
student name is entirely dependent on the ID; the activity cost is entirely
dependent on the activity. The new Participants table essentially forms a union of
information drawn from the other two tables, and each of its fields is part of the
key. The tables are linked by key fields: the Students table: ID corresponds to the
Participants table: ID; the Activities table: Activity corresponds to the Participants
table: Activity.
Step 11: Test sample data.
Student Table
ID*
084
084
100
100
182
219
219
219
Table
Student*
John Dube
Rudo Masuku
John Dube
Mark Ruvende
ID*
084
100
182
219
Activity*
Swimming
Tennis
Squash
Swimming
Tennis
Golf
Swimming
Squash
Participant
Activity Table
Actibvity Cost
*
Golf
$4700
Chess
$5000
Squash
$4000
Swimming $1500
Tennis
$3600
Figure 3
Step 12: Analyse the results.
No redundant information.
No inconsistent data. Theres only one place where the user can enter the price
of each activity, so theres no chance of creating inconsistent data. Also, if there
is a fee rise, all that is necessary is to update the cost in one place.
No insertion anomalies. A new activity can be added to the Activities table without
a student signing up for it.
No deletion anomalies. If Mark Ruvende (number 219) leaves, details about
golfing activity retain.
It should be noted that in order to simplify the process and focus on the relational
aspects of designing the database structure, the students name in a single field.
15
This is not what normally happens, the name is divided into first name, surname
(and initials) fields. Similarly, other fields that you would normally store in a
student table, such as date of birth, address, parents names and so on were
excluded.
Although the ultimate design will depend on the complexity of data, the following
steps are important:
Break composite fields down into constituent parts. Example: Name becomes
surname and first name.
Create a key field, which uniquely identifies each record. or use a composite key.
Eliminate repeating groups of fields.
Eliminate record modification problems (such as redundant or inconsistent data)
and record deletion and addition problems by ensuring each non-key field
depends on the entire key.
Create a separate table for any information that is used in multiple records, and
then use a key to link these tables to one another.
SUMMARY ON CREATING A DATABASE
Gather the requirement and identify the entities
How the information is to be organised t is organized, stored, and used? How this
information could be organized better and used more easily throughout the
organization? What part of the current system are you going to get rid of and
what would you add? Involve as many users in this planning stage as possible.
They are the ones who will prosper or suffer because of the decisions you make
at this point.
Determine the relationships between each data element that you currently have
(entity-relationship diagram). The data don't necessarily have to be in a computer
for you to consider the impact. Determine which data elements work be together
and how you will organize them in tables. Break your groups of data into as small
a unit as possible (normalization). Even when you say it is as small as it can get,
go back again. Avoid redundancy between tables. Decide what the key identifier
will be for each record.
Give it your best shot in the beginning: it costs a lot of time, money, and
frustration to go back and make changes or corrections however it is better than
to live with a poorly designed database.
NB: It should be noted that you can use a commercial DBMS like Microsoft
Access but the output can still be a file system
Query
16
A view of data showing information from one or more tables. For instance, using
the sample database used when describing normalisation, a query could be
made to the Students database asking "Show the first and last names of the
students who take both Tennis and Golf and Dube as their surnames. Such a
query displays information from the Students table (firstname, lastname),
Participant Table, Activity Table.
SQL: Structured Query Language (pronounced sequel in the US; ess-queue-ell
elsewhere). A computer language designed to organise and simplify the process
of getting information out of a database in a usable form, and also used to
reorganise data within databases. SQL is most often used on larger databases
on minicomputers, mainframes and corporate servers.
The whole point of using a database is to turn data into information. Data are
facts that have no inherent meaning; information is data put into context to
convey meaning. Think of a student database containing information such as
student names, addresses, ID numbers and telephone numbers. Put a question
to the database such as What percentage of students does Computer Science
Fundamentals? The resulting answer is useful, meaningful information.
HIERARCHICAL DATABASES
The hierarchical data model presents data to users in a treelike structure.
Think of a mother and her children. A child only has one mother and inherits
some of her characteristics, such as eye colour or hair colour. A mother might
have one or more children to whom she passes some of her characteristics but
usually not exact ones. The child then goes on to develop its own characteristics
separate from the mother.
In a hierarchical database, characteristics from the parent are passed to the child
by a pointer just as a human mother will have a genetic connection to each
human child.
NETWORK DATABASE
A network data model is a variation of the hierarchical model. As with
hierarchical structures, each relationship in a network database must have a
pointer from all the parents to all the children and back.
These two types of databases, the hierarchical and the network, work well
together since they can easily pass data back and forth. But because these
database structures use pointers, which are actually additional data elements,
the size of the database can grow very quickly and cause maintenance and
operation problems.
17
18
User
Friendliness
Low
Low-moderate
Programming
Complexity
High
High
Relational
Lower but
improving
medium
High
High
Low
What you should remember is that none of these databases is very good if you
do not keep the end user in mind. If you're not careful, you'll wind up with lots of
information that no one can use.
There are three types of databases: hierarchical, network, and relational.
Relational databases are becoming the most popular of the three because they
are easier to work worth, easier to change, and can serve a wider range of needs
throughout the organization
8.4 DISCUSSION QUESTIONS:
Why do relational database management systems appear to be a better than a
hierarchical or network database management system?
What should managers focus on when building a database?
19