Sie sind auf Seite 1von 67

What is Data and Information?

Data is the term, that may be new to beginners, but it is very


interesting and simple to understand. It can be anything like
name of a person or a place or a number etc. Data is the name
given to basic facts and entities such as names and numbers.
The main examples of data are weights, prices, costs, numbers
of items sold, employee names, product names, addresses, tax
codes, registration marks etc.
Information: Information is data that has been converted
into a more useful or intelligible form. It helps human beings
in their decision making process. Examples are: Time Table,
Merit List, Report card, Headed tables, printed documents,
pay slips, receipts, reports etc. The information is obtained by
assembling items of data into a meaningful form.

For example, marks obtained by students and their roll


numbers form data, the report card/sheet is the information.
Difference between Data and
Information
Database
The related information when placed is an organized form makes a
database. The organization of data/information is necessary because
unorganized information has no meaning.
In dictionary, the words are arranged in alphabetic order along with
their meanings.
There are so many operations like:
♦ To add new information (e.g. to add the address of a new friend in
your address book)
To view or retrieve the stored information (e.g. you have to find
the address of one of your old friends)
♦ To modify or edit the existing information (e.g. your friend has
shifted to a new place so his address would get changed)
♦ To remove or delete the unwanted information (e.g. your friend
has changed his/her mobile number, so his/her mobile number would
have to be removed from list)
♦ Arranging the information in a desired order etc.
Manual database and its
problems
Consider an example of accounts department of an organization. To
make the salary calculations of the employees every month they are to
keep the record of every employee and do a number of calculations such
as addition of allowances like DA, HRA to the basic salary and to make
several deductions as loan recoveries, income tax and insurance etc and
at the end, to make the pay slips of the net pay. This whole procedure is
repeated every month and is very tedious and laborious job. It’s a mere
calculation job and does not require any logic or intelligence. So to
waste the skills and intelligence of human beings on such repetitive
calculations is not a wise decision.
Database and Computers
♦ Computer has a large storage capacity. It can store thousands of
records at a time.
♦ It has high speed, within no time it searches any desired
information, arrange the data in alphabetical order, do calculations on
the data and make repetitions and so on.
♦ Computer is more accurate.
♦ Data in computers can be stored in the form of a file, records and
fields.
♦ There are two approaches for storing data in computers such as
File based approach and Database approach.
File Based Approach
File Based system: File-based systems were an early attempt to
computerize the manual filing system that we are all familiar with.

A file system is a method for storing and organizing computer files and
the data they contain to make it easy to find and access them.

File systems may use a storage device such as a hard disk or CD-ROM
and involve maintaining the physical location of the files
The manual filing system works well when the number of items to be
stored is small. It even works quite adequately when there are large
numbers of items and we have only to store and retrieve them. However,
the manual filing system breaks down when we have to cross-reference
or process the information in the files. For example, a typical real estate
agent’s office might have a separate file for each property for sale or
rent, each potential buyer and renter, and each member of staff.
.
Each file called a flat file, contained and processed information for one
specific function, such as accounting or inventory. Programmers used
programming languages such as COBOL, C++ to write applications
that directly accessed flat files to perform data management services
and provide information for users.
In addition, early programmers focused on physical implementation and
access procedures when designing a database. These physical procedures
were written into database application; therefore, physical changes
resulted in intensive rework on the part of the programmer. As systems
became more complex, file processing systems offered little flexibility,
presented many limitations, and were difficult to maintain.
Limitations of the File-Based Approach

There are following problems associated with the File Based Approach:
Separated and Isolated Data
Duplication of data
Data Dependence
In file processing systems, files and records were described by specific
physical formats that were coded into the application program by
programmers. If the format of a certain record was changed, the code in
each file containing that format must be updated. Furthermore,
instructions for data storage and access were written into the
application’s code.
Therefore, changes in storage structure or access methods could greatly
affect the processing or results of an application.
Difficulty in representing data from the user’s view

Data Inflexibility

Incompatible file formats


Database Approach
In order to remove all the above limitations of the File Based Approach,
a new approach was required that must be more effective known as
Database approach.
A database is a computer based record keeping system whose over all
purpose is to record and maintain information. The database is a single,
large repository of data, which can be used simultaneously by many
departments and users.

.
The description of the data is known as the Data Dictionary or Meta Data
(the ‘data about data’). It is the self-describing nature of a database that
provides program-data independence.
A database implies separation of physical storage from use of the data by
an application program to achieve program/data independence.
Using a database system, the user or programmer or application
specialist need not know the details of how the data are stored and such
details are “transparent to the user”. Changes (or updating) can be made
to data without affecting other components of the system. These changes
include, for example, change of data format or file structure or relocation
from one device to another.
Characteristics of data in a database
The data in a database should have the following features:
Shared:
Validity/Integrity/Correctness:
Security
Consistency: Whenever more than one data element in a database
represents related real world values, the values should be consistent with
respect to the relationship.
Non-redundancy
Independence: Data at different levels should be independent of each
other so that the changes in one level should not affect the other levels.

To create, manage and manipulate data in databases, a management


system known as database management system was developed.
The Database Management System (DBMS)
DBMS A database management system is the software system
that allows users to define, create and maintain a database and provides
controlled access to the data.

A database management system (DBMS) is basically a collection of


programs that enables users to store, modify, and extract information
from a database as per the requirements. DBMS is an intermediate layer
between programs and the data. Programs access the DBMS, which then
accesses the data. There are different types of DBMS ranging from
small systems that run on personal computers to huge systems that run
on mainframes. The following are main examples of database
applications:
Computerized library systems
♦ Automated teller machines
♦ Flight reservation systems
♦ Computerized parts inventory systems

Commercially available Database management systems in the market


are dbase, Foxpro, IMS and Oracle.

These systems allow users to create, update, and extract information


from their databases. Compared to a manual filing system, the biggest
advantages to a computerized database system are speed, accuracy, and
accessibility.
DBMS provides the following languages to provide the facilities to its
users:
Data Definition Language (DDL):
It is a language that allows the users to define data and their relationship
to other types of data. It is mainly used to create files, databases, data
dictionary and tables within databases.

It is also used to specify the structure of each table, set of associated


values with each attribute, integrity constraints, security and
authorization information for each table and physical storage structure of
each table on disk.
Data Manipulation Language (DML)
It is a language that provides a set of operations to support the basic data
manipulation operations on the data held in the databases. It allows users
to insert, update, delete and retrieve data from the database. The part of
DML that involves data retrieval is called a query language.
DML can be distinguished between two types: procedural and non-
procedural.
The procedural language specify how the output of a DML statement
must be obtained, while the non-procedural DMLs describe only what
output is to be obtained. Typically procedural language treats records
individually, while non-procedural languages operate on sets of records.
Components of the DBMS Environment
Hardware
♦ Software
♦ Data
♦ Users
Procedures
Hardware

The hardware is the actual computer system used for keeping and
accessing the database. Conventional DBMS hardware consists of
secondary storage devices, usually hard disks, on which the database
physically resides, together with the associated
Software
The software is the actual DBMS. Between the physical database itself
(i.e. the data as actually stored) and the users of the system is a layer of
software, usually called the Database Management System or DBMS.
All requests from users for access to the database are handled by the
DBMS.
One general function provided by the DBMS is thus the shielding of
database users from complex hardware-level detail.
Data
It is the most important component of DBMS environment from the end
users point of view. As shown in above figure 1.2 observe that data acts
as a bridge between the machine components and the user components.
The database contains both the operational data and the meta-data, the
‘data about data’.
The database should contain all the data needed by the organization. One
of the major features of databases is that the actual data are separated
from the programs that use the data. A database should always be
designed, built and populated for a particular audience and for a specific
purpose.
Users
There are a number of users who can access or retrieve data on demand
using the applications and interfaces provided by the DBMS. Each type
of user needs different software capabilities. The users of a database
system can be classified in the following groups, depending on their
degrees of expertise or the mode of their interactions with the DBMS.
The users can be:

♦ Naive Users
♦ Online Users
♦ Application Programmers
♦ Sophisticated Users
♦ DBA
Naive Users

Naive Users are those users who need not be aware of the presence of the
database system or any other system supporting their usage. Naive users
are end users of the database who work through a menu driven
application program where the type and range of response is always
indicated to the user.
A user of an Automatic Teller Machine (ATM) falls in this category. The
user is instructed through each step of a transaction. He or she then
responds by pressing a coded key or entering a numeric value.

The operations that can be performed by naïve users are very limited and
affect only a precise portion of the database. For example, in the case of
the user of the Automatic Teller Machine, user’s action affects only one
or more of his/her own accounts.
Online Users
Online users are those who may communicate with the database directly
via an online terminal or indirectly via a user interface and application
program. These users are aware of the presence of the database system
and may have acquired a certain amount of expertise with in the limited
interaction permitted with a database.

Application Programmers
Professional programmers are those who are responsible for developing
application programs or user interface. The application programs could
be written using general purpose programming language or the
commands available to manipulate a database.
Sophisticated Users
Such users interact with the system without writing programs. Instead,
they form their requests in database query language.

Specialized Users

Such users are those who write specialized database application that do
not fit into the fractional data-processing framework. For example:
Computer-aided design systems, knowledge base and expert system,
systems that store data with complex data types (for example, graphics
data and audio data).
Database Administrator
The database administrator (DBA) is the person or group in charge for
implementing the database system within an organization. The DBA
has all the system privileges allowed by the DBMS and can assign
(grant) and remove (revoke) levels of access (privileges) to and form
other users. DBA is also responsible for the evaluation, selection and
implementation of DBMS package.
Procedures
Procedures refer to the instructions and rules that govern the design and
use of the database. The users of the system and the staff that manage the
database require documented procedures on how to use or run the
system.
These may consist of instructions on how to:
♦ Log on to the DBMS.
♦ Use a particular DBMS facility or application program.
♦ Start and stop the DBMS.
♦ Make backup copies of the database.
♦ Handle hardware or software failures.
♦ Change the structure of a table, reorganize the database across
multiple disks, improve performance, or archive data to secondary
storage.
Advantages of DBMS
Controlling Redundancy
Integrity can be enforced
Integrity of data means that data in database is always accurate, such
that incorrect information cannot be stored in database. In order to
maintain the integrity of data, some integrity constraints are enforced on
the database. A DBMS should provide capabilities for defining and
enforcing the constraints.
Inconsistency can be avoided
When the same data is duplicated and changes are made at one site,
which is not propagated to the other site, it gives rise to inconsistency
and the two entries regarding the same data will not agree. At such times
the data is said to be inconsistent. So if the redundancy is removed
chances of having inconsistent data is also removed.
Let us again consider the college system and suppose that in case of
General_Office file it is indicated that Roll_Number 5 lives in Amritsar
but in library file it is indicated that Roll_Number 5 lives in Jalandhar.
Then this is a state at which the two entries of the same object do not
agree with each other (that is one is updated and other is not). At such
time the database is said to be inconsistent.
In case of DBMS, Roll number and address occurs together only single
time in General_Office table. So, it needs single updation and then all
other application retrieve the address information from General_Office
which is updated so, all application will get the current and latest
information by providing single update operation and this single update
operation is propagated to the whole database or all other application
automatically, this property is called as Propagation of Update.
Data can be shared

Providing Backup and Recovery

Standards can be enforced


Restricting unauthorized access
When multiple users share a database, it is likely that some users will
not be authorized to access all information in the database. For example
account office data is often considered confidential, and hence only
authorized persons are allowed to access such data. In addition, some
users may be permitted only to retrieve data, whereas other are allowed
both to retrieve and to update. Hence the type of access operation
retrieval or update must also be controlled. Typically, users or user
groups are given account numbers protected by passwords, which they
can use to gain access to the database. A DBMS should provide a
security and authorization subsystem, which the DBA uses to create
accounts and to specify account restrictions. The DBMS should then
enforce these restrictions automatically.
Note: Without such checks the security of the data may actually be more
at risk in database system than in a traditional file system.
Solving enterprise requirement than individual requirement
Since many types of users with varying level of technical knowledge
use a database, a DBMS should provide a variety of user interface. The
overall requirements of the enterprise are more important than the
individual user requirements. So the DBA can structure the database
system to provide an overall service that is “best for the enterprise”.
For example: A representation can be chosen for the data in storage that
gives fast access for the most important application at the cost of poor
performance in some other application. But the file system favors the
individual requirements than the enterprise requirements.
Disadvantages of DBMS

Complexity
Size
Performance
Typically, a File Based system is written for a specific application, such
as invoicing. As result, performance is generally very good. However,
the DBMS is written to be more general, to cater for many applications
rather than just one. The effect is that some applications may not run as
fast as they used to.
Higher impact of a failure
The centralization of resources increases the vulnerability of the system.
Since all users and applications rely on the availability of the DBMS, the
failure of any component can bring operations to a halt.
Cost of DBMS
The cost of DBMS varies significantly, depending on the environment
and functionality provided. There is also the recurrent annual
maintenance cost.

Additional Hardware costs


The disk storage requirements for the DBMS and the database may
necessitate the purchase of additional storage space. Furthermore, to
achieve the required performance it may be necessary to purchase a
larger machine, perhaps even a machine dedicated to running the DBMS.
The procurement of additional hardware results in further expenditure.
Cost of Conversion
In some situations, the cost of the DBMS and extra hardware may be
insignificant compared with the cost of converting existing applications
to run on the new DBMS and hardware. This cost also includes the cost
of training staff to use these new systems and possibly the employment
of specialist staff to help with conversion and running of the system.
This cost is one of the main reasons why some organizations feel tied to
their current systems and cannot switch to modern database technology.
When not to Use a DBMS
In spite of the advantages of using a DBMS, there are a few situations in
which such a system may involve unnecessary overhead costs, as that
would not be incurred in traditional file processing.
The overhead costs of using a DBMS are due to the following:
♦ High initial investment in hardware, software, and training
♦ Generality that a DBMS provides for defining and processing data.
♦ Overhead for providing security, concurrency control, recovery,
and integrity functions.
Additional problems may arise if the database designers and DBA do
not properly design the database or if the database systems applications
are not implemented properly.
Comparison of File Management System with Database
Management System
Three Level Architecture

An early proposal for a standard terminology and general


architecture for database systems was produced in 1971 by the
DBTG (Data Base Task Group) appointed by the Conference on
Data Systems and Languages (CODASYL, 1971). The DBTG
recognized the need for a two level approach with a system view
called the schema and user views called subschema. The American
National Standards Institute (ANSI) recognized the need for a three
level approach with a system catalog.
There are following three levels or layers of DBMS architecture:

♦ External Level

♦ Conceptual Level

♦ Internal Level
Objective of the Three Level Architecture

♦ Each user should be able to access the same data, but have a
different customized view of the data.
♦ User’s interaction with the database should be independent of
storage considerations.
♦ The Database Administrator (DBA) should be able to change the
database storage structures without affecting the user’s views.

♦ The internal structure of the database should be unaffected by


changes to the physical aspects of storage, such as the changeover to a
new storage device.

♦ The DBA should be able to change the conceptual structure of the


database without affecting all users.
External Level or View level

It is the users’ view of the database. This level describes that part of the
database that is relevant to each user. External level is the one, which is
closest to the end users. This level deals with the way in which
individual users view data. Individual users are given different views
according to the user’s requirement.
External level is also known as the view level. In addition different
views may have different representations of the same data. For example,
one user may view dates in the form (day, month, year), while another
may view dates as (year, month, day).
Conceptual Level or Logical level
It is the community view of the database. This level describes what data
is stored in the database and the relationships among the data. The
middle level in the three level architecture is the conceptual level.
This level contains the logical structure of the entire database as seen by
the DBA. It is a complete view of the data requirements of the
organization that is independent of any storage considerations. The
conceptual level represents:
♦ All entities, their attributes, and their relationships;
♦ The constraints on the data;
♦ Security and integrity information.
.
Internal Level or Storage level
It is the physical representation of the database on the computer. This
level describes how the data is stored in the database. The internal level
is the one that concerns the way the data are physically stored on the
hardware.
The internal level covers the physical implementation of the database to
achieve optimal runtime performance and storage space utilization. It
covers the data structures and file organizations used to store data on
storage devices. It interfaces with the operating system access methods
to place the data on the storage devices, build the indexes, retrieve the
data, and so on.
The internal level is concerned with such things as:
♦ Storage space allocation for data and indexes;
♦ Record descriptions for storage (with stored sizes for data items);
♦ Record placement;
♦ Data compression and data encryption techniques.

There will be only one conceptual view, consisting of the abstract


representation of the database in its entirely. Similarly there will be only
one internal or physical view, representing the total database, as it is
physically stored.
Schema
The overall description of the database is called the Database Schema.
There are three different types of schema in the database corresponding
to each data view of database. In other words, the data views at each of
three levels are described by schema.
A schema is defined as an outline or a plan that describes the
records and relationships existing at the particular level. The External
view is described by means of a schema called external schema that
correspond to different views of the data. Similarly the Conceptual view
is defined by conceptual schema, which describes all the entities,
attributes, and relationship together with integrity constraints. Internal
View is defined by internal schema, which is a complete description of
the internal model, containing definition of stored records, the methods
of representation, the data fields, and the indexes used.
There is only one conceptual schema and one internal schema per
database. The schema also describes the way in which data elements at
one level can be mapped to the corresponding data elements in the next
level.
Thus, we can say that schema establishes correspondence between the
records and relationships in the two levels. In a relational database, the
schema defines the tables, the fields in each table, and the relationships
between fields and tables. Schema are generally stored in a data
dictionary.
The schema is specified during the database design process and is not
expected to change frequently. However the actual data in the database
may change frequently; The data in the database at any particular point
in time is called a database instance. Therefore, many database instances
can correspond to the same database schema. The schema is sometimes
called the intension of the database, while an instance is called an
extension (or state) of the database.
Mapping between Views
External/Conceptual Mapping: Each external schema is related to the
conceptual schema by the external/conceptual mapping. A mapping
between the external and conceptual views gives the correspondence
among the records and the relationships of the external and conceptual
views.. There is a mapping from a particular logical record in the
external view to one (or more) conceptual record(s) in the conceptual
view.
Names of the fields and records, for instance, may be different. A
number of conceptual fields can be combined into a single external field,
for example, Last_Name and First_Name at the conceptual level but
Name at the external level. A given external record could be derived
from a number of conceptual records.
Conceptual/Internal Mapping: Conceptual schema is related to the
internal schema by the conceptual/internal mapping. This enables the
DBMS to find the actual record or combination of records in physical
storage that constitute a logical record in conceptual schema. Mapping
between the conceptual and the internal levels specifies the method of
deriving the conceptual record from the physical database.

Differences between Conceptual/Internal Views


♦ Representation of numeric values could be different in the two
views.
♦ Representation of string data can be considered by the two views to
be coded differently.

Data Independence-Achievement of Layered Architecture of DBMS
There are two kinds of data independence:
♦ Logical data independence
♦ Physical data independence
Logical data independence
Logical data independence indicates that the conceptual schema can be
changed without affecting the existing external schemas. The change
would be absorbed by the mapping between the external and conceptual
levels.
Physical data independence
Physical data independence indicates that the physical storage structures
or devices could be changed without affecting conceptual schema. The
change would be absorbed by the mapping between the conceptual and
internal levels. Physical data independence is achieved by the presence
of the internal level of the database and the mapping or transformation
from the conceptual level of the database to the internal level.
If there is a need to change the file organization or the type of physical
device used as a result of growth in the database or new technology, a
change is required in the conceptual/internal mapping between the
conceptual and internal levels.
The physical data independence criterion requires that the conceptual
level does not specify storage structures or the access methods (indexing,
hashing etc.) used to retrieve the data from the physical storage medium.
The Logical data independence is difficult to achieve than physical data
independence as it requires the flexibility in the design of database and
programmer has to foresee the future requirements or modifications in
the design.
The Procedure for Database Access
A user’s request for data is received by the data manager, which
determines the physical record required. The decision as to which
physical record is needed may require some preliminary consultation of
the database and/or the data dictionary prior to the access of the actual
data itself.

The data manager sends the request for a specific physical record to the
file manager. The file manager decides which physical block of
secondary storage devices contains the required record and sends the
request for the appropriate block to the disk manager. A block is a unit of
physical input/output operations between primary and secondary storage.
The disk manager retrieves the block and sends it to the file manager,
which sends the required record to the data manager.

Das könnte Ihnen auch gefallen