Sie sind auf Seite 1von 31

File Design in

Information Systems
Information Processing and
File systems
Objectives
In doing so we will cover two broad areas.
Our major objective in this course is to present concepts
about file and information processing.
The first is the application of data structure techniques
to secondary memory.
The second is the evaluation of alternate solutions to
the problem of mapping user views of data onto
physical storage.
We will examine this problem in the file environment
and the database environment.
File Systems
Files are used pervasively in both computer system
programming and application programming.
file creation, deletion, and maintenance are major tasks
for operating systems.
Application programs use files to store the source
program, the executable program, and data that the
program may require.
Databases, on the other hand, are used primarily by
application programs to maintain large amounts of
information.
Categories of Data Processing Software
In this section we categorize data processing software
according to the ways in which the software accesses
data on secondary storage devices.
Computer hardware is often classified as belonging to a
particular generation.
Software developments are not so easily classified.
In our scheme we categorize software broadly
according to its independence from details of physical
data storage.
Categories of Data Processing Software
Primitive software has embedded in it much device-
specific information.
More sophisticated software, on the other hand, allows
the user to write programs that manipulate abstract
data objects.
The mapping of the abstract objects onto actual storage
devices is carried out in a manner transparent to the
user.
As the level of sophistication increases, there is a
migration of functions from application programs into
system software.
Understand the Trend of Software
Design in Modern Information Systems
In our categorization we distinguish between the logical
organization and the physical organization of data.
The logical organization is the users view of data. It is
concerned with data elements and their relationships
The physical organization is the way in which data is
stored in storage devices.
By studying the categories of data processing software,
we can have a better understand on the trends of
software design in modern information systems.
Category 1: No Data Independence
A category 1 program typically accesses data by
specifying absolute store addresses.
If the referenced data changes location, then the
program has to be modified.
Examples of category 1 programs are found at low
levels in operating systems.
Category 1 programs are characterized by their intimate
involvement with details of secondary storage, typically
direct-access storage devices (DASD).
Category 1: No Data Independence
A typical operation for a category 1 program is to
initiate the transfer of a particular sector of a particular
disc into main memory.
Category 2: Physical Data Independence
A category 1 program typically accesses data by
specifying absolute store addresses.
In category 2 software there is physical data
independence but no logical independence.
Data processing programs in category 2 typically run
under an operating system.
Category 2: Physical Data
Independence
The operating system is interposed between application
programs and storage devices
Category 2: Physical Data Independence
The operating system takes care of low-level I/O
functions performed by a category 1 program.
Typically, areas of storage can be referred to by name.
The operating system maps the file name onto an
address. (File-Name -> File Location)
However, to operate successfully on a file, a category 2
program must know how the data in the file is
organized. (File Format, File Structure)
For example, it must know about record lengths and the
order of fields. Thus, although the program need not be
aware of physical organization, it must know about the
logical organization. This knowledge is typically
compiled as part of the program.
Category 2: Physical Data Independence
A typical operation for a category 2 program is to
request the operating system to read record number r
from file f.
Using directories, the operating system can compute
the address of the storage area holding the required
information.
Category 4: Logical and Physical
Data Independence (1)
In category 4 software there is both physical
independence and logical independence.
In a category 4 data processing system, data is stored
together with a description of its format.
The data definition (the names of elements and their
relationships to other elements) is no longer part of
application programs.
The data definition (the names of elements and their
relationships to other elements) is no longer part of
application programs.
Category 4: Logical and Physical
Data Independence (2)
In category 4, data is fully integrated into databases,
and the data definition is available to users of the
database.
A database is a fully integrated collection of files
brought together to serve multiple applications.
While the database is typically managed by a database
administrator, ownership of the data is often communal.
Data is accessed via a database management system
(DBMS), which allows the same data to be viewed in a
number of different ways.
Category 4: Logical and Physical
Data Independence (3)
A typical operation for a category 4 program is to
request the reading of the next record of a particular
type.
Category 4: Logical and Physical
Data Independence (4)
The DBMS identifies the logical components making up
the record and requests the operating system to
retrieve the appropriate information from disc.
Category 4: Logical and Physical
Data Independence (5)
The record is assembled and returned to the application
program. A DBMS provides logical independence.
Record format independence is achieved.
Category 3: Partial Logical Data
Independence
In category 3 software there is physical independence
and partial logical independence.
A category 3 program can operate on a variety of files
without being modified,
Consider a sort utility, To be useful such a utility must
be able to operate on data in a number of different
formats.
The distinction between categories 2 and 3 is whether
the information required to access a particular file is
given at compile time or at run time,
Category 5: Geographical Data
Independence
In a category 5 system, there is geographic
independence. A data-base may be distributed over a
number of locations.
Category 5: ODBC (1)
Open DataBase Connectivity: An ODBC Data source
stores information about how to connect to the indicated
data provider
Category 5: ODBC (2)
Creating an ODBC Data source name let user access all
kind of database providers with the same program.
Category 5: ODBC (3)
The ODBC locates the data source automatically, and
maps the data definition to the right database types.
Category 5: ODBC (4)
The ODBC also provides an extra layer of data security.
Category 5: ODBC (5)
The ODBC hides the network layer from users.
Summary of the Categorization
The following table summarizes some of the properties
of the five categories of data processing software.
File Overview and Terminology
A file is typically a collection of records of the same
type.
Although files can exist in both primary and secondary
memory, they are almost always held in secondary
memory: disks
Data files have two salient characteristics.
First, they are often voluminous; the capacity of a data
file can easily exceed the capacity of primary memory.
Second, data files typically have long lifetimes. A data
file often exists before and after the execution of an
application program that accesses it.
Three issues within the file processing
Three issues within the file processing environment are
important:
1. File access techniques,
2. The inter-dependence of application programs and
data files,
3. Data redundancy across data files
File Access Techniques
Solutions that are computationally based. Hashing is the
major computational technique.
The use of additional data structures as techniques for
file access:
Typically, some information about the required record
has to be known before it can be accessed. That
information (a key) is used to facilitate and verify the
file access process:
Static and Dynamic Indexing.
File and Program Inter-dependence
A second problem with file processing is the lack of
logical and physical data independence between the file
and the application program.
Logical data dependence means that an application
program must know the logical structure of records in a
file (their format) and how they are organized within
the file in order to access and modify them.
Physical data dependence implies that the application
program needs to know where a file is located and how
the file is mapped onto secondary storage.
Data Redundancy
A third problem that plagues file processing is
maintaining data consistency with redundant instances
of the same data,
With many files and with many application programs
processing the same files, it is easy for data to become
duplicated and for multiple copies of a data item to
have inconsistent values.
This occurs when different application programs share
part of the data stored in a file.
Thus there is usually a trade-off between optimizing
application processing and maintaining file consistency.
This problem can be solved by DBMS.

Das könnte Ihnen auch gefallen