You are on page 1of 16

FILE STRUCTURES

MJ FOLK, B ZOELLICK, G RICCARDI


CHAPTER 5

Managing Files of Records


Record Access: Record Keys

When looking for an individual record, it is convenient to


identify the record with a key based on the records
content (e.g., the Ames record).
Key is an expression derived from one or more of the fields
within a record that can be used to locate that record.
Canonical form is a standard form for a key that can be
derived, by the application of well-defined rules.
Primary key is a key that uniquely identifies each record
and should be unchanging .
Records can also be searched based on a secondary key.
Those do not typically uniquely identify a record.
Record Access: Sequential Search

Evaluating Performance of Sequential Search.


Sequential search is a method of searching a file by
reading the file from the beginning and continuing until
the desired record has been found.
Sequential access means reading the file from the
beginning and continuing until you have read in
everything that you need.
Record Access: Sequential Search

Improving Sequential Search Performance with


Record Blocking.
Block is a collection of records stored as a physically
contiguous unit on secondary storage.
Record Access: Sequential Search

When Sequential Searching is Good


ASCII files in which you are searching for some pattern
(wc and grep);
Files with few record;
Files that hardly ever need to be searched (tape files);
and
Files in which you want all records with a certain
secondary key value, where large number of matches is
expected.
Unix Tools for Sequential Processing

Unix is an ASCII file with the new-line character as the


record delimiter and when possible, white space as
the field delimiter.
Sample Unix Tools
cat
wc (word count)
grep (generalized regular expression)
Record Access: Direct Access

Direct access is jumping to the exact location of a


record.
How do we know where the beginning of the required
record is?
It may be in an Index
We know the relative record number (RRN)
Record Access: Direct Access

RRN is an index giving the position of a record relative


to the beginning of its file.
Direct access to a fixed-length record is usually
accomplished by using its relative record number (RRN),
computing its byte offset and then seeking to the first
byte of the record.
RRN are not useful when working with variable length-
records: the access is still sequential. However, it is
useful with fixed-length record.
More Record Structures

Choosing a Record Structure and Record Length


within a fixed-length record:
Fixed-Length Fields in record
Varying Field boundaries within the fixed-length
record.
Header Records are often used at the beginning of
the file to hold some general info about a file to assist
in future use of the file.
File Access and File Organization

File organization depends on what use you want to


make of the file.
Since using a file implies accessing it, file access and
file organization are intimately linked.
Example: though using fixed-length records makes
direct access easier, if the documents have very
variable lengths, fixed-length records is not a good
solution: the application determines our choice of
both access and organization.
File Access and File Organization

File access method is used to locate information in a


file. Two way are sequential access and direct access.
File organization method is the combination of
conceptual and physical structures used to distinguish
one record from another and one field from another.
Beyond Record Structure

Abstract Data Models for File Access


Headers and Self-Describing File
Metadata
Color Raster Images
Mixing Object Types in One File
Representation-Independent File Access
Extensibility
Portability and Standardization

Portability is the characteristic of files that describes


how amenable they are to access on a variety of
different machines.

Factors Affecting Portability


Differences among Operating Systems
Differences among Languages
Differences in Machine Architectures
Portability and Standardization

Guidelines in Achieving Portability


Agree on a Standard Physical Record Format and
Stay with it
Agree on a Standard Binary Encoding for Data
Elements
Number and Text Conversion
File Structure Conversion
File System Differences
Unix and Portability
The End

Prepared by: Gemlyn S. Inocencio


S.Y. 2017 2018