Sie sind auf Seite 1von 49

1

Fundamental File Structure


Concepts & Managing Files of
Records

Outline I: Fundamental File
Structure

Concepts
2
Stream Files
Field Structures
Reading a Stream of Fields
Record Structures
Record Structures that use a length
indicator
Outline II: Managing Files of
Records
3
Record Access
More About Record Structures
File Access and File Organization
More Complex File Organization and Access
Portability and Standardization
Field and Record Organization:
Overview
4
The basic logical unit of data is the field
which contains a single data value.
Fields are organized into aggregates, either
as many copies of a single field (an array) or
as a list of different fields (a record).
When a record is stored in memory, we refer
to it as an object and refer to its fields as
members.
In this lecture, we will investigate the many
ways that objects can be represented as
records in files.
Stream Files
Mary Ames
123 Maple
Stillwater, OK 74075
Alan Mason
90 Eastgate
Ada, OK 74820
5
In Stream Files, the information is written as a
stream of bytes containing no added information:
AmesMary123 MapleStillwaterOK74075MasonAlan90
EastgateAdaOK74820
Problem: There is no way to get the information
back in the organized record format.
Field Structures
6
There are many ways of adding structure to
files to maintain the identity of fields:
Force the field into a predictable length
Begin each field with a length indicator
Place a delimiter at the end of each field
to separate it from the next field.
Use a keyword = value expression to
identify each field and its content.
Method1 :Fix the length of
Fields
7
Method1 :Fix the length of
Fields
8
Method 2:Begin each field with
a length indicator
9
Method 3: Separate the fields
with Delimiters
10
Method 4: Use a keyword =
value expression to identify
each field
11
Reading a Stream of Fields
A Program can
easily read a stream
of fields and output
===>
This time, we do
preserve the notion
of fields, but
something is
missing: Rather than
a stream of fields,
these should be two
records
Last Name: Ames
First Name: Mary
Address: 123 Maple
City: Stillwater
State: OK
Zip Code: 74075
Last Name: Mason
First Name: Alan
Address: 90 Eastgate
City: Ada
State: OK
Zip Code: 74820
12
Record Structure I
13
A record can be defined as a set of fields that
belong together when the file is viewed in
terms of a higher level of organization.
Like the notion of a field, a record is another
conceptual tool which needs not exist in the
file in any physical sense.
Yet, they are an important logical notion
included in the files structure.
Record Structures II
14
Methods for organizing the records of a file
include:
Requiring that the records be a predictable
number of bytes in length.
Requiring that the records be a predictable
number of fields in length.
Beginning each record with a length indicator
consisting of a count of the number of bytes that
the record contains.
Using a second file to keep track of the beginning
byte address for each record.
Placing a delimiter at the end of each record to
separate it from the next record.

Method 1: Make Records a
Predictable Number of Bytes
Fixed-Length Records

15
Method 2: Make Records a
Predictable Number of Fields
Specify that it will contain a fixed number of
fields.
Good way to organize the records in the name
and address file.
16
Method 3: Begin Each record
with a Length Indicator
Commonly used method for handling variable-
length records.
17
18
Method 4:Use an Index to keep
Track of Addresses



Method 5: Place a Delimiter at
the End of Each Record
19
Record Structures that Use a
Length Indicator
20
The notion of records that we implemented are
lacking something: none of the predictability in
the length of records
Implementation:
Writing the variable-length records to the file
Representing the record length
Reading the variable-length record from the file.
21
22
Using Classes to Manipulate
Buffers
Buffer Class for Delimited Text Fields
Extending Class Person with Buffer
Operations
Buffer Classes for Length-Based and Fixed-
length Fields
23
A class Hierarchy for Record
Buffer Objects
24
Record Access: Keys
25
When looking for an individual record, it is
convenient to identify the record with a key
based on the records content (e.g., the Ames
record).
Keys should uniquely define a record and be
unchanging.
Records can also be searched based on a
secondary key. Those do not typically
uniquely identify a record.
Sequential Search
26
Evaluating Performance of Sequential Search.
No. of Comparisons
Improving Sequential Search Performance
with Record Blocking.
Reading in a block of several records all at once
and processing it in memory.
When is Sequential Search Useful?
ASCII Files
Files with few records
Files- Hardly need to be searched
Records with a certain secondary key value.
Unix Tools for Sequential
Processing
Unix==> ASCII Files
New line character as record delimiter.
White space as field delimiter.
cat
wc
grep
27
cat
28
wc
Lines
Words
Counts chars
29
grep
Generalized Regular Expression
Searches through a file for a particular pattern.
Returns all the lines in the file containing the
pattern.



No. of lines containing Ada and the no. of
words and bytes in those line
30
Direct Access
31
Sequential search==> O(n)
Direct access ==> O(1)
Direct Access . . .
How do we know where the beginning of
the required record is?
It may be in an Index file
We know the relative record number
(RRN)
RRN are not useful when working with
variable length-records: the access is still
sequential.
With fixed-length records, however, they
are useful.

32
More about Record Structure
33
Choosing a Record Structure and Record
Length within a fixed-length record.
2 approaches:
Fixed-Length Fields in record (simple but
problematic).
Varying Field boundaries within the fixed-
length record.





34
Header Records are often used at the
beginning of the file to hold some general
info about a file to assist in future use of
the file.

35
File Access and File
Organization: A Summary
36
Variable length record
Fixed length record

Sequential access
Direct access
File Organization
File Access
File Access and File
Organization: A Summary
37
File organization depends on what use you want
to make of the file.
Since using a file implies accessing it, file
access and file organization are closely linked.
Example: though using fixed-length records
makes direct access easier, if the documents
have very variable lengths, fixed-length records
is not a good solution: the application
determines our choice of both access and
organization.
Beyond Record Structure
38
Abstract Data Models for File Access
Headers and Self-Describing File
Metadata
Color Raster Images
Mixing Object Types in One File
Representation-Independent File Access
Extensibility
Abstract Data Models for File
Access

Can computer process more than just field and
records?
Application oriented view of data rather than
medium oriented.
39
Headers and Self-Describing
File
Header record==>keep track of how many
records in a file.
If file header contain this sort of information ,
we say file is self-describing.
A name for each field.
A width of each field and
The number of fields per record.
40
Metadata
Data about data that describes property of
original data.
Suppose you are an astronomer interested in
studying images generated by telescope that
scan the sky and you want to design a file
structure for the digital representation of these
images.
You need information about each image:
Where in the sky the image is from?
When it was made?
What telescope was used?
What other images related and so on
41
Metadata
This kind of information is called metadata-
data that describes primary data in a file.
A community of user of a particular kind of
data agrees on standard format for holding
metadata.
Standard format FITS(Flexible Image
Transport System) has been developed by
International Astronomers Union for storing
Astronomical data
42
43
Color Raster Image
A modern computer is a much graphical device
as it is data processor.
Lets us examine one type of image, the color
raster image.
Color raster image is a rectangular array of
colored dots or pixels that are displayed on
screen.
A FITS(Flexible Image Transport System) is
the raster image in the sense that the numbers
that make up a FITS image can be converted
into colors and then displayed on screen.
44
Color Raster Image
Metadata with color raster image
The dimension of image (Pixel per Rows
Columns )
The number of bits used to describe each pixel.
1-bit displays 2 colors
2-bit displays 4 colors
8-bit displays 256 colors
A color lookup table indicates which color is to
be assigned to each pixel value in the image.
2-bit image 4 colors in color lookup table
8-bit image 256 colors in color lookup table



45
Mixing object type in one file
Keywords
Keyword=value format
Tags

46
47
Extensibility
To access a mixture of objects in a file, it must
have methods for reading and writing each
object.
Once we build into our s/w a mechanism for
choosing appropriate methods for a given type
of object, it easy to imagine extending, the
type of objects that our s/w can support.
48
Portability and Standardization
49
Factors Affecting Portability
Differences among Operating Systems
Differences among Languages
Differences in Machine Architectures
Achieving Portability
Agree on a Standard Physical Record Format and Stay
with it
Agree on a Standard Binary Encoding for Data Elements
ASCII & EBCDIC
Number and Text Conversion
File Structure Conversion
File System Differences
Unix and Portability

Das könnte Ihnen auch gefallen