Concepts 2 Stream Files Field Structures Reading a Stream of Fields Record Structures Record Structures that use a length indicator Outline II: Managing Files of Records 3 Record Access More About Record Structures File Access and File Organization More Complex File Organization and Access Portability and Standardization Field and Record Organization: Overview 4 The basic logical unit of data is the field which contains a single data value. Fields are organized into aggregates, either as many copies of a single field (an array) or as a list of different fields (a record). When a record is stored in memory, we refer to it as an object and refer to its fields as members. In this lecture, we will investigate the many ways that objects can be represented as records in files. Stream Files Mary Ames 123 Maple Stillwater, OK 74075 Alan Mason 90 Eastgate Ada, OK 74820 5 In Stream Files, the information is written as a stream of bytes containing no added information: AmesMary123 MapleStillwaterOK74075MasonAlan90 EastgateAdaOK74820 Problem: There is no way to get the information back in the organized record format. Field Structures 6 There are many ways of adding structure to files to maintain the identity of fields: Force the field into a predictable length Begin each field with a length indicator Place a delimiter at the end of each field to separate it from the next field. Use a keyword = value expression to identify each field and its content. Method1 :Fix the length of Fields 7 Method1 :Fix the length of Fields 8 Method 2:Begin each field with a length indicator 9 Method 3: Separate the fields with Delimiters 10 Method 4: Use a keyword = value expression to identify each field 11 Reading a Stream of Fields A Program can easily read a stream of fields and output ===> This time, we do preserve the notion of fields, but something is missing: Rather than a stream of fields, these should be two records Last Name: Ames First Name: Mary Address: 123 Maple City: Stillwater State: OK Zip Code: 74075 Last Name: Mason First Name: Alan Address: 90 Eastgate City: Ada State: OK Zip Code: 74820 12 Record Structure I 13 A record can be defined as a set of fields that belong together when the file is viewed in terms of a higher level of organization. Like the notion of a field, a record is another conceptual tool which needs not exist in the file in any physical sense. Yet, they are an important logical notion included in the files structure. Record Structures II 14 Methods for organizing the records of a file include: Requiring that the records be a predictable number of bytes in length. Requiring that the records be a predictable number of fields in length. Beginning each record with a length indicator consisting of a count of the number of bytes that the record contains. Using a second file to keep track of the beginning byte address for each record. Placing a delimiter at the end of each record to separate it from the next record.
Method 1: Make Records a Predictable Number of Bytes Fixed-Length Records
15 Method 2: Make Records a Predictable Number of Fields Specify that it will contain a fixed number of fields. Good way to organize the records in the name and address file. 16 Method 3: Begin Each record with a Length Indicator Commonly used method for handling variable- length records. 17 18 Method 4:Use an Index to keep Track of Addresses
Method 5: Place a Delimiter at the End of Each Record 19 Record Structures that Use a Length Indicator 20 The notion of records that we implemented are lacking something: none of the predictability in the length of records Implementation: Writing the variable-length records to the file Representing the record length Reading the variable-length record from the file. 21 22 Using Classes to Manipulate Buffers Buffer Class for Delimited Text Fields Extending Class Person with Buffer Operations Buffer Classes for Length-Based and Fixed- length Fields 23 A class Hierarchy for Record Buffer Objects 24 Record Access: Keys 25 When looking for an individual record, it is convenient to identify the record with a key based on the records content (e.g., the Ames record). Keys should uniquely define a record and be unchanging. Records can also be searched based on a secondary key. Those do not typically uniquely identify a record. Sequential Search 26 Evaluating Performance of Sequential Search. No. of Comparisons Improving Sequential Search Performance with Record Blocking. Reading in a block of several records all at once and processing it in memory. When is Sequential Search Useful? ASCII Files Files with few records Files- Hardly need to be searched Records with a certain secondary key value. Unix Tools for Sequential Processing Unix==> ASCII Files New line character as record delimiter. White space as field delimiter. cat wc grep 27 cat 28 wc Lines Words Counts chars 29 grep Generalized Regular Expression Searches through a file for a particular pattern. Returns all the lines in the file containing the pattern.
No. of lines containing Ada and the no. of words and bytes in those line 30 Direct Access 31 Sequential search==> O(n) Direct access ==> O(1) Direct Access . . . How do we know where the beginning of the required record is? It may be in an Index file We know the relative record number (RRN) RRN are not useful when working with variable length-records: the access is still sequential. With fixed-length records, however, they are useful.
32 More about Record Structure 33 Choosing a Record Structure and Record Length within a fixed-length record. 2 approaches: Fixed-Length Fields in record (simple but problematic). Varying Field boundaries within the fixed- length record.
34 Header Records are often used at the beginning of the file to hold some general info about a file to assist in future use of the file.
35 File Access and File Organization: A Summary 36 Variable length record Fixed length record
Sequential access Direct access File Organization File Access File Access and File Organization: A Summary 37 File organization depends on what use you want to make of the file. Since using a file implies accessing it, file access and file organization are closely linked. Example: though using fixed-length records makes direct access easier, if the documents have very variable lengths, fixed-length records is not a good solution: the application determines our choice of both access and organization. Beyond Record Structure 38 Abstract Data Models for File Access Headers and Self-Describing File Metadata Color Raster Images Mixing Object Types in One File Representation-Independent File Access Extensibility Abstract Data Models for File Access
Can computer process more than just field and records? Application oriented view of data rather than medium oriented. 39 Headers and Self-Describing File Header record==>keep track of how many records in a file. If file header contain this sort of information , we say file is self-describing. A name for each field. A width of each field and The number of fields per record. 40 Metadata Data about data that describes property of original data. Suppose you are an astronomer interested in studying images generated by telescope that scan the sky and you want to design a file structure for the digital representation of these images. You need information about each image: Where in the sky the image is from? When it was made? What telescope was used? What other images related and so on 41 Metadata This kind of information is called metadata- data that describes primary data in a file. A community of user of a particular kind of data agrees on standard format for holding metadata. Standard format FITS(Flexible Image Transport System) has been developed by International Astronomers Union for storing Astronomical data 42 43 Color Raster Image A modern computer is a much graphical device as it is data processor. Lets us examine one type of image, the color raster image. Color raster image is a rectangular array of colored dots or pixels that are displayed on screen. A FITS(Flexible Image Transport System) is the raster image in the sense that the numbers that make up a FITS image can be converted into colors and then displayed on screen. 44 Color Raster Image Metadata with color raster image The dimension of image (Pixel per Rows Columns ) The number of bits used to describe each pixel. 1-bit displays 2 colors 2-bit displays 4 colors 8-bit displays 256 colors A color lookup table indicates which color is to be assigned to each pixel value in the image. 2-bit image 4 colors in color lookup table 8-bit image 256 colors in color lookup table
45 Mixing object type in one file Keywords Keyword=value format Tags
46 47 Extensibility To access a mixture of objects in a file, it must have methods for reading and writing each object. Once we build into our s/w a mechanism for choosing appropriate methods for a given type of object, it easy to imagine extending, the type of objects that our s/w can support. 48 Portability and Standardization 49 Factors Affecting Portability Differences among Operating Systems Differences among Languages Differences in Machine Architectures Achieving Portability Agree on a Standard Physical Record Format and Stay with it Agree on a Standard Binary Encoding for Data Elements ASCII & EBCDIC Number and Text Conversion File Structure Conversion File System Differences Unix and Portability