Sie sind auf Seite 1von 10

www.mantul.

com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

Course Code Course Name Assignment No Maximum Marks Last Date of Submission : : : :

CS 06

Introduction to Database Management System BCA (4) 06/Assignment 2012 100 30th April, 2012/ 30th October, 2012

Q1.) What is file organization? List all the file organization techniques. Make a detailed comparison among all the file organization techniques.

Ans:- File organization refers to the relationship of the key of the record to the physical location of that record in the computer file. File organization may be either physical file or a logical file. A physical file is a physical unit, such as magnetic tape or a disk. A logical file on the other hand is a complete set of records for a specific application or purpose. A logical file may occupy a part of physical file or may extend over more than one physical file. The objectives of computer based file organization: Ease of file creation and maintenance Efficient means of storing and retrieving information.

Techniques of File Organization


The three techniques of file organization are:

Heap (unordered) Sorted Sequential (SAM) Line Sequential (LSAM) Indexed Sequential (ISAM) Hashed or Direct In addition to the three techniques, there are four methods of organizing files. They are sequential, line-sequential, indexed-sequential, inverted list and direct or hashed access organization.

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

Sequential Organization

A sequential file contains records organized in the order they were entered. The order of the records is fixed. The records are stored and sorted in physical, contiguous blocks within each block the records are in sequence.

Records in these files can only be read or written sequentially.

Once stored in the file, the record cannot be made shorter, or longer, or deleted. However, the record can be updated if the length does not change. (This is done by replacing the records by creating a new file.) New records will always appear at the end of the file.

If the order of the records in a file is not important, sequential organization will suffice, no matter how many records you may have. Sequential output is also useful for report printing or sequential reads which some programs prefer to do.

Line-Sequential Organization

Line-sequential files are like sequential files, except that the records can contain only characters as data. Line-sequential files are maintained by the native byte stream files of the operating system.

In the COBOL environment, line-sequential files that are created with WRITE statements with the ADVANCING phrase can be directed to a printer as well as to a disk.

Indexed-Sequential Organization

Key searches are improved by this system too. The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key pointer. This pointer is the position in the data file of the record with the given key. A subset of the records, which are evenly spaced along the data file, is indexed, in order to mark intervals of data records.

This is how a key search is performed: the search key is compared with the index keys to find the highest index key coming in front of the search key, while a linear search is performed from the record that the index key points to, until the search key is matched or until the record pointed to by the next index entry is reached. Regardless of double file access (index + data) required by this sort of search, the access time reduction is significant compared with sequential file searches.

Let's examine, for sake of example, a simple linear search on a 1,000 record sequentially organized file. An average of 500 key comparisons are needed (and this assumes the search keys are uniformly distributed among the data keys). However, using an index evenly spaced with 100

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

entries, the total number of comparisons is reduced to 50 in the index file plus 50 in the data file: a five to one reduction in the operations count!

Hierarchical extension of this scheme is possible since an index is a sequential file in itself, capable of indexing in turn by another second-level index, and so forth and so on. And the exploit of the hierarchical decomposition of the searches more and more, to decrease the access time will pay increasing dividends in the reduction of processing time. There is however a point when this advantage starts to be reduced by the increased cost of storage and this in turn will increase the index access time.

Hardware for Index-Sequential Organization is usually Disk-based, rather than tape. Records are physically ordered by primary key. And the index gives the physical location of each record. Records can be accessed sequentially or directly, via the index. The index is stored in a file and read into memory at the point when the file is opened. Also, indexes must be maintained.

Life sequential organization the data is stored in physical contiguous box. How ever the difference is in the use of indexes. There are three areas in the disc storage:

Primary Area:-Contains file records stored by key or ID numbers. Overflow Area:-Contains records area that cannot be placed in primary area. Index Area:-It contains keys of records and there locations on the disc.

Inverted List

In file organization, this is a file that is indexed on many of the attributes of the data itself. The inverted list method has a single index for each key type. The records are not necessarily stored in a sequence. They are placed in the data storage area, but indexes are updated for the record keys and location.

Here's an example, in a company file, an index could be maintained for all products, another one might be maintained for product types. Thus, it is faster to search the indexes than every record. These types of file are also known as "inverted indexes." Nevertheless, inverted list files use more media space and the storage devices get full quickly with this type of organization. The benefits are apparent immediately because searching is fast. However, updating is much slower.

Content-based queries in text retrieval systems use inverted indexes as their preferred mechanism. Data items in these systems are usually stored compressed which would normally slow the retrieval process, but the compression algorithm will be chosen to support this technique.

When querying a file there are certain circumstances when the query is designed to be modal which means that rules are set which require that different information be held in the index. Here's an example of this modality: when phrase querying is undertaken, the particular algorithm requires that offsets to word classifications are held in addition to document numbers.

Direct or Hashed Access

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

With direct or hashed access a portion of disk space is reserved and a hashing algorithm computes the record address. So there is additional space required for this kind of file in the store. Records are placed randomly throughout the file. Records are accessed by addresses that specify their disc location. Also, this type of file organization requires a disk storage rather than tape. It has an excellent search retrieval performance, but care must be taken to maintain the indexes. If the indexes become corrupt, what is left may as well go to the bit-bucket, so it is as well to have regular backups of this kind of file just as it is for all stored valuable data!

External File Structure and File Extensions


Microsoft Windows and MS-DOS File Systems The external structure of a file depends on whether it is being created on a FAT or NTFS partition. The maximum filename length on a NTFS partition is 256 characters, and 11 characters on FAT (8 character name+"."+3 character extension.) NTFS filenames keep their case, whereas FAT filenames have no concept of case (but case is ignored when performing a search under NTFS Operating System). Also, there is the new VFAT which permits 256 character filenames.

UNIX and Apple Macintosh File Systems


The concept of directories and files is fundamental to the UNIX operating system. On Microsoft Windows-based operating systems, directories are depicted as folders and moving about is accomplished by clicking on the different icons. In UNIX, the directories are arranged as a hierarchy with the root directory being at the top of the tree. The root directory is always depicted as /. Within the / directory, there are subdirectories (e.g.: etc and sys). Files can be written to any directory depending on the permissions. Files can be readable, writable and/or executable. Q.2) What are the functions associated with the role of a database administrator?

How does data dictionary help a database administrator?

Ans:- Functions associated with the role of a database administrator:

Data Definition The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field.

Data Manipulation

Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These function can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application.Unplanned queries are ad-hoc queries which are performed on a need basis.

Data Security & Integrity

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

The DBMS contains functions which handle the security and integrity of data in the application. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs.

Data Recovery & Concurrency

Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS.

Data Dictionary help a database administrator by the following ways:Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS.

Performance

Optimizing the performance of the queries is one of the important functions of a DBMS. Hence the DBMS has a set of programs forming the Query Optimizer which evaluates the different implementations of a query and chooses the best among them. Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed.

Q3.)

Define the following terms. (i) (ii) (iii) (iv) (v) Inverted list. Referential integrity. Foreign key. Transaction. Candidate key.

Ans:(i) In data management, a file that is indexed on many of the attributes of the data itself. For example, in an employee file, an index could be maintained for all secretaries, another for managers. It is faster to search the indexes than every record. Also known as "inverted lists," inverted file indexes use a lot of disk space; searching is fast, updating is slower. Referential integrity is a property of data which, when satisfied, requires every value of one attribute (column) of a relation (table) to exist as a value of another attribute in a different (or the same) relation. For referential integrity to hold in a relational database, any field in a table that is declared a foreign key can contain only values from a parent table's primary key or a candidate key. For instance, deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. Some relational database management systems (RDBMS) can enforce referential integrity, normally either by deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing the delete. Which method is used may be determined by a referential integrity constraint defined in a data dictionary. In the context of relational databases, a foreign key is a referential constraint between two tables. A foreign key is a field in a relational table that matches a candidate key of another table. The foreign key can be used to cross-reference tables. For example, say we have two tables, a CUSTOMER table that includes all customer data, and an ORDERS table that includes all customer orders. The intention here is that all orders must be associated with a customer that is already in the CUSTOMER table. To do this, we will place a foreign key in the ORDERS table and have it relate to the primary key of the CUSTOMER table. The foreign key identifies a column or set of columns in one (referencing) table that refers to a column or set of columns in another (referenced) table. The columns in the referencing table must reference the columns of the primary key or other superkey in the referenced table. The values in one row of the referencing columns must occur in a single row in the referenced table. Thus, a row in the referencing table cannot contain values that don't exist in the referenced table (except potentially NULL). This way references can be made to link information together and it is an essential part of database normalization. Multiple rows in the referencing table may refer to the same row in the referenced table. Most of the time, it reflects the one (parent table or referenced table) to many (child table, or referencing table) relationship. The referencing and referenced table may be the same table, i.e. the foreign key refers back to the same table. Such a foreign key is known in SQL:2003 as a self-referencing or recursive foreign key. A table may have multiple foreign keys, and each foreign key can have

(ii)

(iii)

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012


a different referenced table. Each foreign key is enforced independently by the database system. Therefore, cascading relationships between tables can be established using foreign keys. A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes: -

(iv)

1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status. 2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided the programs outcome are possibly erroneous.

A database transaction, by definition, must be atomic, consistent, isolated and durable. Database practitioners often refer to these properties of database transactions using the acronym ACID. Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.

(v). In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that

1. 2.

the relation does not have two distinct tuples (i.e. rows or records in common database language) with the same values for these attributes (which means that the set of attributes is a superkey) there is no proper subset of these attributes for which (1) holds (which means that the set is minimal).

The constituent attributes are called prime attributes. Conversely, an attribute that does not occur in ANY candidate key is called a non-prime attribute. Since a relation contains no duplicate tuples, the set of all its attributes is a superkey if NULL values are not used. It follows that every relation will have at least one candidate key. The candidate keys of a relation tell us all the possible ways we can identify its tuples. As such they are an important concept for the design database schema. For practical reasons RDBMSs usually require that for each relation one of its candidate keys is declared as the primary key, which means that it is considered as the preferred way to identify individual tuples. Foreign keys, for example, are usually required to reference such a primary key and not any of the other candidate keys.

Q4).

Compare and contrast the following. (i) (ii) (iii) (iv) (v) Primary indices and Secondary indices. Centralized DBMS and Distributed DBMS. B tree and B+ tree. Data replication and data Fragmentation. Procedural and non procedural DMLs

Ans :- (i) Primary and Secondary Indices 1. 2. 3. 4. Secondary indices have to be dense. Indices offer substantial benefits when searching for records. When a file is modified, every index on the file must be updated, Updating indices imposes overhead on database modification. Sequential scan using primary index is efficient, but a sequential scan using a secondary index is expensive each record access may fetch a new block from disk.

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

(ii) Centralized and decentralized design constitute variations on the bottomup and topdown approaches we discussed in the third question presented in the discussion focus. Basically, the centralized approach is best suited to relatively small and simple databases that lend themselves well to a bird'seye view of the entire database. Such databases may be designed by a single person or by a small and informally constituted design team. The company operations and the scope of its problems are sufficiently limited to enable the designer(s) to perform all of the necessary database design tasks:

1. Define the problem(s).

2. Create the conceptual design.

3. Verify the conceptual design with all user views.

4. Define all system processes and data constraints.

5. Assure that the database design will comply with all achievable end user requirements.

The Centralized Design Procedure

In contrast, when company operations are spread across multiple operational sites or when the database has multiple entities that are subject to complex relations, the best approach is often based on the decentralized design.

Typically, a decentralized design requires that the design task be divided into multiple modules, each one of which is assigned to a design team. The design team activities are coordinated by the lead designer, who must aggregate the design teams' efforts.

Since each team focuses on modeling a subset of the system, the definition of boundaries and the interrelation between data subsets must be very precise. Each team creates a conceptual data model corresponding to the subset being modeled. Each conceptual model is then verified individually against the user views, processes, and constraints for each of the modules. After the verification process has been completed, all modules are integrated in one conceptual model.

Since the data dictionary describes the characteristics of all the objects within the conceptual data model, it plays a vital role in the integration process. Naturally, after the subsets have been aggregated into a larger conceptual model, the lead designer must verify that the combined conceptual model is still able to support all the required transactions.

A centralized database has all its data at one place so there may occur problems of data avalability, and a system crash may lead to whole dataloss. In a distributed database, database is stored on several computers and they communicates using networks. users from any location can issue commands to retrieve data and it will not affect the working of the database. Failure of one of the site will not make the whole sysstem useless.

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

In a B- tree you can store both keys and data in the internal/leaf nodes. But in a B+ tree you have to store the data in the leaf nodes only. Is there any advantage of doing the above in a B+ tree? Why not use B- trees instead of B+ trees everywhere? As intuitively they seem much faster. I mean why do you need to replicate the key(data) in a B+ tree?
(iii)

The principal advantage of B+ trees over B trees is they allow you to in pack more pointers to other nodes by removing pointers to data, thus increasing the fanout and potentially decreasing the depth of the tree. The disadvantage is that there are no early outs when you might have found a match in an internal node. But since both data structures have huge fanouts, the vast majority of your matches will be on leaf nodes anyway, making on average the B+ tree more efficient.

B+Trees are much easier and higher performing to do a full scan, as in look at every piece of data that the tree indexes, since the terminal nodes form a linked list. To do a full scan with a B-Tree you need to do a full tree traversal to find all the data. B-Trees on the other hand can be faster when you do a seek (looking for a specific piece of data by key) especially when the tree resides in RAM or other non-block storage. Since you can elevate commonly used nodes in the tree there are less comparisons required to get to the data.

(iv) Database replication is the frequent electronic copying data from a database in one computer or server to a database in another so that all users share the same level of information. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others. The implementation of database replication for the purpose of eliminating data ambiguity or inconsistency among users is known as normalization

Database replication can be done in at least three different ways:

Snapshot replication: Data on one server is simply copied to another server, or to another database on the same server. Merging replication: Data from two or more databases is combined into a single database. Transactional replication: Users receive full initial copies of the database and then receive periodic updates as data changes. Data fragmentation occurs when a piece of data in memory is broken up into many pieces. The de-fragmentation tool is to rearrange blocks on disk so that the blocks of each file are contiguous. There are two types of fragmentations. They are External Fragmentation and Internal Fragmentation. In the external fragmentation, the free space, that is available for storage is divided into many small pieces. The storage space is of many different sizes. In dynamic memory allocation, a block might be requested, but the contiguous block has a free space. There are ten blocks of 300 bytes of free space, separated by allocated regions; one still cannot allocate the requested block of 1000 bytes. External fragmentation also occurs in file systems as many files of different sizes are created, change size, and are deleted. The effect is even worse if a file which is divided into many small pieces is deleted, because this leaves similarly small regions of free space. In the external fragmentation, the process are moved into one large adjacent

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

block, leaving all of the remaining free space in one large block. The garbage collectors are moved in order to improve dynamic memory allocation performance, and tools that disk drives perform.

(v) Non-Procedural DML: A high-level or non-procedural DML allows the user to specify what data is required without specifying how it is to be obtained. Many DBMSs allow high-level DML statements either to be entered interactively from a terminal or to be embedded in a general-purpose programming language. The end-users use a high-level query language to specify their requests to DBMS to retrieve data. Usually a single statement is given to the DBMS to retrieve or update multiple records. The DBMS translates a DML statement into a procedure that manipulates the set of records. The examples of non-procedural DMLs are SQL and QBE (Query-By-Example) that are used by relational database systems. These languages are easier to learn and use. The part of a non-procedural DML, which is related to data retrieval from database, is known as query language. Procedural DML: A low-level or procedural DML allows the user, i.e. programmer to specify what data is needed and how to obtain it. This type of DML typically retrieves individual records from the database and processes each separately. In this language, the looping, branching etc. statements are used to retrieve and process each record from a set of records. The programmers use the low-level DML

There are two types of Data manipulation language (DML). One is known as nonprocedural DML and other is known as procedural DML. Nonprocedural DML: It is also known as high level Data Manipulation language. It is used to specify complex database operations. We can enter these high level DML statements from a display monitor with the help of Database Management Systems or these statements can also be entered through a terminal. We can also embed these high level DML statements in a programming language. Procedural DML: It is also known as low level DML. It is used to get data or objects from the database. It processes each operation separately. That's why it has to use programming language constructs to get a record or to process each record from a set of records. Because of this property low level DML is also called set at a time or set oriented DMLs. Low level and high level DMLs are considered as part of the query language because both languages may be used interactively. Normally casual database (end) users use a nonprocedural language.

Ans: 5. Follow this link for more info..

http://bronzeacademy.com/AI/CS06.pdf

Ans 6 : - Follow this link for more info..

http://books.google.co.in/books?id=8PNCKe2SpRwC&pg=PA272&lpg=PA272&dq=A+project+handling+organization+has+per sons+identified+by+PER+%E2%80%93+ID+and+aLAST+%E2%80%93+NAME.+Persons+are+assigned+to+departments+iden tified+by+DEP+%E2%80%93+NAME.+Persons+work+on+projects+and+each+project+has+a+PROJ+%E2%80%93+ID+and+ a+PROJ+%E2%80%93+BUDGET.+Each+project+is+managed+by+one+department+and+a+department+may+manage+many+ projects.+But+a+person+may+work+on+only+some+(or+none)+of+the+projects+in+his+or+her+departments.+Identify+the+ent ities+and+relationship+for+this+organization+and+constructan+E+%E2%80%93+R+diagram.&source=bl&ots=_b03R1ga1&sig=VTAxAF0Tr8_1MHRAbK9TFqEeICk&hl=en&sa=X&ei=KW8xT6ugAsfJrAeI05SeBA&ved=0CCEQ6AEwA A#v=onepage&q&f=false

For more ignou assignments visit www.mantul.com

www.mantul.com

IGNOU BCA CS-06 SOLVED ASSIGNMENT 2012

Ans:7
Hashing is performed on arbitrary data by a hash function. A hash function is any function that can convert data to either a number or an alphanumeric code. There are possibly as many types of hashing as there are data. How precisely the hash function works depends on what data it is meant to generate a hash code from. Hashing is used for a variety of things. For example, a hash table is a data structure used for storing data in memory. Instead of iterating through the structure to find a specific item, we associate a key (hash code) to a particular item (data). A hash code can be generated from a file or disk image. If the data does not match the code, then the data is assumed to be corrupted. Hashing has the advantage of taking a larger amount of data and representing it as a smaller amount of data (hash code). The code generated is unique to the data it came from. Generating a hash code can take time however, depending on the function and the data. Some hash functions include Bernstein hash, Fowler-Noll-Vo hash, Jenkins hash, MurmurHash, Pearson hashing and Zobrist hashing.

For more ignou assignments visit www.mantul.com

Das könnte Ihnen auch gefallen