Sie sind auf Seite 1von 55

Course Code Course Title Assignment Number Maximum Marks Weightage Last Date of Submission

: : : : : :

MCS-023 Introduction to Database Management Systems MCA (2)/023/Assign /11 100 25% 15th October, 2011 (for July, 2011 session) 15th April, 2012 (for January, 2012 session)

This assignment has four questions. Answer all questions of total 80 marks. Rest 20 marks are for viva voce. You may use illustrations and diagrams to enhance explanations. Please go through the guidelines regarding assignments given in the Programme Guide for the format of presentation. Answer to each part of the question should be confined to about 300 words.
Question 1: (i) 20 Marks

Sol:

What are the possible applications of database system in a Bank? What are the advantages of using database system for banking applications? The application of database system can be seen in almost all areas of banking industries, and more and more banks are moving towards core banking system which uses this system as its backbone. There are numerous applications of database in a Bank such as: (1)Storing Records of Customers (2)Daily transactions (3)Accounting information (4)Asset and Liability records (5)Personal Management (6)Loan Records (7)Inter-bank transactions (8)ATM Facility (9)Credit/Debit Cards (10)Internet Banking (11)DEMAT accounts for share trading (12)All other facilities related to CORE Banking Advantages of Database Management System:

Reduction of Redundancies In a file processing system, each user group maintains its own files resulting in a considerable amount of redundancy of the stored data. This results in wastage of storage space but more importantly may result in data inconsistencies. Also, the same data has to be updated more than once resulting in duplication of effort. The files that represent the same data may In database approach data can be stored at a single place or with controlled redundancy become under inconsistent as some may be updated whereas others may not be. DBMS, which saves space and does not permit inconsistency. Shared Data A DBMS allows the sharing of database under its control by any number of application programs or users. A database belongs to the entire organisation and is shared by all authorised users. This scheme can be best explained with the help of a logical diagram

1|P age

(Figure 2). New applications can be built and added to the current system and data not currently stored can be stored.

Data Independence In the file-based system, the descriptions of data and logic for accessing the data are built into each application program making the program more dependent on data. A change in the structure of data may require alterations to programs. Database Management systems separates data descriptions from data. Hence it is not affected by changes. This is called Data Independence, where details of data are not exposed. DBMS provides an abstract view and hides details. For example, logically we can say that the interface or window to data provided by DBMS to a user may still be the same although the Improved Integrity internal structure of the data may be changed. (Refer to Figure 2). Data Integrity refers to validity and consistency of data. Data Integrity means that the data should be accurate and consistent. This is done by providing some checks or constraints. These are consistency rules that the database is not permitted to violate. Constraints may apply to data items within a record or relationships between records. For example, the age of an employee can be between 18 and 70 years only. While entering the data for the age of an employee, the database should check this. However, if Grades of any student are entered, the data can be erroneously entered as Grade C for Grade A. In this case DBMS will not be able Efficient Data Access to provide any check as both A and C are of the same data type and are valid values. DBMS utilises techniques to store and retrieve the data efficiently at least for unforeseen queries. A complex DBMS should be able to provide services to end users, where they can efficiently retrieve the data almost immediately. Multiple User Interfaces Since many users having varying levels of technical knowledge use a database, a DBMS should be able to provide a variety of interfaces. This includes a. Query language for casual users, b. Programming language interfaces for application programmers, c. Forms and codes for parametric users, d. Menu driven interfaces, and e. Natural language interfaces for standalone users, these interfaces are still not available in standard form with commercial database.

2|P age

Shared data under DBMS control

Application Program

Application Program

A user can either access data window through DBMS or use an application Figure 2: User interaction to DBMS program

Representing complex relationship among data A database may include varieties of data interrelated to each other in many ways. A DBMS must have the capability to represent a variety of relationships among the data as well as to retrieve and update related data easily and efficiently. Improved Security Data is vital to any organisation and also confidential. In a shared system where multiple users share the data, all information should not be shared by all users. For example, the salary of the employees should not be visible to anyone other than the department dealing in this. Hence, database should be protected from unauthorised users. This is done by Database Administrator (DBA) by providing the usernames and passwords only to authorised users as well as granting privileges or the type of operation allowed. This is done by using security and authorisation subsystem. Only authorised users may use the database and their access types can be restricted to only retrieval, insert, update or delete or any of these. For example, Backup and Recovery Improved the Branch Managermay fail to provide measures to protect data from system failures. This A file-based system of any company may have access to all data whereas the Sales Assistant on the user by taking backups periodically. DBMS provides facilities for lies solely may not have access to salary details. failures. A backup and recovery subsystem is recovering the hardware and software responsible for this. In case a program fails, it restores the database to a state in which it was before the execution of the program.
Support for concurrent transactions A transaction is defined as the unit of work. For example, a bank may be involved in a transaction where an amount of Rs.5000/- is transferred from account X to account Y. A DBMS also allows multiple transactions to occur simultaneously.
3|P age

(ii)

Explain the three level DBMS Architecture in the context of an application of a database system in a University like IGNOU.

Sol:

Three Level Architecture of DBMS or Logical DBMS Architecture The logical architecture describes how data in the database is perceived by users. It is not concerned with how the data is handled and processed by the DBMS, but only with how it looks. The method of data storage on the underlying file system is not revealed, and the users can manipulate the data without worrying about where it is located or how it is actually stored. This results in the database having different levels of abstraction.
The majority of commercial Database Management S ystems available today are based on the ANSI/SPARC generalised DBMS architecture, as proposed by the ANSI/SPARC Study Group on Data Base Management Systems. Hence this is also called as the ANSI/SPARC model. It divides the system into three levels of abstraction: the internal or physical level, the conceptual level, and the external or view level. The diagram below shows the logical architecture for a typical DBMS.

The External or View Level The external or view level is the highest level of abstraction of database. It provides a window on the conceptual view, which allows the user to see only the data of interest to them. The user can be either an application program or an end user. There can be many external views as any number of external schema can be defined and they can overlap each other. It consists of the definition of logical records and relationships in the external view. It also contains the methods for deriving the objects such as entities, attributes and relationships in the The Conceptual Level or Global level external The conceptual level presents a logical view of the entire database as a unified whole. It view from the Conceptual view. allows the user to bring all the data in the database together and see it in a consistent manner. Hence, there is only one conceptual schema per database. The first stage in the design of a database is to define the conceptual view, and a DBMS provides a data definition language for this purpose. It describes all the records and relationships included in the database. The data definition language used to create the conceptual level must not specify any physical storage considerations that should be handled by the physical level. It does not provide The any Internal or Physical Level The collection of files permanently stored on secondary storage devices is known as the storage or access details, but defines the information content only. physical database. The physical or internal level is the one closest to physical storage, and it provides a low-level description of the physical database, and an interface between the operating systems file system and the record structures used in higher levels of abstraction. It is at this level that record types and methods of storage are defined, as well as how stored fields are represented, what physical sequence the stored records are in, and what other physical structures exist. Mappings between Levels and Data Independence The three levels of abstraction in the database do not exist independently of each other. There must be some correspondence, or mapping, between the levels. There are two types of mappings: the conceptual/internal mapping and the external/conceptual mapping.

4|P age

The conceptual/internal mapping lies between the conceptual and internal levels, and defines the correspondence between the records and the fields of the conceptual view and the files and data structures of the internal view. If the structure of the stored database is changed, then the conceptual/ internal mapping must also be changed accordingly so that the view from the conceptual level remains constant. It is this mapping that provides physical data independence for the database. For example, we may change the internal view of student relation by breaking the student file into two files, one containing enrolment, name and address and other containing enrolment, programme. However, the mapping will make sure that the conceptual view is restored as original. The storage decision is primarily taken for optimisation purposes. The external/conceptual view lies between the external and conceptual levels, and defines the correspondence between a particular external view and the conceptual view. Although these two levels are similar, some elements found in a particular external view may be different from the conceptual view. For example, several fields can be combined into a single (virtual) field, which can also have different names from the original fields. If the structure of the database at the conceptual level is changed, then the external/conceptual mapping must change accordingly so that the view from the external level remains constant. It is this mapping that provides logical data independence for the database. For example, we may change the student relation to have more fields at conceptual level, yet this will not It is also possible to have another mapping, where one external view is expressed in terms change the of two user views at all. other external views (this could be called an external/external mapping). This is useful if several external views are closely related to one another, as it allows you to avoid mapping each of the similar external views directly to the conceptual level.

The need: Three level architecture The objective of the three level architecture is to separate each users view of the database from the way the database is physically represented. Support of multiple user views: Each user is able to access the same data, but have a different customized view of the data. Each user should be able to change the way he or she views the data and this change should not affect other users.
Insulation between user programs and data that does not concern them: Users should not directly deal with physical storage details, such as indexing or hashing. The users interactions with the database should be independent of storage considerations.

Insulation between conceptual and physical structures It can be defined as: 1. The Database Administrator should be able to change the storage structures without affecting users views. 2. The internal structure of the database should be unaffected by the changes to the physical aspects of the storage, such as changing to a new storage device. 3. The DBA should be able to change the conceptual structure of the database without affecting all users.

5|P age

(iii)

Consider the following schema of a student database: Student (st_ID, st_name, st_programme) Subject (su_ID, su_name, su_credits) Marks (st_ID, su_ID, ma_marks) Perform the following tasks for the database: a. Define the Domain of each of the attribute.

Sol:

Domains: Each simple attribute of an entity type contains a possible set of values that can be attached to it. This is called the domain of an attribute. An attribute cannot contain a value outside this domain. EXAMPLE- for PERSON entity PERSON_ID has a specific domain, integer values say from 1 to 100. Domain for following tables:Student (st_ID, st_name, st_programme) Subject (su_ID, su_name, su_credits) Marks (st_ID, su_ID, ma_marks)
Table Student Student Student Subject Subject Subject Marks Marks Marks Attribute st_ID st_name st_programme su_ID su_name su_credits st_ID su_ID ma_marks Domain Number Character Character Number Character Number Number Number Number Range 1 to 9999 a-z ,., ,A-Z A-Z 1 to 100 a-z ,., ,A-Z 2 to 8 1 to 9999 1 to 100 0 to 100

Sol:

b. List domain constraints on each of the domain Domain constraints are primarily created for defining the logically correct values for an attribute of a relation. The relation allows attributes of a relation to be confined to a range of values, for example, values of an attribute age can be restricted as Zero to 150 or a specific type such as integers, etc. Typical constraints include NOT NULL, PRIMARY KEY, UNIQUE, FOREIGN KEY Table Attribute Constraint Student st_ID PRIMARY KEY Student st_name NOT NULL Student st_programme NOT NULL Subject Subject Subject Marks Marks Marks su_ID su_name su_credits st_ID su_ID ma_marks PRIMARY KEY UNIQUE NOT NULL FOREIGN KEY FOREIGN KEY NOT NULL

6|P age

c. List the Super Key and Candidate keys for each of the relation Sol: A super key is an attribute or set of attributes used to identify the records uniquely in a relation. For Example, in the Relation PERSON described earlier PERSON_ID is a super key since PERSON_ID is unique for each person. Similarly (PERSON_ID, AGE) and (PERSON_ID, NAME) are also super keys of the relation PERSON since their combination is also unique for each record.

Candidate keys: Super keys of a relation can contain extra attributes. Candidate keys are minimal super key, i.e. such a key contains no extraneous attribute. An attribute is called extraneous if even after removing it from the key, makes the remaining attributes still has the properties of a key. The following properties must be satisfied by the candidate keys: A candidate key must be unique. A candidate keys value must exist. It cannot be null. (This is also called entity integrity rule) A candidate key is a minimal set of attributes. The value of a candidate key must be stable. Its value cannot change outside the control of the system. A relation can have more than one candidate keys and one of them can be chosen as a primary key. Table Super Key Candidate Key Student st_ID st_ID Student st_ID , st_name Student st_ID , st_programme
Subject Subject Subject Marks Marks Marks Marks Marks su_ID su_ID ,su_name su_ID ,su_credits st_ID su_ID st_ID , ma_marks su_ID, ma_marks st_ID, su_ID su_ID su_name

st_ID su_ID

d. List all the Primary keys for the database Sol: Table Student Subject Marks Primary Key st_ID su_ID

7|P age

e. List all the Entity integrity constraints Sol:

Entity Integrity Constraint: It states that no primary key value can be null. This is because the primary key is used to identify individual tuple in the relation. So we will not be able to identify the records uniquely containing null values for the primary key attributes. This constraint is specified on one individual relation. TableFieldIntegrity Constraint NOT NULL, BETWEEN 1 AND 9999Studentst_ID NOT NULLStudentst_name Studentst_programme NOT NULL
Subject Subject Subject Marks Marks Marks f. su_ID su_name su_credits st_ID su_ID ma_marks NOT NULL, BETWEEN 1 AND 100 NOT NULL NOT NULL, BETWEEN 2 AND 8 NOT NULL, BETWEEN 1 AND 9999 NOT NULL, BETWEEN 1 AND 100 NOT NULL, BETWEEN 0 AND 100

List all the Referential integrity constraints

Sol:

Referential integrity constraint It states that the tuple in one relation that refers to another relation must refer to an existing tuple in that relation. This constraint is specified on two relations (not necessarily distinct) Table Marks Marks Field st_ID su_ID Referential Integrity Constraint NOT NULL, BETWEEN 1 AND 9999 NOT NULL, BETWEEN 1 AND 100

Sol:

g. Enter at least 4-5 tuples in each database to create a valid relational instance

SQL> create table student(st_id number(4) primary key, st_name varchar2(32)unique not null, st_programme varchar2(16)not null); SQL> create table subject(su_id number(3)primary key, su_name varchar2(32) not null unique, su_credits number(1)not null); SQL> create table marks(st_id number(4)not null, su_id number(3)not null, ma_marks number(5,2)not null, constraint stfk foreign key(st_id)references student(st_id), constraint sufk foreign key(su_id)references subject(su_id) ); /*Inserting Values */ insert into student(st_id,st_name,st_programme)values(1,'Rebecca Stevens','MCA'); insert into student(st_id,st_name,st_programme)values(56,'Samit Arora','BCA'); insert into student(st_id,st_name,st_programme)values(269,'Kumar Kailash','MEG');
8|P age

insert into student(st_id,st_name,st_programme)values(99,'Jackie Chan','MCA'); insert into student(st_id,st_name,st_programme)values(45,'Wasim Anwar','CIC'); insert into subject(su_id,su_name,su_credits)values(23,'DBMS','4'); insert into subject(su_id,su_name,su_credits)values(11,'C PROGRAMMING','4'); insert into subject(su_id,su_name,su_credits)values(15,'COMMUNICATION SKILLS','2'); insert into subject(su_id,su_name,su_credits)values(73,'THEORY OF COMPUTER','4'); insert into subject(su_id,su_name,su_credits)values(63,'UNIX OPERATING SYSTEM','8');

insert into MARKS(sT_id,su_id,ma_marks)values(1,'11','77'); insert into MARKS(sT_id,su_id,ma_marks)values(99,'15','56.64'); insert into MARKS(sT_id,su_id,ma_marks)values(45,'63','40.64'); insert into MARKS(sT_id,su_id,ma_marks)values(56,'23','60.45'); insert into MARKS(sT_id,su_id,ma_marks)values(269,'23','39.67');

9|P age

(iv)

Consider the following schema: Customer (cu_ID, cu_name, cu_type) Purchase (pu_ID, cu_ID, it_ID, pu_dateofsale, pu_quantity) Item (it_ID, it_name, it_supplier, it_quantity) Write about 10 queries covering all the relational algebraic operations (UNION, INTERSECTION, SET DIFFERENCE, CARTESIAN PRODUCT, SELECTION, PROJECTION, JOIN, and DIVISION) in at least one of the queries. Please note that you must first write the query in English and then represent it using relational algebra. You may use more that one operator in a query. UNION Operation

Sol:

If R1 and R2 are two union compatible relations then R3 = R1 R2 is the relation containing tuples that are either in R1 or in R2 or in both. In other words, R3 will have tuples such that R3 = {t | R1 t R2 t}. SQL> select cu_id from customer where cu_id >0 union all select cu_id from purchase; R1
CU_ID ---------1 2 7

R3 = R1 R2

R2
CU_ID ---------1 1 2 1 1 1 1 2 1 1 CU_ID ---------1 2 7 1 1 2 1 1 1 1 2 1 1

INTERSECTION If R1 and R2 are two union compatible functions or relations, then the result of R3 = R1 R2 is the relation that includes all tuples that are in both the relations In other words, R3will have tuples such that R3 = {t | R1 t R2 t}. SQL> select cu_id from customer intersect select cu_id from purchase; R1
CU_ID ---------1 2 7

R2
CU_ID ---------1 1 2 1 1 1 1 2 1

R3 = R1 R2 CU_ID ---------1 2

10 | P a g e

SET DIFFERENCE If R1 and R2 are two union compatible relations or relations then result of R3 =R1 R2 is the relation that includes only those tuples that are in R1 but not in R2. In other words, R3 will have tuples such that R3 = {t | R1 t tR2}. SQL> select cu_id from customer minus select cu_id from purchase; R1
CU_ID ---------1 2 7

R2
CU_ID ---------1 1 2 1 1 1 1 2 1 1

R3 = R1 - R2
CU_ID ---------7

CARTESIAN PRODUCT If R1 and R2 are two functions or relations, then the result of R3 = R1 R2 is the combination of tuples that are in R1 and R2. The product is commutative and associative. Degree (R3) =Degree of (R1) + Degree (R2). In other words, R3 will have tuples such that R3 = {t1 || t2 | R1 t1 R2 t2}.

SQL> select * from customer,purchase,item; /*produces 3x10x3=90 rows*/ Or SQL> select * from customer,purchase,item where customer.cu_id=purchase.cu_id and
11 | P a g e

purchase.it_id=item.it_id; /*produces 10 rows*/ SELECTION The select operation is used to select some specific records from the databse based on some criteria. This is a unary operation mathematically denoted as .

Syntax: <Selection condition> (Relation) The Boolean expression is specified in <Select condition> is made of a number of clauses of the form: <attribute name><comparison operator><constant value> or <attribute name><comparison operator><attribute name> Comparison operators in the set {,,, =, <, <} apply to the attributes whose domains are ordered value like integer. SQL> select * from customer where cu_id=1;
CU_ID ---------1 CU_NAME -------------------------------Andy Robbins CU_TYPE ---------------Reseller CU_CREDIT_LIMIT --------------10000

PROJECTION A table that is built from columns in one or more tables is called a projection table. The project operation is used to select the records with specified attributes while discarding the others based on some specific criteria. This is denoted as . List of attribute for project (Relation) SQL> select customer.cu_id,customer.cu_name, purchase.pu_id,purchase.it_ID, purchase.pu_dateofsale, purchase.pu_quantity from customer,purchase where customer.cu_id=purchase.cu_id; CU_IDCU_NAMEPU_ID ---------- -------------------------------- -----1 Andy Robbinsp001 1 Andy Robbins 2 Rajesh Roy p002 p003 IT_ID PU_DATEOF PU_QUANTITY ---------- -------------------- ---------------------222-MAR-112 3 1 11-SEP-11 20-SEP-11 5 10

JOIN The JOIN operation is applied on two relations. When we want to select related tuples from two given relation join is used. This is denoted as _. The join operation requires that both the joined relations must have at least one domain compatible attributes. Syntax: R1_<join condition>R2 is used to combine related tuples from two relations R1 and R2 into a single tuple. <join condition> is of the form:

12 | P a g e

<condition>AND<condition>AND..AND<condition>. Degree of Relation: Degree (R1_<join condition>R2) <= Degree (R1) + Degree (R2). Three types of joins are there: a) Theta join When each condition is of the form A B, A is an attribute of R1 and B is an attribute of R2 and have the same domain, and is one of the comparison operators {,,, =, <, <}. b) Equijoin When each condition appears with equality condition (=) only. c) Natural join (denoted by R*S) When two join attributes have the same name in both relations. (That attribute is called Join attribute), only one of the two attributes is retained in the join relation. The join condition in such a case is = for the join attribute. The condition is not shown in the natural join. SQL> Select item.it_id,item.it_name,item.it_costperunit,purchase.pu_id from item,purchase where purchase.it_id=item.it_id; Item Table IT_ID IT_NAME 1 2 3 Complan Cricket Bat Logitech Optical Mouse

IT_COSTPERUNIT 150 990 590

IT_TYPE Health Drink Sports Accessory Usb Mouse

Purchase Table CU_ID IT_ID PU_DATEOFSALE 1 1 2 1 1 1 1 2 1 13 | P a g e 2 3 1 1 3 2 1 2 2 22-MAR-11 11-SEP-11 20-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 01-APR-11 30-AUG-11 PU_QUANTITY 2 5 10 10 10 10 10 32 32 PU_ID p001 p002 p004 p003 p003 p003 p003 p005 p006

30-AUG-11

20

p006

Item * Purchase IT_ID IT_NAME 2 3 1 1 3 2 1 2 2 3 Cricket Bat Logitech Optical Mouse Complan Complan Logitech Optical Mouse Cricket Bat Complan Cricket Bat Cricket Bat Logitech Optical Mouse IT_COSTPERUNIT 990 590 150 150 590 990 150 990 990 590 PU_ID p001 p002 p004 p003 p003 p003 p003 p005 p006 p006

DIVISION To perform the division operation R1 R2, R2 should be a proper subset of R1. In the following example R1 contains attributes A and B and R2 contains only attribute B so R2 is a proper subset of R1. If we perform R1 R2 than the resultant relation will contain those values of A from R1 that are queries, for example: Find the names of sailors who have reserved all related boats." to all values of B present in R2. The division operator is useful for expressing certain kinds of

14 | P a g e

SQL> select * from purchase where cu_id=1


Table A: Purchase CU_ID IT_IDPU_DATEOFSALE 1 1 2 1 1 1 1 2 1 1 2 3 1 1 3 2 1 2 2 3 22-MAR-11 11-SEP-11 20-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 01-APR-11 30-AUG-11 30-AUG-11 PU_QUANTITY 2 5 10 10 10 10 10 32 32 20 PU_ID p001 p002 p004 p003 p003 p003 p003 p005 p006 p006

Table B: purchase.cu_id Cu_id ---------1 2

Table C: A/B1
CU_ID IT_ID PU_DATEOFSALE PU_QUANTITY PU_ID

1 1 1 1 1 1 1 1

2 3 1 3 2 1 2 3

22-MAR-11 11-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 30-AUG-11 30-AUG-11

2 5 10 10 10 10 32 20

p001 p002 p003 p003 p003 p003 p006 p006

15 | P a g e

(v)

A University maintains the list of its programmes and students. A programme consists of a number of courses. Each programme may have compulsory courses and elective courses. All the courses of the University have 4 credits. A student is expected to take four courses in each semester. Duration of different programmes may vary from 2 semesters to 8 semesters. A course is taught by one teacher in a semester. Teachers have expertise in few areas and normally teach courses in that area. Identify the entities for the University as above. List all the attributes for all the entities. Identify all the relationships among entities. Draw the E-R diagram for the University. You should identify the keys, relationship cardinality etc. Make and state suitable assumptions.

Pname Fees Name Address RollNo. Student 1 1

Duration C_Code Cname N

Program

Course M

Credit Ctype Semester

N Semester

Enroll Name Address Relation D_no D_name N


Guardian

Taught by

1 1
Department

N
ID

Works in

N Faculty 1

Address Basic Salary

1
Headof

Name Expertise

Date from

16 | P a g e

Question 2: (i)

20 Marks

Sol:

What are different referential actions that may be required in order to maintain referential integrity constraints for the schema given in problem 1 (iii) when database modifications are Referential Integrity being performed. It can be simply defined as: The database must not contain any unmatched foreign key values. The term unmatched foreign key value means a foreign key value for which there does not exist a matching value of the relevant candidate key in the relevant target (referenced) relation. For example, any value existing in the EMPID attribute in ASSIGNMENT relation must exist in the EMPLOYEE relation. That is, the only EMPIDs that can exist in the EMPLOYEE relation are 101, 102 and 103 for the present state/ instance of the database given in Figure 2. If we want to add a tuple with EMPID value 104 in the ASSIGNMENT relation, it will cause violation of referential integrity constraint. Logically it is very obvious after all the employee 104 does not exist, so how can s/he be assigned any work. Database modifications can cause violations of referential integrity. We list here the test we must make for each type of database modification to preserve the referential integrity constraint:

Delete During the deletion of a tuple two cases can occur:


Deletion of tuple in relation having the foreign key: In such a case simply delete the desired tuple. For example, in ASSIGNMENT relation we can easily delete the first tuple. Deletion of the target of a foreign key reference: For example, an attempt to delete an employee tuple in EMPLOYEE relation whose EMPID is 101. This employee appears not only in the EMPLOYEE but also in the ASSIGNMENT relation. Can this tuple be deleted? If we delete the tuple in EMPLOYEE relation then two unmatched tuples are left in the ASSIGNMENT relation, thus causing violation of referential integrity constraint. Thus, the following two choices exist for such deletion: RESTRICT The delete operation is restricted to only the case where there are no such matching tuples. For example, we can delete the EMPLOYEE record of EMPID 103 as no matching tuple in ASSIGNMENT but not the record of EMPID 101. CASCADE The delete operation cascades to delete those matching tuples also. For example, if the delete mode is CASCADE then deleting employee having EMPID as 101 from EMPLOYEE relation will also cause deletion of 2 more tuples from ASSIGNMENT relation.

Insert The insertion of a tuple in the target of reference does not cause any violation. However, insertion of a tuple in the relation in which, we have the foreign key, for example, in ASSIGNMENT relation it needs to be ensured that all matching target candidate key exist; otherwise the insert operation can be rejected. For example, one of the possible ASSIGNMENT insert operations would be (103, LG, 3000).
Modify Modify or update operation changes the existing values. If these operations change the value that is the foreign key also, the only check required is the same as that of the Insert operation.
17 | P a g e

What should happen to an attempt to update a candidate key that is the target of a foreign key reference? For example, an attempt to update the PROJID LG for which there exists at least one matching ASSIGNMENT tuple? In general there are the same possibilities as for DELETE operation: RESTRICT: The update operation is restricted to the case where there are no matching ASSIGNMENT tuples. (it is rejected otherwise). CASCADE The update operation cascades to update the foreign key in those matching ASSIGNMENT tuples also.

Entity Integrity Entity Integrity Rule states: No component of the primary key of a relation is allowed to accept NULL. In other words, the definition of every attribute involved in the primary key of any basic relation must explicitly or implicitly include the specifications of NULL NOT ALLOWED.
Foreign Keys and NULL Let us consider the relation: DEPT
DEPTID D1 D2 D3 DNAME Marketing Development Research BUDGET 10M 12M 5M

EMP
EMPID E1 E2 E3 E4 ENAME Rahul Aparna Ankit Sangeeta DEPTID D1 D1 D2 SALARY 40K 42K 30K 35K

Suppose that Sangeeta is not assigned any Department. In the EMP tuple corresponding to Sangeeta, therefore, there is no genuine department number that can serve as the appropriate value for the DEPTID foreign key. Thus, one cannot determine DNAME and BUDGET for Sangeetas department as those values are NULL. This may be a real situation where the person has newly joined and is undergoing training and will be allocated to a department only on completion of the training. Thus, NULL in foreign key values may not be a logical error. So, the foreign key definition may be redefined to include NULL as an acceptable value in the foreign key for which there is no need to find a matching tuple. Are there any other constraints that may be applicable on the attribute values of the entities? Yes, these constraints are basically related to the domain and termed as the domain constraints. Domain Constraints Domain constraints are primarily created for defining the logically correct values for an attribute of a relation. The relation allows attributes of a relation to be confined to a range of values, for example, values of an attribute age can be restricted as Zero to 150 or a specific type such as integers, etc

18 | P a g e

(ii)

Consider the following relation for a Bank: Customer_Record ( Account Number, Holder Name, date of birth, age, address, Account Type, balance in account, Loan Amount, EMI of Loan, start date of loan, end date of loan) An account holder can open only one account in the Bank. However, an account may be a joint account. An account holder may take more than one loans from the bank. Identify the functional dependencies in the relation given above. Normalise the relational up to BCNF. Make suitable assumptions, if any The given table is in First Normal Form

Sol:

Customer Record: Containing Redundancy Account Holder Number Name date of birth age address Account balance Typein account Savings Savings Savings Current Joint Joint 50,000 50,000 50,000 LoanEMI Amount of Loan 10,000 20,000 30,000 1,000 2,000 3,000 5,000 start date date of end of loan loan

111 111 111 222 333 333

Abhishek Abhishek Abhishek Rajeev Abhishek Rajeev

01-01-1980 31 01-01-1980 31 01-01-1980 31 25-06-1992 19 01-01-1980 31 25-06-1992 19

Delhi Delhi Delhi Mumbai Delhi Mumbai

01-01-11 01-01-12 01-06-11 01-06-12 01-07-11 01-07-12 25-06-11 25-06-12 02-09-11 02-09-12 02-09-11 02-09-12

1,00,000 50,000

5,00,000 1,00,000 10,000 5,00,000 1,00,000 10,000

Normalisation Process Second Normal Form: Eliminating redundant data There are multiple loan amounts, Emi, start date and end date values for each account number in the above table. Loan amount is not functionally dependent on Account Number (Primary Key), so this relationship is not in second normal form: Customer: Customer ID 1 2 Holder Name Abhishek Rajeev date of birth 01-01-1980 25-06-1992 age 31 19 address Delhi Mumbai

19 | P a g e

Accounts: Account Number Customer Type ID Balance Loan Amount EMI of Loan start date of loan end date of loan

111 111 111 222 333 333

1 1 1 2 1 2

Savings Savings Savings Current Joint Joint

50,000 50,000 50,000 1,00,000 5,00,000 5,00,000

10,000 20,000 30,000 50,000 1,00,000 1,00,000

1,000 2,000 3,000 5,000 10,000 10,000

01-01-11 01-01-12 01-06-11 01-06-12 01-07-11 01-07-12 25-06-11 25-06-12 02-09-11 02-09-12 02-09-11 02-09-12

Third Normal Form: Eliminate data not dependent on the key In the Account table the customer the loan amount, Emi, Start date and end date are functionally dependent on account number attribute. However, it is not dependent on customer id. The solution is to move those attributes from accounts table to the Loans table as shown below: Customer: Customer ID 1 2 Holder Name Abhishek Rajeev date of birth 01-01-1980 25-06-1992 age 31 19 address Delhi Mumbai

Accounts: Account Number 111 222 333 333 Customer Type ID 1 2 1 2 Savings Current Joint Joint Balance

50,000 1,00,000 5,00,000 5,00,000

20 | P a g e

Loans: Account Number 111 111 111 222 333 Loan Amount EMI of Loan start date of loan end date of loan

10,000 20,000 30,000 50,000 1,00,000

1,000 2,000 3,000 5,000 10,000

01-01-11 01-06-11 01-07-11 25-06-11 02-09-11

01-01-12 01-06-12 01-07-12 25-06-12 02-09-12

Boyce-Codd Normal Form (BCNF): A Relation is in BCNF, if it is in 3NF and if every determinant is a candidate Key. A determinant is the left side of an Functional Dependency. Most relations that are in 3NF are also in BCNF. A 3NF relation is not in BCNF if all the following conditions apply. o o o The candidate keys in the relation are composite keys. There is more than one overlapping candidate keys in the relation, and some attributes in the keys are overlapping and some are not overlapping. There is a FD from the non-overlapping attribute(s) of one candidate key to nonoverlapping attribute(s) of other candidate key

Thus the following table is in BCNF. Customer (Customer ID , Holder Name, date of birth , age , address ) Accounts (account number, customer id, type, balance) Loans (Account Number, Loan Amount, EMI of Loan, start date of loan, end date of loan)

21 | P a g e

(iii)

Compare and contrast the following file organisation : a. Heap Files versus Sequential file organisation

Ans:

Heap files (unordered file) Basically these files are unordered files. It is the simplest and most basic type. These files consist of randomly ordered records. The records will have no particular order. The operations we can perform on the records are insert, retrieve and delete. The features of the heap file or the pile file Organisation are:
o o o New records can be inserted in any empty space that can accommodate them. When old records are deleted, the occupied space becomes empty and available for any new insertion. If updated records grow; they may need to be relocated (moved) to a new empty space. This needs to keep a list of empty space.

Advantages of heap files 1. This is a simple file Organisation method. 2. Insertion is somehow efficient. 3. Good for bulk-loading data into a table. 4. Best if file scans are common or insertions are frequent. Disadvantages of heap files 1. Retrieval requires a linear search and is inefficient. 2. Deletion can result in unused space/need for reorganisation. Sequential File Organisation
The most basic way to organise the collection of records in a file is to use sequential Organisation. Records of the file are stored in sequence by the primary key field values. They are accessible only in the order stored, i.e., in the primary key order. This kind of file Organisation works well for tasks which need to access nearly every record in a file, e.g., payroll. Let us see the advantages and disadvantages of it. In a sequentially organised file records are written consecutively when the file is created and must be accessed consecutively when the file is later used for input. A sequential file maintains the records in the logical sequence of its primary key values. Sequential files are inefficient for random access, however, are suitable for sequential access. A sequential file can be stored on devices like magnetic tape that allow sequential access. On an average, to search a record in a sequential file would require to look into half of the records of the file. However, if a sequential file is stored on a disk (remember disks support direct access of its blocks) with keyword stored separately from the rest of record, then only those disk blocks need to be read that contains the desired record or records. This type of storage allows binary search on sequential file blocks, thus, enhancing the speed of access. Updating a sequential file usually creates a new file so that the record sequence on primary key is maintained. The update operation first copies the records till the record after which update is required into the new file and then the updated record is put followed by the remainder of records. Thus method of updating a sequential file automatically creates a backup copy.

22 | P a g e

Additions in the sequential files are also handled in a similar manner to update. Adding a record requires shifting of all records from the point of insertion to the end of file to create space for the new record. On the other hand deletion of a record requires a compression of the file space. The basic advantage of sequential file is the sequential processing, as next record is easily accessible despite the absence of any data structure. However, simple queries are time consuming for large files. A single update is expensive as new file must be created, therefore, to reduce the cost per update, all updates requests are sorted in the order of the sequential file. This update file is then used to update the sequential file in a single go. The file containing the This process is called the batchto as a of updating. file. updates is sometimes referred mode transaction In this mode each record of master sequential file is checked for one or more possible updates by comparing with the update information of transaction file. The records are written to new master file in the sequential manner. A record that requires multiple update is written only when all the updates have been performed on the record. A record that is to be deleted is not written to new master file. Thus, a new updated master file will be created from the transaction file and old master file. Thus, update, insertion and deletion of records in a sequential file require a new file creation. Can we reduce creation of this new file? Yes, it can easily be done if the original sequential file is created with holes which are empty records spaces. Thus, a reorganisation can be restricted to only a block that can be done very easily within the main memory. Thus, holes increase the performance of sequential file insertion and deletion. This organisation also support a concept of overflow area, which can store the spilled over records if a block is full. This technique is also used in index sequential file organisation. A detailed discussion on it can be found in the further readings.

Advantages of Sequential File Organisation o It is fast and efficient when dealing with large volumes of data that need to be processed periodically (batch system).

Disadvantages of sequential File Organisation o Requires that all new transactions be sorted into the proper sequence for sequential o access processing. Locating, storing, modifying, deleting, or adding records in the file require o rearranging the file. This method is too slow to handle applications requiring immediate updating or responses.

23 | P a g e

b. B Tree indexed versus BST indexes

Ans:

BST (Binary search tree): A BST is a data structure that has a property that all the keys that are to the left of a node are smaller than the key value of the node and all the keys to the right are larger than the key value of the node. To search a typical key value, you start from the root and move towards left or right depending on the value of key that is being searched. Since an index is a <value, address> pair, thus while using BST, we need to use the value as the key and address field must also be specified in order to locate the records in the file that is stored on the secondary storage devices. The following figure demonstrates the use of BST index for a University where a dense index exists on the enrolment number field. A record consists of the key value and other information fields. However, we dont store these information fields in the binary search tree, as it would make a very large tree. Thus, to speed up searches and to reduce the tree size, the information fields of records are commonly stored into files on secondary storage devices. The connection between key values in the BST to its corresponding record in the very much suitable for the index, ifa pointer as to be contained A BST as a data structure is file is established with an help of an index is shown in Figure11. in the primary memory. However, indexes are quite large in nature and require completely Please note that the BST structure is key value, address pair. a

combination of primary and secondary storage. As far as BST is concerned it might be stored level by level on a secondary storage which would require the additional problem of finding the correct sub-tree and also it may require a number of transfers, with the worst condition as one block transfer for each level of a tree being searched. This situation can be drastically A B-Tree as we index has two advantages: remedied if an use B -Tree as data structure. It is completely balanced Each node of B-Tree can have a number of keys. Ideal node size would be if it is somewhat equal to the block size of secondary storage. The question that needs to be answered here is what should be the order of B-Tree for an index. It ranges from 80-200 depending on various index structures and block size. Let us recollect some basic facts about B-Trees indexes. The basic B-tree structure was discovered by R.Bayer and E.McCreight (1970) of Bell Scientific Research Labs and has become one of the popular structures for organising an index structure. Many variations on the basic B-tree structure have been developed. The B-tree is a useful balanced sort-tree for external sorting. There are strong uses of Btrees in a database system as pointed out by D. Comer (1979): While no single scheme can be optimum for all applications, the techniques of organising a file and its index called the Btree is B-tree of order N is a tree in which: A the standard Organisation for indexes in a database system. o Each node has a maximum of N children and a minimum of the ceiling of [N/2] children. However, the root node of the tree can have 2 to N children. o Each node can have one fewer keys than the number of children, but a maximum of N-1 keys can be stored in a node.

24 | P a g e

Ans:

The keys are normally arranged in an increasing order. All keys in the sub tree to the left of a key are less than the key, and all the keys in the sub-tree to the right of a o key are higher then the value of the key. If a new key is inserted into a full node, the node is split into two nodes, and the key o with the median value is inserted in the parent node. If the parent node is the root, a new root node is created. All the leaves of B-tree are on the same level. There is no empty sub-tree above c. Indexed file organisation versus Hashed file organisation the level of the leaves. Thus a B-tree is completely balanced. o

Indexed (Indexed Sequential) File Organisation It organises the file like a large dictionary, i.e., records are stored in order of the key but an index is kept which also permits a type of direct access. The records are stored sequentially by primary key values and there is an index built over the primary key field. The retrieval of a record from a sequential file, on average, requires access to half the records in the file, making such inquiries not only inefficient but very time consuming for large files. To improve the query response time of a An index isfile, a of index value, address pairs. Indexing associates a set of objects to a set of sequential a set type of indexing technique can be added. orderable quantities, that are usually smaller in number or their properties. Thus, an index is a mechanism for faster search. Although the indices and the data blocks are kept together physically, they are logically distinct. Let us use the term an index file to describes the indexes and let us refer to data files as data records. An index can be small enough to be read into the main memory. A sequential (or sorted on primary keys) file that is indexed on its primary key is called an index sequential file. The index allows for random access to records, while the sequential storage of the records of the file provides easy access to the sequential records. An additional feature of this file system is the over flow area. The overflow area provides additional space for record addition without the need to create.
Hashed File Organisation Hashing is the most common form of purely random access to a file or database. It is also used to access columns that do not have an index as an optimisation technique. Hash functions calculate the address of the page in which the record is to be stored based on one or more fields in the record. The records in a hash file appear randomly distributed across the available space. It requires some hashing algorithm and the technique. Hashing Algorithm converts a primary key value into a record Advantages of Hashed file Organisation address. 1. Insertion or search on hash-key is division hashing with chained overflow. The most popular form of hashing is fast. 2. Best if equality search is needed on hash-key. Disadvantages of Hashed file Organisation 1. It is a complex file Organisation method. 2. Search is slow. 3. It suffers from disk space overhead. 4. Unbalanced buckets degrade performance. 5. Range search is slow.

25 | P a g e

d. Multi-list file organisation versus inverted filed organisation Ans: There are numerous techniques that have been used to implement multi-key file Organisation. Most of these techniques are based on building indexes to provide direct access by the key value. Two of the commonest techniques for this Organisation are: Multi-list file Organisation Inverted file Organisation

Multi-list file Organisation Multi-list file organisation is a multi-index linked file organisation. A linked file organisation is a logical organisation where physical ordering of records is not of concern. In linked organisation the sequence of records is governed by the links that determine the next record in sequence. Linking of records can be unordered but such a linking is very expensive for searching of information from a file. Therefore, it may be a good idea to link records in the order of increasing primary key. This will facilitate insertion and deletion algorithms. Also this greatly helps the search performance. In addition to creating order during linking, search through a file can be further facilitated by creating primary and secondary indexes. All these concepts are supported in the multi-list file organisation. Let us Consider the concepts data as given in Figure an The record explain theseemployee further with the help of 13. example. numbers are given as alphabets for better description. Assume that the Empid is the key field of the data records. Let us explain the Multi-list file organisation for the data file.

Since, the primary key of the file is Empid, therefore the linked order of records should be defined as B (500), E(600), D(700), A(800), C(900). However, as the file size will grow the search performance of the file would deteriorate. Therefore, we can create a primary index on the file (please note that in this file the records are in the logical sequence and tied together using links and not physical placement, therefore, the primary index will be a linked index file rather than block indexes). Let >= 500 but < 700 us > = 700 primary index for this file having the Empid values in the range: create a but < 900 >= 900 but < 1100 Inverted File Organisation Inverted file organisation is one file organisation where the index structure is most important. In this organisation the basic structure of file records does not matter much. This file organisation is somewhat similar to that of multi-list file organisation with the key difference that in multi-list file organisation index points to a list, whereas in inverted file organisation the index itself contains the
26 | P a g e

list. Thus, maintaining the proper index through proper structures is an important issue in the design of inverted file organisation. Please note the following points for the inverted file organisation: The index entries are of variable lengths as the number of records with the same key value is changing, thus, maintenance of index is more complex than that of multi-list file organisation. The queries that involve Boolean expressions require accesses only for those records that satisfy the query in addition to the block accesses needed for the indices. For example, the query about Female, MCA employees can be solved by the Gender and Qualification index. You just need to take intersection of record numbers on the two indices. Thus, any complex query requiring Boolean expression can be handled easily through the help of indices. Similarities: Both organisations can support: An index for primary and secondary key The pointers to data records may be direct or indirect and may or may not be sorted. Differences: The indexes in the two organisations differ as: In a Multi-list organisation an index entry points to the first data record in the list, whereas in inverted index file an index entry has address pointers to all the data records related to it. A multi-list index has fixed length records, whereas an inverted index contains variable length records However, the data records do not change in an inverted file organisation whereas in Some of the implications of these differences are: the multi-list file organisation a record contains the links, one per created index. An index in a multi-list organisation can be managed easily as it is of fixed length. The query response of inverted file approach is better than multi-list as the query can be answered only by the index. This also helps in reducing block accesses.

27 | P a g e

(iv)

Sol:

Given the University system in problem 1 (v). Create the suitable relational design for the ER diagram so created. Identify all the constraints on various attributes and tables. The table should be normalised and properly structured along with field names and constraints. You must also ER diagram we can construct a relational databasethe database. For every identify the set of possible queries and reports for which is a collection of

tables. Following are the set of steps used for conversion of ER diagram to a relational database. Conver sion of entity sets: I) For each strong entity type E in the ER diagram, we create a relation R containing all the simple attributes of E. The primary key of the relation R will be one of the key attributes of R. Student Roll No: Primary Key Name Address

Course C_Code: Primary Key Cname Credit Ctype Semester

Faculty ID: Primary Key Name Address Basic Salary Expertise

Department D_no: Primary Key D_name

28 | P a g e

II) For each weak entity type W in the E R Diagram, we create another relation R that contains all simple attributes of W. If E is an owner entity of W then key attribute of E is also included in R. This key attribute of R is set as a foreign key attribute of R. Now the combination of primary key attribute of owner entity type and partial key of weak entity type will form the key of the weak entity type. Table below shows the weak entity GUARDIAN, where the key field of student entity RollNo has been added. Guardian RollNo, Name : Primary Key Address Relation

Conver sion of r elationship sets: Binar y Relationships:


I) One-to-one relationship: For each 1:1 relationship type R in the ER diagram involving two entities E1 and E2 we choose one of entities (say E1) preferably with total participation and add primary key attribute of another entityE2 as a foreign key attribute in the table of entity (E1). We will also include all the simple attributes of relationship type R in E1 if any. For example, the DEPARTMENT relationship has been extended to include head-Id and attribute of the relationship. Please note we will keep information in this table of There is one head and Date from which s/he is the head. and DEPARTMENT. We choose only current Head_of 1:1 relationship between FACULTY DEPARTMENT entity having total participation and add primary key attribute ID of FACULTY entity as a foreign key in DEPARTMENT entity named as Head_ID. Now the DEPARTMENT table will be as follows:

Department D_No D_Name Head_ID Date-from

II) One-to-many relationship: For each 1: n relationship type R involving two entities E1 and E2, we identify the entity type (say E1) at the n-side of the relationship type R and include primary key of the entity on the other side of the relation (say E2) as a foreign key attribute in the table of E1. We include all simple attributes (or simple components of a composite attributes of R (if any) in the table of E1). Faculty (contains works In Relationship) ID Name Address Basic Salary D_NO

29 | P a g e

Course (contains Program Relationship) RollNo Pname Fees Duration C_Code Cname Credit Ctype Semester

Department (contains Enroll Relationship) D_no Semester RollNo Name Address

III) Many-to-many relationship: For each m:n relationship type R, we create a new table (say S) to represent R. We also include the primary key attributes of both the participating entity types as a foreign key attribute in S. Any simple attributes of the m:n relationship type (or simple components of a composite attribute) is also included For example, the m: n relationship taught-by between entities COURSE and FACULTY should be as attributes as a new table. The structure of the table will include primary key of COURSE and represented of S. primary key of FACULTY entities. Taught by ID {Primary Key of Faculty Table} C_Code {Primary Key of Course Table}

30 | P a g e

Question 3: (i) Consider the following schema Customer (cu_ID, cu_name, cu_type, cu_credit_limit) Purchase (cu_ID, it_ID, pu_dateofsale, pu_quantity) Item (it_ID, it_name, it_costperUnit) Perform the following operations on these tables using SQL

20 marks

a. Create the three tables giving suitable domains and constraints including referential actions. Sol: SQL> create table Customer( cu_ID varchar2(16), cu_name varchar2(32), cu_type varchar2(16), cu_credit_limit number(10,2), constraint cu_pk primary key(cu_ID), );

SQL> create table Item( it_ID varchar2(16), it_name varchar2(32), it_costperUnit number(10,2), constraint it_pk primary key(it_ID), );

SQL> create table Purchase( cu_ID varchar2(16) , it_ID varchar2(16) , pu_dateofsale date, pu_quantity number(3), constraint cufk foreign key(cu_ID)references Customer(cu_ID), constraint itfk foreign key(it_ID)references Item(it_ID) ); /*Inserting Values*/ SQL> Insert into Customer (cu_ID, cu_name, cu_type, cu_credit_limit)values(c001, Andy Robinson, Retailer,10000); SQL> Insert into Customer (cu_ID, cu_name, cu_type, cu_credit_limit)values(c002, Rajesh Roberts, Customer,9000);

SQL> select * from customer; CU_ID ------------C001 C002 CU_NAME -------------------------------Andy Robinson Rajesh Roberts CU_TYPE ---------------Retailer Customer CU_CREDIT_LIMIT --------------------------10000 9000

SQL> insert into purchase(cu_id, it_id, pu_dateofsale, pu_quantity)values(c001, 2, to_date('22mar-11'),2);


31 | P a g e

SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,3,to_date('11sep11'),5); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c002,1,to_date('20sep11'),10); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,1,to_date('19sep11'),10); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,3,to_date('19sep11'),10); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,2,to_date('19sep11'),10); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,1,to_date('19sep11'),10); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c002,2,to_date('01apr11'),32); SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,2,to_date('30SQL> select * from purchase; augCU_ID IT_ID PU_DATEOF PU_QUANTITY 11'),32); ----------------------------------- --------------------SQL> insert into purchase(cu_id,it_id,pu_dateofsale,pu_quantity)values(c001,3,to_date('302 22-MAR-11 2 aug- c001 c001 3 11-SEP-11 5 11'),20); c002 1 20-SEP-11 10 c001 1 19-SEP-11 10 c001 3 19-SEP-11 10 c001 2 19-SEP-11 10 c001 1 19-SEP-11 10 c002 2 01-APR-11 32 c001 2 30-AUG-11 32 c001 3 30-AUG-11 20 10 rows selected. b. Add one additional filed it_type in the item table, create a secondary index on it_name and drop any one constraint that you have created in step (a) Ans: /* Add one additional filed it_type in the item table*/ SQL> ALTER TABLE Item add it_type varchar2(16) not null; /* create a secondary index on it_name */ SQL> ALTER TABLE Item alter column it_name varchar2(32) unique; /* drop any one constraint that you have created in step (a)*/ SQL> ALTER TABLE purchase drop constraint cufk

32 | P a g e

c. Create a view named SingleCustomer that shows the customer all the purchases made by him/her only. Ans:
Syntax CREATE VIEW view [(field1[, field2[, ]])] AS selectstatement The CREATE VIEW statement has these parts:
Part view field1, field2 selectstatement Description The name of the view to be created. The name of field or fields for the corresponding fields specified in selectstatement. A SQL SELECT statement. For more information, see SELECT Statement.

SQL> create view singlecustomer(customer_ID,Item_Code,sold_on,qty,name,unitPrice, amount)as select purchase.cu_id, purchase.it_id , purchase.pu_dateofsale , purchase.pu_quantity ,item.it_name, item.it_costperunit, item.it_costperunit*purchase.pu_quantity from purchase,item where item.it_id=purchase.it_id; SQL> select * from singlecustomer1 where custid=c001;

CUSTID ----------C001 C001 C001 C001 C001 C001 C001 C001

ITEM ---------2 3 1 3 2 1 2 3

SOLD_ON -------------22-MAR-11 11-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 19-SEP-11 30-AUG-11 30-AUG-11

QTY ------2 5 10 10 10 10 32 20

NAME -------------Cricket Bat

UNITPRICE ---------------990 590 150 590 990 150 990 590

AMOUNT ------------1980 2950 1500 5900 9900 1500 31680 11800

Logitech Optical Mouse Complan Logitech Optical Mouse Cricket Bat Complan Cricket Bat Logitech Optical Mouse

d. Find the list of the customer names and type of those customers who have purchased an item named Cricket Bat SQL> select cu_name,cu_credit_limit from customer where cu_id in(select cu_id from purchase where it_id in( select it_id from item where it_name='Cricket Bat')); CU_NAME -------------------------------Andy Robinson Rajesh Roberts CU_TYPE --------------Retailer Customer

33 | P a g e

e. List the customer names and credit limit of those customers who have bought more than five items. SQL> select cu_name,cu_type from customer where cu_id in( SQL> select cu_id from purchase group by cu_id having count(*)>5); CU_NAME -------------------------------Andy Robinson f. CU_CREDIT_LIMIT ---------------------------10000

Create the list of items purchased by a customer whose ID is C001 in the decreasing order of cost per unit of those items. SQL> select it_name, it_costperunit from item where it_id in(select it_id from purchase where cu_id=c001) order by it_costperunit desc; IT_NAME -------------------------------Cricket Bat Logitech Optical Mouse Complan IT_COSTPERUNIT ------------------------990 590 150

g. Calculate the total amount that is to be paid by customer C001 on all the items purchased on 30th August 2011 by him/her. SQL> select sum(item.it_costperunit*purchase.pu_quantity) as amount from item,purchase where purchase.pu_dateofsale='30-aug-11' and item.it_id=purchase.it_id and cu_id=c001; AMOUNT ------------43480

34 | P a g e

(ii)

Consider the following transactions in a Bank o Update all the Bank accounts to add monthly interest @6% per annum. You may assume that the interest is calculated on the balance in that account at the time of calculation of interest. o Mr X withdraws from the account A001 an amount of Rs1,00,000/- o Mr Z deposits an amount of Rs 50,000/- in the account A001. Write the pseudo code for all the three transactions. Also explain the ACID properties in the context of any one of these transactions. What are the possible problems that may be encountered if these transactions are executed concurrently? Show one non-serliasable schedule for concurrent execution of these transactions. Use two phase locking protocol and rewrite the pseudo codes of the transactions. Show a serialsable schedule using these pseudo codes. Draw the precedence graph for at least one schedule.

Sol: o

PSEUDO CODES:

Update all the Bank accounts to add monthly interest @6% per annum. You may assume that the interest is calculated on the balance in that account at the time of calculation of interest. ;Assume that account number is in range of X and Y serially TRANSACTION ADDINTEREST (x, y) Begin transaction I=x Do { If i exist then Read x.balance i.balance+=i.balance*0.06 Write i.balance Commit Else Display Transaction cannot be processed }While(i<y); End transaction;

End transaction; o Mr X withdraws from the account A001 an amount of Rs1,00,000/-

; Assume transaction is called with withdrawal_amount =100000 and x=A001 TRANSACTION WITHDRAWAL (withdrawal_amount,x) Begin transaction IF X exi st then READ X.balance IF X.balance > withdrawal_amount THEN SUBTRACT withdrawal_amount WRITE X.balance

35 | P a g e

COMMIT ELSE DISPLAY TRANSACTION CANNOT BE PROCESSED ELSE DISPLAY ACCOUNT X DOES NOT EXIST End transaction;

Mr Z deposits an amount of Rs. 50,000/- in the account A001.

; Assume transaction is called with amount =50000 and x=A001 TRANSACTION DEPOSIT (amount,x) Begin transaction IF X exi st then READ X.balance ADD amount WRITE X.balance COMMIT ELSE DISPLAY ACCOUNT X DOES NOT EXIST End transaction;

A transaction is a unit of program execution that accesses and possibly updates various data items. Usually, a transaction is initiated by a user program written in a high-level data- manipulation language or programming language (for example, SQL, COBOL, C, C++, or Java), where it is delimited by statements (or function calls) of the form begin transaction and end transaction. The transaction consists of all operations executed between Atomicity. Either all operations of the begin transaction and end transaction. To the transaction are the data, we require that database, or ensure integrity of reflected properly in the the database none system are. maintain the following properties of the transactions: Consistency. Execution of a transaction in isolation (that is, with no other transaction executing concurrently) preserves the consistency of the database. Isolation. Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system. Durability. After transaction completes successfully, acronym is derived from to first These properties areaoften called the ACID properties; the the changes it has made thethe letter of database persist, even if there are each of the four properties. system failures. To gain a better understanding of ACID properties and the need for them, consider a simplified banking system consisting of several accounts and a set of transactions that access and update those accounts. For the time being, we assume that the database permanently resides on disk, but that some portion of it Transactions access data using two operations: is temporarily residing in main memory. read(X), which transfers the data item X from the database to a local buffer belonging to the transaction that executed the read operation. write(X), which transfers the data item X from the the local buffer of the transaction that executed the write back to the database.

36 | P a g e

In a real database system, the write operation does not necessarily result in the immediate update of the data on the disk; the write operation may be temporarily stored in memory and executed on the disk later. For now, however, we shall assume that the write operation updates the database immediately. Let Ti be a transaction that transfers $50 from account A to account B. This transaction Ti: read(A); can be defined as A := A 50; write(A); read(B); B := B + 50; write(B). Let us now consider each of the ACID requirements. (For ease of presentation, we consider them in an order different from the order A-C-I-D). Consistency: The consistency requirement here is that the sum of A and B be unchanged by the execution of the transaction. Without the consistency requirement, money could be created or destroyed by the transaction! It can be verified easily that, if the database is consistent before an execution of the transaction, the database remains consistent after the execution of the transaction. Ensuring consistency for an individual transaction is Atomicity: Suppose that, just before the execution of transaction Ti the values of accounts A facilitated by the responsibility of the application programmer who codes the transaction. This task may be and B are $1000 automatic testing of integrity constraints. and $2000, respectively. Now suppose that, during the execution of transaction Ti, a failure occurs that prevents Ti from completing its execution successfully. Examples of such failures include power failures, hardware failures, and software errors. Further, suppose that the failure happened after the write(A) operation but before the write(B) operation. In this case, the values of accounts A and B reflected in the database are $950 and $2000. The system destroyed $50 as a result of this failure. In particular, we note that the sum A + B is no longer preserved. Thus, because of the failure, the state of the system no longer reflects a real state of the world that the database is supposed to capture. We term such a state an inconsistent state. We must ensure that such inconsistencies are not visible in a database system. Note, however, that the system must at some point be in an inconsistent state. Even if transaction Ti is executed to completion, there exists a point at which the value of account A is $950 and the value of account B is $2000, which is clearly an inconsistent state. This state, however, is eventually replaced by the consistent state where the value of account A is $950, and the The basic idea behind ensuring atomicity is this: The database system keeps track (on disk) of the old value of values of is $2050. Thus, if the transaction never started or was guaranteed to complete, such an account B any data on which a transaction performs a write, and, if the transaction does not complete its execution, inconsistent the would not be visible except during the execution of the transaction. That is the reason for the state database system restores the old values to make it appear as though the transaction never executed. atomicity Ensuring requirement: If the atomicity property is present, all actions of the transaction are reflected in the atomicity is the responsibility of the database system itself; specifically, it is handled by a component database, or Durability: Once the execution of the transaction completes successfully, and the user who initiated the calledare. none the transaction has been notified that the transfer of funds has taken place, it must be the case that no system transaction-management component. failure will result in a loss of data corresponding to this transfer of funds. The durability property guarantees that, once a transaction completes successfully, all the updates that it carried out on the database persist, even if there is a system failure after the transaction completes execution. We assume for now that a failure of the computer a. system mayThe updates carried out main memory, but data written to disk are never lost. We can result in loss of data in by the transaction have been written to disk before the transaction completes. guarantee b. Information about the durability by ensuring that either updates carried out by the transaction and written to disk is sufficient to enable the database to reconstruct the updates when the database system is restarted after the failure.

37 | P a g e

Ensuring durability is the responsibility of a component of the database system called the recoveryanagement component. The transaction-management component and the recovery-management component are closely related. Isolation: Even if the consistency and atomicity properties are ensured for each transaction, if several transactions are executed concurrently, their operations may interleave in some undesirable way, resulting in an inconsistent state. For example, as we saw earlier, the database is temporarily inconsistent while the transaction to transfer funds from A to B is executing, with the deducted total written to A and the increased total yet to be written to B. If a second concurrently running transaction reads A and B at this intermediate point and computes A+B, it will observe an inconsistent value. Furthermore, if this second transaction then performs updates Pr oblems of Concur r ent Tr ansactions on A and B based on Suppose the two transactions read, the run concurrently and they happen to be 1. L ost Updates:the inconsistent values that it T3 and T4 database may be left in an inconsistent state even after interleaved in the following way (assume the initial value of X as 10000): both transactions have completed.

After the execution of both the transactions the value X is 13000 while the semantically correct value should be 8000. The problem occurred as the update made by T3 has been overwritten by T4. The root cause of the problem was the fact that both the transactions had read the value of X as 10000. Thus one of the two updates has been lost and we say that a lost update has occurred. There is one more way in which the lost updates can arise. Consider the following part of some transactions:

Here T5 & T6 updates the same item X. Thereafter T5 decides to undo its action and rolls back causing the value of X to go back to the original value that was 2000. In this case also the update performed by T6 had got lost and a lost update is said to have occurred. 2. Unr epeatable r eads: Suppose T7 reads X twice during its execution. If it did not update X itself it could be very disturbing to see a different value of X in its next read. But this could occur if, between the two read operations, another transaction modifies X.

38 | P a g e

Thus, the inconsistent values are read and results of the transaction may be in error. 3. Dir ty Reads: T10 reads a value which has been updated by T9. This update has not been committed and T9 aborts.

Here T10 reads a value that has been updated by transaction T9 that has been aborted. Thus T10 has read a value that would never exi st in the database and hence the problem. Here the problem is primarily of isolation of transaction. 4. I nconsistent Analysis: The problem as shown with transactions T1 and T2 where two transactions interleave to produce incorrect result during an analysis by Audit is the example of such a problem. This problem occurs when more than one data items are being used for analysis, while another transaction has modified some of those values and some are yet to be modified. Thus, an analysis transaction reads values from the inconsistent state of the database that result in inconsistent analysis. Thus, we can conclude that the prime reason of problems of concurrent transactions is that a transaction reads an inconsistent state of the database that has been created by other transaction NON-SERLIASABLE SCHEDULE:

SCHEDULE Read A001 A001=A001-100000 Write A001 Display Result Read A001 A001=A001+50000 Write A001 Display Result

T1 Read A001 A001=A001-100000 Write A001 Display Result

T2

Read A001 A001=A001+50000 Write A001 Display Result

Using Two Phase Locking Protocol


PSEUDO CODES:

Update all the Bank accounts to add monthly interest @6% per annum. You may assume that the interest is calculated on the balance in that account at the time of calculation of interest. ;Assume that account number is in range of X and Y serially o TRANSACTION ADDINTEREST (x, y)
39 | P a g e

Begin transaction I=x Do { If i exist then Lock x Read x.balance i.balance+=i.balance*0.06 Write i.balance Commit Unlock x Else Display Transaction cannot be processed }While(i<y); End transaction;

End transaction; o Mr X withdraws from the account A001 an amount of Rs1,00,000/-

; Assume transaction is called with withdrawal_amount =100000 and x=A001 TRANSACTION WITHDRAWAL (withdrawal_amount,x) Begin transaction IF X exi st then READ X.balance IF X.balance > withdrawal_amount THEN Lock X SUBTRACT withdrawal_amount WRITE X.balance COMMIT Unlock X ELSE DISPLAY TRANSACTION CANNOT BE PROCESSED ELSE DISPLAY ACCOUNT X DOES NOT EXIST End transaction;

Mr Z deposits an amount of Rs. 50,000/- in the account A001.

; Assume transaction is called with amount =50000 and x=A001 TRANSACTION DEPOSIT (amount,x) Begin transaction IF X exi st then READ X.balance Lock X ADD amount WRITE X.balance COMMIT Unlock X

40 | P a g e

ELSE DISPLAY ACCOUNT X DOES NOT EXIST End transaction; SERIALISABLE SCHEDULE:

SCHEDULE Lock A001 Read A001 A001=A001-100000 Write A001 Display Result Unlock A001 Lock A001 Read A001 A001=A001+50000 Write A001 Display Result Unlock A001

T1 Lock A001 Read A001 A001=A001-100000 Write A001 Display Result Unlock A001

T2

Lock A001:Granted Read A001 A001=A001+50000 Write A001 Display Result Unlock A001

Precedence Graph for Deposit Schedule:

Lock A001

Read A001 Add 50000

Start

Commit

Abort/ Rollback

Unlock A001

41 | P a g e

(iii)

You have designed the relations, query and reports for the University database in question 2 (v). Now, implement your design using SQL in a suitable RDBMS. Enter meaningful data and test your queries and reports. /* Table Creation in Oracle Database */ /* Entity Creation */ create table studentIgn( RollNo varchar2(6)primary key, Name varchar2(32)not null, address varchar2(32)not null); create table courseIgn( c_code varchar2(6)primary key, cname varchar2(32)not null, credit number(1)not null, ctype varchar2(32)not null, semester number(1)not null); create table faculty( id varchar2(6)primary key, name varchar2(32)not null, address varchar2(32)not null, basicSalary number(10,2)not null, expertise varchar2(32)not null); create table departmentIgn( d_no varchar2(6)primary key, d_name varchar2(32)not null unique); create table guardianIgn( RollNo varchar2(6)primary key, name varchar2(32)not null, address varchar2(32)not null, relation varchar2(32)not null, constraint rnfk foreign key(RollNo)references studentIgn(RollNo)); /*Relationship Tables*/ create table programIgn( RollNo varchar2(6)not null, c_code varchar2(6)not null, pname varchar2(32)not null, fees number(10,2)not null, duration varchar2(32)not null, constraint stu_rnfk foreign key(RollNo)references studentIgn(RollNo), constraint C_codefk foreign key(c_code)references courseIgn(c_code)); create table enrollIgn( RollNo varchar2(6)not null, d_no varchar2(6)not null, semester number(1)not null, constraint en_rnfk foreign key(RollNo)references studentIgn(RollNo), constraint en_dnfk foreign key(d_no)references departmentIgn(d_no));

Sol:

42 | P a g e

create table taughtbyIgn( c_code varchar2(6)not null, id varchar2(6)not null, constraint taught_ccodefk foreign key(c_code)references courseIgn(c_code), constraint taught_idfk foreign key(id)references faculty(id)); create table worksinIgn( d_no varchar2(6)not null, id varchar2(6)not null, constraint works_dnfk foreign key(d_no)references departmentIgn(d_no), constraint works_idfk foreign key(id)references faculty(id) ); create table headofIgn( d_no varchar2(6)not null, id varchar2(6)not null, dateFrom date not null, constraint hod_dnfk foreign key(d_no)references departmentIgn(d_no), constraint hod_idfk foreign key(id)references faculty(id) ); /* Data Insertion into Student table*/ insert into studentIgn(RollNo,Name,address)values ('M2323','Ravindra Manish','Main Road, Porbander,Gujarat'); insert into studentIgn(RollNo,Name,address)values('B9125','Sunil Agarwal','Srinagar'); insert into studentIgn(RollNo,Name,address)values('M4145','Arvind Sekhar','Navi Mumbai'); insert into studentIgn(RollNo,Name,address)values ('B8976','Ramesh Sinha','24,Rajendra Nagar, Patna');

/* Data Insertion into Course table*/

insert into courseIgn(c_code,cname,credit,ctype,semester)values ('MCS-11','C Programming',4,'Computer Programming',1); insert into courseIgn(c_code,cname,credit,ctype,semester)values ('MCS-23','Introduction to DBMS',4,'Computer Programming',2); insert into courseIgn(c_code,cname,credit,ctype,semester)values ('CS-73','Theory of Computer Science',4,'Computer Programming',5); insert into courseIgn(c_code,cname,credit,ctype,semester)values ('MCS-12','Computer Organisation',4,'Computer Programming',2);

/* Data Insertion into Guardian table*/ insert into guardianIgn(RollNo,name,address,relation)values ('B8976','Sri. Jagat sinha','24,Rajendra Nagar, Patna','Father'); insert into guardianIgn(RollNo,name,address,relation)values
43 | P a g e

('M2323','Sri. Sohan Singh','Main Road,Porbander','Father'); insert into guardianIgn(RollNo,name,address,relation)values ('B9125','Sri. M. K. Agarwal','Srinagar','Uncle'); insert into guardianIgn(RollNo,name,address,relation)values ('M4145','Mr. Sashi Sekhar','Navi Mumbai','Father'); /* Data Insertion into Department table*/ insert into departmentIgn(d_no,d_name)values('1','MCA'); insert into departmentIgn(d_no,d_name)values('2','BCA'); /* Data Insertion into Faculty table*/ insert into faculty(id,name,address,basicSalary,expertise)values ('M001','Dr. A. Kumar','GTB Nagar, Mumbai',70000,'C Programming'); insert into faculty(id,name,address,basicSalary,expertise)values ('B007','Dr. K. Shekhar','Mahrauli, New Delhi',65000,'Theory of Computer Science'); insert into faculty(id,name,address,basicSalary,expertise)values ('M032','Prof. H. Srinivas','T. Nagar, Chennai',85000,'Computer Organisation'); insert into faculty(id,name,address,basicSalary,expertise)values ('M011','Mr. Ramakrishna K. Naidu','Science City, Banglore',70000,'DBMS');

/* Data Insertion into Program table*/

insert into programIgn(RollNo,c_code,pname,fees,duration)values ('B8976','CS-73','BCA',4000,'3 years'); insert into programIgn(RollNo,c_code,pname,fees,duration)values ('B9125','CS-73','BCA',4000,'3 years'); insert into programIgn(RollNo,c_code,pname,fees,duration)values ('M2323','MCS-12','MCA',8000,'3 years'); insert into programIgn(RollNo,c_code,pname,fees,duration)values ('M4145','MCS-23','MCA',8000,'3 years');

/* Data Insertion into Enroll table*/

insert into enrollIgn(RollNo,d_no,semester)values('B8976','2',5);

44 | P a g e

insert into enrollIgn(RollNo,d_no,semester)values('B9125','2',5); insert into enrollIgn(RollNo,d_no,semester)values('M4145','1',2); insert into enrollIgn(RollNo,d_no,semester)values('M2323','1',1);

/* Data Insertion into Taughtby table*/

insert into taughtbyIgn(c_code,id)values('MCS-11','M001'); insert into taughtbyIgn(c_code,id)values('MCS-23','M011'); insert into taughtbyIgn(c_code,id)values('CS-73','B007'); insert into taughtbyIgn(c_code,id)values('MCS-12','M032');

/* Data Insertion into Works in table*/

insert into worksinIgn(d_no,id)values('1','M001'); insert into worksinIgn(d_no,id)values('1','M011'); insert into worksinIgn(d_no,id)values('2','M032'); insert into worksinIgn(d_no,id)values('2','B007');

/* Data Insertion into Head of Department table*/ insert into headofIgn(d_no,id,dateFrom)values('1','M032',to_date('01-jan2011')); insert into headofIgn(d_no,id,dateFrom)values('2','B007',to_date('01-apr2011')); /*Search Queries*/ /* List Student and guardian details */ select * from studentIgn,guardianIgn where studentIgn.RollNo=guardianIgn.RollNo; /* List Details of all those enrolled to MCA department */ select * from enrollIgn where d_no=1; /* List Details of all HODs */ select * from faculty where id in(select id from headofIgn); /* List all Faculty and courses taught by them */ Select * from taughtbyIgn;

45 | P a g e

Question 4:

20 marks

For the following questions use the Student schema given in Question 1 (iii). (i) In the student database the relation Marks stores the latest marks of the student. Assume that a student S001, in the subject MCS011 had obtained 40 marks. These marks were to be upgraded to 70 using an updating transaction. What would be various redo and undo entries for the database logs for the update operatrion. Explain when redo and undo would be required and how it can be performed in the context of transaction given above. Explain the Sol: concept of check point when many such update transactions are being performed. A transaction log is a record in DBMS that keeps track of all the transactions of a database system that update any data values in the database. A log contains the following information about a transaction: o A transaction begin marker o The transaction identification: The transaction id, terminal id or user id etc. o The operations being performed by the transaction such as update, delete, insert. o The data items or objects that are affected by the transaction including name of the table, row number and column number. o The before or previous values (also called UNDO values) and after or changed values (also called REDO values) of the data items that have been updated. o A pointer to the next transaction log record, if needed. o The COMMIT marker of the transaction. In a database system several transactions run concurrently. When a transaction commits, the data buffers used by it need not be written back to the physical database stored on the secondary storage as these buffers may be used by several other transactions that have not yet committed. On the other hand, some of the data buffers that may have updates by several uncommitted transactions might be forced back to the physical database, as they are no longer being used by the database. So the transaction log helps in remembering which transaction did which changes. Thus the system knows exactly how to separate the changes made by transactions that have already committed from those changes that are made by the transactions that did not yet commit. Any operation such as begin transaction, insert /delete/update and end transaction (commit), adds information to the log containing the transaction identifier and enough information to undo or redo the changes.

Transaction T1 Read Marks Marks=70 Write Marks

Transaction Log TransactionTransaction BeginId Marker YT1

OperationUndo on Student Values Table Update on 70 Marks

Redo Values 40

Transaction Commit Marker Y

46 | P a g e

Recovery using Transaction Log ValuesInitial

Marks

70

Just beforeOperation the Failure Required for Recovery 40(assuming UNDO update has been done in physical database also)

Recovered Database Values 70

The selection of REDO or UNDO for a transaction for the recovery is done on the basis of the state of the transactions. This state is determined in two steps: o Look into the log file and find all the transactions that have started. o Find those transactions that have committed. REDO these transactions. All other transactions have not committed so they should be rolled back, so UNDO them. The selection of REDO or UNDO for a transaction for the recovery is done on the basis of the state of the transactions. This state is determined in two steps: Look into the log file and find all the transactions that have started. Find those transactions that have committed. REDO these transactions. All other transactions have not committed so they should be rolled back, so UNDO them.

In the figure above four transactions are executing concurrently, on encountering a failure at time t2, the transactions T1 and T2 are to be REDONE and T3 and T4 will be UNDONE. But consider a system that has thousands of parallel transactions then all those transactions that have been committed may have to be redone and all uncommitted transactions need to be undone. That is not a very good choice as it requires redoing of even those transactions that might have been committed even hours earlier. We use checkpoints to counter this issue. Check Point:

47 | P a g e

A checkpoint is taken at time t1 and a failure occurs at time t2. Checkpoint transfers all the committed changes to database and all the system logs to stable storage (it is defined as the storage that would not be lost). At restart time after the failure the check pointed state is restored. Thus, we need to only REDO or UNDO those transactions that have completed or started after the checkpoint has been taken. The only possible disadvantages of this scheme may be that during the time of taking the checkpoint the database would not be available and some of the uncommitted values may be put in the physical database. To overcome the first problem the checkpoints should be taken at times when system load is low. To avoid the second problem some systems allow sometime to the ongoing transactions to complete without restarting

48 | P a g e

(ii) Explain the possible security threats to the student database. How will you make the database more secure? You may create an authorisation matrix for the database. Make suitable assumptions. Sol: Top Ten Database Security Threats
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Excessive Privilege Abuse Legitimate Privilege Abuse Privilege Elevation Database Platform Vulnerabilities SQL Injection Weak Audit Trail Denial of Service Database Communication Protocol Vulnerabilities Weak Authentication Backup Data Exposure

Excessive Privilege Abuse


When users (or applications) are granted database access privileges that exceed the requirements of their job function, these privileges may be abused for malicious purpose. For example, a university administrator whose job requires only the ability to change student contact information may take advantage of excessive database update privileges to change grades.

Legitimate Privilege Abuse


Users may also abuse legitimate database privileges for unauthorized purposes. Consider a hypothetical rogue healthcare worker with privileges to view individual patient records via a custom Web application. The structure of the Web application normally limits users to viewing an individual patients healthcare history multiple records cannot be viewed simultaneously and electronic copies are not allowed. However, the rogue worker may circumvent these limitations by connecting to the database using an alternative client such as MS-Excel. Using MS-Excel and his legitimate login credentials, the worker may retrieve and save all patient records.

Privilege Elevation

Attackers may take advantage of database platform software vulnerabilities to convert access privileges from those of an ordinary user to those of an administrator. Vulnerabilities may be found in stored procedures, built-in functions, protocol implementations, and even SQL statements. For example, a software developer at a financial institution might take advantage of a vulnerable function to gain the database administrative privilege. With administrative privilege, the rogue developer may turn off audit mechanisms, create bogus accounts, transfer funds, etc. Platform Vulnerabilities Vulnerabilities in underlying operating systems (Windows 2000, UNIX, etc.) and additional services installed on a database server may lead to unauthorized access, data corruption, or denial of service. The Blaster Worm, for example, took advantage of a Windows 2000 vulnerability to create denial of service conditions. SQL Injection In a SQL injection attack, a perpetrator typically inserts (or injects) unauthorized database statements into a vulnerable SQL data channel. Typically targeted data channels include stored procedures and Web application input parameters. These injected statements are then passed to the database where they are executed. Using SQL Weak Audit Trailinjection, attackers may gain unrestricted access to an entire database.

49 | P a g e

Automated recording of all sensitive and/or unusual database transactions should be part of the foundation underlying any database deployment. Weak database audit policy represents a serious organizational risk on many levels. o Regulatory Risk - Organizations with weak (or sometimes non-existent) database audit mechanisms will increasingly find that they are at odds with government regulatory requirements. Sarbanes-Oxley (SOX) in the financial services sector and the Healthcare Information Portability and Accountability Act (HIPAA) in the healthcare sector are just two examples of government regulation with clear database audit requirements. o Deterrence Like video cameras recording the faces of individuals entering a bank, database audit mechanisms serves to deter attackers who know that database audit tracking provide investigators with forensics link intruders to a crime. o Detection and Recovery Audit mechanisms represent the last line of database defense. If an attacker manages to circumvent other defenses, audit data can identify the existence of a violation after the fact. Audit data may then be used to link a violation to a particular user and/or repair the system. o Lack of User Accountability When users access the database via Web applications (such as SAP, Oracle E-Business Suite, or PeopleSoft), native audit mechanisms have no awareness of specific user identities. In this case, all user activity is associated with the Web application account name. Therefore, when native audit logs reveal fraudulent database transactions, there is no link to the responsible user. o Performance Degradation - Native database audit mechanisms are notorious for consuming CPU and disk resources. The performance decline experienced when audit features are enabled forces many organizations to scale back or altogether eliminate auditing. o Separation of Duties Users with administrative access (either legitimately or maliciously obtained see privilege elevation) to the database server can simply turn off auditing to hide fraudulent activity. Audit duties should ideally be separate from both database administrators and the database server platform. o Limited Granularity Many native audit mechanisms do not record details necessary to support attack detection, forensics and recovery. For example, database client application, source IP addresses, query response attributes, and failed queries (an important attack reconnaissance indicator) are not recorded by many native mechanisms. o Proprietary Audit mechanisms are unique to database server platform - Oracle logs are different from MS-SQL, MS-SQL logs are different form Sybase, etc. For organizations with mixed database environments, this virtually eliminates implementation of uniform, scalable audit processes across the enterprise.

Denial of Service
Denial of Service (DOS) is a general attack category in which access to network applications or data is denied to intended users. Denial of service (DOS) conditions may be created via many techniques many of which are related to previously mentioned vulnerabilities. For example, DOS may be achieved by taking advantage of a database platform vulnerability to crash a server. Other common DOS techniques include data corruption, network flooding, and server resource overload (memory, CPU, etc.). Resource overload is particularly common in database environments.

Database Communications Protocol Vulnerabilities


A growing number of security vulnerabilities are being identified in the database communication protocols of all database vendors. Four out of seven security fixes in the two most recent IBM DB2 FixPacks address protocol vulnerabilities1. Similarly, 11 out of 23 database vulnerabilities fixed in the most recent Oracle quarterly patch relate to protocols. Fraudulent activity targeting these vulnerabilities can range from unauthorized data access, to data corruption, to denial of service. The SQL Slammer2 worm, for example, took advantage of a flaw in the Microsoft SQL Server protocol to force denial of service. To make matters worse, no record of these fraud vectors will exist in the native audit trail since Weak operations are not covered by native database audit mechanisms. protocol Authentication

50 | P a g e

Weak authentication schemes allow attackers to assume the identity of legitimate database users by stealing or otherwise obtaining login credentials. An attacker may employ any number of strategies to obtain credentials. o Brute Force - The attacker repeatedly enters username/password combinations until he finds one that works. The brute force process may involve simple guesswork or systematic enumeration of all possible username/password combinations. Often an attacker will use automated programs to accelerate the brute force process. o Social Engineering A scheme in which the attacker takes advantage the natural human tendency to trust in order to convince others to provide their login credentials. For example, an attacker may present himself via phone as an IT manager and request login credentials for system maintenance purposes. o Direct Credential Theft An attacker may steal login credentials by copying post-it notes, password files, etc.

Backup Data Exposure


Backup database storage media is often completely unprotected from attack. As a result, several high profile security breaches have involved theft of database backup tapes and hard disks.

Steps to make database more Secure 1. Physical: The site or sites containing the computer system must be physically secured against illegal entry of unauthorised person 2. Human: An Authorisation is given to a user to reduce the chance of any information leakage and unwanted manipulations. 3. Operating System: Even though foolproof security measures are taken to secure database systems, weakness in the operating system security may serve as a means of unauthorised access to the database. 4. Network: Since databases allow distributed or remote access through terminals or network, software level security within the network software is an important issue. 5. Database system: The data items in a database need a fine level of access control. For example, a user may only be allowed to read a data item and is allowed to issue queries but would not be allowed to deliberately modify the data. It is the responsibility of the database system to ensure that these access restrictions are not violated. Creating database views is a very useful mechanism of ensuring database security. We Can also talk of views to restrict read access. To ensure database security requires implementation of security at all the levels as above. The Database Administrator (DBA) is responsible for implementing the database security policies in a database system. The organisation or data owners create these policies. DBA creates or cancels the user accounts assigning appropriate security rights to user accounts including power of granting and revoking certain privileges further to other users Authorisation Authorisation is the culmination of the administrative policies of the organisation. As the name specifies, authorisation is a set of rules that can be used to determine which ser has what type of access to which portion of the database. The following forms of authorisation are permitted on 1) READ: it allows reading of data object, but not modification, deletion or insertion of data database items: object. 2) INSERT: allows insertion of new data, but not the modification of existing data, e.g., insertion of tuple in a relation 3) UPDATE: allows modification of data, but not its deletion. Items like primary key attributes may not be modified

51 | P a g e

4) DELETE: allows deletion of data only. A user may be assigned all, none or a combination of these types of Authorisation which are broadly called access authorisations. Student (st_ID, st_name, st_programme) Subject (su_ID, su_name, su_credits) Marks (st_ID, su_ID, ma_marks) A sample authorisation matrix for student database Object St_name St_programme Su_id Su_name AllAllRead All ReadReadRead Read Read Not Accessible Read

Subject St_id Manager Read General Read User Guest Read

Su_credits ma_marks All All Read Insert

NotNotRead Accessible Accessible

(ii)

Sol:

Assume that the student schema is to be implemented as distributed database. The student data needs to kept as follows: a. The study centres stores data of all those students who belong to that centre b. The complete data of all the students is to be stored at University head quarter. Write SQL queries that will fragment the data as per the need. Also explain if the proposed fragmentation is vertical or horizontal. Explain if any data replication is required. SQL> select * from student where sc_id=sc001; This fragment is Horizontal Fragmentation that groups together the tuples in a relation that are collectively used by the important transactions. A horizontal fragment is produced by specifying a WHERE clause condition that performs a restriction on the tuples in the relation. It can also be defined using the Selection operation of the relational algebra.

This model uses selective replication where only some data is replicated at study centres while the rest remains at the head office. Selective replication: This is a combination of creating small fragments of relation and replicating them rather than a whole relation. The data should be fragmented on need basis of various sites, as per the frequency of use, otherwise data is kept at a centralised site. The objective of this strategy is to have just the advantages of the other approach but none of the disadvantages. This is the most commonly used strategy as it provides flexibility.
52 | P a g e

(iii)

Sol:

You are asked to design a two tier client server system for the student database what features/functions will be made available on the client side and what on the server side. How will you distribute the functions/ responsibilities of the student database in the 3 tier model? Structure of Client Server Systems In client/server architecture, clients represent users who need services while servers provide services. Both client and server are a combination of hardware and software. Servers are separate logical objects that communicate with clients over a network to perform tasks together. A client makes a request for a service and receives a reply to that request. A server receives and processes a request, and sends back the required response. The client/server systems may contain two different types of architecture 2-Tier and 3-Tier Client/Server Architectures Every client/server application contains three functional units:

Presentation logic which provides the human/machine interaction (the user interface). The presentation layer handles input from the keyboard, mouse, or other input devices and provides output in the form of screen displays. For example, the ATM machine of a bank provides logic is the functionality provided to an application program. For example, Business such interfaces. software that enables a customer to request to operate his/her balance on his/her account with the bank is business logic. It includes rules for withdrawal, for minimum balance etc. It is often called business logic because it contains the business rules that drive a given enterprise. The bottom layer provides the generalized services needed by the other layers including file services, print services, communications services and database services. One example of such M odels 2-Tier Client/Ser ver a service may be to provide the records of customer accounts Initial two-tier (client/server) applications were developed to access large databases available on the server side and incorporated the rules used to manipulate the data with the user interface into the client application. The primary task of the server was simply to process as many requests for data storage and retrieval as possible. Two-tier client/server provides the user system interface usually on the desktop environment to its users. The database management services are usually on the server that is a more powerful machine and services many clients. Thus, 2-Tier client-server architecture splits the processing between the user system interface environment and the database management server In 2-tier client/server applications, the business logic provides stored user interface on the client environment. The database management server also is put inside the procedures and triggers. or There within the database on the server that provide tools to procedures. This results in division of the are a number of software vendors in the form of stored simplify the development of applications business logic between the client and server. File servers and database servers with stored for procedures the two-tier client/server architecture. are examples of 2-tier architecture. is a good solution for distributed computing. Please note the The two-tier client/server architecture use of words distributed computing and not distributed databases. A 2-tier client server system may have a centralised database management system or distributed database system at the server or servers. A client group of clients on a LAN can consist of a dozen to 100 people interacting simultaneously. A client server system does have a number of limitations. When the number of users exceeds 100,
53 | P a g e

performance begins to deteriorate. This limitation is a result of the server maintaining a connection via communication messages with each client, even when no work is being done. A second limitation of the two-tier architecture is that implementation of processing management services using vendor proprietary database procedures restricts flexibility and choice of DBMS for applications. The implementation of the two-tier architecture provides limited flexibility in moving program functionality from one server to another without manually regenerating procedural code. Some of the major functions performed by the client of a two-tier application are: present a user interface, gather and process user input, perform the requested processing, report the status of the request. This sequence of commands can be repeated as many times as necessary. Because servers provide only access to the data, the client uses its local resources to perform most of the processing. The client application must contain information about where the data resides and how it is organised in the server database. Once the data has been retrieved, the client is responsible for formatting and displaying it to the user. Features to be made Available in Client Side: 1. User Login 2. Data entry using Front End Forms 3. Searching and Sorting 4. Report Generation 5. Presentation Logic 6. Business logic 7. Database access(Sql) Features to be made Available in Server Side: 1. Master Database 2. Integrity checks on data entry 3. Backup and restore 4. User Accounts 5. Table Creation/Modification 6. Business Logic in the form of stored procedures

3-tier architecture As the number of clients increases the server would be filled with the client requests. Also, because much of the processing logic was tied to applications, changes in business rules lead to expensive and time-consuming alterations to source code. architecture tools continue to drive many small-scale Although the ease and flexibility of two-tier business applications, the need for faster data access and rapid developmental and maintenance timelines has persuaded systems developers to seek out a new way of creating distributed applications.
The three-tier architecture emerged to overcome the limitations of the two tier architecture (also referred to as the multi-tier architecture). In the three-tier architecture, a middle tier was added between the user system interface client environment and the database management server environment. The middle tier may consist of transaction processing monitors, message servers, or application servers. The middle-tier can perform queuing, application execution, and database staging. For example, on a middle tier that provides queuing, the client can deliver its request to the middle layer and simply gets disconnected because the middle tier will access the data and return the answer to the client. In 54 | P a g e

addition the middle layer adds scheduling and prioritisation for various tasks that are currently being performed. The three-tier client/server architecture has improved performance for client groups with a large number of users (in the thousands) and improves flexibility when compared to the two-tier approach. Flexibility is in terms of simply moving an application on to different computers in three-tier architecture. It has become as simple as drag and drop tool. Recently, mainframes have found a new use as servers in three-tier architectures: In 3-tier client/server applications, the business logic resides in the middle tier, separate from the data and user interface. In this way, processes can be managed and deployed separately from the user interface and the database. Also, 3-tier systems can integrate data from multiple sources.
3-Tier Client /Server Supports 1. Multiple operating systems 2. One or more programming languages 3. Local and remote databases 4. Inter-application communications through network 5. Message routing

Distribution of functions responsibilities of student database in 3 tier model: Client with Multiple Operating System: 1. Data Entry using forms 2. Presentation Logic Middleware: 1. Business Logic 2. Data access (sql) Server: 1. Master Database 2. Backup/Restore 3. Integrity Constraints 4. Table Creation/modification

55 | P a g e