Sie sind auf Seite 1von 28

Assignment Set - 1

(MI0034)

25

Q.1 Differentiate between Traditional File System & Modern Database System? Describe the properties of Database & the Advantage of Database?
Differences between Traditional File system and Modern Database
Traditional File System Traditional File system is the system that was followed before the advent of DBMS i.e., it is the older way. In Traditional file processing, data definition is part of the application program and works with only specific application. File systems are Design Driven; they require design/coding change when new kind of data occurs. E.g.: In a traditional employee the master file has Emp_name, Emp_id, Emp_addr, Emp_design, Emp_dept, Emp_sal, if we want to insert one more column Emp_Mob number then it requires a complete restructuring of the file or redesign of the application code, even though basically all the data except that in one column is the same. Traditional File system keeps redundant [duplicate] information in many locations. This might result in the loss of Data Consistency. For e.g.: Employee names might exist in separate files like Payroll Master File and also in Employee Benefit Master File etc. Now if an employee changes his or her last name, the name might be changed in the pay roll master file but not be changed in Employee Benefit Master File etc. This might result in the loss of Data Consistency. In a File system data is scattered in various files, and each of these files may be in different formats, making it difficult to write new application programs to retrieve the appropriate data. Security features are to be coded in the application Program itself Modern Database Management Systems This is the Modern way which has replaced the older concept of File system. --> Data definition is part of the DBMS --> Application is independent and can be used with any application. --> One extra column (Attribute) can be added without any difficulty --> Minor coding changes in the Application program may be required.

Redundancy is eliminated to the maximum extent in DBMS if properly defined.

This problem is completely solved here.

Coding for security requirements is not required as most of them have been taken care by the DBMS.

25

The following are the important properties of Database: 1. A database is a logical collection of data having some implicit meaning. If the data are not related then it is not called as proper database. E.g. Student studying in class II got 5th rank. Stud_name Rakhi Class Class V Rank obtained 2nd

2. A database consists of both data as well as the description of the database structure and constraints. E.G Field Name Stud_name Class Type Character Alpha numeric Description It is the students name It is the class of the student

3. A database can have any size and of various complexity. If we consider the above example of employee database the name and address of the employee may consists of very few records each with simple structure. E.g. Emp_na Emp_i Emp_addr Emp_des Emp_S me d ig al Prasad 100 Shubhodaya, Near Project 40000 Katariguppe Big Leader Bazaar, BSK II stage, Bangalore Usha 101 #165, 4th main Software 10000 Chamrajpet, engineer Bangalore Nupur 102 #12, Manipal Towers, Lecturer 30000 Bangalore Peter 103 Syndicate house, IT 15000 Manipal executive Like this there may be n number of records.

25

4. The DBMS is considered as general-purpose software system that facilitates the process of defining, constructing and manipulating databases for various applications. 5. A database provides insulation between programs, data and data abstraction. Data abstraction is a feature that provides the integration of the data source of interest and helps to leverage the physical data however the structure is. 6. The data in the database is used by variety of users for variety of purposes. For E.g. when you consider a hospital database management system the view of usage of patient database is different from the same used by the doctor. In this case the data are stored separately for the different users. In fact it is stored in a single database. This property is nothing but multiple views of the database. 7. Multiple user DBMS must allow the data to be shared by multiple users simultaneously. For this purpose the DBMS includes concurrency control software to ensure that the updation done to the database by variety of users at single time must get updated correctly. This property explains the multiuser transaction processing. Advantages of Database 1. Redundancy is reduced 2. Data located on a server can be shared by clients 3. Integrity (accuracy) can be maintained 4. Security features protect the Data from unauthorized access 5. Modern DBMS support internet based application. 6. In DBMS the application program and structure of data are independent. 7. Consistency of Data is maintained 8. DBMS supports multiple views. As DBMS has many users, and each one of them might use it for different purposes, and may require to view and manipulate only on a portion of the database, depending on requirement.

25

Q.2 What is the disadvantage of sequential file organization? How do you overcome it? What are the advantages & disadvantages of Dynamic Hashing?
Disadvantage of sequential file organization is that we must use linear search or binary search to locate the desired record and that results in more i/o operations. In this there are a number of unnecessary comparisons. In hashing technique or direct file organization, the key value is converted into an address by performing some arithmetic manipulation on the key value, which provides very fast access to records.

Let us consider a hash function h that maps the key value k to the value h(k). The VALUE h(k) is used as an address. The basic terms associated with the hashing techniques are: 1) Hash table: It is simply an array that is having address of records. 2) Hash function: It is the transformation of a key into the corresponding location or address in the hash table (it can be defined as a function that takes key as input and transforms it into a hash table index). 3) Hash key: Let R be a record and its key hashes into a key value called hash key. Internal Hashing For internal files, hash table is an array of records, having array in the range from 0 to M-1. Let as consider a hash function H(K) such that 25

H(K)=key mod M which produces a remainder between 0 and M-1 depending on the value of key. This value is then used for the record address. The problem with most hashing function is that they do not guarantee that distinct value will hash to distinct address, a situation that occurs when two non-identical keys are hashed into the same location. For example: let us assume that there are two non-identical keys k1=342 and k2=352 and we have some mechanism to convert key values to address. Then the simple hashing function is: h(k) = k mod 10 Here h (k) produces a bucket address. To insert a record with key value k, we must have its key first. E.g.: Consider h (K-1)=K1% 10 will get 2 as the hash value. The record with key value 342 is placed at the location 2, another record with 352 as its key value produces the same has address i.e. h(k1) = h(k2). When we try to place the record at the location where the record with key K1 is already stored, there occurs a collision. The process of finding another position is called collision resolution. There are numerous methods for collision resolution. 1) Open addressing: With open addressing we resolve the hash clash by inserting the record in the next available free or empty location in the table. 2) Chaining: Various overflow locations are kept, a pointer field is added to each record and the pointer is set to address of that overflow location. External Hashing for Disk Files Handling Overflow for Buckets By Chaining Hashing for disk files is called external hashing. Disk storage is divided into buckets, each of which holds multiple records. A bucket is either one disk block or a cluster of continuous blocks. The hashing function maps a key into a relative bucket number. A table maintained in the file header converts the bucket number into the corresponding disk block address The collision problem is less severe with buckets, because many records will fit in a same bucket. When a bucket is filled to capacity and we try to insert a new record into the same bucket, a collision is caused. However, we can maintain a pointer in each bucket to address overflow records. The hashing scheme described is called static hashing, because a fixed number of buckets M is allocated. This can be serious drawback for 25

dynamic files. Suppose M be a number of buckets, m be the maximum number of records that can fit in one bucket, then at most m*M records will fit in the allocated space. If the records are fewer than m*M numbers, collisions will occur and retrieval will be slowed down. Dynamic Hashing Technique A major drawback of the static hashing is that address space is fixed. Hence it is difficult to expand or shrink the file dynamically. In dynamic hashing, the access structure is built on the binary representation of the hash value. In this, the number of buckets is not fixed [as in regular hashing] but grows or diminishes as needed. The file can start with a single bucket, once that bucket is full, and a new record is inserted, the bucket overflows and is slit into two buckets. The records are distributed among the two buckets based on the value of the first [leftmost] bit of their hash values. Records whose hash values start with a 0 bit are stored in one bucket, and those whose hash values start with a 1 bit are stored in another bucket. At this point, a binary tree structure called a directory is built. The directory has two types of nodes. 1. Internal nodes: Guide the search, each has a left pointer corresponding to a 0 bit, and a right pointer corresponding to a 1 bit. 2. Leaf nodes: It holds a pointer to a bucket a bucket address. Each leaf node holds a bucket address. If a bucket overflows, for example: a new record is inserted into the bucket for records whose hash values start with 10 and causes overflow, then all records whose hash value starts with 100 are placed in the first split bucket, and the second bucket contains those whose hash value starts with 101. The levels of a binary tree can be expanded dynamically. Extendible Hashing: In extendible hashing the stored file has a directory or index table or hash table associated with it. The index table consists of a header containing a value d called the global depth of the table, and a list of 2d pointers [pointers to data block]. Here d is the number of left most bits currently being used to address the index table. The left most d bits of a key, when interpreted as a number give the bucket address in which the desired records are stored. Each bucket also has a header giving the local depth d1. Of that bucket specifies the number bits on which the bucket contents are based. Suppose d=3 and that the first pointer in the table [the 000 pointer] points to a bucket for which the local depth d 1 is 2, the local depth 2 means that in this case the bucket contains all the records whose search keys start with 000 and 001 [because first two bits are 00]. 25

To insert a record with search value k, if there is room in the bucket, we insert the record in the bucket. If the bucket is full, we must split the bucket and redistribute the current records plus the new one. For ex: To illustrate the operation of insertion using account file. We assume that a bucket can hold only two records. Bangalore Mysore Mysore Mangalore Hassan Hassan Hassan 100 200 300 400 500 600 700

Hash function for branch name Branch-name Bangalore Mysore Mangalore Hassan H(branchname) 0010 1101 1010 0011 1100 0001 1111 0001

Let us insert the record (Bangalore, 100). The hash table (address table) contains a pointer to the one-bucket, and the record is inserted. The second record is also placed in the same bucket (bucket size is 2). When we attempt to insert the next records (downtown, 300), the bucket is full. We need to increase the number of bits that we use from the hash value i.e., d=1, 21=2 buckets. This increases entries in the hash address table. Now the hash table contains two entries i.e., it points to two buckets. The first bucket contains the records whose search key has a hash value that begins with 0, and the second bucket contains records whose search key has a hash value beginning with 1. Now the local depth of bucket =1. Next we insert (Mianus, 400). Since the first bit of h (Mianus) is 1, the new record should placed into the 2nd bucket, but we find that the bucket is full. We increase the number of bits for comparison, that we use from the hash to 2(d=2). This increases the number of entries in the hash table to 4 (2 2 = 4). The records will be distributed among two buckets. Since the bucket that has prefix 0 was not split, hash prefix 00 and 01 both point to this bucket. 25

Next (perryridge, 500) record is inserted into the same bucket as Mianus. Next insertion of (Perryridge, 600) results in a bucket overflow, causes an increase in the number of bits (increase global depth d by 1 i,e d=3), and thus increases the hash table entries. (Now the hash table has 23 = 8 entries). The records will be distributed among two buckets; the first contains all records whose hash value start with 110, and the second all those whose hash value start with 111. Advantages of dynamic hashing: 1. The main advantage is that splitting causes minor reorganization, since only the records in one bucket are redistributed to the two new buckets. 2. The space overhead of the directory table is negligible. 3. The main advantage of extendable hashing is that performance does not degrade as the file grows. The main space saving of hashing is that no buckets need to be reserved for future growth; rather buckets can be allocated dynamically. Disadvantages of dynamic hashing: 1. The index tables grow rapidly and too large to fit in main memory. When part of the index table is stored on secondary storage, it requires extra access. 2. The directory must be searched before accessing the bucket, resulting in two-block access instead of one in static hashing. 3. A disadvantage of extendable hashing is that it involves an additional level of indirection.

Q. 3 What is relationship type? Explain the difference among a relationship instance, relationship type & a relation set?
Relationship type is a meaningful association among entity types. Relationship means: an association of entities where the association includes one entity from each participating entity type. Each uniquely identifiable occurrence of a relationship type is referred to as a relationship. A relationship indicates the particular entities that are related with each other in some form or by means. 25

In the real world, items have relationships to one another. E.g.: A book is published by a particular publisher. The association or relationship that exists between the entities relates data items to each other in a meaningful way. A relationship is an association between entities. A collection of relationships of the same type is called a relationship set. A relationship type R is a set of associations between E, E2..En entity types mathematically, R is a set of relationship instances ri. E.g.: Consider a relationship type WORKS_FOR between two entity types employee and department, which associates each employee with the department the employee works for. Each relationship instance in WORKS_FOR associates one employee entity and one department entity, where each relationship instance is ri which connects employee and department entities that participate in ri. Employee el, e3 and e6 work for department d1, e2 and e4 work for d2 and e5 and e7 work for d3. Relationship type R is a set of all relationship instances.

Whenever we want to form a relationship between two objects, you must use a relationship type. A relationship type defines the roles that an object can play in a relationship. For example, if you placed a contains relationship type between two objects, that creates a hierarchical relationship between the two objects. One object would have to play a child role, and the other object would have to play a parent role. The Information Catalog Center contains a set of predefined relationship types that are ready for you to use in your organization. These relationship types are already associated with the predefined object types in Information Catalog Center. Each relationship type is based on a category, which determines the roles that object types can play in it. You can create your own relationship types, but you must select a predefined category which will determine the roles that are used within each new relationship type. 25

Degree of relationship type: The number of entity sets that participate in a relationship set. A unary relationship exists when an association is maintained with a single entity.

A binary relationship exists when two entities are associated.

A tertiary relationship exists when there are three entities associated.

Role Names and Recursive Relationship Each entry type to participate in a relationship type plays a particular role in the relationship. The role name signifies the role that a participating entity from the entity type plays in each relationship instance, e.g.: In the WORKS FOR relationship type, the employee plays the role of employee or worker and the department plays the role of department or employer. However in some cases the same entity type participates more than once in a relationship type in different roles. Such relationship types are called recursive. E.g.: employee entity type participates twice in SUPERVISION once in the role of supervisor and once in the role of supervisee.

Constraints on Relationship Types 25

Relationship types usually have certain constraints that limit the possible combination of entities that may participate in the relationship instance. E.g.: If the company has a rule that each employee must work for exactly one department. The two main types of constraints are cardinality ratio and participation constraints. The cardinality ratio specifies the number of entities to which another entity can be associated through a relationship set. Mapping cardinalities should be one of the following. One-to-One: An entity in A is associated with at most one entity in B and vice versa.

Employee can manage only one department and that a department has only one manager. One-to-Many: An entity in A is associated with any number in B. An entity in B however can be associated with at most one entity in A.

Each department can be related to numerous employees but an employee can be related to only one department Many-to-One: An entity in A is associated with at most one entity in B. An entity in B however can be associated with any number of entities in A. Many depositors deposit into a single account. Man-to-Many: An entity in A is associated with any number of entities in B and an entity in B is associated with any number of entities in A.

25

An employee can work on several projects and several employees can work on a project. Participation Roles: There are two ways an entity can participate in a relationship where there are two types of participations. 1. Total: The participation of an entity set E in a relationship set R is said to be total if every entity in E participates in at lest one relationship in R. Every employee must work for a department. The participation of employee in WORK FOR is called total.

Total participation is sometimes called existence dependency. 2. Partial: If only some entities in E participate in relationship in R, the participation of entity set E in relationship R is said to be partial.

We do not expect every employee to manage a department, so the participation of employee in MANAGES relationship type is partial. 25

Weak Entity: Some entity types may not have any key attribute of their own; they are called weak entity types. An entity set that has a primary key is termed as a strong entity type. A weak entity type always has a total participation [existence dependence] with respect to a strong entity. A weak entity type is dependent on the existence of another entity. Weak entity is also referred to as child, dependent OR subordinate entities, and strong entities as parent, owner OR dominant entities. E.g.: In the following relationship PARENT is a weak entity as it needs the entity EMPLOYEE for its existence. The entities EMPLOYEE, COMPANY etc. are strong entities. Weak entities are represented by a double lined rectangle.

Q. 4 What is SQL? Discuss.


The Structured Query language which is used for programming the database. The history of SQL began in an IBM laboratory in San Jose, 25

California, where SQL was developed in the late 1970s. SQL stands for structured Query Language. It is a non-procedural language, meaning that SQL describes what data to retrieve delete or insert, rather than how to perform the operation. It is the standard command set used to communicate with the RDBMS. A SQL query is not-necessarily a question to the database. It can be command to do one of the following. Create or delete a table. Insert, modify or delete rows. Search several rows for specifying information and return the result in order. Modify security information. THE SQL STATEMENT CAN BE GROUPED INTO FOLLOWING CATEGORIES. 1. 2. 3. 4. DDL(Data Definition Language) DML(Data Manipulation Language) DCL(Data Control Language) TCL(Transaction Control Language)

DDL: Data Definition Language The DDL statement provides commands for defining relation schema i,e for creating tables, indexes, sequences etc. and commands for dropping, altering, renaming objects. DML: (Data Manipulation Language) The DML statements are used to alter the database tables in someway. The UPDATE, INSERT and DELETE statements alter existing rows in a database tables, insert new records into a database table, or remove one or more records from the database table. DCL: (Data Control Language) The Data Control Language Statements are used to Grant permission to the user and Revoke permission from the user, Lock certain Permission for the user. SQL DBA>Revoke Import from Akash; SQL DBA>Grant all on emp to public; SQL DBA>Grant select, Update on EMP to L.Suresh; SQlDBA>Grant ALL on EMP to Akash with Grant option; Revoke: Revoke takes out privilege from one or more tables or views. SQL DBA>rEOKE UPDATE, DELETE FROM l.sURES; SQL DBA>Revoke all on emp from Akash TCL: (Transaction Control Language) It is used to control transactions. 25

Eg: Commit Rollback: Discard/Cancel the changes up to the previous commit point. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems. SQL statements are used to perform tasks such as update data on a database, or retrieve data from a database. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most of them also have their own additional proprietary extensions that are usually only used on their system. However, the standard SQL commands such as "Select", "Insert", "Update", "Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs to do with a database. This tutorial will provide you with the instruction on the basics of each of these commands as well as allow you to put them to practice using the SQL Interpreter. A relational database system contains one or more objects called tables. The data or information for the database are stored in these tables. Tables are uniquely identified by their names and are comprised of columns and rows. Columns contain the column name, data type, and any other attributes for the column. Rows contain the records or data for the columns. Here is a sample table called "weather". city, state, high, and low are the columns. The rows contain the data for this table: Weather city Phoenix Tucson Flagstaff San Diego state hig lo h w

Arizona 105 90 Arizona 101 92 Arizona 88 Californ 77 ia 80 69 60 72

Albuquerq New ue Mexico

The select statement is used to query the database and retrieve selected data that match the criteria that you specify. Here is the format of a simple select statement: select "column1" 25

[,"column2",etc] from "tablename" [where "condition"]; [] = optional The column names that follow the select keyword determine which columns will be returned in the results. You can select as many column names that you'd like, or you can use a "*" to select all columns. The table name that follows the keyword from specifies the table that will be queried to retrieve the desired results. The where clause (optional) specifies which data values or rows will be returned or displayed, based on the criteria described after the keyword where. Conditional selections used in the where clause: = > < Equal Greater than Less than Greater than or >= equal Less than or <= equal <> Not equal to LIK E Structured Query Language (SQL) is a specialized language for updating, deleting, and requesting information from databases. SQL is an ANSI and ISO standard, and is the de facto standard database query language. A variety of established database products support SQL, including products from Oracle and Microsoft SQL Server. It is widely used in both industry and academia, often for enormous, complex databases. In a distributed database system, a program often referred to as the database's "back end" runs constantly on a server, interpreting data files on the server as a standard relational database. Programs on client computers allow users to manipulate that data, using tables, columns, rows, and fields. To do this, client programs send SQL statements to the server. The server then processes these statements and returns replies to the client program. Examples 25

To illustrate, consider a simple SQL command, SELECT. SELECT retrieves a set of data from the database according to some criteria, using the syntax: SELECT list_of_column_names from list_of_relation_names where conditional_expression_that_identifies_specific_rows

The list_of_relation_names may be one or more comma-separated table names or an expression operating on whole tables. The conditional_expression will contain assertions about the values of individual columns within individual rows in a table, and only those rows meeting the assertions will be selected. Conditional expressions within SQL are very similar to conditional expressions found in most programming languages.

For example, to retrieve from a table called Customers all columns (designated by the asterisk) with a value of Smith for the column Last_Name, a client program would prepare and send this SQL statement to the server back end: SELECT * FROM Customers WHERE Last_Name='Smith'; The server back end may then reply with data such as this: +---------+-----------+------------+ | Cust_No | Last_Name | First_Name | +---------+-----------+------------+ | 1001 | 2039 | 2098 | Smith | Smith | Smith | John | David | Matthew | | |

+---------+-----------+------------+ 3 rows in set (0.05 sec) Following is an SQL command that displays only two column_name_1 and column_name_3, from the table myTable: SELECT column_name_1, column_name_3 from myTable Below is a SELECT statement displaying all the columns of the table myTable2 for each row whose column_name_3 value includes the string "brain": SELECT * from column_name_3 where column_name_3 like '%brain%' 25 columns,

SQL, short for Structured Query Language is pronounced Ess Queue el and is a simple non procedural language that lets you store and retrieve data in a relational database. This is a quick introduction to SQL. Two Classes of SQL SQL falls into two classes Flavors of SQL The standards in use today are Ansi-89 and Ansi-92 though there have been three more released (1999, 2003 and 2006). It's the flavor supported by the database server you're using that matters. All modern ones support Ansi-92. Each database publisher has their own slightly different version of SQL. If you use proprietary SQL features, your SQL becomes non portable and needs rewriting if you move to a different database server. Third Party Tools Often, the tools you use such as IDEs for designing and running SQL provide table design, creation and management. Here are a few I've used. EMS MySQl Manager SQLYog SQL Server Enterprise Manager DbArtisan MySQLAdmin Most are standalone applications but the last one is an open source web application. 1.Data Manipulation Language (DML) - SQL for retrieving and storing data. 2.Data Design Language (DDL) - SQL for creating, altering and dropping tables. Most of the time, the SQL you use is for manipulating the data but occasionally you'll need to create new tables, alter existing ones or add an index. One of the best things about SQL is that you can do all of these operations with just simple SQL commands. Comments in SQL Use two dashes to make the rest of the line a comment: -- Don't mangle the furdwinder! Most SQLs support the C-Style comments as well. /* Like this */ Data Storage Data is stored in tables made up of individual rows of data. Each row has the same number and type of columns, defined when you created the table. Database data is held in each row, much like fields or members in a struct or class. A typical payroll record might have these columns. EmployeeID int 25

EmployeeName varchar(30) EmployeeGradeID int -- a number indicating some level EmployeeDOB datetime TotalGrossPayYTD float -- (YTD means Year To Date) TotalTaxDeductedYTD float TotalGrossPayM float -- (M for Month) TotalTaxDeductedM float AnnualSalary Float TaxBand varchar(8) -- Special string DateLastPaid datetime There would probably be other administrative columns such as date of last payroll run etc. You use data manipulation SQL to create this with the Create Table command. Another table might have the employees details in the firmsuch as the department they work in, total days of vacation allowed (and taken) etc. These aren't relevant to the payroll so wouldn't be in that table but they would probably have the EmployeeID and EmployeeName columns. These are needed for indexing. Indexes Tables can have millions of rows but usually only a subset of those rows is needed to work on. This is where the concept of an index comes from. The database designer tells the database server to add an index to a particular column. With 100,000 rows in the Payroll table, fetching the row for employee id 78965 would require a lot of reads to find that particular row without indexes. With an index, it reads the index table, find where that row is held and then fetches it- much faster! The Four Main SQL statements These are 1.Select - Fetches data from one or more tables 2.Insert - Inserts a row of data into a table 3.Update - Changes in a value in one or more rows 4.Delete - Deletes one or more rows of data from a table This is the SQL to fetch an employee record from the payroll table select * from Payroll where EmployeeId = 78965 The * means fetch all columns. You could also fetch just a couple of columns with this query for all employees in taxband 'XYZ'. select EmployeeID, AnnualSalary from Payroll where TaxBand = 'XYZ' If you leave the where clause off then you get all rows. The select statement lets you fetch any combination of columns and rows from one (or more tables). 25

Inserting a new row Insert adds a row. you specify which columns you wish to add and then provide values for them. insert into Payroll ( EmployeeID, EmployeeName, EmployeeGrade, EmployeeDOB, TotalGrossPayYTD, TotalTaxDeductedYTD, TotalGrossPayM, TotalTaxDeductedM, AnnualSalary, TaxBand, DateLastPaid ) values (4567, 'David Bolton', 3,'1958-09-18',0.0, 0.0, 0.0, 0.0, 80000, 'ABG' , NULL) There has to be one value for each column listed. If any are excluded then that column has to have a default value or the value Null. (see shortly) Updating Rows Updating lets you modify one or more columns in one or more rows. update Payroll set DateLastPaid = GetDate(), -- A built in function that returns today's date TotalGrossPayYTD = TotalGrossPayYTD + ( AnnualSalary/12), TotalGrossPayM = AnnualSDalary/12 where TaxBand='XYZ' This uses the where clause to extract rows for TaxBand = XYZ' then sets the DateLastPaid, TotalGrossPayYTD and TotalGrossPayM columns for each of those rows. In practice, payroll is a lot more complicated than this! Deleting Rows Delete uses the where clause to specify which rows you wish to remove. If you leave it off you can delete all rows! delete Payroll where EmployeeID = 4567 -- I got fired! What is NULL? It is a special value means that no data exists. If a database table is a bit like a spreadsheet then a null value is an empty cell. If you do much with SQL you'll come across Null values. Joins The power of SQL really comes into its own when you use join. This lets you retrieve data from two or more tables that have related columns. For example say we had a grade table with two columns, 25

1.GradeID int 2.GradeDescription varchar(20) that has this data. 0 1 2 3 4 Graduate Programmer Analyst Architect System Designer

Then we could do a select like this select EmployeeName, GradeDescription from PayRoll,Grade where EmployeeGradeID = GradeId and EmployeeId = 4567

Q. 5 What is Normalization? Discuss various types of Normal Forms?


Normalization is the process of building database structures to store data, because any application ultimately depends on its data structures. If the data structures are poorly designed, the application will start from a poor foundation. This will require a lot more work to create a useful and efficient application. Normalization is the formal process for deciding which attributes should be grouped together in a relation. Normalization serves as a tool for validating and improving the logical design, so that the logical design avoids unnecessary duplication of data, i.e. it eliminates redundancy and promotes integrity. In the normalization process we analyze and decompose the complex relations into smaller, simpler and well-structured relations. Database normalization is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only. Types of Normal Forms Normal forms Based on Primary Keys 25

A relation schema R is in first normal form if every attribute of R takes only single atomic values. We can also define it as intersection of each row and column containing one and only one value. To transform the unnormalized table (a table that contains one or more repeating groups) to first normal form, we identify and remove the repeating groups within the table. E.g. Dept. D.Name R&D HRD D.No 5 4 D. location [England, London, Delhi) Bangalore

Consider the figure that each dept can have number of locations. This is not in first normal form because D.location is not an atomic attribute. The dormain of D location contains multivalues. There is a technique to achieve the first normal form. Remove the attribute D.location that violates the first normal form and place into separate relation Dept_location

Functional dependency: The concept of functional dependency was introduced by Prof. Codd in 1970 during the emergence of definitions for the three normal forms. A functional dependency is the constraint between the two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, in R, (X->Y) if and only if each value of X is associated with one value of Y. X is called the determinant set and Y is the dependant attribute. Second Normal Form (2 NF) A second normal form is based on the concept of full functional dependency. A relation is in second normal form if every non-prime attribute A in R is fully functionally dependent on the Primary Key of R. Emp_Project:Emp_Project 2NF and 3 NF, (a) Normalizing EMP_PROJ into 2NF relations

25

(b) Normalizing EMP_DEPT into 3NF relations A Partial functional dependency is a functional dependency in which one or more non-key attributes are functionally dependent on part of the primary key. It creates a redundancy in that relation, which results in anomalies when the table is updated. Third Normal Form (3NF) This is based on the concept of transitive dependency. We should design relational schema in such a way that there should not be any transitive dependencies, because they lead to update anomalies. A functional dependence [FD] x->y in a relation schema R is a transitive dependency. If there is a set of attributes Z Le x->, z->y is transitive. The dependency SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN>Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset [part] of the key.

25

According to codds definition, a relational schema R is in 3NF if it satisfies 2NF and no no_prime attribute is transitively dependent on the primary key. Emp_dept relation is not in 3NF, we can normalize the above table by decomposing into E1 and E2. Boyce Codd Normal Form (BCNF) Database relations are designed so that they are neither partial dependencies nor transitive dependencies, because these types of dependencies result in update anomalies. functional dependency describes the relationship between attributes in a relation. For example, A and B are attributes in relation R. B is functionally dependent on A (A B) if each value of A is associated with exactly one value of B. The left_hand side and the right_hand side functional dependency are sometimes called the determinant and dependent respectively. A relation is in BCNF if and only if every determinant is a Candidate key. The difference between the third normal form and BCNF is that for a functional dependency , the third normal form allows this dependency in a relation if B is a primary_key attribute and A is not a Cndidate key. Where as in BCNF. A must be Candidate Key. Therefore BCNF is a stronger form of the third normal form.

The PRODUCT scheme is in BCNF. Since the prd# is a candidate key, similarly customer schema is also in BCNF. The schema ORDER, however is not in BCNF, because ord# is not a super key for ORDER, i.e. we could have a pair of tuples representing a single ord#. 25

For e.g. (1234,145,13,789) (1234,123,53,455) here ord# is not a candidate key. However, the FD ord#->amt is not trivial; therefore ORDER does not satisfy the definition of CNF. It suffers from the problem of repetition of information. This redundancy can be eliminated by decomposing into ORDER1, ORDER2. ORDER1(ord#,cust#) ORDER2(prd#,qty,amt) Fourth Normal Form (4NF) Multi valued dependencies are based on the concept of first normal form, which prohibits attributes having a set of values. If we have two or more multi valued independent attributes in the same relation, we get into a situation where we have to repeat every value of one of the attributes, with every value of the other attributes to keep the relation state consistent, and to maintain independence among the attributes involved. This constraint is specified by a Multi valued dependency. Normalization using join dependencies Join dependency: The 5NF is also called "Project Join Normal form". It is important to note that normalization into 5NF is considered very rarely in practice. Definition: relation r is in 5NF, if for all join dependencies at least one of the following holds: (R1,R2..Rn) dependency Every Ri is a candidate key for R.

Q. 6 What do you mean by Shared Lock & Exclusive lock? Describe briefly two phase locking protocol?
A lock is a restriction on access to data in a multi-user environment. It prevents multiple users from changing the same data simultaneously. If locking is not used, data within the database may become logically incorrect and may produce unexpected results. Shared Locks 25

It is used for read only operations, i.e., used for operations that do not change or update the data. E.G., SELECT statement: Shared locks allow concurrent transaction to read (SELECT) a data. No other transactions can modify the data while shared locks exist. Shared locks are released as soon as the data has been read. Exclusive Locks Exclusive locks are used for data modification operations, such as UPDATE, DELETE and INSERT. It ensures that multiple updates cannot be made to the same resource simultaneously. No other transaction can read or modify data when locked by an exclusive lock. Exclusive locks are held until transaction commits or rolls back since those are used for write operations. There are three locking operations: read_lock(X), write_lock(X), and unlock(X). A lock associated with an item X, LOCK(X), now has three possible states: "read locked", "write-locked", or "unlocked". A read-locked item is also called share-locked, because other transactions are allowed to read the item, whereas a write-locked item is called exclusive-locked, because a single transaction exclusive holds the lock on the item. Each record on the lock table will have four fields: <data item name, LOCK, no_of_reads, locking_transaction(s)>. The value (state) of LOCK is either read-locked or write-locked. The Two Phase Locking Protocol The two phase locking protocol is a process to access the shared resources as their own without creating deadlocks. This process consists of two phases. 1. Growing Phase: In this phase the transaction may acquire lock, but may not release any locks. Therefore this phase is also called as resource acquisition activity. 2. Shrinking phase: In this phase the transaction may release locks, but may not acquire any new locks. This includes the modification of data and release locks. Here two activities are grouped together to form second phase. IN the beginning, transaction is in growing phase. Whenever lock is needed the transaction acquires it. As the lock is released, transaction enters the next phase and it can stop acquiring the new lock request. 25

Strict two phase locking In the two phases locking protocol cascading rollback are not avoided. In order to avoid this slight modification are made to two phase locking and called strict two phase locking. In this phase all the locks are acquired by the transaction are kept on hold until the transaction commits. Deadlock & starvation: In deadlock state there exists, a set of transaction in which every transaction in the set is waiting for another transaction in the set. Suppose there exists a set of transactions waiting {T1, T2, T3,.., Tn) such that T1 is waiting for a data item existing in T2, T2 for T3 etc and Tn is waiting of T1. In this state none of the transaction will progress.

25

Das könnte Ihnen auch gefallen