Sie sind auf Seite 1von 20

Relational Database Design The design of relational database may be performed using two approaches: (a) bottom-up or (b)

top-down. A bottom-up design methodology would consider the basic relationships among individual attributes as the starting point, and it would use those to build up relations. But, this approach is not very popular in practice as it requires collecting a large number of attribute relationships. This approach is also called design by synthesis. The top-down design methodology would start with a number of groupings of attributes into relations that have already been obtained from conceptual design and mapping activities. The relations are then analyzed individually and collectively leading to further decomposition of relations. This approach is also called design by analysis. There are four informal guidelines that will help the designer in arriving at a quality relation schema design. These guidelines are listed and explained in detail in the sub sections below: a. Semantics of the attributes

b. Reducing the redundant values in tuples and update anomalies


c. Reducing the null values in tuples d. Disallowing the possibility of generating spurious tuples 4.1.1 Design Guidelines Guideline1 Semantics of the attributes: Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. If a relation schema corresponds to one entity type or one relationship type, its meaning and use becomes very clear. Otherwise, the relation corresponds to a mixture of multiple entities and relationships and hence becomes semantically unclear. For example, consider the real-world entities EMPLOYEE and the DEPARTMENT in which he works. It is logically correct to combine the attributes of entities EMPLOYEE and DEPARTMENT into a single relation called EMP_DEPT, with attributes DEPT_ID, DEPT_NAME and DEPT_MGR of DEPARTMENT entity also added to the each employee tuple in EMP_DEPT. However, the design of EMP_DEPT relation is not considered as a

good design as it violates guideline 1 by mixing the attributes from two real-world entities. EMP_DEPT may be created as a view, but they cause problems when used as base relations. Guideline2 Reducing the redundant values in tuples and update

anomalies: Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly. Again, consider the EMP_DEPT relation discussed above. Assume there are 5 employees in department 1. The department information like id, name and manager will be repeated in all the tuples corresponding to these 5 employees in EMP_DEPT relation. This means there are many duplicate or redundant information stored in the relation EMP_DEPT which results in wastage of storage space. Also, the schema of EMP_DEPT suffers from a problem called update anomalies. These can be classified into insertion anomalies, deletion anomalies and modification anomalies. Insertion Anomalies: Whenever a new employee tuple is inserted into EMP_DEPT, the user needs to either provide the information about the department in which the employee works or null (if the employee is not assigned to any department yet). If the user enters the information about the department for a new employee, he should enter the information correctly so that they are consistent with values for same department in other tuples in EMP_DEPT. Otherwise, the data in the EMP_DEPT relation will become inconsistent. Also, it is not possible to insert a new department that has no employees as yet in the EMP_DEPT relation. Inserting a tuple with only values for department attributes and null values for employee information may not be possible because of the primary key constraint of employee id. Assuming that the PK constraint is not there and we have inserted the tuple with the information of new department & all attributes corresponding to employee as null. But, when the first employee is assigned to this department, we do not need the tuple with null values any more.

12/6/2011

Deletion Anomalies: If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information about that department is lost from the database. Modification Anomalies: In EMP_DEPT, if we change the value of one of the

attributes of a particular department say, the manager of department 1 we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent. If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which should not be the case. Guideline3 Reducing the Null values in tuples: As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation. If many of the attributes do not apply to all tuples in the relation, we end up with many nulls in those tuples. This can waste storage space and may also lead to problems when specifying JOIN operations with other relations (because nulls will produce different result for inner and outer joins, which would be confusing for normal users). Another problem with nulls is how to account for them when aggregate operations such as COUNT or SUM are applied. Moreover, nulls can have multiple interpretations like (a) The attribute doesnt apply to this tuple (b) The attribute value for this tuple is unknown (c) The value is known but absent; that is, it has not been recorded yet. For example, if only 10 percent of employees have individual offices, there is little justification for including an attribute OFFICE_NUMBER in the EMPLOYEE relation; rather, a relation EMP_OFFICES(EMP_ID, OFFICE_NUMBER) can be created to include tuples for only the employees with individual offices. Guideline4 Disallowing the possibility of generating spurious tuples: Design relation schemas so that they can be JOINed with equality conditions on

12/6/2011

attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated. Do not have relations that contain matching attributes other than foreign key-primary key combinations. If such relations are unavoidable, do not join them on such attributes, because the join may produce spurious tuples. For example, consider the relation EMP_PROJECT which stores the information regarding all projects & staff members working for the project. EMP_PROJECT EMP_ID 1 1 2 2 PROJ_CODE 1 2 1 2 EMP_NAME Raju A Raju A Kavitha K Kavitha K PROJ_NAME Project ABC Project XYZ Project ABC Project XYZ HOURS 3 5 4 4 PROJ_LOCATION Kochi Trivandrum Kochi Trivandrum

Suppose, it has been decided to separate the employees project location into a separate relation called EMP_LOCATION containing attributes EMP_NAME and PROJ_LOCATION, resulting in two relations as given below. EMP_PROJECT_1 EMP_ID 1 1 2 2 PROJ_CODE 1 2 1 2 PROJ_NAME Project ABC Project XYZ Project ABC Project XYZ HOURS 3 5 4 4 PROJ_LOCATION Kochi Trivandrum Kochi Trivandrum

EMP_LOCATION EMP_NAME Raju A Raju A Kavitha K Kavitha K PROJ_LOCATION Kochi Trivandrum Kochi Trivandrum

If there is a requirement to join the above two tables, it can be done only by the common attribute called PROJ_LOCATION which is neither a primary key nor a foreign key. The resulting tuples from the JOIN are given below:

12/6/2011

EMP_ID 1 1 1 1 2 2 2 2

PROJ_CODE 1 1 2 2 1 1 2 2

PROJ_NAME Project ABC Project ABC Project XYZ Project XYZ Project ABC Project ABC Project XYZ Project XYZ

HOURS 3 3 5 5 4 4 4 4

PROJ_LOCATION Kochi Kochi Trivandrum Trivandrum Kochi Kochi Trivandrum Trivandrum

Raju A Kavitha K Raju A Kavitha K Raju A Kavitha K Raju A Kavitha K

As can be seen, the tuples highlighted in yellow are spurious invalid tuples where the EMP_ID and EMP_NAME doesnt match with that of the tuples in the original relation EMP_PROJECT. So, decomposing EMP_PROJECT into EMP_LOCATION and EMP_PROJECT_1 is undesirable because, when we JOIN them back using NATURAL JOIN, we do not get the correct original information. This is because the attribute PROJ_LOCATION used to relate EMP_LOCATION and EMP_PROJECT_1 is neither a primary key nor a foreign key in either EMP_LOCATION or EMP_PROJECT_1. Integrity Constraints Integrity constraints can be generally defined as various restrictions on data that are specified on a relational database schema and are enforced by a relational DBMS. The need for such restrictions on data arises because of some business rules or conditions on business data, few examples of which are listed below: Every employee should be identified by a unique employee id. Employee id should be always numeric. Every employee age should be between 19 and 60. Employee status should be any of the following: Active, Inactive, On Leave. An employee should belong to one (and only one) department.

The integrity constraints can be classified as Domain Constraints and Key Constraints. Domain Constraints: Domain constraints specify that the value of each attribute A must be an atomic value from the domain dom(A). The data types associated with domains typically include standard numeric data types for integers, real or floating

12/6/2011

point numbers, characters, fixed length & variable length strings, date, time, money data types etc. Other possible domains may be described by a sub range of values from a data type or as an enumerated data type where all possible values are explicitly listed. For example, in SQL we can specify a CHECK constraint during table creation as follows: CREATE TABLE STUDENT ( STUDENT_ID INTEGER, NAME MARKS from 0 to 100 only. Key Constraints: As discussed above domain constraints puts some restriction on the attribute values, whereas key constraints specifies some constraints or conditions for tuples in a relational database. Key constraints can be broadly classified as (a) Entity integrity constraints and (b) Referential integrity constraints. Let us define few terms before we can define Entity and Referential integrity constraints. Keys A key specifies the uniqueness of tuples in a relation. Keys in relational model are primary key, candidate key and super key.
Super key is defined in a relational model as a set of attributes that, taken collectively, to uniquely identify a tuple in the relation. For eg: the social_security_no attribute of the relation employee is sufficient to distinguish one employee from another. Thus social_security_no is a superkey for the relation employee. Candidate key: Superkeys with minimal subset is known as the candidate key. For eg: it is possible to combine the attributes, employee_id & organization_name to form a superkey. But the social_security_no is sufficient to distinguish the two employees. Thus social_security_no is a candidate key. Primary key is used to denote the candidate key that is chosen by the database designer to uniquely identify a tuple in a relation.

CHAR(20), INTEGER, CHECK (MARKS > 0 AND MARKS <= 100))

This domain constraint is used to check and enforce that the mark values stored are

12/6/2011

Prime attribute: An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. An attribute is called nonprime if it is not a prime attribute that is, if it is not a member of any candidate key.

A foreign key is an attribute or set of attributes of a relation say R(R) such that the value of each attribute in this set is that of a primary key of the relation S(S). Entity Integrity: This is concerned with primary key values of individual relations. This specifies that instances of entities are distinguishable and no prime attribute value may be NULL. Referential Integrity: Referential integrity is specified between two relations and is used to maintain the consistency among tuples of the two relations. The referential integrity constraint states that a tuple in one relation that refers to another relation must refer to an existing tuple in that relation. The attributes in a relation that provides reference to tuple in another relation is called foreign key. Functional Dependency Functional dependency is a very important concept in relational schema design. A functional dependency is a constraint between two sets of attributes from the database. Suppose that a relational database schema has n attributes A 1, A2,..,An; let us think of the whole database as being described by a single universal relation schema R = {A1, A2,..,An}. A functional dependency, denoted by X->Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], we must also have t1[Y] = t2[Y]. This means that the values of the Y component of a tuple in r depend on, or are determined by, by the values of the X component; or alternatively, the values of the X component of a tuple uniquely (or functionally) determine the values of Y component. Also, we can say that there is a functional dependency from X to Y or that Y is functionally dependent on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. If X->Y in R, this does not say whether or not Y->X in R.

12/6/2011

Thus X functionally determines Y in a relation schema R if and only if, whenever two tuples of r(R) agree on their X value, they must necessarily agree on their Y value. For example, consider the relation schema EMP_PROJECT shown below: EMP_PROJECT EMP_ID FD1 FD3 The value of an employee id uniquely determines the employee name. But, the reverse may not be true always. Ie there could be many employees having the same name, so the value of an employee name need not uniquely identify an employee id. So, there exists a functional dependency from EMP_ID to EMP_NAME, represented by: FD1 = EMP_ID->EMP_NAME. But the reverse EMP_NAME->EMP_ID is not a valid FD. Similarly, the value of a PROJ_CODE uniquely determines the PROJ_NAME and PROJ_LOCATION. This functional dependency is represented by: FD2 = PROJ_CODE->{PROJ_NAME, PROJ_LOCATION} The third obvious functional dependency in the EMP_PROJECT relation schema is: FD3 = {EMP_ID, PROJ_CODE}->HOURS This means that, a combination of EMP_ID and PROJ_CODE values uniquely determines the number of hours the employee works on the project per week (HOURS). The set of all the obvious functional dependencies that are specified on a relation schema R is represented by F as shown below: F = {EMP_ID->EMP_NAME, PROJ_CODE->{PROJ_NAME, PROJ_LOCATION}, {EMP_ID, PROJ_CODE}->HOURS} PROJ_CODE FD2 EMP_NAME PROJ_NAME HOURS PROJ_LOCATION

However, there could be numerous other functional dependencies, that are not very obvious, exist in the relation schema R. Those other dependencies can be inferred or

12/6/2011

deduced from the FDs in F. But, it is practically impossible to specify all possible functional dependencies. The set of all such dependencies that can be inferred from F is called the closure of F and is denoted by F+. For example, we can infer the following additional functional dependencies from F given above: PROJ_CODE->PROJ_NAME PROJ_CODE->PROJ_LOCATION {EMP_ID, PROJ_NAME}->HOURS The set of above functional dependencies and any other possible FDs that can be inferred from F is called closure of F, F+. Inference Rules for Functional Dependencies There are six inference rules exist in order to systematically infer new dependencies from a given set of dependencies. They are: IR1 Reflexive Rule: If set of attributes X is a super set of Y, then X->Y. The reflexive rule (IR1) states that a set of attributes always determines itself or any of its subsets, which is obvious. Because IR1 generates dependencies that are always true, such dependencies are called trivial. The reflexive rule can also be stated as X->X, that is, any set of attributes functionally determines itself. IR2 Augmentation Rule: {X->Y} infers that the dependency XZ->YZ is valid. The augmentation rule (IR2) says that adding the same set of attributes to both the left and right hand sides of a dependency results in another valid dependency. The augmentation rule can also be stated as {X->Y} infers XZ->Y; that is, augmenting the left hand side attributes of an FD produces another valid FD. IR3 Transitive Rule: {X->Y, Y->Z} infers that the dependency X->Z is valid. As per this rule, the functional dependencies are transitive. IR4 Decomposition or Projective Rule: {X->YZ} infers that the dependency X>Y and X->Z are valid. The decomposition rule IR4 says that we can remove attributes from the right hand side of a dependency; applying this rule repeatedly

12/6/2011

can decompose the FD X->{A1, A2,..,An} into the set of dependencies {X->A1, X-> A2, X->An}. IR5 Union or Additive Rule: {X->Y, X->Z} infers that the dependency X->YZ is valid. The union rule IR5 is the opposite of decomposition rule IR4; we can combine a set of dependencies {X->A1, X->A2, ,X->An} into the single FD X->{A1, A2, ..,An}. IR6 Pseudotransitive Rule: {X->Y, WY->Z} infers that the dependency WX->Z is valid. 4.1.2 Normalization using Functional Dependencies Normalization is a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of a good relational database design like: (1) minimizing redundancy of data and (2) minimizing the insertion, deletion, and update anomalies described in section 4.1. The normalization process was first proposed by Codd in 1972. The process consists of a series of tests to ensure that a relation schema satisfies a certain normal form. The process proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms and decomposing relations as necessary. Initially, Codd proposed three normal forms called first (1NF), second (2NF) and third (3NF) normal forms. A stronger definition of 3NF called Boyce Codd normal form (BCNF) was proposed later by Boyce and Codd. All these normal forms are based on the functional dependencies among the attributes of a relation. Later, a fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of multivalued dependencies and join dependencies, respectively. The normal form of a relation refers to the highest normal form condition that it meets and hence indicates the degree to which it has been normalized. It is not sufficient to check that each relation schema in the database conforms to a normal form (say BCNF or 3NF) in order to guarantee a good database design. Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relational schemas, taken together, should possess. These would include two properties:

12/6/2011

10

The lossless join or nonadditive join property, which guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition.

The

dependency

preservation

property,

which

ensures

that

each

functional dependency is represented in some individual relations resulting after decomposition. The nonadditive join property is extremely critical and must be achieved at any cost, whereas the dependency preservation property, although desirable, is sometimes sacrificed. The database designers need not always normalize to the highest possible normal form. Relations may be left in a lower normalization status for performance reasons. The process of storing the join of higher normal form relations as a base relation which is in a lower normal form is known as denormalization.

4.1.3

Normal forms based on Primary Keys

The first (1NF), second (2NF) and third (3NF) normal forms are based on primary keys of relations. They are defined and explained in the sections below: First Normal Form (1NF): A relation schema is said to be in first normal form (1NF) if the values in the domain of each attribute of the relation are atomic. In other words, only one value is associated with each attribute and the value is not a set of values or a list of values. A database schema is in first normal form if every relation schema included in the database schema is in 1NF. The first normal form states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple. In other words, 1NF disallows relations within relations or relations as attributes of tuples. The only attribute values permitted by 1NF are single atomic (or indivisible) values.

12/6/2011

11

Example: Consider the DEPARTMENT relation shown below, whose primary key is DEPT_ID. DEPARTMENT DEPT_ID 1 2 3 DEPT_NAME Administration Finance HR DEPT_MGR Sunil Abraham Vijayakumar K Aswathy Nair DEPT_LOCATIONS Kochi, Trivandrum, Bangalore Trivandrum Kochi, Trivandrum

Assume that, each department can have a number of locations. A list of example tuples with attribute values is shown above. As per the definition of 1NF, the above relation is not in 1NF because DEPT_LOCATIONS is not an atomic attribute and the attribute has a list of values. There are three main techniques to achieve first normal form for such a relation:

i.

Remove the attribute DEPT_LOCATIONS that violates 1NF and place it in a separate relation DEPT_LOCATION along with the primary key DEPT_ID of DEPARTMENT. The primary key of this relation is the combination (DEPT_ID, DEPT_LOCATION) as shown below:

DEPARTMENT
DEPT_ID 1 2 3 DEPT_NAME Administration Finance HR DEPT_MGR Sunil Abraham Vijayakumar K Aswathy Nair

DEPT_LOCATION
DEPT_ID 1 1 1 2 3 3 DEPT_LOCATION Kochi Trivandrum Bangalore Trivandrum Kochi Trivandrum

A distinct tuple in DEPT_LOCATION exists for each location of a department. This decomposes the non-1NF relation into two 1NF relations.

ii.

Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location of a DEPARTMENT, as shown below. In this case, the primary key becomes the combination {DEPT_ID,

12/6/2011

12

DEPT_LOCATION}. But, this solution has the disadvantage of introducing redundancy in the relation. DEPARTMENT DEPT_ID 1 1 1 2 3 3 DEPT_NAME Administration Administration Administration Finance HR HR DEPT_MGR Sunil Abraham Sunil Abraham Sunil Abraham Vijayakumar K Aswathy Nair Aswathy Nair DEPT_LOCATION Kochi Trivandrum Bangalore Trivandrum Kochi Trivandrum

iii.

If a maximum number of values is known for the attribute for example, if it is known that at most three locations can exist for a department replace the DEPT_LOCATION attribute by three atomic attributes: DEPT_LOCATION1, DEPT_LOCATION2 and DEPT_LOCATION3. This solution has the disadvantage of introducing null values if most departments have fewer than three locations.

Of the three solutions above, the first is superior because it does not suffer from redundancy and it is completely general, having no limit placed on a maximum number of values. The first normal form also disallows multivalued attributes that are themselves composite. These are called nested relations because each tuple can have a relation within it. An example of a relation EMP_PROJECT with nested relation PROJECTS within it is shown below:

EMP_PROJECT EMP_ID EMP_NAME PROJECTS PROJECT_ID HOURS

The above non-1NF relation could be decomposed into two 1NF relations EMPLOYEE and EMP_PROJECT by propagating the primary key as shown below. EMPLOYEE EMP_ID EMP_NAME

12/6/2011

13

EMP_PROJECT EMP_ID PROJECT_ID HOURS

Second Normal Form (2NF): A relation schema R is in second normal form (2NF) if it is in the 1NF and if all nonprime attributes are fully functionally dependent on the primary key of R. A database schema is in second normal form if every relation schema included in the database schema is in second normal form. Second normal form is based on the concept of full functional dependency. A functional dependency X->Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; that is, for any attribute A X, (X-{A}) does not functionally determine Y. A functional dependency X->Y is a partial dependency if some attribute A X can be removed from X and the dependency still holds; that is, for some A X, (X-{A})->Y. The test for 2NF involves testing for functional dependencies whose left-hand side attributes are part of the primary key. If the primary key contains a single attribute, the test need not be applied at all. If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number of 2NF relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent. The functional dependencies FD1, FD2 and FD3 in relation EMP_PROJECT shown below hence lead to the decomposition of EMP_PROJECT into the three relation schemas EMP_PROJECT1, EMP_PROJECT2, EMP_PROJECT3, each of which is in 2NF. EMP_PROJECT EMP_ID FD1 FD3 2NF normalized EMP_PROJECT tables are given below: EMP_PROJECT1 EMP_ID PROJ_CODE HOURS PROJ_CODE FD2 EMP_NAME PROJ_NAME HOURS PROJ_LOCATION

12/6/2011

14

EMP_PROJECT2 EMP_ID EMP_NAME

EMP_PROJECT3 PROJ_CODE PROJ_NAME PROJ_LOCATION

Third Normal Form (3NF): A relation schema R is in third normal form (3NF) if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key. (ie. 3NF allows a prime attribute to be transitively dependent on the primary key). A database schema is in third normal form if every relation schema included in the database schema is in third normal form. Third normal form (3NF) is based on the concept of transitive dependency. A functional dependency X->Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both X->Z and Z->Y hold. A relation can be tested for 3NF compliance as follows: Relation should not have a non-key attribute functionally determined by another non-key attribute (or by a set of non-key attributes). That is, there should be no transitive dependency of a nonkey attribute on the primary key. Consider the EMP_DEPT relation shown below: EMP_DEPT EMP_ID EMP_NAME BIRTH_DATE ADDRESS DEPT_ID DEPT_NAME DEPT_MGR

The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist. However, EMP_DEPT is not in 3NF because of the transitive dependency of DEPT_MGR (and also DEPT_NAME) on EMP_ID via DEPT_ID. We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas EMPLOYEE and DEPARTMENT as shown below: EMPLOYEE

12/6/2011

15

EMP_ID DEPARTMENT DEPT_ID Obviously, the

EMP_NAME

BIRTH_DATE

ADDRESS

DEPT_ID

DEPT_NAME normalized

DEPT_MGR relations EMPLOYEE and DEPARTMENT represent

independent entity facts about employees and departments. A NATURAL JOIN operation on these relations will recover the original relation EMP_DEPT without generating spurious tuples. Boyce Codd Normal Form (BCNF): A relation R is in Boyce Codd normal form, if whenever a nontrivial functional dependency X->A hold in R, then X is a super key of R.

Boyce Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF, because every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. Consider a relation schema in third normal form that has a number of overlapping composite candidate keys. For example, consider the relation TEACH (STUD_ID, COURSE, INSTRUCTOR) shown below. TEACH STUD_ID Here, the COURSE functional INSTRUCTOR dependencies ie. FD1: are {{STUD_ID,COURSE}->INSTRUCTOR, FD2:

INSTRUCTOR->COURSE}; INSTRUCTOR->COURSE.

{STUD_ID,COURSE}->INSTRUCTOR;

The relation has two candidate keys {STUD_ID, COURSE} and {STUD_ID, INSTRUCTOR}. {STUD_ID, COURSE} is chosen as the primary key. FD2 doesnt

12/6/2011

16

violate 3NF, since COURSE is a prime attribute. But, FD2 violates BCNF, since INSTRUCTOR is not a super key of TEACH. This relation could be decomposed into one of the three following possible pairs:

i) ii) iii)

{STUD_ID, INSTRUCTOR} and {STUD_ID, COURSE} {COURSE, INSTRUCTOR} and {COURSE, STUD_ID} {INSTRUCTOR, COURSE} and {INSTRUCTOR, STUD_ID}

All three decompositions lose the functional dependency FD1 and first two decompositions also generate spurious tuples. So, the desirable decomposition is the third one as it doesnt generate spurious tuples, though it doesnt preserve the dependency FD1 of the original relation. So, the relation TEACH can be decomposed into the following two BCNF relations: INSTRUCTOR_STUD and INSTRUCTOR_COURSE as shown below: INSTRUCTOR_STUD: INSTRUCTOR STUD_ID

INSTRUCTOR_COURSE: INSTRUCTOR COURSE

Note that, if we designate {STUD_ID, INSTRUCTOR} as the primary key of relation TEACH, the FD INSTRUCTOR->COURSE becomes a partial (not fully functional) dependency of COURSE because INSTRUCTOR is a part of the primary key. This would violate the 2NF. In order to decompose this relation to two 2NF relations, we may decompose TEACH into two relations as: (STUD_ID, INSTRUCTOR) and (INSTRUCTOR, COURSE). This leads to the same ultimate BCNF relations. An all-key relation is always in BCNF by default, as there are no FDs. Multivalued Dependency Multivalued Dependencies are a consequence of 1NF, which disallows an attribute in a tuple to have a set of values. If we have two or more multivalued independent attributes in the same relation schema, we get into the problem of having to repeat

12/6/2011

17

every value of one of the attributes with every value of the other attribute to keep the relation state consistent and to maintain the independence among the attributes involved. This cant be specified by a FD; rather this constraint is specified by a multivalued dependency. Multivalued Dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows. Example: Consider the relation EMP shown in figure below:

A tuple in this relation EMP represents the fact that an employee whose name is Ename works on the project whose name is Pname and has a dependent whose name is Dname. An employee may work on several projects and may have several dependents and the employees projects and dependents are independent of one another. These constraints are specified as multivalued dependencies on the EMP relation. MVDs are: Ename->Pname and Ename->Dname. Informally, whenever two independent 1:N relationships A:B and A:C are mixed in the same relation, R(A,B,C) an MVD may arise. An MVD X->Y in R is called a trivial MVD, if either (a) Y is a subset of X, or (b) X U Y=R. An MVD that satisfies neither (a) nor (b) is called a nontrivial MVD. For example, the two MVDs on EMP table specified above are nontrivial MVDs since Ename is neither a superset of Pname or Dname nor Ename U Pname and Ename U Dname is EMP. Fourth Normal Form (4NF): A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued

12/6/2011

18

dependencies), if for every nontrivial multivalued dependency X->Y in F+, X is a super key for R. Example: Consider the EMP relation shown below:

The EMP relation has no FD since it is an all-key relation. Because BCNF constraints are stated in terms of FDs only, an all-key relation is always in BCNF by default. Hence EMP is in BCNF. But EMP is not in 4NF, because in the nontrivial MVDs Ename->Pname and Ename->Dname, Ename is not a super key of EMP. We can decompose EMP into two 4NF relation schemas EMP_PROJECTS (Ename, Pname) and EMP_DEPENDENTS are in 4NF, (Ename, because Dname). the Both EMP_PROJECTS Ename->Pname and in EMP_DEPENDENTS MVDs

EMP_PROJECTS and Ename->Dname in EMP_DEPENDENTS are trivial MVDs. No other nontrivial MVDs hold in either EMP_PROJECTS or EMP_DEPENDENTS. No FDs hold in these relation schemas either. Join Dependency A relation R satisfies Join Dependency *(R1,R2,,Rn) if and only if R is equal to the join of R1,R2,,Rn where Ri are subsets of the set of attributes of R. Alternately, R satisfies a Join Dependency if R can always be recreated by joining multiple relations each having a subset of the set of attributes of R. If one of the relations in the join has all the attributes of R, then the join dependency is called trivial JD. The join dependency is a generalization of the MVD (where n=2). A relation R ( A, B, C ) satisfies JD *(AB,AC) if and only if it satisfies the MVDs A->B|C.

12/6/2011

19

Fifth Normal Form (5NF): A relation R is in 5NF (or Project-Join Normal Form, PJNF) if for all join dependencies at least one of the following holds: (a) *(R1,R2,..,Rn) is a trivial join dependency (ie. One of Ri is R) (b) Every Ri is a super key of R. The aim of fifth normal form is to have relations that cannot be decomposed further. A relation in 5NF cannot be constructed from several smaller relations. Example: Consider the relation R (S_id, S_name, Status, City) with S_id and S_name candidate keys

*({S_id, S_name, Status}, {S_id, City}) is a JD (nontrivial) because S_id is a candidate key in R *({S_id, S_name}, {S_id, Status}, {S_name, City}) is a JD (nontrivial) because S_id and S_name are both candidate keys in R *({S_id, S_name}, {S_id, S_name, Status, City}) is a JD (trivial) because S_id and S_name are both candidate keys in R For first two JDs above, every Ri is a super key of R and the third JD is trivial. Hence, R is in 5NF.

Pitfalls of Relational Database Design

Programmers with previous experience in non-relational databases may tend to design databases that resemble hierarchical or network databases, or even flat files and spreadsheet designs. More often you face the task of redesigning a database to fit new business requirements, improve performance, and so on. Whatever you do, never try reusing existing database structures as a basis for a new database. It is wrong for your particular task if it cannot accommodate the new features you are trying to implement. Take a fresh look without limits imposed by a previous design, already in place. Redesigning databases to preserve legacy data is not a small task and should be approached with caution. Another common problem arises from the tendency to utilize every single feature offered by a particular RDBMS vendor. While improving performance most of the time this approach could lock you into that vendor's product; and it costs you dearly both in terms of time and money to move your database onto a different vendor.

12/6/2011

20

Das könnte Ihnen auch gefallen