Sie sind auf Seite 1von 13

UNIT-3

FUNCTIONAL DEPENDENCE: DEFINITION: Let r be a relation and let X and Y be arbitrary subsets of the set of attributes of r. Then we say that Y is functionally dependent on X, i.e. X->Y (X functionally determines Y) if and only if each X value in r has associated with it precisely one Y value in r. In other words whenever two tuples of r agree on their X value, they also agree on their Y value. Example: Consider the parts relation given below: Supplier no. S1 S1 S2 S2 S3 S4 S4 S4 City Bombay Bombay Chennai Chennai Chennai Bombay Bombay Bombay Part no P1 P2 P1 P2 P2 P2 P4 P5 Quantity 100 100 200 200 300 400 400 400

The above relation satisfies the FD {supplier-no}->{city} Because all tuples of the relation with a given supplier-no value also have the same city value. The relation also satisfies several more FDs as given below. {supplier-no, part-no}->{qty} {supplier-no, part-no}->{city} {supplier-no, part-no}->{city, qty} {supplier-no, part-no}->{supplier-no} {supplier-no, part-no}->{supplier-no, part-no, city, qty} The left side of an FD is the determinant and the right side is the dependent. The determinant and dependent are both sets of attributes. When such a set contains just one attribute, i.e. when it is a singleton set, we often drop the set braces and we write as Supplier-no->city TRIVIAL AND NONTRIVIAL DEPENDENCIES One way to reduce the size of the set of FDs is to eliminate the trivial dependencies. An FD is trivial if and only if the right side is a subset of the left side. EX: {supplier-no, part-no}->supplier-no Generally it is advisable to avoid trivial dependencies. CLOSURE OF A SET OF DEPENDENCIES: There are circumstances such that some FDs might imply others. For example, {supplier-no, part-no}->{city, qty} implies both of the following. {supplier-no, part-no}->{city} {supplier-no, part-no}->{qty}

As another example, consider the relation R with attributes A,B and C, such that the FDs A->B and B->C both hold for R. Then it is easy to see that the FD A->C also holds for R. The FD A->C is an example of a transitive FD i.e. C is said to depend on A transitively via B. The set of all FDs that are implied by a given set S of FDs is called the closure of S, written S+ The task of computing S+ from S can be done by the following rules: Let A,B and C be arbitrary subsets of the set of attributes of given relation R and let AB mean the union of A and B. Then we have: 1. Reflexivity: if B is a subset of A then A->B 2. Augmentation: If A->B, then AC->BC. 3. Transitivity: If A->B and B->C, then A->C. 4. Self-determination: A->A 5. Decomposition: If A->BC, then A->B and A->C. 6. Union: If A->B and A->C, then A->BC. 7. Composition: If A->B and C->D, then AC->BD. Example: Let R be the relation with attributes A,B,C,D,E,F and the FDs are: A->BC B->E CD->EF We now show that the FD AD->F holds for R and is thus a member of the closure of the given set: 1. A->BC (given) 2. A->C (1, decomposition) 3. AD->CD (2, augmentation) 4. CD->EF (given) 5. AD->EF (3 &4, transitivity) 6. AD->F (5, decomposition) CLOSURE OF A SET OF ATTRIBUTES: The closure S+ of a given set S of FDs can be computed by means of an algorithm that says Repeatedly apply the rules from the previous section until they stop producing new FDs. Let R be the relation, Z be the set of all attributes of R and S be the set of FDs that hold for R. From this we can determinate the set of all attributes of R that is functionally dependent on Z i.e. the closure Z+ of Z under S. A simple algorithm for computing this closure is given in the below pseudo code: CLOSURE[Z,S]=Z; do forever; for each FD X->Y in S do; if X C CLOSURE[Z,S] then CLOSURE[Z,S]=CLOSURE[Z,S]UY; end if CLOSURE[Z,S] did not change on this iteration then leave the loop; end; Example: Suppose we are given a relation R with attributes A,B,C,D,E,F and FDs are: A->BC E->CF B->E CD->EF

We now compute the closure{A,B}+ of the set of attributes {A,B} under this set of FDs. 1. We initialize the result CLOSURE[Z,S] to {A,B} 2. We now go round the inner loop four times, once for each of the given FDs. On the first iteration (for the FD A->BC), we find that the left side is a subset of CLOSURE[Z,S]. so we add attributes (B and C) to the result. CLOSURE[Z,S] is now the set {A,B,C}. 3. On the second iteration (for the FD E->CF), we find that the left side is not a subset of the result, which thus remains unchanged. 4. On the third iteration (for the FD B->E), we add E to CLOSURE[Z,S], which now has the value {A,B,C,E}. 5. On the fourth iteration (for the FD CD->EF), CLOSURE[Z,S] remains unchanged. 6. Now we go round the inner loop four times again. On the first iteration, the result does not change; on the second, it expands to {A,B,C,E,F}, on the third and fourth it does not change. 7. Now we go round the inner loop four times again. CLOSURE[Z,S] does not change, and so the whole process terminates with {A,B}+ = {A,B,C,E,F}. Thus if Z is a set of attributes of relation R and S is a set of FDs that hold for R, then set of FDs that hold for R with Z as the left side is the set consisting of all FDs of the form Z->Z, where Z is some subset of the closure Z+ of Z under S. The closure S+ of the original set S of FDs is then the union of all such sets of FDs, taken over all possible attribute sets Z. IRREDUCIBLE SETS OF DEPENDENCIES: Let S1 and S2 be two sets of FDs. If every FD implied by S1 is implied by S2 i.e. if S1+ is a subset of S2+, we say that S2 is a cover of S1. This means that if the DBMS enforces the FDs in S2, then it will automatically be enforcing the FDs in S1. If S2 is a cover of S1 and S1 is a cover for S2 i.e. if S1+ = S2+ -we say that S1 and S2 are equivalent. In this case if the DBMS enforces the FDs in S2 it will automatically be enforcing the FDs in S1 and vice versa. A set S of FDs is said to be irreducible, if and only if it satisfies the following three properties: 1. The right side of every FD in S involves just one attribute. 2. The left side of every ED in S is irreducible in turn-meaning that no attribute can be discarded from the determinant without changing the closure S+. This type of FD is called as left irreducible. 3. No FD in S can be discarded from S without changing the closure S+. Example: Consider the relation PARTS for which the following FDs hold: PART-NO->PART-NAME PART-NO->COLOUR PART-NO->WEIGHT PART-NO->CITY This set of FDs is easily seen to be irreducible. The right side is a single attribute in each case and the left side is also irreducible in turn. So none of the FDs can be discarded without changing the closure. The following sets of FDs are not irreducible: 1. PART-NO->{PART-NAME,COLOUR} The right side of the first FD is not singleton set. PART-NO->WEIGHT PART-NO->CITY 2. {PART-NO,PART-NAME}->COLOUR- The first FD here can be simplified by dropping PART-NAME from left without changing the closure

PART-NO-> PART-NAME PART-NO->WEIGHT PART-NO->CITY 3. PART-NO-> PART-NO The first FD can be discarded without changing the closure. PART-NO-> PART-NAME PART-NO->COLOUR PART-NO->WEIGHT PART-NO->CITY So, for every set of FDs there exist at least one equivalent set that is irreducible. Example: Consider the relation R with attributes A,B,C,D and FDs: A->BC B->C A->B AB->C AC->D We now compute the irreducible set of FDs that is equivalent to the given set: 1. The first step is to rewrite the FDs such that each has a singleton right side: A->B A->C B->C A->B AB->C AC->D In this, the FD A->B occurs twice, so one occurrence can be eliminated. 2. Next, attribute C can be eliminated from the left side of the FD AC->D, because we have A->C which can be written as AA->AC (by augmentation) AC->D (given) So, A->D by transitivity. Thus C on the left side of AC->D is redundant. 3. Next AB->C can be eliminated, because we have A->C (given) AB->CB (by augmentation) So, AB->C (by decomposition) 4. Finally, the FD A->C is implied by the FDs A->B and B->C, so it can also be eliminated. The final irreducible sets of FDs are: A->B B->C A->D NOTE: The irreducible sets can also be represented by the terms minimal sets, minimal cover and canonical cover.

NONLOSS DECOMPOSITION AND FUNCTIONAL DEPENDENCIES: Consider the relation suppliers with attributes supplier-no, status and city. The below figure shows the sample values for this relation. SUPPLIERS: Supplier-no Status City S3 30 Chennai S5 30 Delhi The two corresponding decomposition for the above relation is given below: a) SST: SC: Supplier-no S3 S5 Supplier- no S3 S5 b) SST: Supplier- no S3 S5 Status 30 30 Status 30 30 STC: Status 30 30 City Chennai Delhi City Chennai Delhi

Examining the two decompositions, we observe that: 1. In case a, no information is lost; the SC and SST values still tells that the supplier S3 has status 30 and city Chennai, and supplier S5 has status 30 and city Delhi. Thus, the first decomposition is non loss. 2. In case b, by contrast, information is definitely lost. It is possible to tell that both suppliers have status 30, but we cannot tell which supplier has which city. Thus, the second decomposition is lossy. The case a is lossless because if we join SST and SC, the original relation suppliers is obtained. The case b is lossy because the join of SST and SC does not get back the original relation suppliers. HEATHS THEORM: Let R{A,B,C} be a relation, where A,B and C are sets of attributes. If R satisfies the FD A->B, then R is equal to the join of its projections on {A,B} and {A,C}. In summary the decomposition of relation R into projections R1,R2,...,Rn is non loss if R is equal to the join of R1,R2,...,Rn. NORMALIZATION Normalization is the process of validating and improving the logic design of the database, so that logical design satisfies certain constraints and avoids unnecessary duplication of

data. The main goal of normalization process is to analyse and decompose complex relations and transforms them into smaller, simpler and well structured relations.

FIRST NORMAL FORM DEFN: A relation R is in INF if and only if, in every legal value of that relation, every tuple contains exactly one value for each attributes. Consider the relation sup-ship with attributes as given below Sup-ship {supp-no, status, city, part no, qty} Primary key {supp-no, part no} Where supp-no and part no are the primary keys. In the relation sup-ship the below constrain holds City status ie, the status of the supplier is determined by the location of that supplier. For ex, all Bombay suppliers must have a status 20. The below diagram shows the FDs associated with the relation

Suppno qty Part no

city

statu s

The below table shows the sample values for the table, supp-ship Supp-no S1 S1 S1 S1 S2 S2 S3 S4 S4 Status 20 20 20 20 10 10 10 20 20 City Bombay Bombay Bombay Bombay Chennai Chennai Chennai Bombay Bombay Part no P1 P2 P3 P4 P1 P2 P2 P4 P5 Qty 300 200 400 200 300 400 200 300 400

The values in the supp-ship relation are redundant. For example every tuple for city Bombay shows the status 20. The redundancies in the relation lead to a variety of anamolies called update anamolies. Ie., it is difficult to perform the update operations INSERT, DELETE and UPDATE in the relation. In the relation supp-ship consider the supplier-city redundancy corresponding to the FD supp-no-> city problems occur with each of the three update operations. INSERT It is not possible to insert the fact that a particular supplier is located in a particular city until that supplier supplies at least one part. For ex, in the relation supp-ship it is not possible to show that supplier S5 is located in Delhi. The reason is that until the supplier supplies some part we dont have the primary key value.

DELETE

In supp-ship relation, if a tuple for a particular supplier is deleted, the deletion involves not only the shipment connecting that supplier to a particular part but also the information that the supplier is located in a particular city. For ex, in the relation supp-ship if the tuple with supp-no value 33 and part no value P2 is deleted, then the information that S3 is located in Chennai is also lost. UPDATE

The city value for a given supplier appears in supp-ship many times. This redundancy causes update problems. For ex, if supplier S1 moves from Bombay to Ahmadabad then every tuple connecting with S1 should be changed and if not it leads to an inconsistent database. The solution to these problems is to replace the relation supp-ship with two relations as given below: (a) Suppliers { supp-no,status,city} (b) Shipments { supp-no, part no, qty} The FD diagrams and sample values for these relations are as follows: (a)
Supp-no City

(b)
Suppno Qty Statu s Part no

SUPPLIER Supp-no S1 S2 S3 S4 S5 status 20 10 10 20 30 city Bombay Chennai Chennai Bombay Delhi

SHIPMENTS Supp-no S1 S1 S1 S1 S2 S2 S3 S4 S4 Part no P1 P2 P3 P4 P1 P2 P2 P4 P5 Qty 300 200 400 200 300 400 200 300 400

The revised structure overcomes all of the update problems. For example, the information that S5 is located in Delhi can be inserted in supplier relation, even though S5 does not currently supply any parts. Even though the shipments connecting S3 and P2 are deleted from shipment relation, we dont lose the information that S3 is located in Chennai. In the revised structure, the city for a given supplier appears just once, not many times. Thus we can change the city for S1 from Bombay to Ahmadabad by changing it once and for all in the relevant supplier tuple. SECOND NORMAL FORM DEFN: A relation is in 2NF if and only if it is in 1NF and fully functionally dependent i.e., every non key attribute should be entirely dependent on key and not part of it. In our example the relation supp-ship is not in second normal form because qty is dependent on only {supp-no, partno} and {status , city} are dependent on supp-no. But the revised structures supplier and shipments are in second normal form because the non key attributes are dependent on the keys supp-no, {supp-no, part no}. There are still problems in this structure as given below. INSERT

We cannot insert the fact that a particular city has a particular status. For ex, we cannot state that any supplier in Nellai must have a status of 50 until we have some supplier actually located in that city. DELETE

If a particular tuple in supplier is deleted, we delete not only the information for the supplier concerned but also the information that the city has that particular status. For ex, if we delete the tuple S5 form supplier, we lose the information that the status for Delhi is 30. UPDATE

The status for a given city appears in supplier many times in general. Thus, if we need to change the status for Bombay from 20 to 30, it is needed to find every tuple for Bombay or otherwise the database may go to an inconsistent state. The solution to this problem is to perform third normal form. THIRD NORMAL FORM A relation is in 3NF if and only if it is in 2NF and every non key attribute is non transitively dependent on the primary key.

In the revised structure the supplier relation has the attribute {supp-no, city , status }. In this supp-no value determines a city value, and that city value in turn determines the status value. This is called as transitive dependency. According to 3NF, transitive dependency should be avoided. So the supplier relation is decomposed into two new relations supp-city and supp-status as below. Supp-city {supp-no, city} Supp-status {city, status} The FD diagrams and sample values for these relations are given below: FDS:
Suppno city city statu s

SUPP-CITY: Supp-no S1 S2 S3 S4 S5 City Bombay Chennai Chennai Bombay Delhi

SUPP-STATUS: City Delhi Bombay Chennai Nellai Status 30 20 10 50

Now Nellai can be included in supp-status with status 50 although it is not in supp-city. If S5 is deleted from supp-city only that associated city Delhi is deleted and status is retained. Finally updation of status for Bombay from 20 to 30 can be done easily as the city values are unique in supp-city. BOYCE/CODD NORMAL FORM: Third normal form will not deal the following situations satisfactorily. The relation had two or more candidate keys such that The candidate keys are composite and They are overlapped. DEFN: A relation is in BCNF if and only if every determinant is a candidate key. Example: Consider the relation SJT{student, subject, teacher} The following constrains apply to the relation: For each subject, each student of that subject is taught only by one teacher. Each teacher teaches only one subject (but each subject is taught by several teachers) The below table shows the sample values for SJT Student Subject teacher Sarala Maths Prof.raj Sarala Physics Prof.arul Uma Maths Prof.raj Uma Physics Prof. elakkia From the given first constraint, we have the FD {student, subject}-> teacher From the given second constraint, we have the FD teacher->subject

The FD diagram for SJT is shown below:

S J

student subject teacher

s J

The relation has two overlapping candidate keys {student , subject} and {student, teacher}. The relation SJT is in 3NF but not in BCNF. The relation SJT suffers from certain update anamolies. For ex, if we wish to delete the information that uma is studying physics, then we lose the information that prof.elakkia is teaching physics. This difficulty is caused by the fact that the attribute teacher is a determinant but not a candidate key. The solution to the problem is to split or decompose the original relation by two BCNF projections as below: ST{student, teacher} TF{Teacher, subject} DEPENDENCY PRESERVATION: Let R be a relation. Let F be the given set of dependencies on relation R. The projection of F on Ri is denoted by Ri(F) (where Ri is the sub set of R) is the set of dependencies x->y in F + (the closure of F) such that the attributes in XUY are all contained in Ri. Hence the projection of F on each relation Ri in the decomposition D is the set of functional dependencies in F+, such that all their left and right side attributes are in Ri. A Decomposition D = {R1, R2,....,Rm}of R is dependency preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F. ie., (( R1(F))U....U(Rm(F)))+=F+ If a decomposition is not dependency preserving, some dependency is lost in the decomposition. ALGORITHM: To check dependency preservation and loseless join property 1. Find a minimal cover G for F 2. For each LHS X of a FD that appears in G, create a relation schema in D with attributes {X U {A1}U{A2}....U{AK}}, where X->A1, X->A2..... X->AK are only the dependencies in G with X as LHS.(X is key of the relation) 3. If none of the relation schemas in D contains a key of R, then create one or more relation schema in D that contains attributes that form a key of R. MULTIVALUED DEPENDENCIES AND FOURTH NORMAL FORM: MULTIVALUED DEPENDENCIES

Multivalued Dependency (MVD) represents a dependency between attributes( for ex X,Y and Z) in a relation, such that for each value of X there is a set of values for Y and a set of values for Z. However, the set of values for Y and Z are independent of each other. MVD is represented as X-->>Y(X multi determines Y) By symmetry whenever X-->>Y holds in R, so does X-->>Z. hence it can be written as X-->>Y|Z. Multivalued dependencies are a consequence of first normal form (INF), which disallowed an attribute in a tuple to have a set of values. If two or more Multivalued independent attributes are available in the same relation we get into a problem of having to repeat every value of one of the attributes with every value of the other attribute to keep the relation state consistent and to maintain the independence among the attributes involved. For example, consider the relation emp shown below: Emp Ename Smith Smith Smith Smith X Y X Y Pname john Anna Anna john Dname

A tuple in this emp relation represents the fact that an employee whose name is Ename works on the project whose name is Pname and has a dependent whose name is Dname. An employee may work on several projects and may have several dependents and the employees projects and the dependents are independent of one another. To keep the relation state consistent we must have a separate tuple to represent every combination of an employees dependent and an employees project. This constraint is specified as a multivalued dependency on the Emp relation. In the ex, the MVDs are Ename-->>Pname and Ename-->>Dname or Ename-->>Pname|Dname. The Emp with Ename Smith works on projects with Pname X and Y and has 2 dependents with Dname john and Anna. If we stored only the first two tuples in emp(< smith, X, john > and < smith, Y, Anna>) , we would incorrectly show associations between project X, john and project Y, Anna. These should not be conveyed, because no such meaning is intended in this relation. Hence we must store the other 2 tuples (< smith, X, Anna>) and (< smith, Y, john>) to show that {X,Y} and {john,anna} are associated only with smith ie., there is no association between Pname and Dname which mean that the two attributes are independent.

Das könnte Ihnen auch gefallen