You are on page 1of 18

CMPT-354-98.

2 Lecture Notes

June 23, 1998

Chapter 7

Relational Database Design


The goal of relational database design is to generate a set of schemas that allow us to Store information without unnecessary redundancy. Retrieve information easily and accurately.

7.1 Pitfalls in Relational DB Design


A bad design may have several properties, including: Repetition of information. Inability to represent certain information. Loss of information.

7.1.1 Representation of Information


1. Suppose we have a schema, Lending-schema, and suppose an instance of the relation is Figure 7.1. t assets is the assets for t bname t bcity is the city for t bname bname bcity assets cname loan amount SFU Burnaby 2M Tom L-10 10K SFU Burnaby 2M Mary L-20 15K Downtown Vancouver 8M Tom L-50 50K Figure 7.1: Sample lending relation. 1
Lending-schema = bname, bcity, assets, cname, loan, amount

2. A tuple t in the new relation has the following attributes:

2 bname bcity assets cname SFU Burnaby 2M Tom SFU Burnaby 2M Mary Downtown Vancouver 8M Tom

CHAPTER 7. RELATIONAL DATABASE DESIGN


cname loan amount Tom L-10 10K Mary L-20 15K Tom L-50 50K

Figure 7.2: The decomposed lending relation. t loan is the loan number made by branch t bname to t cname . t amount is the amount of the loan for t loan If we wish to add a loan to our database, the original design would require adding a tuple to borrow: SFU, L-31, Turner, 1K In our new design, we need a tuple with all the attributes required for Lending-schema. Thus we need to insert SFU, Burnaby, 2M, Turner, L-31, 1K We are now repeating the assets and branch city information for every loan. Repetition of information wastes space. Repetition of information complicates updating. Under the new design, we need to change many tuples if the branch's assets change. Let's analyze this problem: We know that a branch is located in exactly one city. We also know that a branch may make many loans. The functional dependency bname ! bcity holds on Lending-schema. The functional dependency bname ! loan does not. These two facts are best represented in separate relations. Another problem is that we cannot represent the information for a branch assets and city unless we have a tuple for a loan at that branch. Unless we use nulls, we can only have this information when there are loans, and must delete it when the last loan is paid o .

3. 4. 5.

6. 7.

8. 9.

7.2 Decomposition
1. The previous example might seem to suggest that we should decompose schema as much as possible. Careless decomposition, however, may lead to another form of bad design. 2. Consider a design where Lending-schema is decomposed into two schemas

Branch-customer-schema = bname, bcity, assets, cname Customer-loan-schema = cname, loan, amount 3. We construct our new relations from lending by: branch-customer = bname;bcity;assets;cnamelending customer-loan = cname;loan;amount lending 4. It appears that we can reconstruct the lending relation by performing a natural join on the two new schemas.

7.2. DECOMPOSITION
bname bcity assets cname loan amount SFU Burnaby 2M Tom L-10 10K SFU Burnaby 2M Tom L-50 50K SFU Burnaby 2M Mary L-20 15K Downtown Vancouver 8M Tom L-10 10K Downtown Vancouver 8M Tom L-50 50K Figure 7.3: Join of the decomposed relations.

5. Figure 7.3 shows what we get by computing branch-customer 1 customer-loan. 6. We notice that there are tuples in branch-customer 1 customer-loan that are not in lending. 7. How did this happen? The intersection of the two schemas is cname, so the natural join is made on the basis of equality in the cname. If two lendings are for the same customer, there will be four tuples in the natural join. Two of these tuples will be spurious - they will not appear in the original lending relation, and should not appear in the database. Although we have more tuples in the join, we have less information. Because of this, we call this a lossy or lossy-join decomposition. A decomposition that is not lossy-join is called a lossless-join decomposition. The only way we could make a connection between branch-customer and customer-loan was through cname. 8. When we decomposed Lending-schema into Branch-schema and Loan-info-schema, we will not have a similar problem. Why not?
Branch-schema = bname, bcity, assets Branch-loan-schema = bname, cname, loan, amount

The only way we could represent a relationship between tuples in the two relations is through bname. This will not cause problems. For a given branch name, there is exactly one assets value and branch city. 9. For a given branch name, there is exactly one assets value and exactly one bcity; whereas a similar statement associated with a loan depends on the customer, not on the amount of the loan which is not unique. 10. We'll make a more formal de nition of lossless-join: Let R be a relation schema. A set of relation schemas fR1; R2; : : :; Rng is a decomposition of R if R = R1 R2 : : : Rn That is, every attribute in R appears in at least one Ri for 1  i  n. Let r be a relation on R, and let ri = R r for 1  i  n That is, fr1; r2; : : :rng is the database that results from decomposing R into fR1; R2; : : :Rng. It is always the case that: r r1 1 r2 1 : : : 1 rn
i

CHAPTER 7. RELATIONAL DATABASE DESIGN


To see why this is, consider a tuple t 2 r. When we compute the relations fr1 ; r2; : : :rng, the tuple t gives rise to one tuple ti in each ri. These n tuples combine together to regenerate t when we compute the natural join of the ri . Thus every tuple in r appears in 1n i=1 ri . However, in general, r 6= r1 1 r2 1 : : : 1 rn We saw an example of this inequality in our decomposition of lending into branch-customer and customer-loan. In order to have a lossless-join decomposition, we need to impose some constraints on the set of possible relations. Let C represent a set of constraints on the database. A decomposition fR1; R2; : : :; Rng of a relation schema R is a lossless-join decomposition for R if, for all relations r on schema R that are legal under C : r = R1 r 1 R2 r 1 : : : 1 R r
n

11. In other words, a lossless-join decomposition is one in which, for any legal relation r, if we decompose r and then recompose" r, we get what we started with no more and no less.

7.3 Normalization Using Functional Dependencies


We can use functional dependencies to design a relational database in which most of the problems we have seen do not occur. Using functional dependencies, we can de ne several normal forms which represent good" database designs.

7.3.1 Desirable Properties of Decomposition


1. We'll take another look at the schema which we saw was a bad design. 2. The set of functional dependencies we required to hold on this schema was:
bname ! assets bcity loan ! amount bname Lending-schema = bname, assets, bcity, loan, cname, amount

3. If we decompose it into

we claim this decomposition has several desirable properties.

Branch-schema = bname, assets, bcity Loan-info-schema = bname, loan, amount Borrow-schema = cname, loan

Lossless-Join Decomposition
1. We claim the above decomposition is lossless. How can we decide whether a decomposition is lossless? Let R be a relation schema. Let F be a set of functional dependencies on R. Let R1 and R2 form a decomposition of R.

7.3. NORMALIZATION USING FUNCTIONAL DEPENDENCIES

The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in F + : a R1 R2 ! R1 b R1 R2 ! R2 Why is this true? Simply put, it ensures that the attributes involved in the natural join R1 R2  are a candidate key for at least one of the two relations. This ensures that we can never get the situation where spurious tuples are generated, as for any value on the join attributes there will be a unique tuple in one of the relations. 2. We'll now show our decomposition is lossless-join by showing a set of steps that generate the decomposition: First we decompose Lending-schema into
Branch-schema = bname, bcity, assets Loan-info-schema = bname, cname, loan, amount Since bname ! assets bcity, the augmentation rule for functional dependencies implies that bname ! bname assets bcity Since Branch-schema Borrow-schema = bname, our decomposition is lossless join. Next we decompose Borrow-schema into Loan-schema = bname, loan, amount Borrow-schema = cname, loan As loan is the common attribute, and loan ! amount bname

This is also a lossless-join decomposition.

Dependency Preservation
1. Another desirable property in database design is dependency preservation. We would like to check easily that updates to the database do not result in illegal relations being created. It would be nice if our design allowed us to check updates without having to compute natural joins. To know whether joins must be computed, we need to determine what functional dependencies may be tested by checking each relation individually. Let F be a set of functional dependencies on schema R. Let fR1; R2; : : :; Rng be a decomposition of R. The restriction of F to Ri is the set of all functional dependencies in F + that include only attributes of Ri . Functional dependencies in a restriction can be tested in one relation, as they involve attributes in one relation schema. The set of restrictions F1; F2; : : :; Fn is the set of dependencies that can be checked e ciently. We need to know whether testing only the restrictions is su cient. Let F = F1; F2; : : :; Fn. F is a set of functional dependencies on schema R, but in general, F 6= F . However, it may be that F + = F + . If this is so, then every functional dependency in F is implied by F , and if F is satis ed, then F must also be satis ed. A decomposition having the property that F + = F + is a dependency-preserving decomposition.
0 0 0 0 0 0 0

CHAPTER 7. RELATIONAL DATABASE DESIGN


2. The algorithm for testing dependency preservation follows this method: compute F + for each schema Ri in D do

begin Fi := the restriction of F + to Ri; end F =; for each restriction Fi do begin F = F Fi end compute F + ; if F + = F + then return true else return false;
0 0 0 0 0

3. We can now show that our decomposition of Lending-schema is dependency preserving. The functional dependency can be tested in one relation on Branch-schema. The functional dependency can be tested in Loan-schema. 4. As the above example shows, it is often easier not to apply the algorithm shown to test dependency preservation, as computing F + takes exponential time. 5. An Easier Way To Test For Dependency Preservation Really we only need to know whether the functional dependencies in F and not in F are implied by those in F . In other words, are the functional dependencies not easily checkable logically implied by those that are? Rather than compute F + and F +, and see whether they are equal, we can do this:
0 0 0

bname ! assets bcity

loan ! amount bname

Find F , F , the functional dependencies not checkable in one relation. See whether this set is obtainable from F by using Armstrong's Axioms. This should take a great deal less work, as we have usually just a few functional dependencies to work on.
0 0

Use this simpler method on exams and assignments unless you have exponential time available to you.

Repetition of Information
1. Our decomposition does not su er from the repetition of information problem. Branch and loan data are separated into distinct relations. Thus we do not have to repeat branch data for each loan. If a single loan is made to several customers, we do not have to repeat the loan amount for each customer. This lack of redundancy is obviously desirable. We will see how this may be achieved through the use of normal forms.

7.3. NORMALIZATION USING FUNCTIONAL DEPENDENCIES

7.3.2 Boyce-Codd Normal Form

1. A relation schema R is in Boyce-Codd Normal Form BCNF with respect to a set F of functional dependencies if for all functional dependencies in F + of the form ! , where R and R, at least one of the following holds: ! is a trivial functional dependency i.e. . is a superkey for schema R. 2. A database design is in BCNF if each member of the set of relation schemas is in BCNF. 3. Let's assess our example banking design:
Customer-schema = cname, street, ccity cname ! street ccity Branch-schema = bname, assets, bcity bname ! assets bcity Loan-info-schema = bname, cname, loan, amount loan ! amount bname Customer-schema and Branch-schema are in BCNF. 4. Let's look at Loan-info-schema: We have the non-trivial functional dependency loan ! amount, and loan is not a superkey. Thus Loan-info-schema is not in BCNF.

We also have the repetition of information problem. For each customer associated with a loan, we must repeat the branch name and amount of the loan. We can eliminate this redundancy by decomposing into schemas that are all in BCNF. 5. If we decompose into we have a lossless-join decomposition. Remember why? To see whether these schemas are in BCNF, we need to know what functional dependencies apply to them. For Loan-schema, we have loan ! amount bname applying. Only trivial functional dependencies apply to Borrow-schema. Thus both schemas are in BCNF. We also no longer have the repetition of information problem. Branch name and loan amount information are not repeated for each customer in this design. 6. Now we can give a general method to generate a collection of BCNF schemas. result := fRg; done := false; compute F + ; while not done do if there is a schema Ri in result that is not in BCNF
Loan-schema = bname, loan, amount Borrow-schema = cname, loan

7. This algorithm generates a lossless-join BCNF decomposition. Why?

then begin let ! be a nontrivial functional dependency that holds on Ri such that ! Ri is not in F + , and = ;; result = result ,Ri  Ri ,   ; ; end else done = true;

CHAPTER 7. RELATIONAL DATABASE DESIGN


We replace a schema Ri with Ri ,  and  ; . The dependency ! holds on Ri . Ri ,   ;  = . So we have R1 R2 ! R2, and thus a lossless join. 8. Let's apply this algorithm to our earlier example of poor database design: The set of functional dependencies we require to hold on this schema are A candidate key for this schema is floan, cnameg. We will now proceed to decompose: The functional dependency
bname ! assets bcity loan ! amount bname Lending-schema = bname, assets, bcity, loan, cname, amount

bname ! assets bcity holds on Lending-schema, but bname is not a superkey. We replace Lending-schema with Branch-schema = bname, assets, bcity Loan-info-schema = bname, loan, cname, amount Branch-schema is now in BCNF.

The functional dependency

These are both now in BCNF. We saw earlier that this decomposition is both lossless-join and dependency-preserving. 9. Not every decomposition is dependency-preserving. Consider the relation schema The set F of functional dependencies is
Banker-schema = bname, cname, banker-name

loan ! amount bname holds on Loan-info-schema, but loan is not a superkey. We replace Loan-info-schema with Loan-schema = bname, loan, amount Borrow-schema = cname, loan

banker-name ! bname cname bname ! banker-name The schema is not in BCNF as banker-name is not a superkey.

If we apply our algorithm, we may obtain the decomposition


Banker-branch-schema = bname, banker-name Cust-banker-schema = cname, banker-name

The decomposed schemas preserve only the rst and trivial functional dependencies. The closure of this dependency does not include the second one. Thus a violation of cname bname ! banker-name cannot be detected unless a join is computed. This shows us that not every BCNF decomposition is dependency-preserving. 10. It is not always possible to satisfy all three design goals: BCNF. Lossless join.

7.3. NORMALIZATION USING FUNCTIONAL DEPENDENCIES


Dependency preservation. 11. We can see that any BCNF decomposition of Banker-schema must fail to preserve
cname bname ! banker-name

12. Some Things To Note About BCNF There is sometimes more than one BCNF decomposition of a given schema. The algorithm given produces only one of these possible decompositions. Some of the BCNF decompositions may also yield dependency preservation, while others may not. Changing the order in which the functional dependencies are considered by the algorithm may change the decomposition. For example, try running the BCNF algorithm on R = A; B; C; D A ! B; C B !D D !B Then change the order of the last two functional dependencies and run the algorithm again. Check the two decompositions for dependency preservation.

7.3.3 Third Normal Form

1. When we cannot meet all three design criteria, we abandon BCNF and accept a weaker form called third normal form 3NF. 2. It is always possible to nd a dependency-preserving lossless-join decomposition that is in 3NF. 3. A relation schema R is in 3NF with respect to a set F of functional dependencies if for all functional dependencies in F + of the form ! , where R and R, at least one of the following holds: is a superkey for schema R. Each attribute A in , is contained in a candidate key for R. 4. A database design is in 3NF if each member of the set of relation schemas is in 3NF. 5. We now allow functional dependencies satisfying only the third condition. These dependencies are called transitive dependencies, and are not allowed in BCNF. 6. As all relation schemas in BCNF satisfy the rst two conditions only, a schema in BCNF is also in 3NF. 7. BCNF is a more restrictive constraint than 3NF. 8. Our Banker-schema decomposition did not have a dependency-preserving lossless-join decomposition into BCNF. The schema was already in 3NF though check it out. 9. We now present an algorithm for nding a dependency-preserving lossless-join decomposition into 3NF.

! is a trivial functional dependency.

10

CHAPTER 7. RELATIONAL DATABASE DESIGN

10. Note that we require the set F of functional dependencies to be in canonical form. let Fc be a canonical cover for F ; i := 0; for each functional dependency ! 2 Fc do if none of the schemas Rj , 1  j  i contains

then begin

end if none of the schemas Rj , 1  j  i contains a candidate key for R then begin end return R1; R2; : : :; Ri
i := i + 1; Ri := any candidate key for R

i := i + 1; Ri :=

11. Each relation schema is in 3NF. Why? A proof is given is Ullman 1988 . 12. The design is dependency-preserving as a schema is built for each given dependency. Lossless-join is guaranteed by the requirement that a candidate key for R be in at least one of the schemas. 13. To review our Banker-schema consider an extension to our example: The set F of functional dependencies is
banker-name ! bname o ce cname bname ! banker-name Banker-info-schema = bname, cname, banker-name, o ce

The for loop in the algorithm gives us the following decomposition:

Banker-o ce-schema = banker-name, bname, o ce Banker-schema = cname, bname, banker-name Since Banker-schema contains a candidate key for Banker-info-schema, the process is nished.

7.3.4 Comparison of BCNF and 3NF


1. We have seen BCNF and 3NF. It is always possible to obtain a 3NF design without sacri cing lossless-join or dependency-preservation. If we do not eliminate all transitive dependencies, we may need to use null values to represent some of the meaningful relationships. Repetition of information occurs. 2. These problems can be illustrated with Banker-schema. As banker-name ! bname , we may want to express relationships between a banker and his or her branch. Figure 7.4 shows how we must either have a corresponding value for customer name, or include a null. Repetition of information also occurs. Every occurrence of the banker's name must be accompanied by the branch name. 3. If we must choose between BCNF and dependency preservation, it is generally better to opt for 3NF. If we cannot check for dependency preservation e ciently, we either pay a high price in system performance or risk the integrity of the data.

7.4. NORMALIZATION USING MULTIVALUED DEPENDENCIES NOT TO BE COVERED


cname banker-name bname Bill John SFU Tom John SFU Mary John SFU null Tim Austin Figure 7.4: An instance of Banker-schema.

11

The limited amount of redundancy in 3NF is then a lesser evil. 4. To summarize, our goal for a relational database design is BCNF. Lossless-join. Dependency-preservation. 5. If we cannot achieve this, we accept 3NF Lossless-join. Dependency-preservation. 6. A nal point: there is a price to pay for decomposition. When we decompose a relation, we have to use natural joins or Cartesian products to put the pieces back together. This takes computational time.

7.4 Normalization Using Multivalued Dependencies not to be covered


1. Suppose that in our banking example, we had an alternative design including the schema: We can see this is not BCNF, as the functional dependency holds on this schema, and cname is not a superkey. 2. If we have customers who have several addresses, though, then we no longer wish to enforce this functional dependency, and the schema is in BCNF. 3. However, we now have the repetition of information problem. For each address, we must repeat the loan numbers for a customer, and vice versa.
cname ! street ccity BC-schema = loan, cname, street, ccity

7.4.1 Multivalued Dependencies

1. Functional dependencies rule out certain tuples from appearing in a relation. If A ! B, then we cannot have two tuples with the same A value but di erent B values. 2. Multivalued dependencies do not rule out the existence of certain tuples. Instead, they require that other tuples of a certain form be present in the relation. 3. Let R be a relation schema, and let R and R. The multivalued dependency

! !

12 t1 t2 t3 t4 a1    ai a1    ai a1    ai a1    ai

CHAPTER 7. RELATIONAL DATABASE DESIGN


ai+1    aj bi+1    bj ai+1    aj bi+1    bj R, , aj +1    an bj +1    bn bj +1    bn aj +1    an

Figure 7.5: Tabular representation of ! ! . name address Tom North Rd. Tom Oak St. Tom North Rd. Tom Oak St. car Toyota Honda Honda Toyota

Figure 7.6: name; address; car where name ! ! address and name ! ! car. holds on R if in any legal relation rR, for all pairs of tuples t1 and t2 in r such that t1 = t2 , there exist tuples t3 and t4 in r such that: t1 = t2 = t3 = t4 t3 = t1 t3 R , = t2 R , t4 = t2 t4 R , = t1 R , 4. Figure 7.5 textbook 6.10 shows a tabular representation of this. It looks horrendously complicated, but is really rather simple. A simple example is a table with the schema name; address; car, as shown in Figure 7.6. Intuitively, ! ! says that the relationship between and is independent of the relationship between and R , . If the multivalued dependency ! ! is satis ed by all relations on schema R, then we say it is a trivial multivalued dependency on schema R. Thus ! ! is trivial if or = R. 5. Look at the example relation bc relation in Figure 7.7 textbook 6.11. We must repeat the loan number once for each address a customer has. We must repeat the address once for each loan the customer has. This repetition is pointless, as the relationship between a customer and a loan is independent of the relationship between a customer and his or her address. loan 23 23 93 cname street ccity Smith North Rye Smith Main Manchester Curry Lake Horseneck

Figure 7.7: Relation bc, an example of redundancy in a BCNF relation.

7.4. NORMALIZATION USING MULTIVALUED DEPENDENCIES NOT TO BE COVERED


loan cname street ccity 23 Smith North Rye 27 Smith Main Manchester Figure 7.8: An illegal bc relation.

13

If a customer, say Smith", has loan number 23, we want all of Smith's addresses to be associated with that loan. Thus the relation of Figure 7.8 textbook 6.12 is illegal. If we look at our de nition of multivalued dependency, we see that we want the multivalued dependency to hold on BC-schema.
cname ! ! street ccity

6. Note that if a relation r fails to satisfy a given multivalued dependency, we can construct a relation r that does satisfy the multivalued dependency by adding tuples to r.
0

7.4.2 Theory of Multivalued Dependencies


1. We will need to compute all the multivalued dependencies that are logically implied by a given set of multivalued dependencies. Let D denote a set of functional and multivalued dependencies. The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. We can compute D+ from D using the formal de nitions, but it is easier to use a set of inference rules. 2. The following set of inference rules is sound and complete. The rst three rules are Armstrong's axioms from Chapter 5. a Re exivity rule: if is a set of attributes and , then ! holds. b Augmentation rule: if ! holds, and is a set of attributes, then ! holds. c Transitivity rule: if ! holds, and ! holds, then ! holds. d Complementation rule: if ! ! holds, then ! ! R , , holds. e Multivalued augmentation rule: if ! ! holds, and R and , then ! ! holds. f Multivalued transitivity rule: if ! ! holds, and ! ! holds, then ! ! , holds. g Replication rule: if ! holds, then ! ! . h Coalescence rule: if ! ! holds, and , and there is a such that R and = ; and ! , then ! holds. 3. An example of multivalued transitivity rule is as follows. loan ! ! cname and cname ! ! fcname; caddressg. Thus we have loan ! ! caddress, where caddress = fcname; caddressg , cname. ! fbank; accountg, and student id ! bank, An example of coalescence rule is as follows. If we have student name ! then we have student name ! bank. 4. Let's do an example: Let R = A; B; C; G; H;I  be a relation schema. Suppose A ! ! BC holds.

14

CHAPTER 7. RELATIONAL DATABASE DESIGN


The de nition of multivalued dependencies implies that if t1 A = t2 A , then there exists tuples t3 and t4 such that: t1 A = t2 A = t3 A = t4 A t3 BC = t1 BC t3 GHI = t2 GHI t4 GHI = t1 GHI t4 BC = t2 BC The complementation rule states that if A ! ! BC then A ! ! GHI . Tuples t3 and t4 satisfy A ! ! GHI if we simply change the subscripts. 5. We can simplify calculating D+ , the closure of D by using the following rules, derivable from the previous ones: Multivalued union rule: if ! ! holds and ! ! holds, then ! ! holds. Intersection rule: if ! ! holds and ! ! holds, then ! ! holds. Di erence rule: if ! ! holds and ! ! holds, then ! ! , holds and ! ! , holds. 6. An example will help: Let R = A; B; C; G; H; I  with the set of dependencies: A! !B B! ! HI CG ! H We list some members of D+ : A! ! CGHI : since A ! ! B , complementation rule implies that A ! ! R,B ,A, and R,B ,A = CGHI . A! ! HI : Since A ! ! B and B ! ! HI , multivalued transitivity rule implies that A ! ! HI , B . B ! H : coalescence rule can be applied. B ! ! HI holds, H HI and CG ! ! H and CG HI = ;, so we can satisfy the coalescence rule with being B , being HI , being CG, and being H . We conclude that B ! H . A! ! CG: now we know that A ! ! CGHI and A ! ! HI . By the di erence rule, A ! ! CGHI , HI = CG.

7.4.3 Fourth Normal Form 4NF


1. We saw that BC-schema was in BCNF, but still was not an ideal design as it su ered from repetition of information. We had the multivalued dependency cname ! ! street ccity, but no non-trivial functional dependencies. 2. We can use the given multivalued dependencies to improve the database design by decomposing it into fourth normal form. 3. A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form ! ! , where R and R, at least one of the following hold: ! ! is a trivial multivalued dependency. is a superkey for schema R. 4. A database design is in 4NF if each member of the set of relation schemas is in 4NF. 5. The de nition of 4NF di ers from the BCNF de nition only in the use of multivalued dependencies. Every 4NF schema is also in BCNF. To see why, note that if a schema is not in BCNF, there is a non-trivial functional dependency ! holding on R, where is not a superkey.

7.4. NORMALIZATION USING MULTIVALUED DEPENDENCIES NOT TO BE COVERED


Since ! implies ! ! , by the replication rule, R cannot be in 4NF. 6. We have an algorithm similar to the BCNF algorithm for decomposing a schema into 4NF: result := fRg; done := false; compute D+ ; while not done do if there is a schema Ri in result that is not in 4NF

15

then begin let ! ! be a nontrivial multivalued dependency that holds on Ri such that ! Ri is not in D+ , and = ;; result = result , Ri Ri ,   ; ; end else done = true;

7. If we apply this algorithm to BC-schema: cname ! ! loan is a nontrivial multivalued dependency and cname is not a superkey for the schema. We then replace BC-schema by two schemas: These two schemas are in 4NF. 8. We can show that our algorithm generates only lossless-join decompositions. Let R be a relation schema and D a set of functional and multivalued dependencies on R. Let R1 and R2 form a decomposition of R. This decomposition is lossless-join if and only if at least one of the following multivalued dependencies is in D+ : R1 R2 ! ! R1 R1 R2 ! ! R2 We saw similar criteria for functional dependencies. This says that for every lossless-join decomposition of R into two schemas R1 and R2, one of the two above dependencies must hold. You can see, by inspecting the algorithm, that this must be the case for every decomposition. 9. Dependency preservation is not as simple to determine as with functional dependencies. Let R be a relation schema. Let R1 ; R2; : : :Rn be a decomposition of R. Let D be the set of functional and multivalued dependencies holding on R. The restriction of D to Ri is the set Di consisting of: All functional dependencies in D+ that include only attributes of Ri . All multivalued dependencies of the form ! ! Ri where Ri and ! ! is in D+ . A decomposition of schema R is dependency preserving with respect to a set D of functional and multivalued dependencies if for every set of relations r1R1; r2R2 ; : : :rnRn such that for all i, ri satis es Di , there exists a relation rR that satis es D and for which ri = R r for all i.
i

Cust-loan-schema=cname, loan Customer-schema=cname, street, ccity

16 A B r1 : a1 b1 a2 b1 C G H r2 : c1 g1 h1 c2 g2 h2

CHAPTER 7. RELATIONAL DATABASE DESIGN


A I r3 : a1 i1 a1 i2 A C G r4 : a1 c1 g1 a2 c2 g2

Figure 7.9: Projection of relation r onto a 4NF decomposition of R. 10. What does this formal statement say? It says that a decomposition is dependency preserving if for every set of relations on the decomposition schema satisfying only the restrictions on D there exists a relation r on the entire schema R that the decomposed schemas can be derived from, and that r also satis es the functional and multivalued dependencies. 11. We'll do an example using our decomposition algorithm and check the result for dependency preservation. Let R = A; B; C; G; H;I . Let D be A! !B B! ! HI CG ! H R is not in 4NF, as we have A ! ! B and A is not a superkey. The algorithm causes us to decompose using this dependency into R1 = A; B  R2 = A; C; G; H;I  R1 is now in 4NF, but R2 is not. Applying the multivalued dependency CG ! ! H how did we get this?, our algorithm then decomposes R2 into R3 = C; G; H  R4 = A; C; G; I  R3 is now in 4NF, but R4 is not. Why? As A ! ! HI is in D+ why? then the restriction of this dependency to R4 gives us A ! ! I. Applying this dependency in our algorithm nally decomposes R4 into R5 = A; I  R6 = A; C; G The algorithm terminates, and our decomposition is R1; R3; R5 and R6. 12. Let's analyze the result. This decomposition is not dependency preserving as it fails to preserve B ! ! HI . Figure 7.9 textbook 6.14 shows four relations that may result from projecting a relation onto the four schemas of our decomposition. The restriction of D to A; B  is A ! ! B and some trivial dependencies. We can see that r1 satis es A ! ! B as there are no pairs with the same A value. Also, r2 satis es all functional and multivalued dependencies since no two tuples have the same value on any attribute. We can say the same for r3 and r4 . So our decomposed version satis es all the dependencies in the restriction of D. However, there is no relation r on A; B; C; G; H; I  that satis es D and decomposes into r1; r2; r3 and r4. Figure 7.10 textbook 6.15 shows r = r1 1 r2 1 r3 1 r4.

7.5. NORMALIZATION USING JOIN DEPENDENCIES NOT TO BE COVERED


A B C G H I a1 b1 c1 g1 h1 i1 a2 b1 c2 g2 h2 i2 Figure 7.10: A relation rR that does not satisfy B ! ! HI . Relation r does not satisfy B ! ! HI . Any relation s containing r and satisfying B ! ! HI must include the tuple a2 ; b1; c2; g2; h1; i1. However, CGH s includes a tuple c2 ; g2; h1 that is not in r2 . Thus our decomposition fails to detect a violation of B ! ! HI .

17

13. We have seen that if we are given a set of functional and multivalued dependencies, it is best to nd a database design that meets the three criteria: 4NF. Dependency Preservation. Lossless-join. 14. If we only have functional dependencies, the rst criteria is just BCNF. 15. We cannot always meet all three criteria. When this occurs, we compromise on 4NF, and accept BCNF, or even 3NF if necessary, to ensure dependency preservation.

7.5 Normalization Using Join Dependencies not to be covered


We will omit this section.

7.6 Domain-Key Normal Form not to be covered


We will omit this section.

7.7 Alternative Approaches to Database Design not to be covered


1. We have taken the approach of starting with a single relation schema and decomposing it. One goal was lossless-join decomposition. For that, we decided we needed to talk about the join of all relations on the decomposed database. Figure 6.20 shows a borrow relation decomposed in PJNF, where the loan amount is not yet determined. If we compute the natural join, we nd that all tuples referring to loan number 58 disappear. In other words, there is no borrow relation corresponding to the relations of gure 6.20. We call the tuples that disappear when the join is computed dangling tuples. Formally, if r1R1; r2R2 ; : : :rn Rn are a set of relations, a tuple t of relation ri is a dangling tuple if t is not in the relation R r1 1 r2 1 : : : 1 rn Dangling tuples may occur in practical applications. They represent incomplete information.
i

18

CHAPTER 7. RELATIONAL DATABASE DESIGN


The relation r1 1 r2 1 : : : 1 rn  is called a universal relation since it involves all the attributes in the universe de ned by R1 R2 : : : Rn. The only way to write a universal relation for our example is include null values. Because of the di culty in managing null values, it may be desirable to view the decomposed relations as representing the database rather than the universal relation. We still might need null values if we tried to enter a loan number without a customer name, branch name, or amount. In this case, a particular decomposition de nes a restricted form of incomplete information that is acceptable in our database. 2. The normal forms we have de ned generate good database design from the point of view of representation of incomplete information. We need a loan number to represent any information in our example. We do not want to store data for which the key attributes are unknown. The normal forms we have de ned do not allow us to do this unless we use null values. Thus our normal forms allow representation of acceptable incomplete information via dangling tuples while prohibiting the storage of undesirable incomplete information. 3. Another point in our method of design is that attribute names must be unique in the universal relation. We call this the unique role assumption. If we de ned the relations
branch-loanname, number loan-customernumber, name loannumber, amount expressions like branch-loan 1 loan-customer are possible but meaningless.

In SQL, there is no natural join operation, and so references to names are disambiguated by pre xing relation names. In this case, non-uniqueness might be both convenient and allowed. The unique role assumption is generally preferable, and if it is not made, special care must be taken when constructing a normalized design.