Sie sind auf Seite 1von 17

Database Design

Requirements Analysis user needs; what must database do? Conceptual Design high level description (often done with ER model) Logical Design translate ER into DBMS data model(Relational model) (NOW)Schema Refinement

consistency,normalization
Physical Design - indexes, disk layout Security Design - who accesses what Good Database Design no redundancy of FACT (!) no inconsistency no insertion, deletion or update anomalies no information loss no dependency loss

Informal Design Guidelines for Relational Databases 1. Semantics of the Relation Attributes 2. Redundant Information in Tuples and Update Anomalies 3. Null Values in Tuples 4. Spurious Tuples 1:Semantics of the Relation Attributes GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes). o Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation o Only foreign keys should be used to refer to other entities o Entity and relationship attributes should be kept apart as much as possible.

Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret. 2:Redundant Information in Tuples and Update Anomalies Information is stored redundantly o Wastes storage o Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Insertion anomalies Cannot insert a project unless an employee is assigned to it. Deletion anomalies a. When a project is deleted, it will result in deleting all the employees who work on that project. b. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project. Modification anomalies Changing the name of project number P1 from Billing to Customer-Accounting may cause this update to be made for all 100 employees working on project P1. GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any anomalies present, then note them so that applications can be made to take them into account. 3:Null Values in Tuples GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls: 2

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable 4:Spurious Tuples Bad designs for a relational database may result in erroneous results for certain JOIN operations The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations. Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normalization is used to design a set of relation schemas that is optimal from the point of view of database updating Normalization starts from a universal relation schema

1NF
Attributes must be atomic: they can be chars, ints, strings they cant be 1. _ tuples 2. _ sets 3. _ relations 4. _ composite 5. _ multivalued Considered to be part of the definition of relation Unnormalised Relations Name PaperList SWETHA EENADU, HINDU,DC PRASANNA EENADU,VAARTHA,HINDU This is not ideal. Each person is associated with an unspecified number of papers. The items in the PaperList column do not have a consistent form. Generally, RDBMS cant cope with relations like this. Each entry in a table needs to have a single data item in it. 3

This is an unnormalised relation. All RDBMS require relations not to be like this - not to havemultiple values in any column (i.e. no repeating groups) Name SWETHA SWETHA SWETHA PRASANNA PRASANNA PRASANNA PaperList EENADU HINDU DC HINDU EENADU VAARTHA

This clearly contains the same information. And it has the property that we sought. It is in First Normal Form (1NF). A relation is in 1NF if no entry consists of more than one value (i.e. does not have repeating groups) So this will be the first requirement in designing our databases:

Obtaining 1NF
1NF is obtained by Splitting composite attributes splitting the relation and propagating the primary key to remove multi valued attributes

There are three approaches to removing repeating groups from unnormalized tables: 1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. 2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation. 3.By finding maximum possible values for the multi valued attribute and adding that many attributes to the relation

Example:-

The DEPARTMENT schema is not in 1NF because DLOCATION is not a single valued attribute. The relation should be split into two relations. A new relation DEPT_LOCATIONS is created and the primary key of DEPARTMENT, DNUMBER, becomes an attribute of the new relation. The primary key of this relation is {DNUMBER, DLOCATION} Alternative solution: Leave the DLOCATION attribute as it is. Instead, we have one tuple for each location of a DEPARTMENT. Then, the relation is in 1NF, but redundancy exists.

A super key of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more. If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. A Prime attribute must be a member of some candidate key A Nonprime attribute is not a prime attributethat is, it is not a member of any candidate key

Functional Dependencies (FDs)


Definition of FD Inference Rules for FDs Equivalence of Sets of FDs Minimal Sets of FDs

Functional dependency describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, and B is functionally dependent on A ( denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consist of one or more attributes.)

Trivial functional dependency means that the right-hand side is a subset ( not necessarily a proper subset) of the left- hand side. Main characteristics of functional dependencies in normalization

Have a one-to-one relationship between attribute(s) on the left- and right- hand side of a dependency; hold for all time; are nontrivial. A set of all functional dependencies that are implied by a given set of functional dependencies X is called closure of X, written X+. A set of inference rule is needed to compute X+ from X. Inference Rules (RATPUP) 1. Relfexivity: If B is a subset of A, them A B

2. Augmentation:If A B, then A, C B,C 3. Transitivity: If A B and B C, then A C 4. Projection: If A B,C then A B and A C 5. Union: If A B and A C, then A B,C 6. psudotransitivity: If A B and C D, then A,C B, Example:-

F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER {DNAME, DMGRSSN}} From F of above example we can infer: SSN {DNAME, DMGRSSN}, SSN SSN, DNUMBER DNAME Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. A functional dependency AB is partially dependent if there is some attributes that can be removed from A and the dependency still holds. 7

2NF
Second normal form (2NF) is a relation that is in first normal form and every non-key attribute is fully functionally dependent on the key. The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the functional dependent attributes from the relation by placing them in a new relation along with a copy of their determinant.

Obtaining 2NF
_ If a nonprime attribute is dependent only on a proper part of a key, then we take the given attribute as well as the key attributes that determine it and move them all to a new relation _ We can bundle all attributes determined by the same subset of the key as a unit Transitive dependency A condition where A, B, and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). Third normal form (3NF) A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on the primary key. The normalization of 2NF relations to 3NF involves the removal of transitive dependencies by placing the attribute(s) in a new relation along with a copy of the determinant

3NF
R is in 3NF if and only if if X A then _ X is a superkey of R, or _ A is a key attribute of R 3NF: Alternative Definition R is in 3NF if every nonprime attribute of R is fully functionally dependent on every key of R, and 8

non transitively dependent on every key of R.

Obtaining 3NF Split off the attributes in the FD that causes trouble and move them, so there are two relations for each such FD The determinant of the FD remains in the original relation

Boyce-Codd normal form (BCNF) A relation is in BCNF, if and only if, every determinant is a key. The difference between 3NF and BCNF is that for a functional dependency A B, 3NF allows this dependency in a relation if B is a key attribute and A is not a super key, 9

where as BCNF insists that for this dependency to remain in a relation, A must be a super key.

BCNF
R is in Boyce-Codd Normal Form iff if X A then X is a superkey of R more restrictive than 3NF , preferablehas fewer anomalies

Obtaining BCNF
As usual, split the schema to move the attributes of the troublesome FD to another relation, leaving its determinant in the original so they remain connected

10

Decomposition: The process of decomposing the universal relation schema R into a set of relation schemas D = {R1,R2, , Rm} that will become the relational database schema by using the functional dependencies. Attribute preservation condition: Each attribute in R will appear in at least one relation schema Ri in the decomposition so that no attributes are lost. Dependency Preservation Property of a Decomposition: Definition: Given a set of dependencies F on R, the projection of F on Ri, denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X Y in F+ such that the attributes in X Y are all contained in Ri. Hence, the projection of F on each relation schema Ri in the decomposition D is the set of functional dependencies in F+, the closure of F, such that all their left- and right-hand-side attributes are in Ri. Dependency Preservation Property: A decomposition D = {R1, R2, ..., Rm} of R is dependencypreserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F; that is ((R1(F)) . . . (Rm(F)))+ = F+ Lossless (Non-additive) Join Property of a Decomposition: Definition: Lossless join property: a decomposition D = {R1, R2, ..., Rm} of R has the lossless (nonadditive) join property with respect to the set 11

of dependencies F on R if, for every relation state r of R that satisfies F, the following holds, where * is the natural join of all the relations in D: * ( R1(r), ..., Rm(r)) = r Multi-valued dependency (MVD) represents a dependency between attributes (for example, A, B and C) in a relation, such that for each value of A there is a set of values for B and a set of value for C. However, the set of values for B and C are independent of each other. A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A > B in relation R is defined as being trivial if B is a subset of A or AUB=R A MVD is defined as being nontrivial if neither of the above two conditions is satisfied. Fourth normal form (4NF) A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies. A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X >> Y in F+, X is a superkey for R. Definition: A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2, ..., Rn; that is, for every such r we have * (R1(r), R2(r), ..., Rn(r)) = r Note: an MVD is a special case of a JD where n = 2. A join dependency JD(R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R. Fifth normal form (5NF) Definition: A relation schema R is in fifth normal form (5NF) (or Project-Join Normal Form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, 12

for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R.

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF Every 4NF relation is in BCNF 13

Every 5NF relation is in 4NF

Diagrammatic notation of normal forms:-

Normalization

A technique for producing a set of relations with desirable

properties, given the data requirements of an enterprise

UNF is a table that contains one or more repeating groups 1NF is a relation in which the intersection of each row and column contains one
and only one value

2NF is a relation that is in 1NF and every non-primary-key attribute is fully


functionally dependent on the primary key.

3NF is a relation that is in 1NF, 2NF in which no non-primary-key attribute is


transitively dependent on the primary key

BCNF is a relation in which every determinant is a candidate key 4NF is a relation that is in BCNF and contains no trivial multi-valued
dependency

5NF is a relation that contains no join dependency


14

DBMS ARCHITECTURES: Centralized DBMS: Combines everything into single system including- DBMS software, hardware, application programs, and user interface processing software. User can still connect through a remote terminal however, all processing is done at centralized site.

15

Basic 2-tier Client-Server Architectures Specialized Servers with Specialized functions Print server File server DBMS server Web server Email server Clients can access the specialized servers as needed

Clients Provide appropriate interfaces through a client software module to access and utilize the various server resources. Clients may be diskless machines or PCs or Workstations with disks with only the client software installed. 16

Connected to the servers via some form of a network. (LAN: local area network, wireless network, etc.) DBMS Server Provides database query and transaction services to the clients Relational DBMS servers are often called SQL servers, query servers, or transaction servers Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as: ODBC: Open Database Connectivity standard JDBC: for Java programming access Client and server must install appropriate client module and server module software for ODBC or JDBC 1. A client program may connect to several DBMSs, sometimes called the data sources. 2. In general, data sources can be files or other non-DBMS software that manages data. 3. Other variations of clients are possible: e.g., in some object DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers, etc. Three Tier Client-Server Architecture Common for Web applications Intermediate Layer called Application Server or Web Server: Stores the web connectivity software and the business logic part of the application used to access the corresponding data from the database server Acts like a conduit for sending partially processed data between the database server and the client. Three-tier Architecture Can Enhance Security: Database server only accessible via middle tier Clients cannot directly access database server

17

Das könnte Ihnen auch gefallen