Sie sind auf Seite 1von 61

Conceptual Data Models

The Enhanced Entity-Relationship Model


Enhanced Entity-Relationship Model
 Since 1980s there has been an increase in emergence of new database
applications with more demanding requirements

 Basic concepts of ER modeling are not sufficient to represent requirements of


newer, more complex applications

 Response is development of additional ‘semantic’ modeling concepts

 Semantic concepts are incorporated into the original ER model and called the
Enhanced Entity-Relationship (EERM) model.

 Examples of additional concepts of EERM model are:


™ specialization / generalization;
™ aggregation;
™ composition.
EERM - Specialization / Generalization

 Superclass
An entity type that includes one or more distinct subgroupings of
its occurrences

 Subclass
A distinct subgrouping of occurrences of an entity type
EERM - Specialization / Generalization

 Superclass/subclass relationship is one-to-one (1:1) also called


isa relationship

 Superclass may contain overlapping or distinct subclasses

 Not all members of a superclass need be a member of a subclass

 Attribute Inheritance
An entity in a subclass represents same ‘real world’ object as in
superclass, and may possess subclass-specific attributes, as well
as those associated with the superclass.
EERM - Specialization / Generalization

 Specialization
Process of maximizing differences between members of an
entity by identifying their distinguishing characteristics.

 Generalization
Process of minimizing differences between entities by
identifying their common characteristics.
EERM - Specialization / Generalization (examples)

AllStaff table holding details of all Staff


EERM - Specialization / Generalization (examples)

Specialization / Generalization of Staff Entity with subclasses for


Job Roles
EERM - Specialization / Generalization (examples)
Specialization / Generalization of Staff Entity with subclasses for
Job Roles and Contracts of Employment
EERM - Specialization / Generalization (examples)
Specialization / Generalization of Staff Entity with with a shared
subclass and a subclass with its own subclass
EERM - Specialization / Generalization (Constraints)

Constraints on Specialization / Generalization:

 Participation constraints
™ determine whether every member in superclass must be a
member of at least one subclass
™ a specialization/generalization may be mandatory or optional

 Disjoint constraints
™ describe a relationship between members of the subclasses and
indicates whether a member of a superclass can be a member of
one, or more than one, subclasses
™ a specialization/generalization may be disjoint or non-disjoint
EERM - Specialization / Generalization (Constraints)

Based on the above constraints of specialization / generalization


relationship can fall into one of the following four type:

Participation Constraints
mandatory optional
mandatory & optional &
Disjoint disjoint disjoint disjoint
Constr
aints non- mandatory & optional &
disjoint non-disjoint non-disjoint
EERM - Specialization / Generalization (Constraints - Examples)

Staff Superclass with Supervisor and Manager Subclasses


EERM - Specialization / Generalization (Constraints - Examples)

Owner Superclass with PrivateOwner and BusinessOwner


Subclasses
EERM - Specialization / Generalization (Constraints - Examples)

Person Superclass with Staff, PrivateOwner, and Client Subclasses


EERM - Specialization / Generalization (Constraints - Examples)

An EERD of Branch View with Specialization/Generalization


EERM - Aggregation
Aggregation
represents a ‘has-a’ or ‘is-part-of’ relationship between entity
types, where one represents the ‘whole’ and the other ‘the part’
EERM - Composition
Composition
a specific form of aggregation that represents an association
between entities, where there is a strong ownership and
coincidental lifetime between the ‘whole’ and the ‘part’
Design Principles

 Faithfulness
A design should be faithful to the specifications of the applications, i.e. it should
reflect reality.
 Avoid redundancy
Be careful to say everything once, otherwise you may end up producing a confusing
and inconsistent design.
 Simplicity
Avoid introducing more elements into your design than it is absolutely necessary.
 Choose the correct relationships
Adding every relationship is not always a good idea. It can lead to redundancy,
storage waste, complex updates, but it can also not represent faithfully users’
perception of relationships (connection traps).
To overcome the problem find out the validity of any assumptions you make and also
the queries that will be asked.
Design Principles
 Picking the Right Kind of Element
Sometimes options exist regarding the type of design elements used to
represent reality. In general an attribute is simpler to implement than either an
entity or a relationship. However, making everything an attribute is not wise
either.
In general the following rule can be applied:
Let E be an entity
™ whose attributes collectively identify the entity, i.e. if E has more than 1 attribute
then no attribute must depend on the other attributes and
™ that is involved only in one-many relationships with E always in the one side of
the relationship is in the 1-side and
™ that is not involved in a relationship with another entity more than once
Then E could be removed and its attributes should become (suitably
renamed, if necessary) attributes of each entity it is related to. If E participates
in a multi-way relationship then its attributes should be made attributes of the
multi-way relationship instead.
The Relational Data Model
Relational Model - Instances of Branch and Staff Relations
Relational Model - Examples of Attribute Domains
Relational Model (Terminology)

Relation
(conceptually) a table with columns and rows
Attribute
a named column of a relation
Domain
the set of allowable values for one or more attributes
Tuple
a row of a relation
Degree
the number of attributes in a relation
Cardinality
the number of tuples in a relation
Relational Database
a collection of normalized relations with distinct relation names
Alternative Terminology for Relational Model
Data Redundancy

A Major aim of relational database design is to group attributes into


relations to minimize data redundancy and reduce file storage space
required by base relations.
Data Redundancy

 StaffBranch relation has redundant data: details of a branch are


repeated for every member of staff.

 In contrast, branch information appears only once for each


branch in Branch relation and only branchNo is repeated in Staff
relation, to represent where each member of staff works.
Update Anomalies

 Relations that contain redundant information may potentially


suffer from update anomalies.

 Types of update anomalies include:


™ Insertion,
™ Deletion,
™ Modification.
Database Relations

Relation schema
Named relation defined by a set of attribute and domain name
pairs

Relational database schema


Set of relation schemas, each with a distinct name
Properties of Relations

 Relation name is distinct from all other relation names in relational schema.

 Each cell of relation contains exactly one atomic (single) value (1st Normal
Form / 1NF Assumption)

 Each attribute has a distinct name.

 Values of an attribute are all from the same domain.

 Each tuple is distinct; there are no duplicate tuples (Why ?).

 Order of attributes has no significance (Why ?).

 Order of tuples has no significance, theoretically (Why?).


Relational Keys

 Superkey
An attribute, or a set of attributes, that uniquely identifies a tuple
within a relation.

 Candidate Key
™ Superkey (K) such that no proper subset is a superkey within the
relation.
™ In each tuple of R, values of K uniquely identify that tuple
(uniqueness).
™ No proper subset of K has the uniqueness property (irreducibility).
Relational Keys

 Primary Key
Candidate key selected to identify tuples uniquely within relation.

 Alternate Keys
Candidate keys that are not selected to be primary key.

 Foreign Key
Attribute, or set of attributes, within one relation that matches candidate key of
some (possibly same) relation.
Relational Integrity

Null
 Represents value for an attribute that is currently unknown or not
applicable for tuple
 Deals with incomplete or exceptional data
 Represents the absence of a value and is not the same as zero
or spaces, which are values.
Relational Integrity

Entity Integrity
In a base relation, no attribute of a primary key can be null

Referential Integrity
If foreign key exists in a relation, either foreign key value must
match a candidate key value of some tuple in its home relation or
foreign key value must be wholly null

Enterprise Constraints
Additional rules specified by users or database administrators
Mathematical definition of relation

 Consider two sets, D1 and D2, where D1 = {2, 4} & D2 = {1, 3, 5}.

 Cartesian product, D1 × D2, is the set of all ordered pairs, where the first
element is member of D1 and second element is member of D2.

D1 × D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}

 An alternative way of representing the Cartesian product D1 × D2 is to


describe its elements, i.e. all combinations of elements of D1 and D2, such
that in each such combination (or pair) the first component is an element of
D1 whereas the second is an element of D2.

D1 × D2 = {(x, y) | x ∈D1, y ∈D2}


Mathematical definition of relation

 Any subset of Cartesian product is a relation;


e.g. let R = {(2, 1), (4, 1)} ⊂ D1 × D2, R is a relation

 May specify which pairs are in relation using some condition for
selection; e.g.
second element is 1:
R = {(x, y) | x ∈D1, y ∈D2, and y = 1}
first element is always twice the second:
S = {(x, y) | x ∈D1, y ∈D2, and x = 2y}
Mathematical definition of relation

 Consider three sets D1, D2, D3 with Cartesian Product D1 × D2


× D3 ; e.g.

D1 = {1, 3} D2 = {2, 4} D3 = {5, 6}

D1 × D2 × D3 =
{(1,2,5), (1,2,6), (1,4,5), (1,4,6), (3,2,5), (3,2,6), (3,4,5), (3,4,6)}

 Any subset of these ordered triples is a relation.


Mathematical definition of relation

 The Cartesian product of n sets (D1, D2, . . ., Dn) is a set of tuples:

D1 × D2 × . . . × Dn = { (d1, d2, . . . , dn) | d1 ∈D1, d2 ∈D2, ... , dn∈Dn}

n
 Usually we write × Di instead of D1 × D2 × . . . × Dn
i =1

 Any set of n-tuples from this Cartesian product is a relation on the


n sets.
Views

Base Relation
Named relation corresponding to an entity in conceptual schema,
whose tuples are physically stored in database.

View
Dynamic result of one or more relational operations operating on
base relations to produce another relation.
Views

 A virtual relation that does not necessarily actually exist in the


database but is produced upon request, at time of request.

 Contents of a view are defined as a query on one or more base


relations.

 Views are dynamic, meaning that changes made to base


relations that affect view attributes are immediately reflected in
the view.
Purpose of Views

 Provides powerful and flexible security mechanism by hiding


parts of database from certain users.

 Permits users to access data in a customized way, so that same


data can be seen by different users in different ways, at same
time.

 Can simplify complex operations on base relations.


Updating Views
 All updates to a base relation should be immediately reflected in all views that
reference that base relation
 If view is updated, underlying base relation should reflect change
 There are restrictions on types of modifications that can be made through
views:
™ Updates are allowed if query involves a single base relation and contains a
candidate key of base relation
™ Updates are not allowed involving multiple base relations
™ Updates are not allowed involving aggregation or grouping operations

 Classes of views are defined as:


™ theoretically not updateable
™ theoretically updateable
™ partially updateable
Relational Languages
Relational Languages

 Relational algebra and relational calculus are formal languages


associated with the relational model.

 Informally,
™ relational algebra is a (high-level) procedural language (how data
is to be manipulated), whereas
™ relational calculus a non-procedural language (what data is
required).
Relational Algebra

 Relational algebra operations work on one or more relations to


define another relation without changing the original relations.

 Both operands and results are relations, so output from one


operation can become input to another operation. This property is
called closure.

 Allows expressions to be nested, just as in arithmetic.


Relational Algebra

 5 basic operations in relational algebra:


™ Selection,
™ Projection,
™ Cartesian product
™ Union and
™ Set Difference.

 These perform most of the data retrieval operations needed.

 Also have Join, Intersection, and Division operations, which can


be expressed in terms of 5 basic operations.
Relational Algebra Operations
Relational Algebra Operations
Relational Calculus

 Relational calculus query specifies what is to be retrieved rather


than how to retrieve it.
™ No description of how to evaluate a query.

 Interested in finding tuples for which a predicate is true. Based on


use of tuple variables.

 In first-order logic (or predicate calculus), predicate is a truth-


valued function with arguments.
Tuple Relational Calculus

 When we substitute values for the arguments, function yields an expression,


called a proposition, which can be either true or false.
 Tuple variable is a variable that ‘ranges over’ a named relation: ie., variable
whose only permitted values are tuples of the relation.

 Specify range of a tuple variable S as the Staff relation as:


™ Staff(S)
 To find set of all tuples S such that P(S) is true:
™ {S | P(S)}
Relational Calculus

 If predicate contains a variable (e.g. ‘x is a member of staff’),


there must be a range for x.

 When we substitute some values of this range for x, proposition


may be true; for other values, it may be false.

 When applied to databases, relational calculus has forms: tuple


and domain.
Relational Completeness

 Relational algebra and relational calculus are equivalent to one


another, i.e. for every relation algebra expression we can write a
relational calculus that will produce the same result (and vice
versa)

 A language that produces a relation that can be derived using


relational calculus is relationally complete.
Other Languages

 Transform-oriented languages are non-procedural languages that use


relations to transform input data into required outputs (e.g. SQL).

 Graphical languages provide user with picture of the structure of the relation.
User fills in example of what is wanted and system returns required data in
that format (e.g. QBE).

 4GLs can create complete customized application using limited set of


commands in a user-friendly, often menu-driven environment.

 Some systems accept a form of natural language, sometimes called a 5GL,


although this development is still a an early stage.
The road from
the EE-R model
to
the Relational Model
Handling strong entity types

For each strong entity type E in an EER schema

™ create a relation R (preferably with the same name)

™ Include all the simple attributes of E

™ Include ONLY simple component attributes of a composite


attribute

™ Choose one of the key attributes of E as the primary key for R (if
the chosen key for E is composite, then the set of simple attributes
that form it will together form the primary key of R).
7
Handling weak entity types
For each weak entity type, say W, in the EER schema, with owner
entity type E,
™ create a relation
™ include all the simple attributes (or simple components of
composite attributes) of W
™ include also as foreign key attributes of R any primary key
attribute(s) of the relations that correspond to the owner entity
type(s)
™ choose as the primary key of R the combination of the primary
keys of the owner(s) and the partial key of the week entity type W.

Note: The identifying relationship of W does not need to be created,


because the attributes of such a relationship are always a subset of
the attributes for the weak entity type itself and thus it does not provide
any additional information. 7
Handling relationship types
For every relationship R in an EERD identify the relations
that correspond to the entity types participating in R, and
create a relation that:
™ includes the primary key of each relation corresponding to an
entity type involved in the relationship R.
™ includes all the simple attributes of R, if any.
™ includes ONLY simple component attributes of a composite
attributes of R, if any.
™ Attributes of an entity type involved several times in a relationship
must have its attributes renamed in the corresponding relation.
™ The primary key of relation R is a combination of the primary
key(s) of all the relations corresponding to entity types involved in
the relationship R.
7
Handling binary 1:1 relationship types

 If the relationship R is a binary 1:1 relationship then


™ the schema of the relation corresponding to either of the two entity
types participating in R can be amended to include all the
attributes of the corresponding relation R.
™ It is preferable to choose the relation that corresponds to an entity
type with mandatory participation in R.

7
Handling binary 1 : M relationship types

 If the relationship R is a binary 1:* relationship then the schema of


the relation corresponding to *-side entity type of the relationship
can be amended to include:
™ all the simple attributes of the relationship R
™ and also,
™ as a foreign key, the primary key attributes of relation
corresponding to the other entity type of the relationship.

7
Handling Multi-valued Attributes

For each multi-valued attribute A,


™ create a new relation R that includes an attribute corresponding to
A plus the primary key attribute K (as a foreign key in R) of the
relation that represents the entity type or relationship type that has
A as an attribute.
™ The primary key of R is the combination of A and K.

™ If the multi-valued attribute is composite then we include its simple


components.

7
Handling specialization relationship types

Convert each specialization with m subclasses {S1, S2, …, Sm}


and (generalized) superclass C, where the attributes of C are
{key, a1, a2, …, an}, into relation schemas using one of the
following options:
™ Create a relation L for C with attributes attr(L) = {key, a1, a2, …,
an}, where key is again the primary key. Create a relation Li for
each subclass Si, with attributes attr(Li) = {key} ∪ attr(Si), where
key is again the primary key.
™ Create a relation Li for each subclass Si, with attributes

attr(Li) = {key, a1, a2, …, an} ∪ attr(Si),


where key is again the primary key.
7
Handling specialization relationship types (cont.)

™ Create a single relation L with attributes


attr(L) = {key,a1,a2,…,an}∪attr(S1)∪attr(S2)∪…∪attr(Sm)∪{t},
where key is again the primary key. This option is for a specialization whose
classes are disjoint, and t is a type attribute that indicates the subclass to
which each tuple belongs, if any. This option generates too many tuples
containing null values.
™ Create a single relation L with attributes
attr(L) = {key,a1,a2,…,an}∪attr(S1)∪attr(S2)∪…∪attr(Sm)∪{t1, t2, …, tm},
where key is again the primary key. This option is for a specialization whose
classes are overlapping, and each ti is a Boolean attribute that indicates
whether a tuple belongs to subclass Si.

Das könnte Ihnen auch gefallen