Sie sind auf Seite 1von 43

Practical Database

Design
By
Liyakathunisa
Dept of CS & E,
SJCE, Mysore
Introduction
 In this chapter we move from the
theory to the practice of DB design.
 The overall DB design activity has to
undergo a systematic process called
design methodology
Information System Life Cycle
 Resources that enable the collection, management, control, and
dissemination of information throughout an organization are
called Information System (IS).

 Database is the main part of IS.

 Here we examine the typical life cycle of IS and how DB fits into
this life cycle.

 The IS life cycle is often called the macro life cycle and the
DB life cycle is referred to as micro life cycle.
- Typical Phases of a Macro
Life-Cycle …
1. Feasibility analysis: Analyzing potential application areas,
costing, and setting priorities among applications.
2. Requirements collection and analysis: Gathering detailed
requirements and specifications from users of the system.
3. Design: Design of DB systems and associated applications.
4. Implementation: IS is implemented, DB loaded and
transactions performed.
5. Validation and testing: The system is tested against
performance criteria and behavior specifications.
6. Deployment, operation, and maintenance: The system is
deployed in real life. As new requirements crop up, they are
passed through all previous phases and later incorporated into
the system.
… - Typical Phases of a Macro
Life-Cycle
Feasibility analysis
Requirements collection and
analysis
Design
Implementation
Validation and testing
Deployment, operation, and
maintenance
- Typical Phases of a Micro
Life-Cycle …
1. System definition: The scope of DB systems, its users, and its
applications are defined.
2. DB Design: At the end of this phase, complete logical and
physical design of the chosen DBMS is ready.
3. DB implementation: Define conceptual, external, internal
database definitions, creating empty DB files.
4. Loading or data conversion: Loading the system data to the
DB system format.
5. Application Conversion: Any software application from a
previous systems are converted into the new system.
6. Testing and validation: The new system is tested and
validated.
7. Operation: The DB system and its applications are put into
operation.
8. Monitoring and maintenance: The system is constantly
monitored and maintained.
… - Typical Phases of a Micro
Life-Cycle
System definition
DB Design
DB implementation
Loading or data conversion
Application Conversion
Testing and validation
Operation
Monitoring and maintenance
- Database Design Process
 We will now focus on step 2 of the database application life,
cycle, which is the DB design.
 The problem of DB design can be stated as follows:
 Design the logical and physical structures of one or more

DBs to accommodate the information needs of the user in


an organization for defined set of applications.
 Goals of the DB design:
 Provide a natural and easy-to-understand structuring of the

information.
 Satisfy the data requirements of user or application.

 Support processing requirements and any performance

objectives such as response time, processing time, and


storage space.
- Six Phases of DB design
Process Data Content And Structure DB Applications

Phase 1: Requirements
Collection and Analysis Data Requirements Processing Requirements

Phase 2: Conceptual DB Conceptual Schema Design Transaction & Application


Design DBMS independent Design

Phase 3: Choice of DBMS

Frequencies
Logical schema & Performances
Phase 4: Data Model
view Design Constraints
Mapping
DBMS-Dependent

Internal Schema
Phase 5: Physical Design Design

Transaction &
Phase 6: System DDL Statements Applications
Implementation and Tuning SDL statement implementation
Phase 1: requirement
Collection and Analysis
 Before we can effectively design a
database ,we must know and analyze
the expectations of the users and the
intended uses of the DB in as much
detail as possible.
 This process is called requirement
collection and analysis.
Phase 2:Conceptual Data Base
Design
 The goal of this phase is to produce a
conceptual schema for the database
that is independent of a specific DBMS.
 A high level data model such as ER or
EER model is used during this phase.
Phase 3: Choice of DBMS …
 Factors affecting the Choice of DBMS
 Technical
 Type of DBMS (RDBMS, ORDBMS, etc)
 Storage structures
 User and Programmer interfaces available
 Access paths that the DBMS supports
 Users and programmer interfaces available
 Client-server environment, etc
 Cost
Phase 4: Data Model Mapping
 Also called logical database design
 We map (or transform) the conceptual
from the high level data model used in
phase 2 into the data model of the
chosen DBMS.
Phase 5:Physical Data base
Design
 During this phase , we deign the
specifications for the stored database in
terms of physical storage structure,
record placement, and indexes
 Corresponds to designing internal
schema in three level DBMS
architecture.
Phase 6: Database system
Implementation and Tuning
 During this phase, the database and
application programs are implemented ,
tested and eventually deployed for
service.
 Various transactions and applications
are tested individually and then in
conjunction with each other .
Phase 6: Database system
Implementation and Tuning …
 This typically reveals opportunities for
physical design , changes, data indexing ,
reorganization , and different placement of
data- an activity referred to as database
tuning.
 Tuning is an ongoing activity
 A part of system maintenance that continues for
the life cycle of a database as long as the data
base and applications keep evolving and
performance problems are detected.
Physical Database Design
in
Relational Databases
Factors that Influence Physical
Database Design

A. Analyzing the database queries and


transactions
 Before undertaking physical database design,
 we must have a good idea of the intended
use of the database by defining the queries
and transactions that we expect to run on the
DB in a high level form
Factors that Influence
Physical Database Design ..

For each query, the following information is needed.


 The files that will be accessed by the query;

 The attributes on which any selection conditions


for the query are specified;
 The attributes on which any join conditions or
conditions to link multiple tables or objects for the
query are specified;
 The attributes whose values will be retrieved by
the query.
Note: the attributes listed in items 2 and 3 above
are candidates for definition of access structures.
Factors that Influence Physical
Database Design (cont.)

For each update transaction or operation, the following


information is needed.
1. The files that will be updated;
2. The type of operation on each file (insert, update or
delete);
3. The attributes on which selection conditions for a
delete or update operation are specified;
4. The attributes whose values will be changed by an
update operation.
Note: the attributes listed in items 3 above are candidates for
definition of access structures. However, the attributes listed in
item 4 are candidates for avoiding an access structure.
Factors that Influence Physical
Database Design (cont.)
B. Analyzing the expected frequency of invocation of
queries and transactions
 Besides identifying the characteristics of expected queries and
transactions ,
 We must consider their expected rates of invocation
 This frequency information, along with the attribute
information collected on each query and transaction, is
used to compile a cumulative list of expected frequency
of use for all the queries and transactions.
 It is expressed as the expected frequency of using each
attribute in each file as a selection attribute or join
attribute, over all the queries and transactions.
 For large volumes of processing the informal “80-20 rule”
applies,
 States that approximately 80 percent of the processing is
accounted for by only 20 percent of the queries and
transactions.
Factors that Influence Physical
Database Design (cont.)
C. Analyzing the time constraints of
queries and transactions
 Some transactions may have stringent performance constraints.
 For example , a transaction may have constraints that it should
terminate within 5 sec on 95 percent of occasions when it is
invoked and that it should never take more than 20 seconds .
 Such performance constraints place further priorities on the
attributes that are candidates for access paths.
 The selection attributes used by queries and transactions with
time constraints become higher priority candidates for primary
access structures.
Factors that Influence Physical
Database Design (cont.)
D. Analyzing the expected frequencies of
update operations
A minimum number of access paths should be
specified for a file that is updated frequently.
E. Analyzing the uniqueness constraints
on attributes.
Access paths should be specified on all candidate
key attributes — or set of attributes — that are
either the primary key or constrained to be unique
Physical Database Design in
Relational Databases
Physical Database Design Decisions
 Most relational systems represent each base relation as a
physical database file.
 The access path options include specifying the type of file
for each relation and the attributes on which index should
be defined.
 At most one of the index on each file may be primary or
clustering index.
Physical Database Design in
Relational Databases
 Design decisions about indexing
1. Whether to index an attribute?
2. What attribute or attributes to index on?
3. Whether to set up a clustered index?
4. Whether to use a hash index over a tree
index?
5. Whether to use dynamic hashing for the
file?
Physical Database Design
Decisions (cont.)
 Denormalization as a design decision for speeding up queries
 The goal of normalization is to separate the logically related
attributes into tables to minimize redundancy and thereby
avoid the update anomalies that cause an extra processing
overheard to maintain consistency of the database.
 The above ideals are sometimes sacrificed in favor of faster
execution of frequently occurring queries and transactions.
 This process of storing the logical database design in a
weaker normal form say 2NF or 1NF is called
denormalization
 The goal of denormalization is to improve the performance
of frequently occurring queries and transactions. (Typically
the designer adds to a table attributes that are needed for
answering queries or producing reports so that a join with
another table which contains newly added attributes is
avoided.)
 Trade off between update and query performance
2. An Overview of Database Tuning
in Relational Systems (1)
 Tuning:
 After a DB is deployed and is in operation, actual use
of the applications, transactions, queries and views
reveals factors and problem areas that may not have
been accounted for during the initial physical design.
 the inputs to the physical design can be revised by
gathering actual statistics about usage patterns .
 Resource utilization as well as internal DBMS
processing, Such as query optimization can be
monitored to reveal the bottle neck,
 Such as contention for the same data or devices
 Volumes of activity and sizes of data can be better
estimated .
 It is therefore necessary to monitor and revise the
physical DB design constantly
An Overview of Database
Tuning in Relational Systems
 Goal of Tuning:
 To make application run faster
 To lower the response time of

queries/transactions
 To improve the overall throughput of

transactions
An Overview of Database Tuning
in Relational Systems (2)
Statistics internally Statistics obtained
collected in DBMSs: from monitoring:
 Size of individual tables  Storage statistics
 Number of distinct values in a
 I/O and device performance
column
statistics
 The number of times a
particular query or transaction  Query/transaction
is submitted/executed in an processing statistics
interval of time  Locking/logging related
 The times required for statistics
different phases of query and  Index statistics
transaction processing
An Overview of Database Tuning in
Relational Systems (3)
Problems to be considered in tuning:
 How to avoid excessive lock contention?
 How to minimize overhead of logging and
unnecessary dumping of data?
 How to optimize buffer size and scheduling of
processes?
 How to allocate resources such as disks, RAM
and processes for most efficient utilization?
An Overview of Database Tuning in
Relational Systems (4)
Tuning Indexes
 Reasons to tuning indexes
 Certain queries may take too long to run for lack of an index;
 Certain indexes may not get utilized at all;
 Certain indexes may be causing excessive overhead because the
index is on an attribute that undergoes frequent changes

 Options to tuning indexes


 Drop or/and build new indexes
 Change a non-clustered index to a clustered index (and vice
versa)
 Rebuilding the index
An Overview of Database Tuning in
Relational Systems (5)
Tuning the Database Design
 Dynamically changed processing
requirements
 need to be addressed by making changes to the
conceptual schema if necessary and to reflect those
changes into the logical schema and physical design.
An Overview of Database Tuning in
Relational Systems (6)
Tuning the Database Design (cont.)
 Possible changes to the database design
 Existing tables may be joined (denormalized) because certain
attributes from two or more tables are frequently needed
together.
 This reduces the normalization level from BCNF to 3NF, 2NF to 1NF
 For the given set of tables, there may be alternative design
choices, all of which achieve 3NF or BCNF. One may be replaced
by the other.
 A relation of the form R(K, A, B, C, D, …)
that is in BCNF can be stored into multiple
tables that are also in BCNF by replicating
the key K in each table.
 For example , the table
 Employee (SSN, Name, Phone, Grade, Salary)
 May be split into two tables
 Emp1(SSN, Name, Phone) and EMP2(SSN, Grade,
Salary).
 If the original table had a very large number of rows
(say 100,000) and queries about phone number and
salary information are totally distinct, this separation
of tables may work better.
 This is also called vertical portioning
 Attributes from one table may be repeated
in another even though this creates
redundancy and potential anomalies.
 E.g Part name may be replicated in tables
wherever the Part# appears
 Just like vertical portioning splits a table
vertically into multiple tables
 Horizontal portioning takes horizontal slices
of a table and stores them as distinct
tables
 For example product sales data may be
separated into ten tables based on ten
products
An Overview of Database Tuning in
Relational Systems (7)
Tuning Queries
 There are 2 Indications that suggest that query
tuning may be needed
 A query issues too many disk accesses
 The query plan shows that relevant indexes are not
being used.
Tuning Queries ..
 Typical instances for query tuning
1. Many query optimizers do not use indexes

in the presence of arithmetic expressions,


numerical comparisons of attributes of
different sizes and precision, NULL
comparisons, and sub-string comparisons.
2. Indexes are often not used for nested

queries using IN;


Tuning Queries (cont.)
 Typical instances for query tuning (cont.)
3. Some DISTINCTs may be redundant and can be
avoided without changing the result.
4. Unnecessary use of temporary result tables can be
avoided by collapsing multiple queries into a single
query unless the temporary relation is needed for
some intermediate processing.
5. In some situations involving using of correlated
queries, temporaries are useful.
6. If multiple options for join condition are possible,
choose one that uses a clustering index and avoid
those that contain string comparisons.
Tuning Queries (cont.)

 Typical instances for query tuning (cont.)


7. The order of tables in the FROM clause may affect
the join processing.
8. Some query optimizers perform worse on nested
queries compared to their equivalent un-nested
counterparts.
9. Many applications are based on views that define the
data of interest to those applications. Sometimes
these views become an overkill.
Additional Query Tuning
Guidelines
1. A query with multiple selection conditions that are
connected via OR may not be prompting the query
optimizer to use any index. Such a query may be split
up and expressed as a union of queries, each with a
condition on an attribute that causes an index to be
used.
2. Apply the following transformations
 NOT condition may be transformed into a positive expression.
 Embedded SELECT blocks may be replaced by joins.
 If an equality join is set up between two tables, the range
predicate on the joining attribute set up in one table may be
repeated for the other table
3. WHERE conditions may be rewritten to utilize the
indexes on multiple columns.
End of Chapter!

Das könnte Ihnen auch gefallen