Sie sind auf Seite 1von 24

What is a Distributed Database System?

Distributed Database
A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.

Distributed DBMS
Software system that permits the management of the distributed database and makes the distribution transparent to users.

What is not a DDBS?


A timesharing computer system A loosely or tightly coupled multiprocessor

system
A database system which resides at one of the

nodes of a network of computers - this is a centralized database on a network node

The Fundamental Principle of Distributed Database


To the user, a distributed system should look exactly like a nondistributed system.

A typical distributed database system:


New York Shanghai

Communication network

London

San Francisco

What is the 12 objectives?


Local autonomy No reliance on a central Distributed query

site Continuous operation Location independence Fragmentation independence Replication independence

processing Distributed transaction management Hardware independence Operating system independence Network independence DBMS independence

Types Of Distributed Databases


In a homogeneous distributed database
All sites have identical software
Are aware of each other and agree to cooperate in processing user

requests. Each site surrenders part of its autonomy in terms of right to change schemas or software Appears to user as a single system

In a heterogeneous distributed database


Different sites may use different schemas and software

Difference in schema is a major problem for query processing Difference in software is a major problem for transaction processing Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing

Why use a DDBMS? (!)


Advantages:
Reflects organizational structure
Improved

shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth

Disadvantages: Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex

Distributed Database Design


DATA FRAGMENTATION, REPLICATION, AND ALLOCATION TECHNIQUES FOR DISTRIBUTED DATABASE DESIGN
Fragmentation: Breaking up the database into logical units called

fragments and assigned for storage at various sites.

Data replication: The process of storing fragments in more than one site Data Allocation: The process of assigning a particular fragment to a particular
site in a distributed system.

The information concerning the data fragmentation, allocation and

replication is stored in a global directory.

12.5 Distributed Relational Database Design

Fragmentation !
Four types of fragmentation:

1.

Horizontal:

Consists of a subset of the tuples of a relation.

- Defined using Selection operation - Determined by looking at predicates used by Ts. - Involves finding set of minimal (complete and relevant) predicates. - Set of predicates is complete, iff, any two tuples in same fragment are referenced with same probability by any application. - Predicate is relevant if there is at least one application that accesses fragments differently.

12.5 Distributed Relational Database Design

Fragmentation !
Four types of fragmentation:
2.

Other possibility is no fragmentation:

Vertical:

-If relation is small and not updated frequently, may be - Defined using Projection operation better not to fragment. - Determined by establishing affinity of one attribute to another.

subset of atts of a relation.

3.

Mixed: horizontal fragment that is vertically fragmented, or a


vertical fragment that is horizontally fragmented. - Defined using Selection and Projection operations

4.

Derived: horizontal fragment that is based on horizontal

fragmentation of a parent relation. - Ensures fragments frequently joined together are at same site. - Defined using Semijoin operation

Data Allocation !
Four alternative strategies regarding placement of data:

Centralized: single database and DBMS stored at one site with users distributed across the network.
Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one site. Complete Replication: Consists of maintaining complete copy of database at each site. Selective Replication: Combination of partitioning, replication, and centralization.

Data Allocation

DATA REPLICATION
Fully replicated database:

* Stores multiple copies of each database fragment at multiple sites *Can be impractical due to amount of overhead Partially replicated database: *Stores multiple copies of some database fragments at multiple sites *Most DDBMSs are able to handle the partially replicated database well Unreplicated database: *Stores each database fragment at a single site *No duplicate database fragments

Advantages of Replication
Availability: failure of site containing relation r does

not result in unavailability of r is replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.

Disadvantages of Replication
Increased cost of updates: each replica of relation r

must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.

One solution: choose one copy as primary copy and apply concurrency control operations on primary copy.

Transparency in a DDBMS
Transparency hides implementation details from users. Overall objective: equivalence to user of DDBMs to centralised DBMS - FULL transparency not universally accepted objective

Transparency types: 1.Distribution/ Netwrok Transparency a.Location Transparency b.Naming Transparency 2.Replication Transparency 3.Fragmentation Transparency 4.Design Transparency 5.Execution Transparency

Distributed DBMS Issues


Query Processing
convert user transactions to data manipulation instructions optimization problem min{cost = data transmission + local processing} general formulation is NP-hard

Concurrency Control
synchronization of concurrent accesses consistency and isolation of transactions' effects deadlock management

Reliability
how to make the system resilient to failures
atomicity and durability

Relationship Between Issues


Directory Management

Query Processing

Distribution Design

Reliability

Concurrency Control

Deadlock Management

Concurrency Control and Recovery


Distributed Databases encounter a number of

concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below.
Dealing with multiple copies of data items
Failure of individual sites Communication link failure

Distributed commit
Distributed deadlock

Slide 2520

System Failure Modes


Failures unique to distributed systems:
Failure of a site. Loss of massages

Handled by network transmission control protocols such as TCPIP Failure of a communication link Handled by network protocols, by routing messages via alternative links Network partition A network is said to be partitioned when it has been split into two or more subsystems that lack any connection between them Note: a subsystem may consist of a single node Network partitioning and site failures are generally indistinguishable.

Client-Server Database Architecture


It consists of clients running client software, a set of

servers which provide all database functionalities and a reliable communication infrastructure.
Server 1 Client 1 Client 2 Server 2 Client 3

Server n

Client n
Slide 2522

Conclusion
Todays business environment has an increasing need for distributed database and client/server applications as the desire for reliable, scalable and accessible information is steadily rising. Distributed database systems provide an improvement on communication and data processing due to its data distribution throughout different network sites. Not only is data access faster, but a singlepoint of failure is less likely to occur, and it provides local control of data for users. However, there is some complexity when attempting to manage and control distributed database systems. A distributed database allows faster local queries and can reduce network traffic. With these benefits comes the issue of maintaining data integrity. Single big server could hardly handle requirement of high availability, data warehousing and fast data storage simultaneously. The distributed database satisfies them by separating functions at low cost. The grid computing is becoming the main stream of information technology. Not only computation, we expect database grid will also be a key technology in the future.

THANK YOU

Das könnte Ihnen auch gefallen