Sie sind auf Seite 1von 19

Pachaiyappa's College For Men., Kanchipuram Department of Computer Science M.sc.

, Computer Science & Technology


Distributed DataBase Outline Introduction What is a distributed DBMS Problems Current state-of-affairs Distributed Database Design Fragmentation Data Location Distributed Query Processing Query Processing Methodology Distributed Query Optimization Distributed Transaction Management Transaction Concepts and Models Distributed Concurrency Control Distributed Reliability

What is distributed ... Processing logic Functions Data Control What is a Distributed Database System? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + DDBMS . What is not a DDBS? A timesharing computer system A loosely or tightly coupled multiprocessor system A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node . Applications Manufacturing - especially multi-plant manufacturing Military command and control Airlines Hotel chains Any organization which has a decentralized organization structure

Distributed DBMS Promises Transparent management of distributed, fragmented, and replicated data Improved reliability/availability through distributed transactions Improved performance Easier and more economical system expansion Transparency Transparency is the separation of the higher level semantics of a system from the lower level implementation issues. Fundamental issue is to provide data independence in the distributed environment Network (distribution) transparency Replication transparency Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid Potentially Improved Performance Proximity of data to its points of use Requires some support for fragmentation and replication Parallelism in execution Inter-query parallelism Intra-query parallelism Distributed DBMS Issues Distributed Database Design how to distribute the database replicated & non-replicated database distribution a related problem in directory management Query Processing convert user transactions to data manipulation instructions optimization problem min{cost = data transmission + local processing} general formulation is NP-hard Concurrency Control synchronization of concurrent accesses consistency and isolation of transactions' effects deadlock management Reliability how to make the system resilient to failures atomicity and durability

Operating System Support operating system with proper support for database operations dichotomy between general purpose processing requirements and database processing requirements Open Systems and Interoperability Distributed Multidatabase Systems More probable scenario Parallel issues

Design Problem In the general setting : Making decisions about the placement of data and programs across the sites of a computer network as well as possibly designing the network itself. In Distributed DBMS, the placement of applications entails placement of the distributed DBMS software; and placement of the applications that run on the database Distribution Design Top-down mostly in designing systems from scratch mostly in homogeneous systems Bottom-up when the databases already exist at a number of sites Distribution Design Issues Why fragment at all? How to fragment? How much to fragment? How to test correctness? How to allocate? Information requirements? Fragmentation Can't we just distribute relations? What is a reasonable unit of distribution? relation views are subsets of relations locality extra communication fragments of relations (sub-relations) concurrent execution of a number of transactions that access different portions of a relation . views that cannot be defined on a single fragment will require extra processing . semantic data control (especially integrity enforcement) more difficult .

Correctness of Fragmentation Completeness Decomposition of relation R into fragments R1, R2, ..., Rn is complete if and only if each data item in R can also be found in some Ri . Reconstruction If relation R is decomposed into fragments R1, R2, ..., Rn, then there should exist some relational operator such that R = 1inRi . Disjointness If relation R is decomposed into fragments R1, R2, ..., Rn, and data item di is in Rj, then di should not be in any other fragment Rk (k j ). Allocation Alternatives Non-replicated partitioned : each fragment resides at only one site Replicated fully replicated : each fragment at each site partially replicated : each fragment at some of the sites Information Requirements Four categories: Database information Application information Communication network information Computer system information Fragmentation Horizontal Fragmentation (HF) Primary Horizontal Fragmentation (PHF) Derived Horizontal Fragmentation (DHF) Vertical Fragmentation (VF) Hybrid Fragmentation (HF) Primary Horizontal Fragmentation Definition : Rj = Fj (R ), 1 j w where Fj is a selection formula, which is (preferably) a minterm predicate. Therefore, A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi. Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates. Set of horizontal fragments also referred to as minterm fragments.

PHF Algorithm Given: A relation R, the set of simple predicates Pr Output: The set of fragments of R = {R1, R2,...,Rw} which obey the fragmentation rules. Preliminaries : Pr should be complete Pr should be minimal PHF Example Two candidate relations : PAY and PROJ. Fragmentation of relation PAY Application: Check the salary info and determine raise. Employee records kept at two sites application run at two sites Simple predicates p1 : SAL 30000 p2 : SAL > 30000 Pr = {p1,p2} which is complete and minimal Pr'=Pr Minterm predicates m1 : (SAL 30000) m2 : NOT(SAL 30000) = (SAL > 30000) Derived Horizontal Fragmentation Defined on a member relation of a link according to a selection operation specified on its owner. Each link is an equijoin. Equijoin can be implemented by means of semijoins. DHF Definition Given a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as Ri = R F Si, 1iw where w is the maximum number of fragments that will be defined on R and Si = Fi (S) where Fi is the formula according to which the primary horizontal fragment Si is defined. DHF Example Given link L1 where owner(L1)=SKILL and member(L1)=EMP EMP1 = EMP SKILL1 EMP2 = EMP SKILL2 where SKILL1 = SAL30000 (SKILL) SKILL2 = SAL>30000 (SKILL)

Vertical Fragmentation Has been studied within the centralized context design methodology physical clustering More difficult than horizontal, because more alternatives exist. Two approaches : grouping attributes to fragments splitting relation to fragments Overlapping fragments grouping Non-overlapping fragments splitting We do not consider the replicated key attributes to be overlapping. Advantage: Easier to enforce functional dependencies (for integrity checking etc.)

Query Processing Components Query language that is used SQL: intergalactic dataspeak Query execution methodology The steps that one goes through in executing high- level (declarative) user queries. Query optimization How do we determine the best execution plan? Selecting Alternatives SELECT FROM WHERE AND ENAME EMP,ASG EMP.ENO = ASG.ENO DUR > 37

Strategy 1 ENAME(DUR>37EMP.ENO=ASG.ENO (EMP ASG)) Strategy 2 ENAME(EMP


ENO

(DUR>37 (ASG)))

Strategy 2 avoids Cartesian product, so is better

Cost of Alternatives Assume: size(EMP) = 400, size(ASG) = 1000 tuple access cost = 1 unit; tuple transfer cost = 10 units Strategy 1 produce ASG': (10+10)tuple access cost transfer ASG' to the sites of EMP: (10+10)tuple transfer cost produce EMP': (10+10) tuple access cost2 transfer EMP' to result site: (10+10) tuple transfer cost Total cost Strategy 2 transfer EMP to site 5:400tuple transfer cost transfer ASG to site 5 :1000tuple transfer cost produce ASG':1000tuple access cost join EMP and ASG':40020tuple access cost Total cost Query Optimization Objectives Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks communication cost will dominate low bandwidth low speed high protocol overhead most algorithms ignore all other cost components Local area networks communication cost not that dominant total cost function should be considered Can also maximize throughput Query Optimization Issues Types of Optimizers Exhaustive search cost-based optimal combinatorial complexity in the number of relations Heuristics not optimal regroup common sub-expressions perform selection, projection first replace a join by a series of semijoins reorder operations to reduce intermediate relation size optimize individual operations 20 200 40 200 460 4,000 10,000 1,000 8,000 23,000

Query Optimization Issues Optimization Granularity Single query at a time cannot use common intermediate results Multiple queries at a time efficient if many similar queries decision space is much larger Query Optimization Issues Optimization Timing Static compilation optimize prior to the execution difficult to estimate the size of the intermediate results error propagation can amortize over many executions R* Dynamic run time optimization exact information on the intermediate relation sizes have to reoptimize for multiple executions Distributed INGRES Hybrid compile using a static algorithm if the error in estimate sizes > threshold, reoptimize at run time MERMAID Query Optimization Issues Statistics Relation cardinality size of a tuple fraction of tuples participating in a join with another relation Attribute cardinality of domain actual number of distinct values Common assumptions independence between different attribute values uniform distribution of attribute values within their domain Query Optimization Issues Decision Sites Centralized single site determines the best schedule simple need knowledge about the entire distributed database

Distributed cooperation among sites to determine the schedule need only local information cost of cooperation Hybrid one site determines the global schedule each site optimizes the local subqueries Query Optimization Issues Network Topology Wide area networks (WAN) point-to-point characteristics low bandwidth low speed high protocol overhead communication cost will dominate; ignore all other cost factors global schedule to minimize communication cost local schedules according to centralized query optimization Local area networks (LAN) communication cost not that dominant total cost function should be considered broadcasting can be exploited (joins) special algorithms exist for star networks Cost-Based Optimization Solution space The set of equivalent algebra expressions (query trees). Cost function (in terms of time) I/O cost + CPU cost + communication cost These might have different weights in different distributed environments (LAN vs WAN). Can also maximize throughput Search algorithm How do we move inside the solution space? Exhaustive search, heuristic algorithms (iterative improvement, simulated annealing, genetic,...) Join Ordering in Fragment Queries Ordering joins Distributed INGRES System R* Semijoin ordering SDD-1

Join Ordering Consider two relations only if size (R) < size (S) R if size (R) > size (S) Multiple relations more difficult because too many alternatives. Compute the cost of all alternatives and select the best one. Necessary to compute the size of intermediate relations which is difficult. Use heuristics Join Ordering Example Execution alternatives: 1. EMP Site 2 Site 2 computes EMP'=EMP ASG EMP' Site 3 Site 3 computes EMP PROJ 3. ASG Site 3 Site 3 computes ASG'=ASG PROJ ASG' Site 1 Site 1 computes ASG' EMP 5. EMP Site 2 PROJ Site 2 Site 2 computes EMP PROJ ASG Semijoin Algorithms Consider the join of two relations: R[A] (located at site 1) S[A] (located at site 2) Alternatives: 1 Do the join R
A

2. ASG Site 1 Site 1 computes EMP'=EMP ASG EMP' Site 3 Site 3 computes EMP PROJ 4. PROJ Site 2 Site 2 computes PROJ'=PROJ ASG PROJ' Site 1 Site 1 computes PROJ' EMP

2 Perform one of the semijoin equivalents 3 Perform the join SDD-1 Algorithm Based on the Hill Climbing Algorithm Semijoins No replication No fragmentation Cost of transferring the result to the user site from the final result site is not considered Can minimize either total time or response time

Hill Climbing Algorithm Assume join is between three relations. Step 1: Do initial processing Step 2: Select initial feasible solution (ES0) 2.1 Determine the candidate result sites sites where a relation referenced in the query exist 2.2 Compute the cost of transferring all the other referenced relations to each candidate site 2.3 ES0 = candidate site with minimum cost Step 3: Determine candidate splits of ES0 into {ES1, ES2} 3.1 ES1 consists of sending one of the relations to the other relation's site 3.2 ES2 consists of sending the join of the relations to the final result site Step 4: Replace ES0 with the split schedule which gives cost(ES1) + cost(local join) + cost(ES2) < cost(ES0) Step 5: Recursively apply steps 34 on ES1 and ES2 until no such plans can be found Step 6: Check for redundant transmissions in the final plan and eliminate them. SDD-1 Algorithm Initialization Step 1: In the execution strategy (call it ES), include all the local processing Step 2: Reflect the effects of local processing on the database profile Step 3: Construct a set of beneficial semijoin operations (BS) as follows : BS = For each semijoin SJi BS BS SJi if cost(SJi ) < benefit(SJi) SDD-1 Algorithm Example Consider the following query SELECT R3.C FROM R1, R2, R3 WHERE R1.A = R2.A AND R2.B = R3.B Iterative Process Step 4: Remove the most beneficial SJi from BS and append it to ES Step 5: Modify the database profile accordingly Step 6: Modify BS appropriately compute new benefit/cost values check if any new semijoin need to be included in BS Step 7: If BS , go back to Step 4. Assembly Site Selection Step 8: Find the site where the largest amount of data resides and select it as the assembly site Example: Amount of data stored at sites: Site 1: 360 Site 2: 360 Site 3: 2000 Therefore, Site 3 will be chosen as the assembly site.

Postprocessing Step 9: For each Ri at the assembly site, find the semijoins of the type Ri Rj where the total cost of ES without this semijoin is smaller than the cost with it and remove the semijoin from ES. Note : There might be indirect benefits. Example: No semijoins are removed. Step 10: Permute the order of semijoins if doing so would improve the total cost of ES. Example: Final strategy: Send (R2 R1) R3 to Site 3 Send R1 R2 to Site 3 Distributed Query Optimization Problems Cost model multiple query optimization heuristics to cut down on alternatives Larger set of queries optimization only on select-project-join queries also need to handle complex queries (e.g., unions, disjunctions, aggregations and sorting) Optimization cost vs execution cost tradeoff heuristics to cut down on alternatives controllable search strategies Optimization/reoptimization interval extent of changes in database profile before reoptimization is necessary Transaction A transaction is a collection of actions that make consistent transformations of system states while preserving system consistency. concurrency transparency failure transparency Transaction Example A Simple SQL Query Transaction BUDGET_UPDATE begin EXEC SQL UPDATE PROJ SET BUDGET = BUDGET1.1 WHERE PNAME = CAD/CAM end. Example Database Consider an airline reservation example with the relations: FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP) CUST(CNAME, ADDR, BAL) FC(FNO, DATE, CNAME,SPECIAL)

Example Transaction SQL Version Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL UPDATE FLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES (flight_no, date, customer_name, null); output(reservation completed) end . {Reservation} Termination of Transactions Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL SELECT STSOLD,CAP INTO temp1,temp2 FROM FLIGHT WHERE FNO = flight_no AND DATE = date; if temp1 = temp2 then output(no free seats); Abort else EXEC SQL UPDATEFLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES(flight_no, date, customer_name, null); Commit output(reservation completed) endif end . {Reservation} Example Transaction Reads & Writes Begin_transaction Reservation begin input(flight_no, date, customer_name); temp Read(flight_no(date).stsold); if temp = flight(date).cap then begin output(no free seats); Abort end else begin Write(flight(date).stsold, temp + 1);

Write(flight(date).cname, customer_name); Write(flight(date).special, null); Commit; output(reservation completed) end end. {Reservation} Properties of Transactions ATOMICITY all or nothing CONSISTENCY no violation of integrity constraints ISOLATION concurrent changes invisible serializable DURABILITY committed updates persist Atomicity Either all or none of the transaction's operations are performed. Atomicity requires that if a transaction is interrupted by a failure, its partial results must be undone. The activity of preserving the transaction's atomicity in presence of transaction aborts due to input errors, system overloads, or deadlocks is called transaction recovery. The activity of ensuring atomicity in the presence of system crashes is called crash recovery. Consistency Internal consistency A transaction which executes alone against a consistent database leaves it in a consistent state. Transactions do not violate database integrity constraints. Transactions are correct programs Consistency Degrees Degree 0 Transaction T does not overwrite dirty data of other transactions Dirty data refers to data values that have been updated by a transaction prior to its commitment Degree 1 T does not overwrite dirty data of other transactions T does not commit any writes before EOT Degree 2 T does not overwrite dirty data of other transactions T does not commit any writes before EOT T does not read dirty data from other transactions

Degree 3 T does not overwrite dirty data of other transactions T does not commit any writes before EOT T does not read dirty data from other transactions Other transactions do not dirty any data read by T before T completes. Isolation Serializability If several transactions are executed concurrently, the results must be the same as if they were executed serially in some order. Incomplete results An incomplete transaction cannot reveal its results to other transactions before its commitment. Necessary to avoid cascading aborts. Durability Once a transaction commits, the system must guarantee that the results of its operations will never be lost, in spite of subsequent failures. Database recovery Characterization of Transactions Based on Application areas non-distributed vs. distributed compensating transactions heterogeneous transactions Timing on-line (short-life) vs batch (long-life) Organization of read and write actions two-step restricted action model Structure flat (or simple) transactions nested transactions workflows Transaction Structure Flat transaction Consists of a sequence of primitive operations embraced between a begin and end markers. Begin_transaction Reservation ... end.

Nested transaction The operations of a transaction may themselves be transactions. Begin_transaction Reservation ... Begin_transaction Airline ... end. {Airline} Begin_transaction Hotel ... end. {Hotel} end. {Reservation} Transaction Processing Issues Transaction structure (usually called transaction model) Flat (simple), nested Internal database consistency Semantic data control (integrity enforcement) algorithms Reliability protocols Atomicity & Durability Local recovery protocols Global commit protocols Concurrency control algorithms How to synchronize concurrent transaction executions (correctness criterion) Intra-transaction consistency, Isolation Replica control protocols How to control the mutual consistency of replicated data One copy equivalence and ROWA Concurrency Control The problem of synchronizing concurrent transactions such that the consistency of the database is maintained while, at the same time, maximum degree of concurrency is achieved. Anomalies: Lost updates The effects of some transactions are not reflected on the database. Inconsistent retrievals A transaction, if it reads the same data item more than once, should always read the same value.

Serializability in Distributed DBMS Somewhat more involved. Two histories have to be considered: local histories global history For global transactions (i.e., global history) to be serializable, two conditions are necessary: Each local history should be serializable. Two conflicting operations should be in the same relative order in all of the local histories where they appear together. Concurrency Control Algorithms Pessimistic Two-Phase Locking-based (2PL) Centralized (primary site) 2PL Primary copy 2PL Distributed 2PL Timestamp Ordering (TO) Basic TO Multiversion TO Conservative TO Hybrid Optimistic Locking-based Timestamp ordering-based Two-Phase Locking (2PL) A Transaction locks an object before using it. When an object is locked by another transaction, the requesting transaction must wait. When a transaction releases a lock, it may not request another lock. Centralized 2PL There is only one 2PL scheduler in the distributed system. Lock requests are issued to the central scheduler. Distributed 2PL 2PL schedulers are placed at each site. Each scheduler handles lock requests for data at that site. A transaction may read any of the replicated copies of item x, by obtaining a read lock on one of the copies of x. Writing into x requires obtaining write locks for all copies of x. Timestamp Ordering Transaction (Ti) is assigned a globally unique timestamp ts(Ti). Transaction manager attaches the timestamp to all operations issued by the transaction. Each data item is assigned a write timestamp (wts) and a read timestamp (rts): rts(x) = largest timestamp of any read on x

wts(x) = largest timestamp of any read on x Conflicting operations are resolved by timestamp order. Basic T/O: for Ri(x) for Wi(x) if ts(Ti) < wts(x) if ts(Ti) < rts(x) and ts(Ti) < wts(x) then reject Ri(x) then reject Wi(x) else accept Ri(x) else accept Wi(x) rts(x) ts(Ti) wts(x) ts(Ti) Deadlock A transaction is deadlocked if it is blocked and will remain blocked until there is intervention. Locking-based CC algorithms may cause deadlocks. TO-based algorithms that involve waiting may cause deadlocks. Wait-for graph If transaction Ti waits for another transaction Tj to release a lock on an entity, then Ti Tj in WFG. Deadlock Management Ignore Let the application programmer deal with it, or restart the system Prevention Guaranteeing that deadlocks can never occur in the first place. Check transaction when it is initiated. Requires no run time support. Avoidance Detecting potential deadlocks in advance and taking action to insure that deadlock will not occur. Requires run time support. Detection and Recovery Allowing deadlocks to form and then finding and breaking them. As in the avoidance scheme, this requires run time support. Deadlock Prevention All resources which may be needed by a transaction must be predeclared. The system must guarantee that none of the resources will be needed by an ongoing transaction. Resources must only be reserved, but not necessarily allocated a priori Unsuitability of the scheme in database environment Suitable for systems that have no provisions for undoing processes. Evaluation: Reduced concurrency due to preallocation Evaluating whether an allocation is safe leads to added overhead. Difficult to determine (partial order) No transaction rollback or restart is involved. Deadlock Avoidance Transactions are not required to request resources a priori. Transactions are allowed to proceed unless a requested resource is unavailable. In case of conflict, transactions may be allowed to wait for a fixed time interval. Order either the data items or the sites and always request locks in that order. More attractive than prevention in a database environment.

Distributed Deadlock Detection Sites cooperate in detection of deadlocks. One example: The local WFGs are formed at each site and passed on to other sites. Each local WFG is modified as follows: Since each site receives the potential deadlock cycles from other sites, these edges are added to the local WFGs The edges in the local WFG which show that local transactions are waiting for transactions at other sites are joined with edges in the local WFGs which show that remote transactions are waiting for local ones. Each local deadlock detector: looks for a cycle that does not involve the external edge. If it exists, there is a local deadlock which can be handled locally. looks for a cycle involving the external edge. If it exists, it indicates a potential global deadlock. Pass on the information to the next site. Reliability Problem: How to maintain atomicity durability properties of transactions Fundamental Definitions Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced any failures within a given time period. Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical. Availability The fraction of the time that a system meets its specification. The probability that the system is operational at a given time t.

All The Best...

Das könnte Ihnen auch gefallen