Beruflich Dokumente
Kultur Dokumente
Overview
• Definition
• Why Choose a Distributed Database
Distributed Databases • Distributed Database Architecture and Design
• Transaction Processing
Josh McCord • Conclusion
What is a Distributed Database?
• “A distributed database stores logically related
data in two or more physically independent
Definition sites connected via a computer network”
(Coronel & Rob, 2009)
By Your Powers Combined… What is a Distributed Database?
(Coronel & Rob 2009)
1
11/12/2009
Brief History
• Centralized Databases until the 1970’s
Why Choose a Distributed • Switch to Distributed Databases
– Decentralized business operations
Database? – Global Competition
Gl b l C ii
– Low cost powerful computers
– Internet
Distributed Database Advantages Distributed Database Disadvantages
• Data dispersed to match organization • Complexity
• Faster access • Technical Difficulty
• Faster processing • No universal standards
• Growth • Costs
• Lower operating costs
• User friendly GUI are easy to use on PCs and
workstations
• Single point failure less of a risk
C.J. Date’s Twelve Commandments
1. Local Site Independence
2. Central Site Independence
Distributed Architecture and 3.
4.
Failure Independence
Location Transparency
Design
g 5.
6
6.
Fragmentation Transparency
Replication Transparency
Replication Transparency
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
(Date 1987)
2
11/12/2009
Major Components Major Components
• Distributed Database Management System
(DDBMS)
• Transaction Processor (TP)
• Data processor (DP)
Data processor (DP)
• Distributed Processing
(Coronel & Rob 2009)
Distributed Database Design ANSI/SPARC
GES1 GES2 GESn
• ANSI/SPARC Architecture
– three levels of data abstraction: external,
conceptual, and internal LES11 … LES1n GCS LESn1 … LESnm
• Distributed ANSI/SPARC Architecture
Distributed ANSI/SPARC Architecture
– Extension of the ANSI/SPARC 3 schema model LCS1 LCS2 … LCSn
• Global External Schemas
• Global Conceptual Schema
LIS1 LIS2 … LISn
• Fragmentation and Allocation Schema
• Schemas for every local DBMS • GES: Global External Schema • LCS: Local Conceptual Schema
• LES: Local External Schema • LIS: Local Internal Schema
Distributed Database Design Data Fragmentation
• Database design, whether centralized or • Data Fragmentation allows you to break a
distributed must follow the normal design database or table into two or more fragments
principles – relational model design, ER – Horizontal Fragmentation:
modeling and normalization
modeling, and normalization – Vertical Fragmentation
Vertical Fragmentation
• Distributed Databases also must address: – Mixed Fragmentation
– Fragmentation
– Replication
– Allocation
3
11/12/2009
Data Replication Data Allocation
• Data replication is the storage of data copies • Data Allocation is the process of choosing
at multiple sites served by a computer where to locate data
network – Centralized data allocation
– Fully Replicated Database
Fully Replicated Database – Partitioned data allocation
Partitioned data allocation
– Partially Replicated Database – Replicated data allocation
– Un‐replicated Database • No universally accepted algorithm
• Mutual Consistency: all replicated data must • Related to data fragmentation
be identical
Process Distribution Transparency
• Multiple‐site processing, multiple‐site data • Transparency groups common features with the
(MPMD) goal of allowing users to feel like they are working
with a centralized database, hiding the
– Fully distributed DBMS supports multiple data complexities of a distributed database
processors and transaction processors
processors and transaction processors
• Transparency Features:
– Homogeneous‐single type of DBMS (i.e. Oracle) – Distribution Transparency
– Heterogeneous‐multiple types of different DBMSs – Transaction Transparency
– Failure Transparency
– Performance Transparency
– Heterogeneity Transparency
Transaction Processing
• Database Links
• Remote requests and transactions
Transaction Processing • Distributed requests and transactions
• Concurrency Control
4
11/12/2009
Database Links Database Links
• Central Concept in Distributed Databases
– Allow users to access remote databases without
being a “user” on that database
• Each
Each database must have a unique
database must have a unique “Global
Global
Database Name”
• Communication path between Databases
– Links must be defined in data dictionaries
(Baylis 2003)
Global Database Names Remote Request/Transaction
• Remote Request
– Single SQL statement references data at one
remote site
• Remote Transaction
Remote Transaction
– Several SQL statements access data at a single
remote site
(Baylis 2003)
Distributed Request/Transaction Concurrency Controls
• Distributed Request • Coordinates simultaneous transactions in a
– Single SQL statement references data located at database system while maintaining data
multiple sites integrity
• Distributed Transaction
Distributed Transaction • Especially important in a DDBMS
Especially important in a DDBMS
– Several SQL statements reference multiple sites – Multisite, multi‐process transactions more likely
to produce irregularities
• Two‐phase commit protocol
5
11/12/2009
Two‐phase Commit Protocol
• Guarantees if a transaction cannot be performed
it will be rolled back at all participating nodes
– Phase 1: Preparation
• Prepare to commit message sent to all data processors Conclusion
• Use write‐ahead protocol to write in transaction log and
p g
replies back to initiator
• Transaction occurs if all are ready to abort, else aborts
– Phase 2: Final COMMIT
• Broadcasts COMMIT message and awaits reply
• Data processors update database using DO protocol
• Data processors reply (if one or more do not commit, UNDO
protocol used to undo changes
Conclusion References
• Coronel, Carlos & Rob, Peter (2009). Database Systems: Design, Implementation and Management. Eighth
• Distributed Databases grew from the growth Edition. Boston, MA. Course Technology.
• Hoffer, Jeffrey, George, Joey, & Valacich, Joseph (2008). Modern Systems Analysis and Design. Fifth Edition.
of more powerful computer systems and the Upper Saddle River, NJ. Pearson Education, Inc.
need of organizations to distribute data • Baylis, Ruth (2003). Oracle Database Administrators Guide. Internet Resource. URL:
http://www.comp.hkbu.edu.hk/docs/o/oracle10g/server.101/b10739/title.htm.
• Much more complex than centralized
Much more complex than centralized • Date, C.J. “Twelve
Date, C.J. Twelve Rules for a Distributed Database
Rules for a Distributed Database”.. Computer World. June 8, 1987. 2(23). Pp. 77
Computer World. June 8, 1987. 2(23). Pp. 77‐81.
81.
• Laudon, Kennth & Laudon, Jane (2007).Tenth Edition. Management Information Systems: Managing the Digital
databases Firm. Upper Saddle River, NJ. Pearson Education, Inc.
• Client/Server Database and Distributed Database. Internet Resource. URL:
• Rely heavily on DDBMSs to perform functions http://media.wiley.com/product_data/excerpt/78/04712629/0471262978.pdf
• Olamendy, John Charles. “Distributed Database Management Systems”. C# Corner. Internet Resource. URL:
http://www.c‐
sharpcorner.com/UploadFile/john_charles/DistributedDatabaseManagementSystems12172008141339PM/Distrib
utedDatabaseManagementSystems.aspx
• Valduriez, Patrick & Ozsu, Tamar (1998). Distributed DBMS. Internet Resource. URL:
http://www.cs.purdue.edu/homes/bb/cs542‐06Spr/week2_lecture2.ppt.