Sie sind auf Seite 1von 14

TRANSACTION MANAGMENT Programs(clients) access databases(servers).

Programs issues SQL commands to the DB How to deal with concurrent access (around 100 at a time) Every connection uses a lot of energy Transactions A sequence of operations that form a logical unit. For example a bank transactio n A single transaction - a sequence of operations carried out Example Check funds, balance>100 in acc 123 withdraw 100 from acc 123 Issued by calling withdraw(account,amount), that is, withdraw(123,100) Withdrawal after check - very common pattern Example Conflict scenario Suppose acc 123 has 120 Two withdrawals arrive concurrently withdraw(123,100):TA1, withdraw(123,110):TA2 Sequence of operations as arrive at DB: TA1: check funds: balance>100? TA2: check funds: balance>100? TA1: withdraw 100 TA2: withdraw 110 Distinguish program and performed operations Transaction is the sequence of commands, not program. DB does not see program, o nly sequence of commands DB access as r/w operations Example Transer between accounts withdraw(123,100) deposit(321,100) Operations performed read a123 into variable d write d-100 to a123 read a321 into variable b write b+100 to a321 Basic TA model Allows for TA sceduling Take TA s read rs[x] write ws[x] commit cs abort as Example (from above) TA1:r1[x],w1[x],r1[y],w1[y],c1 With SQL

SELECT is one or many read operations UPDATE is one or many write of r/w operations INSERT can cause the phantom phenomenon COMMIT must be turned off ABORT user or DB initiated rollback COMMIT if DB confirms commit then TA is final, durable Lock based transaction management We tag the data accessed by a TA. This is a lock. Latecomer must wait for an item. This enables us to solve conflicts with scheduling Contracts and transaction service DB offers virtual exclusive access but DB client (application server) must provi de transaction demarcation and accepts that a transaction bay be aborted by DB a t any time Dirty read Happens if TA2 reads an uncommitted write of TA1, eg ...w1[x],r2[x]. If TA1 abor ts this is problematic Shedules s: r1[x],r1[y],r3[z],w1[x],c1,w2[x],c2,r3[x],c3 TA1:r1[x],r1[y], w1[x],c1 TA2: w2[x],c2 TA3: r3[z],r3[x], c3 Scheduling happens by waiting ordered operations of a set of TAs in time Example s:r1[x],r2[x],r1[y],w1[x],r3[x],r3[y],c3,w2[x],a2,c1 TA1:r1[x],r1[y],w1[x],c1 The schedule respects the TA order of operations Schedulers The simple scheduler ensures isolation only one type of lock pessimistic. often delays TAs. not practical The common scheduler more complex locking. read-locks, exclusive locks. less delays Simple scheduler X-lock on a data object has an owner TA. Only the owner can access data object each data object has queue a waiting TA is blocked - can't execute any other operation. A TA can only be in one queue If TA accesses unlocked data object, aquires lock, else if object locked TA in w aiting queue (FIFO) Example s: r1[x],r1[y],r3[z],w1[x],c1,r2[x],c2,w3[x],c3 TA1:r1[x],r1[y], w1[x],c1

TA2: TA3:

r2[x],c2 r3[z], w3[x],c3

TA1 aquires Xlock on x early on. TA1 and TA2 must wait in queue for the lock. TA 2 gets it first, then TA3 The scheduler and locks are one unit Locks and queues are the schedulers datastructures, scheduler is algorithm Strict two phase locking S2PL used by simple scheduler first phase TAs aquire locks during lifetime second phase release lock upon commit/abort, no earlier Simple sceduler ignores r/w differences see above, TA2 waits to read x although there is no write conflict Transactions sequence of commands (not program), group operations into logical u nit, concurrency and conflicts/inconsistencies, S2PL, simple scheduler delays TA s ACID Properties Atomicity all or no operations of a TA are durable Consistency DB consistent after a TA Isolation TA operations appear isolated Durability after commit a TA is final Atomicity Example withdraw(123,100) deposit(321,100) The transaction must not stop after the first step and make it durable. This wou ld violate a consistency constraint (account balances) Client requests commit, DB handles rollback Consistency high level DB features During a TA some integrity constraint might be violated. TA will be aborted Isolation Operations in a TA appear isolated from operations of other TAs Example check balance>100 in acc 123 at 11:23:34 withdraw 100 from acc 123 at 11:23:34+e If another TA withdraws between these operations there is a problem. TAs must ha ve virtual serial view on system Isolation expensive, can be relaxed: isolation levels Durability If user is notified of TA success, that TA cannot be aborted The effects are crash resistant. TA written to persistent storage resistant to O S crash, system outage. Also protection against loss of persistent memory prefer red - catastrophes

Serial schedules one TA starts only after all others have finished Example r1[x],r1[y],w1[x],c1,r2[x],w2[y],c2 serial r1[x],r1[y],r2[x]... not serial r2[x],w2[x],c2,r1[x],r1[y],w1[x],c1 serial mutually exclusive TA access to db - single lock. fulfils ACID isolation Serialisable schedules if a schedule is equivalent to a serial schedule in effect. ACID isolation - onl y serialisable schedules allowed Data disjoint transactions No data object accessed by more than one TA. Implies every schedule is serialisa ble Example r1[x],r1[y],w1[x],c1,r2[z],w2[z],c2 fulfils isolation, the TA operations may be reordered without issue. No scheduling necessary Conflict objects accessed by at least two TAs If there is at least one write access we have a write conflict object. Otherwise read conflict object. Data disjoint - no conflict object Lock based scheduling for isolation Simple scheduler algorithm: If two TAs data disjoint, do nothing If there is a conflict object: define a logical order in time ensure the order is serialisable: one TA does all operations on conflict objects before the other A TA sees only results of TAs that come earlier in the order, only has influence on TAs that come later Write disjoint transactions no write conflict objects Example r1[x],r1[y],r2[y],w1[x],c1,w2[z],c2 write disjoint No scheduling necessary. But what if a TA tries to write a read conflict object? Scheduling is online Scheduler an online algorithm - cannot know commands in advance, takes them as t hey come. The simple scheduler causes write disjoint TAs to wait for each other unnecessar ily - pessimistic: expects that every read may be followed by a write Example s: r1[x],r1[y],w1[x],c1,r2[x],w2[x],c2,r3[y],w3[z],c3 TA1:r1[x],r1[y]w1[x],c1 TA2: r2[x], TA3: r3[y], w2[x],c2 w3[z],c3

It turns out well that TA2 has to wait but TA3 waits unecessarily Simple scheduler treats all conflicts the same only one type of lock, Xlock Might delay write disjoint TAs. Not practical Common scheduler Rlocks and Xlocks. Does not delay write disjoint TAs Scedulers with read locks The common scheduler Slocks - shared objects with Slock can only be read, nt written Xlock No other lock can be set on x owner can r/w lone Slock can be upgraded Example s: r1[x],r2[x],c2,r1[y],r3[z],w1[y],c1,w3[x],c3 TA1:r1[x], r1[y], w1[y],c1 TA2: r2[x],c2 TA3: r3[z], w3[x],c3 TA2 doesn't need to wait. TA3 waits for Xlock on x Problems if several TAs have Slock on x, No TA can get Xlock. A writing TA might never ge t Xlock if TAs continuously read x - x in a live lock update locks first wrting TA gets a Ulock - no new reader allowed before write done. When all readers done, writing TA gets Xlock s: r1[x],r2[x],r3[x],c2,c3,w1[x],c1,r4[x] TA1:r1[x], w1[x], c1 TA4: r4[x] TA1 waits for x, aquires Ulock, then Xlock. TA4 cannot aquire Slock til TA1 done Only one TA has Ulock Deadlocks Deadlocks while attempting lock upgrade TAs often read x then write x - get Slock then Ulock then Xlock deadlock: s: r1[x],r2[x] TA1:r1[x], w1[x] TA2: r2[x], w2[x] Solution: SQL: SELECT ... FOR UPDATE A read operation that expresses intent to write: R[] requests Ulock and Xlock s: R1[x],w1[x],c1,R2[x],w2[x],c2 TA1:R1[x],w1[x],c1 TA2: R2[x], w2[x],c2 (simple scheduler - all reads are FOR UPDATE) Finding deadlocks: Queue graph digraph. nodes TAs. If TAn entering queue on x and will get lock from TAm, draw

edge from TAm to TAn with label x. Cycles are deadlocks. Continuous vs periodic deadlock detection. Strategy to reduce deadlocks: TAs access different data items in the same order Relaxed Isolation, Isolation Levels TA tuning, relaxed isolation Scheduling reduces throughput Increase throughput Isolation levels TA chopping Optimistic locking Isolation Levels Iso Level READ UNCOMMITTED READ COMMITTED REPEATABLE READ SERIALIZABLE Dirty Yes No No No Fuzzy Yes Yes No No Phantom Yes Yes Yes No

Dirty read TA2 reads uncommitted write TA1 If TA2 does not observe Xlock READ UNCOMMITTED allows dirty reads but TAs read only ...w1[x],r2[x] Fuzzy read TA1 reads different committed values for x TA2 does not observe Slock of TA1 r1[x],w2[x],c2,r1[x] Can lead to lost updates - a committed TA might miss an update r1[x],r2[x],w2[x],c2,(fzy rd zone)w1[x],c1 Example a123=99 TA1 withdraw 17, TA2 withdraw 23 r1[x]:d1=99 r2[x]:d2=99 w2[x]:a123=d2-23 w1[x]:a123=d1-17 Phantom Caused by inserts, not updates TA1: SELECT * FROM mytable TA2: inserts row into mytable TA1: SELECT * FROM mytable, different result with phantom row Dirty read worse than fuzzy read worse than phantom row If a read is dirty and fuzzy, call it dirty Dirty write following dirty read Isolation level NONE Every write in a TA following a dirty read is called a dirty write as it may be influenced by the read Example r1[x],w1[x],r2[x],w2[x],c2,a1 NONE is similar to Autocommit=TRUE

Dirty write worse than dirty read Fuzzy read: r1[x],r2[x],w1[x],c1,r2[x],w2[z],c2 Lost update:r1[x],r2[x],w1[x],c1, w2[x],c2 Dirty read: r1[x],r2[x],w1[x], r2[x],r2[z],c2,c1 Dirty write:r1[x],r2[x],w1[x], r2[x]w2[z],c2,c1 We talk about the conflict object (x above) and the conflict writing TA (TA1 abo ve) Lost updates cannot be avoided by more reads A: TA1: r1[x], w1[x], c1 B: TA1: r1[x], r1[x],w1[x], c1 If the second read is different? rollback Lost updates Case 1, second read TA2 prevents s: r1[x],r2[x],r1[x],w1[x],c1,r2[x],a2 TA1:r1[x], r1[x],w1[x],c1 TA2 r2[x] r2[x],a2 Rollback because of changed value s: r1[x],r2[x],r1[x],r2[x],w1[x],c1,w2[x],c2 TA1:r1[x], r1[x], w1[x],c1 TA2: r2[x], r2[x], w2[x],c2 second read happens too early, still lost update Preventing lost update in READ COMMITTED SELECT FOR UPDATE can prevent lost updates in READ COMMITTED s: r1[x],r2[x],w1[x],c1,w2[x],c2 TA1:r1[x], w1[x],c1 TA2: r2[x], w2[x],c2 s: R1[x],w1[x],c1,R2[x],w2[x],c2 TA1:R1[x], w1[x],c1 TA2: R2[x], w2[x],c2 Summary bad phenomena of reduced isolation SERIALIZABLE, REPEATABLE READ, READ COMMITTED, READ UNCOMMITTED, NONE Atomicity Requires rollback log - list of records for write ops in order of execution. related to schedule w ithout reads. important for recovery, durability also DB clients->TA manager->((DB buffer->stable DB), (log buffer->stable log)) The database log central to DB management log contains list of write ops, TA demarcations (BOT,a,c) undo/redo logs: for each write op, a log entry contains log sequence number, nr: TA ID, ta: object id, obj: before image, b: after image, a:

Example [nr:110,ta:22,obj:x,b:2,a:53] [nr:111,ta:22,commit] [nr:112,ta:23,obj:x,b:53,a:46] [nr:113,ta:24,obj:y,b:34,a:87] [nr:114,ta:23,obj:z,b:23,a:54] [nr:115,ta:23,obj:x,b:46,a:78] at this point we have x=78,z=54,y=87 rollback based on the log. aborts are recorded For all write operations up to this point the TA still holds the Xlock: restore before image release all write locks Example [nr:110,ta:22,obj:x,b:2,a:53] [nr:111,ta:22,commit] [nr:112,ta:23,obj:x,b:53,a:46] [nr:113,ta:24,obj:y,b:34,a:87] [nr:114,ta:23,obj:z,b:23,a:54] [nr:115,ta:23,obj:x,b:46,a:78] [nr:116,ta:23,abort] rollback x=78->46->53, y=87->34, z=54->23 Rollback vs compensating TA If TA aborted, database does rollback, as if TA never happened compensating TA: undoes the effect of earlier TA, for example cancellation of pu rchase. Can fail because isolation has ended and other operations may have been made Durability crash recovery - write ahead logging Log buffer in main memory, stable log on persistent storage System crash - lose db buffer and log buffer, use stable log and db to recover TA is committed only after commit entered in persistent log file storage - write ahead logging stable log is boss WAL: committed TAs have commit record in the stable log, are winners, considered committed otherwise losers, considered aborted Goal of recovery, clean stable db Stable db is clean iff is consistent with winner/loser decision of stable log all winner writes and only winner writes are reflected in the stable db log entries of uncommitted TAs are without effect Recovery: ensure the stable db clean, based on the stable log Media recovery the clean stable db can be reconstructed from the complete stable log at any tim e - redo all TAs - or from a hinstoric clean stable db copy and the stable log: media recovery Database structure db buffer and stable db content partitioned into pages Every buffer page has one image page on the stable db Pages read/written to/from stable db as a whole Buffer fast, stable db vast

db buffer management buffer a write cache: changes to data buffer not immediately written to stable d b. distinguish between committed and uncommitted changes Easy crash recovery Try to always keep stable db clean. Not always practical. Instead, stable db can containg pages with uncommitted writes, old data not reflecting committed write s. These can both appear on the same page. Pagewise write locks - to maintain a clean stable db, only one TA can write a pa ge at a time. Suppose both TA1 and TA2 write a page and TA1 commits, TA2 does not. If we write back, we have a steal page, if not, an outdated page, either way, unclean buffer management policy alternatives buffer pages with committed write: force: At commit, pages are written back to t he stable db. Leads to performance bottlenecks. Or no force: will lead to redos buffer pages with uncommitted write: no steal: Such pages must not be written to stable db. Can lead to buffer bottlenecks. Or Steal: leads to undo. Force, no steal is easy crach recovery. Ensures the stable db always clean. howe ver unpractical no force, steal policy avoids bottlenecks, difficult to implement. no force: Some committed writes are not in stable db yet. makes redo after crash necessary. steal: some uncommitted writes in the stable db. undo after crash Write ahead logging 2 enabling crash recovery for steal policy. stable db pages changed by losers. inf ormation to undo must be in the log, so before a buffer page written to stable d b, all log entires for that page are written back to the stable log Example [nr:110,ta:22,obj:x,b:91,a:78] [nr:111,ta:23,obj:z,b:23,a:54] [nr:112,ta:22,obj:x,b:78,a:53] [nr:113,ta:22,obj:y,b:87,a:64] [nr:114,ta:22,commit] [nr:115,ta:23,obj:y,b:64,a:85] CRASH! we have Stable database: x=78, z=54, y=87 db buffer: x=91->78->54, y=87->64->85, z=23->54->85 Crash recovery: redo winners, undo losers go through log redo in positive time direction is the write/TA committed? write for each committed TA the after image undo in negative time direction is the write/TA not committed? write for each uncommitted TA the before image First we redo, then undo Stable db: redo: x=78->53, y=87->64, undo: z=54->23

Log truncation Redo only a reasonable section of the stable log using the stable db. So pages m ust be written to the stable db Start redo from earliest write on db buffer not in stable db stop undo at earliest write not committed Write behind daemon and checkpoints for log truncation write behind daemon goes through buffer pages and writes them out If starts at time t, goes through all pages in buffer by t+s, then all writes co mmitted before t are written out at t+s We remember the last log sequence number L on the stable log at time t and by t+ s we have written out all committed writes before L db writes a checkpoint log entry containing the last safe L. Then we only have t o redo from L+1 write behind daemon is low priority (takes time) Optimisations and undo want to only write out necessary pages Buffer pages remember earliest unsaved committed write, RedoLSN, write behind da emon writes out pages with RedoLSN<L Instead of L we write to the checkpoint entry the oldest RedoLSN that we encount er, likely >L We also have to truncate undo: remember the first LSN of the earliest uncommitte d TA as the UndoLSN and the checkpoint record contains this also A checkpoint record limits the number of log entries we have to consider. Tells us how far to go back: RedoLSN tells us how far to go back with the redo UndoLSN how far back with the undo [nr: [nr: [nr: [nr: [nr: [nr: [nr: 110, 111, 112, 113, 114, 114, 115, ta: 22, obj: x, b: 91, a: ta: 23, obj: z, b: 23, a: ta: 22, obj: x, b: 78, a: ta: 22, obj: y, b: 87, a: checkpt: redo: 110, undo: ta: 22, commit] ta: 23, obj: y, b: 64, a: 78] 54] 53] 64] 111] 85]

So we have to go back and start at 110 for the redo, and go backwards to 111 for the undo Multiple system crashes If a crash while recovery in process crash recovery algorithm writes checkpoints as it goes Media recovery Loss of the log unrepairable - highly reliable store used. Log is archived, neve r discarded. From time to time db backups made Summary Atomicity, durability achieved using log Stable log and db are on tertiary memory Crash - main memory lost WAG stable log authoritative, can reconstruct clean, stable db force/noforce, steal/nosteal We use steal noforce and in a crash have to redo winners and undo losers

Multiversion Concurrency multiple versions of a data object database replication protocol replication - master copy, working copies (for clients). Faster read access. dbs should behave as one. Maintain ACID properties No locks, no delay for TAs. If conflict, one is aborted. First committer wins Clients do local transaction replication middleware: coordinate with master. Log client commands. create read sets and write sets. During TA, listen. At commit.. Each working copy for single TA. Takes a fresh copy of a recent clean state of m aster, dates it with committ timestamp (how uptodate it is) Local TA works on a snapshot of the database - the state at timestamp s We have the clean master at time s the working copy at time s+e TA1: request commit at time s+e+d Say others have committed inbetween. now durable. ACID durability says TA1 can commit only if master is in the same state as at ti me s+e with respect to TA1. TA1 is restamped with timestamp s+e+d, the commit ti mestamp For simple cases TA1 and TAx must be write, data disjoint Commit through replication middleware at commit interferes and communicates with master. checks if restamping OK, then carries out changes Read sets, write sets Objects sets with values attached, at most one read (before image) and at most o ne write (after image) Read set - those objects read write set - those written rw set - those read and written At commit replication middleware uses read write sets to check write disjoint: Snapshot isolation check only rw set. For each object, is read value still current in master? Optimistic locking Check whole read set. Is the read value still current in master? Replication middleware executes write set check and write happen in a single TA on master. Fast since all ready and TAs ha ve same simple structure Multiversion as linear schedules: s: r4[x],r4[y], w3,c3, r5[x],r5[y], w4[x],c4, Time: s2+e s3 s3+e s4 The schedule ca violate lock-based scheduling rules

Snapshot isolation allows write skew - a "double almost-lost-update on different objects" s: r1[x],r1[y],r2[x],r2[y],w1[x],c1,w2[y],c2 will pass snapshot isolation test but violates serializability. rare, mostly har mless Suppose bank allows one account to go negative if overall balance between accoun ts is positive s: r1[x],r1[y], (compute balance) w1[x],c1 Now w1[x] causes x to be negative. Consider s: r1[x],r1[y],r2[x],r2[y],w1[x],c1,w2[y],c2 Now both accounts are negative Replication - keep local copies uptodate with incremental changes using write se ts. Each client has an instance of the replication middleware which communicates the write sets Queues Transaction chopping Split one business TA into several. keep TAs short requires communication between TAs with message queues. may require compensation TAs if business TAs can't complete Example 1. withdraw $x from y TA1 2. put $x in z TA2 If client crashes after 1.? TA1 leaves a note in message queue, a table with sch ema TransferOngoing(id,fromAcc,toAcc,amount) TA1 leaves a row in queue, TA2 dequeues, acts accordingly Processing not FIFO, is best effort. Try process older first but might encounter aborts, concurrent access etc Local message queues table, rows are messages, handled by database clients producers - message in queue consumers - dequeue, either delete or mark as processed. Might then enqueue a fu rther message for subsequent processing Consumers/dequeue workers activated by messages. Must act on message appropriate action Transactional dequeue pattern performed by dequeue worker, database client A transactional dequeue is a single local TA that does two things: dequeue message act on message Atomicity, message is dequeued iff action succeeds (may include enqueueing of ne w messages) Transactional dequeue allows chopping Transaction chaining A transactional dequeue may work on two message queues:

dequeue message from incoming queue 1 execute action enqueue message in outgoing queue 2 conditional response and declining An action might be to decline a transaction. This is a correct response Concurrent access to queues read past - go to the next unlocked item. Tricky, if a TA looks at an object tha t is locked the TA gets blocked Simulating read past Queue Management and tracking unprocessed messages handled by dispatcher using R EAD UNCOMMITTED Dispatcher calls dequeue workers that do transactional dequeue, passes on messag e id Distributed messaging scenario Consider DB1, DB2 Applications send messages from 1 to 2 for processing Queue for each DB DB1: outbox1(messageID,status,message) DB2: inbox2(messageID,status,message) One worker W works on outbox, moving messages to inbox2 which has workers proces sing the message W is a message consumer for outbox1, producer for inbox2 Distributed messaging, secure delivery W does two ACID TAs For outbox1 on DB1 W is message consumer, does transactional dequeue of message in DB1 1. enqueus message in inbox2 on DB2 2. commits write on DB2 3. commits dequeue on DB1 If W crashes between 2 and 3 dequeue on outbox not committed. W redoes it get mu ltiple copies in DB2? An operation g is indepotent - applying g twice has the same effect as applying g once Enqueue of message to inbox2 is indepotent - if message with same id already in inbox2, skip enqueue Indepotence is crucial, enables maintaining ACID properties with two ACID TAs. D atabases don't need to worry about being part of distributed, transactional comm unication Distributed ACID TAs heavyweight Distributed messaging much lighter Connection to Service Oriented Architecture SOA: components communicate over service interfaces, eg web services Makes components reusable Require indepotence for service interfaces for messaging. Equivalent to using an id Message queues as load buffer User places request to TA service into message queue. Once queued message will b e processed. Reliable. Application server continuously processes pending request

s in the queue, can be 100% utilized. Several application servers can work on sa me queue.

Das könnte Ihnen auch gefallen