Sie sind auf Seite 1von 11

Transaction Management and Concurrency control

Definition;
Transaction – refers to an action or series of actions carried out by a single use
r or application program which accesses or changes the contents of the database.
It is a logical unit of work on the database.
A transaction can have one of two outcomes;
1. If it completes successfully, it is said to have committed and the datab
ase reaches a new consistent state.
2. On the other hand, if the transaction does not execute successfully, it
is said to have aborted and the database is restored to its original consistent
state before the concerned transaction started.
Such a transaction is said to have rolled back or undone. A committed transactio
n cannot be aborted.
If a transaction is committed by mistake, then another one has to be initiated t
o reverse the previous transaction’s effects. This new transaction is sometimes ca
lled a compensating transaction.
Most DBMSs do not have an inbuilt way of determining which actions are grouped t
ogether to form a single transaction.
A way of circumventing this is to provide key indicators for these important bou
ndaries. Keywords like BEGIN, END, COMMIT and/or ROLLBACK (or their equivalent)
are used.
If these delimiters are not used, the entire program is usually regarded as a si
ngle transaction, with the DBMS performing an automatic COMMIT when the program
terminates correctly and a ROLLBACK if it does not.
Properties of transactions
1. Atomicity (the “all or nothing property”)
A transaction is an indivisible unit that is either performed in its entirety or
not performed at all.
2. Consistency
A transaction must transform the database from one consistent state to another c
onsistent state.
3. Isolation
Transactions execute independently of one another. This means that any partial e
ffects of one incomplete transaction are not “visible” to other transactions.

4. Durability
The effects of a successfully completed (committed) transaction are permanently
recorded in the database and must not be lost as a result of a subsequent failur
e.
If a failure occurs during a transaction, then the database could be inconsisten
t. Different DBMSs provide for different ways of avoiding this. Oracle, for inst
ance has the following provisions;
Use of the RECOVERY MANAGER (RMAN) to ensure the database is restored to the sta
te it was in before the start of the transaction, and therefore a consistent sta
te.
The BUFFER MANAGER is responsible for the transfer of data between disk storage
and main memory.
Concurrency Control
Refers to the process of managing simultaneous operations on the database withou
t having them interfere with each other.
A key objective of developing a database is to enable many users to access share
d data concurrently.
Concurrent access is relatively easy if all users are only reading; as there is
no way that they can interfere with one another.
Alternatively, when two or more transactions are accessing the database simultan
eously and at least one is updating data, there may be interference that can res
ult in inconsistencies.
Although two transactions may be perfectly correct in themselves, the interleavi
ng of operations may produce incorrect results, thus compromising the integrity
and consistency of the database.
There are 3 such occurrences;
1. Lost update problem
This is whereby an apparently successfully completed update operation by one use
r/application is prevailed/dominated by another user/application.
Example
Assume we have two transactions T1 and T2.
T1 is withdrawing KSh. 1,000 from an account with a balance of Ksh. 10,000 and T
2 is depositing Ksh. 5000 into the same account.
If these transactions were executed serially, one after the other without interl
eaving of operations, the final balance would be Ksh. 14,000 no matter which tra
nsaction is performed first.
Problem:
Transaction T1 and T2 start at nearly the same time, and both read the balance a
s Ksh. 10,000. T2 increases the balance by Ksh. 5,000 to have the balance at Ksh
. 15,000 and stores the update in the database. Meanwhile transaction T1 reduces
its copy of the balance by Ksh. 1,000 to Ksh. 9,000 and stores the value in the
database, overwriting the previous update by T2 and thereby “losing” the Ksh. 5,000
previously added to the balance.
Illustration (Problem).
Time T1 T2 Balance
t1 Begin transaction 10,000
t2 Begin transaction Read (Balance) 10,000
t3 Read (Balance) Balance = Balance + 5,000 10,000
t4 Balance = Balance - 1,000 Write (Balance) 15,000
t5 Write (Balance) Commit 9,000
t6 Commit 9,000
The loss of T2’s update can be avoided by preventing T1 from reducing the value of
the Balance until T2’s update has been completed.
Solution:
T2 first requests a write_lock on the Balance. It can the proceed to read the va
lue of the balance from the database, increase it by Ksh. 5,000 and write back t
he new value back to the database. When T1 starts, it also requests a write_lock
on the Balance. However, since the balance is already write locked by T2, the r
equest is not immediately granted and T1 has to wait until the lock is released
by T2. This happens only after T2 commits.
Illustration - Solution
Time T1 T2 Balance
t1 Begin transaction 10,000
t2 Begin transaction Write_lock (Balance) 10,000
t3 Write_lock (Balance) Read (Balance) 10,000
t4 WAIT Balance = Balance + 5,000 15,000
t5 WAIT Write (Balance) 15,000
t6 WAIT Commit/unlock (Balance) 15,000
t7 Read (Balance) 15,000
t8 Balance = Balance - 1,000 15,000
t9 Write (Balance) 14,000
t10 Commit/unlock )Balance) 14,000
2. The uncommitted dependency problem
This occurs when one transaction is allowed to see the intermediate results of a
nother transaction before it has committed.
Assume 2 transactions T3 and T4.
Transaction T4 reads the balance as KSh. 10,000 and increases by Ksh. 5,000 and
updates the figure to Ksh. 15,000, but it aborts the transaction such that the b
alance is restored back to the original value of Ksh. 10,000.
However, by this time, transaction T3 has read the new value of the balance (Ksh
. 15,000) and is using it as the basis of a Ksh. 1,000 withdrawal, giving a new
incorrect balance of KSh. 14,000 instead of Ksh. 9,000.
The reason for the aborting of the transaction is immaterial to us but the effec
t is the assumption by T3 that T4‘s update completed successfully, even though it
was rolled back.
Illustration (Problem).
Time T3 T4 Balance
t1 Begin transaction 10,000
t2 Read (Balance) 10,000
t3 Balance = Balance + 5,000 10,000
t4 Begin transaction Write (Balance) 15,000
t5 Read (Balance) . 15,000
t6 Balance = Balance - 1,000 Rollback 10,000
t7 Write (Balance) 14,000
t8 Commit 14,000
The solution to the problem is to prevent T3 from reading the Balance till after
T4 is through (whether it completes successfully or not).
T4 first requests write_lock on the balance. It then proceeds to read the value
of the balance, increments the value by Ksh. 5,000 and writes back the value to
the database, but it does not commit. Instead, a roll back is issued. When the r
oll back is executed, the updates of T4 are undone and the value of the Balance
is returned to its original value of Ksh. 10,000.
Illustration - solution.
Time T3 T4 Balance
t1 Begin transaction 10,000
t2 Begin transaction Write_lock (Balance) 10,000
t3 Write_lock (Balance) Read (Balance) 10,000
t4 WAIT Balance = Balance + 5,000 15,000
t5 WAIT Write (Balance) 15,000
t6 WAIT Rollback/unlock Balance 10,000
t7 Read (Balance) 10,000
t8 Balance = Balance - 1,000 10,000
t6 Write (Balance) 9,000
t7 Commit/unlock Balance 9,000

3. The inconsistent analysis problem


Introduction
The lost update problem and the uncommitted dependency problem mainly concentrat
e on transactions that are updating the database and their interference may corr
upt the database.
On the other hand, transactions that only read the database can also produce ina
ccurate results, if they are allowed to read partial results of other incomplete
transactions that are simultaneously updating the database, a situation referre
d to as a dirty read or unrepeatable read.
The inconsistent analysis problem occurs when a transaction reads several values
from the database but another transaction updates one or more of the values dur
ing the execution of the first.
Assume two transactions T5 and T6
T6 is calculating the total of balances in 3 accounts X (1,000), Y (700) and Z (
500). In the meantime, transaction T5 has transferred Ksh. 100 from account X to
account Z, so that T6 now has the wrong result.
Illustration (Problem).
Time T5 T6 Balances Total
X Y Z
t1 Begin transaction 1,000 700 500 -
t2 Begin transaction Total = 0 1,000 700 500 0
t3 Read (X) Read (X) 1,000 700 500 0
t4 X = X - 100 Total = Total + X 1,000 700 500 1,000
t5 Write (X) Read (Y) 900 700 500 1,000
t6 Read (Z) Total = Total +Y 900 700 500 1,700
t7 Z = Z + 100 . 900 700 500 1,700
t8 Write (Z) . 900 700 600 1,700
t9 Commit Read (Z) 900 700 600 1,700
t10 Total = Total + Z 900 700 600 2,300
The solution to this problem is to prevent transaction T6 from reading balances
X, Y and Z until T5 has completed its updates.

Illustration (solution).
Time T5 T6 Balances Total
X Y Z
t1 Begin transaction 1,000 700 500 -
t2 Begin transaction Total = 0 1,000 700 500 0
t3 Write_lock (X) Read_lock (X) 1,000 700 500 0
t4 Read (X) WAIT 1,000 700 500 0
t5 X = X - 100 WAIT 1,000 700 500 0
t6 Write (X) WAIT 900 700 500 0
t7 Write_lock (Z) WAIT 900 700 500 0
t8 Read (Z) WAIT 900 700 500 0
t9 Z = Z + 100 WAIT 900 700 500 0
t10 Write (Z) WAIT 900 700 600 0
t11 Commit/unlock (X, Z) WAIT 900 700 600 0
t12 Read (X) 900 700 600 0
t13 Total = Total + X 900 700 600 900
t14 Read_lock (Y) 900 700 600 900
t15 Read (Y) 900 700 600 900
t16 Total = Total +Y 900 700 600 1,600
t17 Read_lock (Z) 900 700 600 1,600
t18 Read (Z) 900 700 600 1,600
t19 Total = Total + Z 900 700 600 2,200
t20 Commit/unlock (X, Y & Z) 900 700 600 2,200
Serializability and recoverability
We have seen the problems associated with allowing transactions to execute concu
rrently.
The aim of concurrency control is to schedule transactions in such a way as to a
void interference.
One way of doing this is to allow only one transaction to execute at any one giv
en time: one transaction is committed before the next transaction is allowed to
begin. However, the aim of a multi-user DBMS is to maximize the degree of concur
rency or parallelism within the system, such that transactions that can execute
without interfering with each other can and should be allowed to run in parallel
.
Definition;
Schedule – refers to a sequence of operations by a set of concurrent transactions
that preserves the order of the operations in each of the individual transaction
s.
A transaction is made up of a sequence of operations consisting of reads and wri
tes to the database, followed by a commit or abort actions.
A schedule S could be defined to consist of a sequence of operations from a set
of n transactions T1, T2, T3, …,Tn, subject to the constraint that the order of op
erations for each transaction is preserved in the schedule. Therefore, for each
transaction Ti in schedule S, the order of the operations in Ti must be the same
in the schedule S.
Serial schedule – refers to a schedule where the operations of each transaction ar
e executed consecutively without any interleaved operations from other transacti
ons.
In a serial schedule, the transactions are performed in serial order. For instan
ce, if we have 2 transactions T1 and T2, the serial order would be T1 then T2, o
r T2 then T1. Evidently, in serial execution, there is no interference between t
ransactions, since only one transaction is executing at any given time.
It may not be guaranteed that the outcome of all serial executions of a given se
t of transactions will be identical. For instance, in a bank, it matters a lot w
hether the interest is calculated before or after a large deposit is made.
Non-serial schedule – refers to a schedule where the operations from a set of conc
urrent transactions are interleaved.
The 3 problems of concurrency control described earlier arise from the mismanage
ment of concurrency control, which left the database in an inconsistent state fo
r the first two problems and presented the user with a wrong result in the last
problem (inconsistent analysis).
Serial execution prevents such problems. Whatever schedule is chosen, serial exe
cution never leaves the database in an inconsistent state. Thus any serial sched
ule is considered correct even though different results may arise.
Serializability aims at finding a non-serial schedule that allows transactions t
o execute concurrently without interfering with one another, and thus produce a
database state that could be produced by a serial execution.
If a set of transactions execute concurrently, the non-serial schedule is termed
correct if it produces the same results as some serial execution, and such a sc
hedule is said to be serializable.
To prevent the inconsistent analysis problem, it is very important to guarantee
serializability of concurrent transactions.
In serializability, the ordering of reads and writes is important;
If two transactions only read a data item, they do not conflict and order is not
important.
If two transactions wither read or write completely separate data items, they do
not conflict and order is not important.
If one transaction writes a data item and another either reads or writes the sam
e data item, the order of execution is important.

Consider the following schedule S1 containing operations from two concurrently e


xecuting transactions T7 and T8.
Time T7 T8
t1 Begin_transaction
t2 Read (Balance X)
t3 Write (Balance X)
t4 Begin_transaction
t5 Read (Balance X)
t6 Write (Balance X)
t7 Read (Balance Y)
t8 Write (Balance Y)
t9 Commit
t10 Read (Balance Y)
t11 Write (Balance Y)
t12 Commit
Since the write operation on the balance in T8 does not conflict with the subseq
uent read operation on the Balance in T7, the order of these two operations can
be changed to produce an equivalent schedule S2 shown below.
Time T7 T8
t1 Begin_transaction
t2 Read (Balance X)
t3 Write (Balance X)
t4 Begin_transaction
t5 Read (Balance X)
t6 Read (Balance Y)
t7 Write (Balance X)
t8 Write (Balance Y)
t9 Commit
t10 Read (Balance Y)
t11 Write (Balance Y)
t12 Commit

If we also change the order of the following non-conflicting operation, we produ


ce an equivalent schedule S3 as follows;
Time T7 T8
t1 Begin_transaction
t2 Read (Balance X)
t3 Write (Balance X)
t4 Read (Balance Y)
t5 Write (Balance Y)
t6 Commit
t7 Begin_transaction
t8 Read (Balance X)
t9 Write (Balance X)
t10 Read (Balance Y)
t11 Write (Balance Y)
t12 Commit
We have simply
Changed the order of the Write (Balance X) of T8 with the Write (Balance Y) of T
7.
Changed the order of the Read (Balance X) of T8 with the Read (Balance Y) of T7.
Changed the order of the Read (Balance X) of T8 with the Write (Balance Y) of T7
.
The schedule S3 is a serial schedule and since S1 and S2 are equivalent to S3, S
1 and S2 are serializable schedules.
This type of serializability is known as conflict serializability. This is a sch
edule that orders any conflicting operations in the same way as some serial exec
ution.
Under the unconstrained write rule (i.e. a transaction updates a data item based
on its old value, which is first read by the transaction), a precedence graph c
an be produced to test for conflict serializability.
A precedence graph consists of;
A node for each transaction
A directed edge Ti Tj, if Tj reads the value of an item written by Ti.
A directed edge Ti Tj, if Tj writes a value into an item after it has been read
by Ti.
If the precedence graph contains a cycle then the schedule is not conflict seria
lizable.
Non-conflict serializable schedule.
Consider two transactions T9 and T10. Transaction T9 is transferring Ksh. 1,000
from one account with balance X to another account with balance Y, whilst T10 is
increasing the balance of these two accounts by 10%. The diagram follows;
Time T9 T10
t1 Begin transaction
t2 Read (Balance X)
t3 Balance X = Balance X –1,000 .
t4 Write (Balance X) Begin transaction
t5 Read (Balance X)
t6 Balance X = Balance X * 1.1
t7 Write (Balance X)
t8 Read (Balance Y)
t9 Balance Y = Balance Y * 1.1
t10 Write (Balance Y)
t11 Read (Balance Y) Commit
t12 Balance Y = Balance Y + 1,000
t13 Write (Balance Y)
t14 Commit
The precedence graph is as follows;

As the precedence graph has a cycle, then this schedule is not conflict serializ
able.
View serializability.
This is one other type of serializability that offers less stringent definitions
of schedule equivalence than that offered by conflict serializability.
Two schedules S1 and S2 consisting of the same operations from n transactions T1
, T2, T3, …, Tn are view equivalent if the following three conditions hold;
For each data item x, if transaction Ti reads the initial value of x in the sche
dule S1, then transaction Ti must also read the initial value of x in the schedu
le S2.
For each record operation on data item x by transaction Ti in schedule S1, if th
e value read by x has been written by transaction Tj, then transaction Ti must a
lso read the value of x produced by transaction Tj in Schedule S2.
For each data item x, if the last write operation on x was performed by transact
ion Ti in schedule S1, the same transaction must perform the final write on data
item x in schedule S2.
Concurrency control techniques
Serializability is achievable is several ways.
The most basic ways of attaining this is to use techniques that allow transactio
ns to proceed safely subject to certain constraints: locking and timestamping me
thods.
The above two methods are conservative (or pessimistic) techniques in that they
cause a delay in transactions in case there is a conflict with other transaction
s at some future time.
Alternative methods, called the Optimistic methods, are base don the premise tha
t conflict is rare and so transactions are allowed to proceed unsynchronized and
only check for conflict at the end, when the transactions reach the “commit” stage.
Locking
This is a procedure used to control concurrent access to data. When one transact
ion is accessing the database, a lock may deny access by other transactions to p
revent incorrect results.
There are two types of lock;
Read lock: if a transaction has a read lock on a data item, it can read the item
but not update it. A read lock is shared i.e. many users can be granted a read
lock at the same time without an adverse effect on the database!
Writelock: if a transaction has write lock on a data item, it can both read and
update the item. A write lock is exclusive i.e. only one user/application can be
granted a write lock at any one particular time.
Locks work as follows;
Any transaction that requires access to a data item must first lock the item by
requesting a read lock for read only access or a write lock for both read and wr
ites access.
If the item is not already locked by another transaction, the lock will be grant
ed.
If the item is currently locked, the DBMS determines whether the request is comp
atible with the existing lock – if a read lock is requested on an item that alread
y has a read lock on it, the request is granted; on the other hand, if a write l
ock is requested on an item that already has a write-lock on it, then the transa
ction must WAIT until the existing lock is released.
A transaction continues to hold a lock until it explicitly releases it either du
ring execution or when it terminates (aborts or commits). It is only when the wr
ite lock is released that the effects of the write operation will be made visibl
e to other transactions.
Some systems permit a transaction to issue a read lock on an item and then later
upgrade the lock to a write lock. This allows a transaction to examine data fir
st and then decide whether to update or not. If upgrades are not supported, a tr
ansaction must hold write locks on all data items that it may update at some tim
e during the execution of the transaction, thereby potentially reducing the leve
l of concurrency in the system.
For similar reasons, some systems also permit a transaction to issue a write loc
k and then later downgrade the lock to a read lock.
Two-phase locking (2PL)
A transaction follows the two-phase locking protocol if all locking operations p
recede the first unlock operation in the transaction.
According to the rules of this protocol, every transaction can be divided into t
ow phases; first a growing phase, in which it acquires all the locks required bu
t cannot release any locks, and then the shrinking phase, in which it releases i
ts locks but cannot acquire any new locks. It is not mandatory that all locks be
acquired simultaneously - a transaction will normally acquire some locks, does
some processes and goes on to acquire additional locks as needed. However, no l
ocks are released until the transaction has reached a stage at which no new lock
s are needed.
The rules are;
A transaction must acquire a lock on an item before operating on that item. The
lock may be read or write, depending on the type of access required.
Once a transaction releases a lock, it can never acquire any new locks.
If upgrading of locks is supported, then it can only happen during the growing p
hase and may dictate that the transaction wait until another transaction release
s a read lock on the item. Downgrading can only take place during the shrinking
phase.
Deadlock
It refers to an impasse that may occur when two (or more) transactions are each
waiting for locks held by the other to be released.
Assume we have two transactions TA and TB;
Time TA TB
t1 Begin transaction
t2 Write_lock (balx) Begin transaction
t3 Read (balx) Write_lock (baly)
t4 Bal x = balx - 1000 Read (baly)
t5 Write (balx) Baly = baly + 2000
t6 Write_lock (baly) Write (baly)
t7 WAIT Write_lock (balx)
t8 WAIT WAIT
t9 WAIT WAIT
t10 . WAIT
t11 . .
In the above case, there s only one way to break deadlock: abort one or more of
the other transactions, which will involve undoing all the changes made by the t
ransactions. Assume we abort transaction TB. Once this is done, the locks held b
y transaction TB are released and TA is able to proceed. Deadlocks should be tra
nsparent to the users, and therefore the DBMS should automatically restart the a
borted transactions.
There are two techniques for handling deadlock;
Deadlock prevention and deadlock detection and recovery.
In the deadlock prevention, the DBMS looks ahead to determine if a transaction w
ould cause deadlock and never allows deadlock to occur.
On the other hand, in deadlock detection and recovery, the DBMS allows deadlock
to occur but recognizes occurrences of the deadlock and breaks it.
It is generally easier to test for deadlock and break it when it occurs than to
prevent it, many systems use the deadlock detection and recovery.
Deadlock prevention
A common approach used in deadlock prevention is to order transactions using tra
nsaction timestamps.
There are two algorithms used here;
Wait-die - it allows only an older transaction to wait for a younger one, otherw
ise the transaction is aborted (dies) and restarted with the same timestamp, so
that eventually it will become the oldest active transaction and will not die.
Wound-wait - it works such that only younger transactions can wait for older one
s. If older transaction requests a lock held by a younger one, the younger one i
s aborted (wounded).
Deadlock detection
It is usually handled by the construction of a wait-for Graph (WFG), showing tra
nsaction dependencies; i.e. transaction Ti is dependent on Tj, if Tj holds a loc
k on a data item that Ti is waiting for.
The WFG is constructed as follows;
Create a node for each transaction.
Create a directed edge Ti Tj, if transaction Ti is waiting to lock an item that
is currently locked by Tj.
Deadlock exists if and only if the WFG contains a cycle. Since it is a necessary
and sufficient condition to have a cycle in the WFG for a deadlock to exist, th
e deadlock detection algorithm generates the WFG regularly and examines it for a
cycle.
Timestamping
A timestamp is a unique identifier created by the DBMS that indicates the relati
ve starting time of a transaction.
Timestamping, on the other hand is a concurrency control protocol in which the k
ey objective is to order transactions globally in a such a way that older transa
ctions (those with smaller timestamps) get priority in the event of conflict.
Optimistic techniques
In some systems, conflicts between transactions are rare, and the additional pro
cessing required by locking or timestamping protocols is unnecessary for many tr
ansactions.
In this approach, it is assumed that conflict is rare and that it is more effici
ent to allow transactions to proceed unsynchronised. When a transaction wishes t
o commit, a check is performed to determine whether conflict has occurred.
If there has been conflict, the transaction must be rolled back and restarted. S
ince conflict is rare, rollback is rare too.
The overhead is involved in restarting a transaction may be considerable, since
it effectively means redoing the entire transaction. This may be tolerated only
if it happens very infrequently, in which case majority of transactions will be
processed without being subjected to any delays. This allows for greater concurr
ency than traditional procotols, since no locking is needed.
There are three phases to an optimistic concurrency control protocol, depending
on whether it is read only or an update transaction;
Read phase: this extends from the start of the transaction until immediately bef
ore the commit. The transaction reads the values of all the data items it needs
from the database and stores them as local variables. Updates are applied to a l
ocal copy of the data, not to the database.
Validation phase: if follows the read phase. Checks are performed to ensure seri
alizability is not violated if the transaction updates are applied to the databa
se.
For a read-only transaction, this consists of checking that the data values read
are still the current values for the corresponding data items. If no interferen
ce occurred, the transaction is committed. However, if interference occurred, th
e transaction is aborted and restarted.
For an update transaction, validation consists of determining whether the curren
t transaction leaves the database in a consistent state, with serializability ma
intained. If not, the transaction.
Write phase: this follows a successful validation phase for an update transactio
n. During this phase, the updates made to the local copy are applied to the data
base.
Granularity of data items
This refers to the size of data items chosen as the unit of protection by a conc
urrency control protocol.
The granule may be;
The entire database
A data file
A page (sometimes called an area or database space - a section of physical disk
in which relations are stored.
A record
A field value of a record.

Das könnte Ihnen auch gefallen