You are on page 1of 35

Transaction Processing

Transaction is a program unit whose execution may change the contents of a database. If the database was in a consistent state before a transaction, then on the completion of the execution of the program unit corresponding to the transaction, the database will be in a consistent state. This requires that the transaction be considered atomic; it is executed successfully or in case of errors, the user can view the transaction as not having been executed at all. The commit and rollback operations included at the end of a transaction ensure that the user can view a transaction as an atomic operation, which preserves database consistency. The commit operation executed at the completion of the modifying phase of the transaction allows the modifications made on the temporary copy of the database items to be reflected in the permanent copy of the database. The rollback operation is executed if there was an error of some type during the modification phase of the transaction. It indicates that any modifications made by the transaction are ignored; consequently, none of these modifications is allowed to change the contents of the database.

Transaction Processing

Properties of Transaction: Until the transaction terminates, the status and its actions is not visible outside. Any notification of what a transaction is doing must not be communicated, until the transaction is terminated. Once a transaction ends, the user may be notified of its success or failure and the changes made by the transaction are accessible. In order for a transaction to achieve these characteristics, it should have the properties of atomicity, consistency, isolation and durability. These properties referred to as the ACID test, represent the transaction paradigm.

Transaction Processing

Properties of Transaction (Contd.,): The Atomicity property of a transaction implies that it will run to completion as an indivisible unit, at the end of which either no changes have occurred to the database or the database has been changed in a consistent manner. At the end of a transaction the updates made by the transaction will be accessible to other transactions and the processes outside the transaction. The Consistency property of a transaction implies that if the database was in a consistent state before the start of a transaction, then on termination of a transaction the database will also be in a consistent state. The Isolation property of a transaction indicates that actions performed by a transaction will be isolated or hidden from outside the transaction until the transaction terminates. This property gives the transaction a measure of relative independence. The Durability property of a transaction ensures that the commit action of a transaction, on its termination, will be reflected in the database. The permanence of the commit action of a transaction requires that any failures after the commit operation will not cause loss of the updates made by the transaction.

Important properties Atomicity: Each execution is all or nothing. Consistency: Each execution leaves system state as well as the state of the real-world consistent. Isolation: partial effects of a transaction are hidden from each other. Durability: successful transaction effects survive future system malfunctions.

ACID Example Withdraw $100 from checking account using ATM. Atomicity: account debited if and only if the user gets money Consistency: balance of account is always positive. Isolation: concurrent execution of withdraw, deposit, transfers does not result in an incorrect balance of account. Durability: After withdraw, despite failures, the ATM account reflects the withdrawn $100.

Concurrent Execution of Transactions In concurrent operations, where a number of transactions are running and modifying parts of the database at the same time. We have to make sure that only one transaction has exclusive access to these data-items for at least the duration of the original transactions usages of the data items. This requires that an appropriate locking mechanism be used to allow exclusive access of these data items to the transaction requiring them. In the case of transaction of no such locking was used with the consequence that the result is not the same as expected.

The concurrency control scheme ensures that the schedule that can be produced by a set of concurrent transactions will be serializable. One of two approaches is usually used to ensure serializability, delaying one or more contending transactions, or aborting and restarting one or more of the contending transactions. Concurrency can be controlled by: Timestamp based order  Optimistic Scheduling  Locking Scheme

Serializability We assume that transactions are independent. An execution schedule of transactions is called serial execution. In a serial execution, each transaction runs to completion before any statements from any other transaction are executed. In a shared environment the result obtained by independent transactions that modify the same data item always depends on the order in which transactions are run; any of these results is considered to be correct. A non serial schedule where in operations from a set of concurrent transactions are interleaved is considered to be serialable if the execution of the operation in the schedule leaves the database in the same state as some serial execution of the transactions. i.e., if the given interleaved schedule produce the same result as one of the serial schedules and the answer is positive, then the given interleaved schedule is said to be serializable. Given an interleaved execution of a set of transaction the following condition hold for each transaction in a setAll transactions are correct in the sense that if any of the transaction is executed on a consistent database, the resulting database will be consistent.

Concurrency Control Time Stamp-based Order:-

In timestamp-based ordering, each transaction is assigned an unique identifier, which is usually based on the system clock. This identifier is called a timestamp and the value of the timestamp is used to schedule contending transactions. The rule is to ensure that a transaction with a smaller time stamp is effectively executed before a larger transaction. Any variation from this rule is corrected by aborting a transaction, rollback any modification made by it and starting it again.

Concurrency Control Optimistic Scheduling:-

In Optimistic Scheduling, the philosophy is that a contention between transactions will be very unlikely and any data item used by a transaction is not likely to be used for modification by any other transaction, this assumption is found to be invalid for a given transaction, the transaction is aborted and rolled back. In the Optimistic approach each transaction is made up of three phases. The optimistic technique uses a timestamp method to assign a unique identifier to each transaction. The three phases are:1. Read Phase:- This phase starts with the activation of a transaction and is considered to last until the commit. All data items are read into local variables and any modifications that are made are only to these local copies. 2. Validation Phase:- For data items that were read by the DBMS will verify that the values read are still the current values of the corresponding data items. For data items that are modified, the DBMS verifies that the modifications will not cause the data base to become inconsistent. Any change in the value of data item read or any possibility of inconsistencies due to modification causes the transaction to be rolled back.

Concurrency Control Optimistic Scheduling (Contd.,):-

3. Write Phase:- If a transaction has passed the validation phase the modifications made by the transaction are committed.

Concurrency Control Locking Scheme:-

A database can be considered as being made up of a set of data item. A lock is a variable associated with each such data item. Manipulating the value of a lock is called locking. The value of lock variable is used in locking scheme to control the concurrent access and manipulation of the associated data item. Locking items being used by a transaction can prevent other concurrently running transactions from using these locked items. The locking is done by a sub-system of the DBMS usually called the lock manager. Two types of locks are defined Exclusive Lock and Shared Lock.

Concurrency Control Locking Scheme (Contd.,):-

Exclusive Lock:- The exclusive lock is also called an update or a write lock. The intention of this mode of locking is to provide exclusive use of data item to one transaction. If a transaction locks a data item in an exclusive mode, no other transaction can access, not even read, until the lock is released by the transaction which locked it. Shared Lock:- The shared lock is also called read lock. The intention of this mode of locking is to ensure that the data item does not undergo any modifications while it is locked in this mode. Any number transactions can concurrently lock and access a data item in the shared mode, but none of these transaction can modify the data item. A data item located in shared mode can not be locked in the exclusive mode until the shared lock is released by all transactions holding the lock. A data item locked in the exclusive mode cannot be locked in the shared mode until the exclusive lock on the data item is released.

Concurrency Control Locking Scheme (Contd.,):-

Two Phase Locking:-

All locks are first acquired before any of the locks are released. Once a lock is released, no additional locks are requested. In other words, the release of the locks is delayed until all locks on all data items required by the transaction have to be acquired. This method of locking is called two phase locking. It has two phases, a growing phase wherein the number of locks increase from zero to the maximum for the transaction, and a contracting phase wherein number of locks held decreases from maximum to zero. Both of these phases are monotonic, the number of locks are only increasing in the first phase and decreasing in the second phase. Once a transaction starts releasing locks it not allowed to request any further locks. In this way a transaction is obliged to request all locks it may need during its life before it release any, this leads to possible lower degree of concurrency. The two phase locking protocol ensures that the schedules involving transactions using this protocol will always be serializable.

Concurrency Control Deadlock:-

Deadlock is a situation that arises when data items are locked in different order by different transactions. A deadlock situation exists when there is a circular chain of transactions each transaction in the chain waiting for a data item already locked by the next transaction in the chain. Deadlock situations can be either avoided or detected. One method of avoiding deadlock is to assign a rank to each data item and request lock for data items in a given order. Another technique depends on electively aborting some transaction and allowing others to wait. The selection is based on the time stamp of the contending transactions and the decision as to which transactions to abort and which to allow to wait is determined according to the protocol being used. The wait-die and wound-wait are two such protocols. Deadlock Detection:- Deadlock detection depends on detecting the existence of a circular chain of transactions and then aborting or rolling back one transaction at a time until no further deadlocks are present.

Concurrency Control Deadlock Detection:

To detect deadlock, system must have the following information: 1. The current set of transactions 2. The current allocations of data-items to each of the transactions 3. The current set of data-items for which each of the transactions is waiting.

A deadlock is said to occur when there is a chain of transactions, each waiting for the release of a data-item held by the next transaction in the chain. Wait-for graph is used to detect deadlocks. Wait for graph is a directed graph and contains nodes and directed arcs; the nodes of the graph are active transactions. An arch of the graph is inserted between two nodes if there is a data-item required by the node at the tail of arc, which is being held by the node at the head of the arc. i.e., if there is a transaction, Ti, waiting for such a data-item that is currently allocated and held by the transaction Tj then there is a directed arc from the node for transaction Ti to the node for transaction Tj.

Concurrency Control Deadlock Detection (Contd.,): Consider the situation: 1. Transaction T1 is waiting for data-items locked by transactions T2 and T5 2. Transaction T2 is waiting for data-items locked by transactions T3 and T4 3. Transaction T3 is waiting for data-items locked by transactions T5 and T6 4. Transaction T4 is waiting for data-items locked by transactions T5 5. Transaction T6 is waiting for data-items locked by transactions T7 6. Transaction T7 is waiting for data-items locked by transactions T5 The situation given above is deadlock free. Assume after some time if T5 makes a request for a data-item held by transaction T2. This request assuming no previous depicted in the wait-for graph of deadlock free situation have been satisfied, adds the arc from the node for transaction T5 to the node for transaction T2. The addition of this arc causes the wait-for graph to a have a number of cycles. One of the cycles is indicated by the arc from transaction T2 to T4 , then, from T4 to T5 and finally from T5 back to T2. Consequently represents a situation where a number of sets of transactions are deadlocked.

Concurrency Control Wait-for Graph T6

T1

T2

T3

T4

T7

T5

Dead lock free situation.

Concurrency Control Wait-for Graph T6

T1

T2

T3

T4

T7

T5

Dead lock situation.

Concurrency Control Deadlock Recovery: To recover from deadlock, the cycles in the wait-for graph must be broken. The common method of doing this is to roll back one or more transactions in the cycles until the system exhibits no further deadlock situation. The selection of the transactions to be rolled back is based on the following considerations: 1. The progress of the transaction and the number of data-items it has used and modified. It is preferable to rollback a transaction that has just started or has not modified any of the data-item, rather than one that runs for a considerable time and/or has modified many data-items. 2. The amount of computing remaining for the transaction and the number of dataitems that have yet to be accessed by the transaction. It is preferable not to rollback a transaction if it has almost run to completion and/or it needs very few additional data-items before its termination. 3. The relative cost of rolling back a transaction. Not withstanding the above considerations, it is preferable to rollback a less important or non critical transaction.

Concurrency Control Deadlock Avoidance: In the deadlock avoidance scheme, care is taken to ensure that a circular chain of processes holding some resources and waiting for additional ones held by other transactions in the chain never occurs. The two-phase locking protocol ensures serializability, but does not ensure a deadlock free situation. Consider a situation: T1: Sum=0 Locks (A) Read (A) Sum=Sum + A Locks (B) Read (B) Sum=Sum + B Show (B) Unlock (A) Unlock (B) T2: Sum=0 Lockx (B) Read (B) B=B+100 Show (B) Sum=Sum + B Lockx (A) Unlock (B) Read (A) A=A-100 Show (A) Unlock (A) Sum=Sum + A Show (Sum)

Concurrency Control Deadlock Avoidance (Contd.,): Schedule leading to Deadlock with Two Phase transaction Sum=0 Locks (A) Read (A) Sum=Sum + A Sum=0 Lockx (B) Read (B) B= B + 100 Show (B) Sum=Sum + B Locks (B) *Transaction T1 will Wait Lockx (A) *Transaction T2 will wait

Concurrency Control Deadlock Avoidance (Contd.,): One of the simplest methods of avoiding a deadlock situation is to lock all data items at the beginning of a transaction. This has to be done in an atomic manner, otherwise there could be a deadlock situation again. The main disadvantage of this scheme is that the degree of concurrency is lowered considerably. Another approach used in avoiding deadlock is assigning an order to the data-items and requiring the transactions to request locks in a given order, such as only ascending order. Thus, data-items may be ordered as having rank 1, 2, 3, and so on. When a transaction T requiring a lock for data-items A and B must first request a lock for the data-item with the lowest rank, namely A. When it succeeds in getting the lock for A, only then can it request a lock for data-item B.

Security, Integrity and Control Accidental Security and Integrity Threats: A user can get access to a portion of the database not normally accessible to that user due to a system error or an error on the part of another user. Failures of various forms during normal operation, for example, transaction processing or storage media loss. Proper recovery procedures are normally used to recover from failures occurring during transaction processing. Concurrent usage anomalies. Proper synchronization mechanisms are used to avoid data inconsistencies due to concurrent usage. System error. A dial in user may be assigned the identity of another dial in user who was disconnected accidentally or who hung up without going through a log-off procedure. Improper authorization. The authorizer can accidentally give improper authorization to a user, which could lead to database security and/or integrity violations. Hardware failures. For example, memory protection hardware that fails could lead to software errors and culminate in database security and/or integrity violations.

Security, Integrity and Control Intentional Security and Integrity Threats: A computer system operator or system programmer can intentionally bypass the normal security and integrity mechanisms, alter or destroy the data in the database, or make unauthorized copies of sensitive data. An unauthorized user can get access t a secure terminal or to the password of an authorized user and compromise the database. Such users could also destroy the database files. System and application programmers could bypass normal security in their programs by directly accessing database files and making changes and copies for illegal use.

Security, Integrity and Control Defense Mechanisms: Four levels of defense are generally recognized for database security. 1. Human Factors: At the outermost level are the human factors, which encompass the ethical, legal and societal environments. An organization depends n these to provide a certain degree of protection. Thus it is unethical for a person to obtain something by stealth and it is illegal to forcibly enter the premises of an organization. Privacy laws also make it illegal to use information for purposes other than that for which it was collected. 2. Physical Security: Physical Security mechanisms include appropriate locks and keys and entry logs to computing facility and terminals. Security of the physical storage devices within the organization and when being transmitted from one location to another must be maintained. User identification and passwords have to kept confidential, otherwise unauthorized users can borrow the identification and password of a more privileged user and compromise the database. 3. Administrative Controls: Administrative controls are the security and access control policies that determine what information will be accessible to what class of users and the type of access that will be allowed to this class.

Security, Integrity and Control Defense Mechanisms (Contd.,): 4. DBMS and OS Security Mechanisms: The database depends on some of the protection features of the OS for security. The proper mechanisms for the identification and verification of users. Each user is assigned an account number and a password the OS ensures that access to the system is denied unless the number and password are valid

Security, Integrity and Control Authorization: Authorization is the culmination of the administrative policies of the organization, expressed as a set of rules that can be used to determine which user has what type of access of which portion of the database. The person who is in charge of specifying the authorization is usually called the authorizer. The authorizer can be distinct from the DBA and usually is the person who owns the data. Cryptography and Encryption: Consider the secure transmission of the message: Mr. Watson, can you please come here. One method of transmitting this message is to substitute a different character of the alphabet for each character in the message. If we ignore the space between words and the punctuation, and if the substitution is made by shifting each character by a different random amount, then the above message can be transferred into the following string of characters: xhlkunsikevoabondwinhwoajahf Cryptography has been practiced since the days of the Roman Empire. With the increasing use of public communication facilities to transmit data, there is an increased need to make such transmissions secure.

Security, Integrity and Control Cryptography and Encryption (Contd.,): Transmitting information between geographically dispersed sites, requires the data to be encrypted before it is transmitted. At the receiving end, the received data is decrypted before it is used. A encrypting scheme developed by the U.S National Bureau of Standards is called the Data Encryption Standards (NBS-DES). The NBS-DES scheme is based on the substitution of characters and rearrangement of their order and assumes the existence of secure encryption keys.

Security, Integrity and Control Failure: The failure of a system occurs when the system does not function according to its specification and fails to deliver the service for which it was intended. The component in question is said to be in an erroneous state and further use of the component will lead to a failure that cannot be attributed to any other factor. Types of failure: Hardware Failures Failure that occurs in the hardware is called hardware failure. It can be caused by the following: Design error, Poor quality control, Over utilization, Wear out Software Failures The error that can lead to a software failure are very much similar to those that lead to hardware failure except wear out. Storage medium error Storage medium can be classified as volatile and nonvolatile.

Security, Integrity and Control Recovery: Recovery in DBMS has the basic technique to implement the database transaction paradigm in the presence of failure of various kinds is by using data redundancy in the form of logs, checkpoints and archival copied of the database. Log based Recovery: The log is written to stable storage, contains the redundant data required to recover from volatile storage failure and also from errors discovered by the transaction or the database system. For each transaction following data is recorded in log1. A start of transaction marker. 2. The transaction identifier, which could include the who and where information 3. The record identifiers, which include the identifiers for the record occurrence 4. The operations performed on the record 5. The previous value of the modified data. This information is required for undoing the changes made by a partially complete transaction, it is called the undo log. Where the modification made by the transaction is the insertion of a new record, the previous values can be assumed to be null.

Security, Integrity and Control Recovery: Log based Recovery Contd.,): 6. The update values of the modified records. This information is required for making sure that the changes made by a committed transaction are in fact reflected in the database and can be used to redo these modifications. This information is called the redo part of the log. In case the modification made by the transaction is the deletion of record. The update value can be assumed to be Null. 7. A commit transaction marker if the transaction is committed, otherwise an abort or rollback transaction marker.

Security, Integrity and Control Shadow Paging: Shadow page scheme is one possible form of the indirect page allocation. In shadow page scheme, the database is considered to be made up of logical units of storage called pages. The pages are mapped into physical blocks of storage by means of a page table. The shadow page scheme uses two page tables for a transaction that is going to modify the database. The original page table is called shadow page table and the transaction addresses the database using another page table known as the current page table. Initially, both tables point to the same blocks of physical storage. The current page table entries may change during the life of the transaction. The changes are made whenever the transaction modifies the database by means of a write operation. The pages that are affected by a transaction are copied to new blocks of physical storage and these blocks, along with the blocks not modified, are accessible to the transaction via the current page table.

Security, Integrity and Control Buffer Management: The input and output operations required by a program including a DBMS application program are usually performed by a component of the operating system. These operations normally use buffers to match the speed of the processor and the relatively fast primary memories with the slower secondary memories and to minimize, whenever possible, the number of input and output operations between the secondary and primary memories. The assignment and management of memory blocks is called buffer management and the component of the operating system that performs this task is called buffer manager.

File Organization Hashing: Mainly hashing functions takes places in direct file organization. In direct file organization the key value is mapped directly to the storage location. The usual method of direct mapping is by performing some arithmetic manipulation of the key value. This process is called hashing. It is obvious that a hash function that maps many different key values to a single address or one that does not map the key values uniformly is a bad hash function. A collision is said to occur when two distinct key values are mapped to the same storage location. We can immediately see that with hashing schemes there are no indexes to traverse. Well designed hashing functions where collision are few, this is a great advantage. Advantages of Hashing: 1. Exact key matches are extremely quick 2. Hashing is very good long keys, or those with multiple columns, provided the compute by value is provided for the query. 3. This organization usually allows for the allocations of disk space so a good deal of disk management is possible. 4. No disk space is used by this indexing method.

File Organization Hashing (Contd.,): Disadvantages: 1. It becomes difficult to predict overflow because the workings of the hashing algorithm will not be visible to the DBA 2. No sorting of data occurs either physically or logically so sequential access is poor 3. This organization usually takes a lot of disk space to ensure that no overflow occurs