CS 606 Advanced Database Technology

NOORUL ISLAM COLLEGE OF ENGINEERING,KUMARACOIL DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.E.
SIXTH SEMESTER
2 MARKS & 16 MARKS
CS606- ADVANCED DATABASE TECHNOLOGY
Prepared By, J.E.Judith Lecturer/CSE NICE
TWO MARKS
UNIT I
1.Define ER model? The entity-relationship model (or ER model) is a top down approach to database design that begins by identifying the important data called entities and relationship between the data. The ER model was first proposed by Peter Pin-Shan Chen. 2.Define Entity type? A group of object with same property which are identify by the enterprise as having an independence existence. In an ER model, we diagram an entity type as a rectangle containing the type name, such as student ER diagram notation for entity student Student 3.Define Entity occurrence? A uniquely identifiable object of an entity type is known as entity occurrence. Entity occurrence is similar to entity. 4.Define relationship type? A relationship type is a set of meaningful associations among entity types. For example, the student entity type is related to the team entity type because each student is a member of a team. ER diagram notation for relationship type, Member Of
Student
MemberOf
Team
5.Define relationship occurrence? A uniquely identifiable association which includes one occurrence from each participating entity type.
6.Define degree of relationship?
The degree of a relationship type is the number of entity types that participate. If two entity types participate, the relationship type is binary. A role name indicates the purpose of an entity in a relationship. 7.Define recursive relationship with diagrammatic representation? A recursive relationship is one in which the same entity participates more than once in the relationship. The supervision relationship is a recursive relationship because the same entity, a particular team, participates more than once in the relationship, as a supervisor and as a supervisee. 8.What are the types of attribute? The types of attributes are 1. Simple and composite attribute 2. Single-valued and multi-valued attribute Simple and composite attribute Attributes that cant be divided into subparts are called Simple or Atomic attributes. The attribute composed of single component with independent system. Ex: position and salary attribute of staff entity.
The attribute composed of multiple components each with an independent existence. Composite attributes can be divided into smaller subparts. For example, take Name attributes. We can divide it into sub-parts like First name, Middle name, and Last-named. Single-valued and multi-valued attribute Attributes that can have single value at a particular instance of time are called single valued. A person cant have more than one age value. Therefore, age of a person is a single-values attribute. A multi-valued attribute can have more than one value at one time. For example, degree of a person is a multi-valued attribute since a person can have more than one degree.
9.Define candidate key?
Minimal set of attributes that uniquely identifies each occurrence of an entity type is known as primary key. For example: Branch number attribute is the candidate key for branch entity type. 10.Define primary key? The candidate key that is selected to uniquely identify each occurrence of an entity type is called primary key. Primary keys may consist of a single attribute or multiple attributes in combination. 11.Differentiate strong and weak entity type? An entity type that is not existence dependent on some other entity type called strong entity type. For example, the entity type student is strong because its existence does not depend on some other entity type. An entity type that is existence dependent on some other entity type is called weak entity type. For example, a child entity is a weak entity because it relies on the parent entity in order for it to exist. 12.Define query processing? Query processing transforms the query written in high level languages into a correct and efficient execution strategy expressed in a low level language ant to execute the strategy to retrieve the required data. 13.Define query optimization? Query optimization means converting a query into an equivalent form which is more efficient to execute. It is necessary for high level relation queries and it provides an opportunity to DBMS to systematically evaluate alternative query execution strategies and to choose an optimal strategy. 14.What are the phases of query processing? The phases are 1) Query Decomposition. 2) Query Optimization. 3) Code Generation. 4) Runtime Query Execution.
15.Define query decomposition and what are its stages?
The query decomposition is the firs phase of query processing whose aims are to transform a high-level query into a relational algebra query and to check whether that query is syntactically and semantically correct. Different stages are 1) Analysis 2) Normalization 3) Semantic analysis 4) Simplification 5) Query restructuring. 16.Define conjunctive and disjunctive normal form? Conjunctive normal form Conjunctive normal form means sequence of conjuncts connected with an AND operator. These conjuncts contain one or more terms connected by OR operator. Disjunctive normal form Disjunctive normal form means sequence of disjuncts connected with an OR operator. These disjuncts contain one or more terms connected by AND operator. 17.Differentiate Dynamic vs Static form optimization? Dynamic optimization Query has to be passed, validated and optimized before it can be executed. All information required to select an optimum strategy is up to date. Static optimization Query is passed, validated and optimized only once. Runtime overhead is reduced. 18.What are the problems caused by concurrency control. The process of managing simultaneous operations on the database without having them to interfere with one another is called as concurrency control. The problems caused by concurrency control are i. ii. Lost update problem Uncommitted dependency problem
iii. 19.Define 3NF and BCNF
Inconsistent analysis problem
Third Normal Form (3NF): A relation that is in 1NF and 2NF, and in which no non-primary key attribute is transitively dependent on the primary key. Boyce-Codd Normal Form (BCNF): A relation is in BCNF, if and only if, every determinant is a candidate key. 20. Define Timestamp? Timestamp is a unique identifier created by the DBMS that indicates the relative starting time of a transaction. Time stamping is a concurrency control protocol that orders transaction in such a way that older transaction with smaller imestamp will get priority in the event of conflict. 21.What are the properties of transaction? The four basic properties of transactions are called as ACID properties. A - atomicity C - consistency I - isolation D - durability ATOMICITY: The all or nothing property. A transaction is an indivisible unit that is either performed in its entirely or not performed at all. CONSISTENCY: A trasaction must transform the database from one consistent state to another consisient state. ISOLATION: Transactions execute independently on one another. In other words, the partial effects of incomplete transaction should not be visible to other transactions. DURABILITY: The effects of successfuly completed transaction are permenently recorded in the db and must not be lost because of subsequent failure. 22.Define concurrency control?
The process of managing simultaneous operations on the db without having them interface with each other. 23.What are the problems caused by concrrency control? The problems caused by concurrency control are 1. Lost update problem, 2. Uncommited dependency problem, 3. Inconsistent analysis problem. LOST UPDATE: An apparentlty successfully completed update operation by one user can be overriden by another user. This is known as the lost update problem. UNCOMMITED DEPENDNCY: An uncommited dependency problem occurs when one transaction is allowed to see the intermediate results of another transaction before it has committed. INCONSISTENT ANALYSIS: A problem of inconsistent analysis occurs when a transaction reads several values from the db but a second transaction updates some of them during the execution of the first. 24.Define serial schedule? A schedule where the operations of each transaction are executed consequently without any interleaved operations from other transactions. 25.Define serializable? If a set of trasaction execute concurrently , we say that the schedule(nonserial) is correct if it produces the same results as some serial execution. Such a schedule is called serializable. 26.Define the conservative and optimistic concurrency control methods? CONSERVATIVE METHOD: This approach causes the transaction to be delayed in case they conflict with other transaction at some time in future. Locking and timestamping are essentially conservative approaches. OPTIMISTIC METHOD:
This approach is based on the premise that conflict is rare so they allow transaction to proceed unsynchronized and only check for conflicts at the end, when a transaction commits. 27.Define shared and exclusive lock? SHARED LOCK:A transaction has a shared lock on a data item it can only read the item but cant update. EXCUSIVE LOCK:A transaction has exclusive lock on a data item it cannot both read and update the data.
28.Define 2PC? A transaction follows two phase locking protocol if all locking operation precede the first unlock operation in the transaction. 29.Define ignore obsolete write rule? The transaction T asks to write an item(x) whose value already been written by an younger transaction, that is ts(T)<write_timestamp(x). This means that a later trasaction has alreadyupdated the value of the item, and the value that the older transaction is writing must be based on an obsolete value of an item. In this case, the write operation can be safely ignored. This is sometimes knows as the ignore obsolete write rule, and allows greater consistency. 30.List out different db recovery facilities? A DBMS should provide the following facilities to assist with recovery. 1. A backup mechanism, which makes periodic backup copies of the db. 2. Logging facilities, which keep track of the current state of transactions and db changes. 3. A checkpoint facility, which enables updates to the db that are in progress be made permenent. 4. a recovery manager, which allows the system to restore the db to an consistent state following a failure. 31.What is the need for db tuning? The need for tuning a db are,
1. Existing tables may be joinned. 2. For a given set of tables, ther may be an alternative design choice. 32.Define normalization? Normalization is a bottom up approach to a db design that begins by examining the relationship between attributes. It is a validation technique. It supports a db designer by presenting a series of tests, which can be applied to individual relations so that the relational schema can be normalized to a specific form to prevent possible occurence of update anomalies. 33.What is flattening the table? We remove the repeated groups by entering the appropiate data in the empty columns of rows containing the repeated data. In other words we fill in the blanks by duplicating the nonrepeating data where required. This approach is called as flattening the table.
UNIT II
1.Define parallel DBMS. A DBMS running across multiple processors and disk that is designed to execute operations in parallel whenever possible inorder to improve performance. 2.What are the different parallel db architectures? Shared memory Shared disc Shared nothing hierarchial
3.Differentiate interquery and intraquery parallelism. Interquery parallelism: Different queries or transactions execute in parallel with one another.It increases scaleup and throughput. Intraquery parallelism: It refers to the execution of a single query in parallel on multiple processors and disk.It is important for speeding up long running queries.
4.Differentiate intraoperation parallelism and interoperation parallelism. Intraoperation parallelism: Speed up processing of a query by parallelising the execution of each individual operation. Interoperation parallelism: Speed up processing of a query by executing in parallel the different operations in a query expression. 2 types, pipelined parallelism independent parallelism
5.Define distributed DBMS. The software system that permits the management of distributed database and makes the distribution transparent to the user. 6.What is the fundamental principle of distributed DBMS? The fundamental principle of DDBMS is to make the distributed system transparent to the user that is to make the distributed system appear like a centralised system. 7.List any four advantages and disadvantages of DDBMS.
Advantages: capacity and incremental growth reliability and availability efficiency and flexibility sharing managing and controlling is complex less security because data is at different sites. 8.Define homogenous and heterogenous DDBMS. Homogenous DDBMS: In all sites the same DBMS product will be used.It is easier to design and manage.
Disadvantages:
advantage:Easy performance.
communication,possible
to
add
more
sites,provides
increased
Heterogenous DDBMS: Sites may run different DBMS product which need not be based on same data model.Translations are required for communication between different DBMS.Data may be required from another site that may have different hardware,different DBMS product,different hardware and different DBMS product. 9.What are the major components of DDBMS? There are four major components in DDBMS, (1)Local DBMS component(LDBMS) (2)Data Communication component(DC) (3)Global System Catalog(GSC) (4)Distributed DBMS component 10.What are the correctness rules for fragmentation? Any fragment should follow the correctness rules.There are 3 correctness rules.They are, (1)Completeness (2)Reconstruction (3)Disjointness 11. Define multiple copy consistency problem? Multiple copy consistency problem is the problem occurs when there is more than one copy of a data item in different locations. To maintain consistency of the global
database, when a replicated data item is updated at one site all other copies of the data item must also be updated. If a copy is not updated, the database becomes inconsistent. 12. Define distributed serializability? If the schedule of transaction execution at each site is serializable, then the global schedule is also serializable provided local serialization orders are identical. This is called distributed serializability. 13. What are the different types of locking protocols in DDBMS? The different types of locking protocols employed to ensure serializability in DDBMS are centralized 2PL, primary copy 2PL, distributed 2PL and majority locking.
14. What are the types of deadlock detection in DDBMS? There are three common methods for deadlock detection in DDBMSs: centralized, hierarchical and distributed deadlock detection.
15. What is the general approach for timestamping in DDBMS? The general approach for timestamping in DDBMS is to use the concatenation of the local timestamp with a unique identifier, <local timestamp, site identifier>. The site identifier is placed in the least significant position to ensure that events can be ordered according to their occurrence as opposed to their location. 16. What are the phases of 2PC protocol? The two phases of 2PC protocol are: a voting phase and a decision phase. 17. Define cooperative termination protocol? Cooperative termination protocol is defined as the termination protocol which blocks the participant without any information. However the participant could other participants attempting to find one that knows the decision.
contact each of the
18. What is the use of election protocols? If the participants detect the failure of the coordinator they can elect a new site to act as coordinator by using election protocols. This protocol is relatively efficient. 19. Define 3PC? The three phase commit is an alternative non blocking protocol. It is non blocking for all site failures, except is to in the event of the failure of all sites. The basic idea of 3PC
remove the uncertainty period for participants that have voted COMMIT from the
coordinator. 3PC introduces a third phase, called pre-commit, between voting and the global decision. 20. Define Distributed Query Processing? The process of converting high level language query into low level language with effective execution strategy depends in order to achieve good performance is called query
processing. In distributed query processing the query was distributed and processed in different locations.
21. Write the differences between locking and non-locking protocols?
Locking protocol 1. In this Locking guarantees that the concurrent execution is equivalent to some serial execution of those transactions.
Non-locking protocol 1. In this timestamping guarantees that the concurrent execution is equivalent to a specific serial execution of those
transactions, corresponding to the order of the timestamps. 2. This involves checking for deadlock at each local level and at the global level. 3. This protocol does not involve 2. This does not involve checking for deadlock at any level. 3. This protocol involves the generation of unique timestamps both globally and locally
generation of timestamps.
UNIT III
1.Define OODM? OODM- Object Oriented Data Model A (Logical)data Model that captures the semantics of objects supported in object-oriented programming. 2. Define OODB? OODB-Object Oriented Database A persistent and sharable collection of objects defined by an OODM 3. Define OODMS? OODBMS- Object Oriented Database Management System OODBMS-The Manager of OODB
OO refers to abstract DB plus Inheritence & object identify. It is the Combination OO capability and DB capability. 4 . What are the types of OID? They are 2 types of OID Logical OID Physical OID 5. Define pointer swizzling or object faulting? To achieve the required performance, the OOBMS must be able to convert OID to and from in memory pointer. This conversion technique is known pointer swizzing or object faulting. 6. What is the aim of pointer swizzling ? The aim of pointer swizzling is to optimize access to objects. As we have just mentioned, reference between objects are normally represented using OIDs. 7. List the classification of pointer swizzling ? Classification or technique for pointer swizzling: Copy vs in place swizzling Eager vs lazy swizzling Direct vs indirect swizzling 8. Define persistent object ? The object that exist even after the session is over is called Persistent object. There is 2 types of objects Persistent Transient 9. Define transient object ? The Transient object is defined as Lact only for the invocation of program.
The Objects memory is allocated and Deallocated by the programming languages at the run-time system. 10. List the scheme for implementing persistence within OODBMS? Persistent scheme There are 3schemes for implementing persistence in OODBMS Check pointing Serialization Explicit paging 11. List the two methods for creating or updating persistent objects using explicit paging? Reachability based method Allocation based method 12. What are the fundamental principles of orthogonal persistence ? It is based on 3 fundamental principles Persistence independence Data type orthoganality Transitive persistence 13. Define nested transaction model ? A transaction is views as a collection of related subtransaction each of which may also containany number of subtransaction. 14. Define sagas ? A sequence of flat transaction that can be interleaved with other transaction. Sagas is based on the use of Compensative transaction. DBMS guarantees that either all the transaction in a Sagas are Sucessfully completed or compensative Transaction are run to recover from partial exection.
15. How the Concurrency Control is implemented in OODBMS? Concurrency control protocol is used in Multiversion control protocol. Hence,by using this the concurrency is implemented. 16.List the basic architecture for client server DBMS? 3 basic architecture for client server DBMS is Object Server Page Server Database Server 17. Define POSTGRES? POSTGRES is the reaserch system designers of INGRES that attempts to extend the relational mode with abstract datatype procedure and rules. 18.What is a GEMSTONE? Gemstone is a product which extend an existing object-oriented
programming language with database capability. It extend 3 Languague such as Smalltalk, C++ or Java.
19.What is OQL? OQL Object Query Languague An OQL is a function that delivers an object whose type may be infrrred from the operator contributing to the query expression. OQL is Used for both associative and navigational access. 20. Advantage and Disadvantage of OODBMS? Adv: Enriched modeling capabilities Extensibility Removal of impedance mismatch
Improved performance
Disadv: Lack of Universal Data model Lack of Experience Lack of standards Complexity
UNIT IV
1.Define Data Mining. The process of extracting valid, previously unknown comprehensible and actionable information from large databases and using it to make crucial business decisions. 2.List the different steps in data mining. Data cleaning Data integration Data selection Data transformation Data mining Pattern evaluation Knowledge presentation
3.Define Classification. It is used to establish a specific, predetermined class for each record in a database from a finite set of possible class values. 4. Define Clustering. Clustering can be considered the most important unsupervised learning problem A cluster is therefore a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. 5.Define data warehousing.
A subject oriented, integrated, time variant and non volatile collection of data in support of the managements decision making process.
6.Define web database. A database that is used for web applications that use an architecture called three tier architecture. It has web browser,web server, database server. 7.Define mobile database. A database that is portable and physically separate from a centralized database server but is capable of communicating with that server from remote sites allowing the sharing of corporate data. 8.Define upflow. Upflow means adding value to the data in the datawarehouse through summarizing, packaging and distribution of data. 9.Define downflow. Downflow means archiving and backing up the data in the warehouse. 10.What are the different groups of end user access tools? Reporting and query tools. Application development tools. Executive information system tools. Online analytical processing tools. Data mining tools. 11.What are the four main operations associated with data mining techniques. 1. Predictive modeling. 2. Database segmentation. 3. Link analysis. 4. Deviation detection. 12.Define outliers. Outliers which express deviation from some previously known expectations and norms. 13.List the benefits of data warehousing. 1. Potential high returns on investment.
2. Competitive advantage. 3. Increased productivity of corporate decision makers. 14.Define XML. The basic object is XML in the XML document.Two main structuring concepts are used to construct an XML document:elements and attributes.Attributes in XML provide additional information that describes elements. 15.What are the uses of DTD? DTD give an overview of XML schema. It specifies the elements and their nested structures. 16.Define data mart. Data marts generally are targeted to a subset of the organization, such as a department and are more tightly focused. 17.Define client/server model. Client server model is a two-tier architecture. It consists of 2 tiers namely client and server. Here the client performs presentation service and the server performs data service. The client is called fat-client because client require more resources. 18.List the use of data mining tools. 19.Define OLAP. OLAP is a term used to describe the analysis of complex data from the datawarehouse.OLAP tools use distributed computing capabilities for analysis that require more storage and processing power. 20.List the problems of data warehousing. Project management is an important and challenging consideration that should not be underestimated. Administration of a data warehouse is an intensive enterprise, Proportional to the size and complexity of the data warehouse. Data preparation. Selection of data mining operation. To provide scalability and improve performance. Facilities for visualization of result.
21.List some examples of data mining application. Marketing. Finance. Manufacturing. Health care.
Unit-V
1.Define deductive database. A deductive database includes capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database. Because part of the theoretical foundation for some deductive database systems is mathematical logic, such rules are often referred to as logic databases. 2.Define spatial database. Spatial databases provide concepts for databases that keep track of objects in a multi dimensional space. 3.Define multimedia database. Multimedia provide features that allow users to store and query different types of multimedia information, which includes images (such as photos or drawing), videoclips (such as movies, newsreels, or home videos), audio clips (such as songs, phone messages, or speeches), and documents (such as books or articles). 4.List the different spatial query language. The different spatial query languages are 1. Range query 2. nearest neighbor query 3. Spatial joins or overlays. 5. Define inference engine. An inference engine (or deductive mechanism) within the system can deduce new facts from the database by interpreting these rules. The model used for deductive databases is closely related to the relational data model, and particularly to the domain relational calculus formalism. It is related to the field of logic programming and the prolog language.
6.Example for spatial database. Example for spatial database is cartographic databases that store maps include two dimensional spatial descriptions of their objects - from countries and states to rivers, cities, roads, seas and so on. These applications are also knowns as Geographical Information Systems(GIS), and are used in areas such as environmental, emergency, and battle management. Other databases, such as meterological databases for weather
information, are three dimensional, - since temperatures and other Meterological information are related to three dimensional spatial points. 7. Define active database. Active databases which provide additional functionality for specifying active rules. These rules can be automatically triggered by events that occur, such as database updates or certain times being reached, and can initiate certain actions that have been specified in the rule declaration to occur if certain conditions are met. 8. Example for multimedia database. For example, one may want to locate all video clips in a video database that include a certain person, say Bill Clinton. One may also want to retrieve video clips based on certain activities included in them, such as a video clips where a soccer goal is scored by a certain player or team. 9. Define Quad trees. Quad trees generally divide each space or subspace into equally sized areas, and proceed with the subdivisions of each subspace to identify the positions of various objects. 10. What are the two main methods of defining the truth values of predicates in actual datalog programs? There are two main methods of defining the truth values of predicates in actual datalog programs that is, 1.Fact-defined predicates (or relations) 2. Rule-defined predicates (or views). 11. What is Fact-defined predicates?
Fact-defined predicates (or relations) are defined by listing all the combinations of values (the tuples) that make the predicate true. These corresponds to base relations whose contents are stored in a database system. 12. What is Rule-defined predicates? Rule-defined predicates (or views) are defined by being the head of one or more Datalog rules; they correspond to virtual relations whose contents can be inferred by the inference engine. 13. What is the use of relational operations? It is straightforward to specify many operations of the relational algebra in the form of Datalog rules that define the result of applying these operations on the database relations (fact predicates). This means that relational queries and views can easily be specified in Datalog. 14. What are the characteristics of Nature of Multimedia Applications? Applications may be categorized based on their data management characteristics as follows: 1. Repository applications 2. Presentation applications 3. Collaborative work using multimedia information. 15. What are the terms included in multimedia information systems? Multimedia Information Systems are complex, and embrace a large set of issues, including the following: 1. Modeling 2. Design 3. Storage 4. Queries and retrieval 5. Performance 16. What are the different characteristics of Hypermedia links or hyperlinks? 1. Links can be specified with or without associated information, and they may have large descriptions associated with them. 2. Links can start from a specific point within a node or from the whole node.
3. Links can be directional or nondirectional when they can be traversed in either direction. 17. What are the applications of multimedia database? 1. Documents and records management 2. Knowledge dissemination 3. Education and training 4. Marketing, advertising, retailing, entertainment, and travel 5. Real-time and monitoring. 18. What are the three main possibilities for rule consideration? There are the three main possibilities for rule consideration: 1. Immediate consideration 2. Deferred consideration 3. Detached consideration 19. What is Horn Clauses? In Datalog, rules are expressed as a restricted form of clauses called Horn Clauses, in which a clause can contain at most one positive literal. 20. What are the two alternatives for interpreting the theoretical meaning of rules? There are two main alternatives for interpreting the theoretical meaning of rules: 1. Proof-theoretic 2. Model-theoretic.
16 MARKS UNIT I
1. Explain the different phases in query processing. Query processing are the activities involved in retrieving data from database. The different phases involved in query processing are i. Query decomposition ii. Query optimization iii. Code generation iv. Runtime query execution Query decomposition:
It transforms high level language into relational algebra expression and check whether the query is syntactically correct or not. The different stages of query processing are Query optimization: Query optimization is the activity of choosing an efficient execution strategy of processing a query. It is of two types: 1. Dynamic query optimization 2. Static Query optimization Heuristical approach to query processing: It uses transformation rules to convert one relational algebra expression into an equivalent form that is more efficient. Transformation rules for relational algebra operation Heuristical processing strategy 2. Explain the heuristical approach to query optimization. It uses transformation rules to convert one relational algebra expression into an equivalent form that is more efficient. Transformation rules for relational algebra operation - write 12 rules Heuristical processing strategy write 5 strategies Analysis Normalization Semantic analysis Simplification Query restructure
3. Explain the problems caused by concurrency control.
The process of managing simultaneous operations on the database without having them to interfere with one another is called as concurrency control. The problems caused by concurrency control are iv. v. vi. i. Lost update problem Uncommitted dependency problem Inconsistent analysis problem
Lost update problem: In this, successfully completed update operation by one user can be overridden by another user.
ii.
Uncommitted dependency problem(Dirty read problem): It occurs when one transaction is allowed to see intermediate results of another transaction before it has committed.
iii.
Inconsistent analysis problem: It occurs when a transaction reads several values from database but second transaction updates some of them during the execution of first.
4. Explain the different steps of using locks and how concurrency control problems can be prevented using 2PL? Steps of using locks: Any transaction that needs to access a data item must first lock the item. If the item is not already locked by another transaction, the lock will be granted. If the item is currently locked, the DBMS determines whether the request is compatible with the existing lock. A transaction continues to hold a lock until it explicitly releases it either during execution or when it terminates. Two-Phase Locking (2PL):
A transaction follows 2PL protocol if all locking operations precede the first unlock operation in the transaction. The two phases are Growing phase Shrinking phase
Preventing the lost update problem using 2PL Preventing the uncommitted dependency problem using 2PL Preventing the inconsistence analysis problem using 2PL 5. Explain the basic timestamp ordering protocol and Thomas write rule. Timestamp is a unique identifier created by the DBMS that indicates the relative starting time of a transaction. Time stamping is a concurrency control protocol that orders transaction in such a way that older transaction with smaller timestamp will get priority in the event of conflict. The basic timestamp ordering protocol works as follows: 1. The transaction Tissues a read(x) a) ts(T)<write_timestamp(x) Transaction T is aborted restarted with new timestamp. b) ts(T)>=write_timestamp(x) Read operation can proceed and set
read_timestamp(x)=max(ts(T),read_timestamp(x)) 2. The transaction Tissues a write(x) a) ts(T)<read_timestamp(x) Transaction T is roll backed and restarted with new timestamp. b) ts(T)<write_timestamp(x) Transaction T is roll backed and restarted with new timestamp. c) Otherwise, Write operation can proceed and set write_timestamp(x)=ts(T). Thomas Write Rule: a) ts(T)<read_timestamp(x) Transaction T is roll backed and restarted with new timestamp.
b) ts(T)<write_timestamp(x) Ignore the write operation. c) Otherwise, Write operation can proceed and set write_timestamp(x)=ts(T). 5. Explain 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, 6NF with eg. First Normal Form (1NF): A relation in which the intersection of each row and column contains one only one value. Second Normal Form (2NF): A relation that is in 1NF and every non-primary key attribute is fully functionally dependent on the primary key. Third Normal Form (3NF): A relation that is in 1NF and 2NF, and in which no non-primary key attribute is transitively dependent on the primary key. Boyce-Codd Normal Form (BCNF): A relation is in BCNF, if and only if, every determinant is a candidate key. Fourth Normal Form (4NF): A relation is in BCNF and contains no nontrivial multi-valued dependencies. Fifth Normal Form (5NF): A relation that has no join dependency.
UNIT II
1.Explain the types of fragmentation. Fragmentation definition 3 Correctness rule completeness ReconstructionDisjointness 4 types - Horizontal fragmentation explanation with eg Vertical
fragmentation explanation with eg Mixed fragmentation explanation with eg Derived fragmentation explanation with eg. 2.Explain the different types of locking protocols in DDBMS. Locking protocols definition Ensure serializability 4 types of locking protocols Centralized lock manager is centralized 5 messages - adv and disadv
Primary copy 2 PL Many lock managers are available adv and disadv Distributed 2 PL Lock manager is distributed in every site - adv and disadv Majority locking- adv and disadv. 3.Explain the distributed deadlock management. Distributed deadlock management definition transaction waiting for another transaction Wait For Graph(WFG) Local Wait For Graph Combined Wait For Graph Handling deadlocks Centralized definition with advantage and disadvantage Hierarchical definition with advantage and disadvantage - Distributed definition with advantage and disadvantage. 4.Explain the reference architecture for DDBMS and Component architecture for DDBMS Reference Architecture Diagram - Global external schema, fragmentation schema, allocation schema, global conceptual schema, local mapping schema Component Architecture Diagram 4 major components Local DBMS (LDBMS) Data Communication Components(DC) Global System Catalog Distributed DBMS 5.Explain the phases of 2 PC. 2 PC Blocking protocol co ordinator and participant definition Two phases of 2 PC Voting phase explanation - Decision phase explanation- Procedure for coordinator and procedure for participants 7.Give notes on Distributed Transaction Management Distributed transaction management definition Modules Transaction manager, Scheduler, Recovery Manager, Buffer Manager Locking Manager Data
Communication component Procedure to execute a Global transaction initiated at sight S1.- Distributed concurrency control Concurrency control problem Distributed serializability. 8.Explain about 3 PC 3 PC definition Non blocking protocol Coordinator states of coordinator
Initial, Waiting, Decided Participant states of participant Initial, Prepared, Precommit, abort, commit.
UNIT III
1.Explain the schemas for implementing persistance? *dbms must provide for the storage of persistant objects *there are 3 schemas for implementing persistance *they are -check pointing -serialization -explicit paging check pointing: *copy all the part of the program's address space to secondary storage *complete address space is saved and the program can be restarted from the check point. *in other cases only the program's heap can be saved drawbacks: *can be used only the program that created *large amount of data is not used serialization: *implements persistance by copying the closure of the data items to the disk *reading back this flattened data structure produces new copy of the original data. *called seialization or pickling or in a didtributed computing context marshalling. drawbacks: *does not preserve object identity *it is not incremental explicit paging: *involves the application programmer explicitly paging objects between application heap and the persistant store *reachability-based persistance means that an object will persist if its reachable from a persistant root object *programmer does not need to decide at object creation time whether the object
should be persistant *after creation an object can become persistant by adding it to the reachability tree *allocation based persistance means that an object is made persistance only if it is explicitly declared as such within the application program *byclass -Aclass is statically declared to be persistant and all the instances ofthe class are made persistant wen they are created *alternatively a class may be a sub class of a system supplied persistant class. 2.Explain the classification of pointer swizzling techniques? *The action of converting object identifiers to main memory pointers,and back again *the aim of pointer swizzling is to optimize access to objects *if we read an object from secondary storage intothe data base cache,we should be able to locate any referenced objects on data *storage using their oids *want to record those objects held in main memory pointers *pointer swizzling attempts to provide a more efficient strategy by storing the main memory pointers no swizzling: earliest implementation of faulting objects objects are faulted into memory by the underlying objects manager a handle is passed back to the application the system maintain some sort of lookup tables so that the object virtual memory pointers can be located and then used to access the object application tends to access the object only once moss proposed the analytical model of avaluating tha conditions in which swizzling is appropriate classification of pointer swizzling: can be classified according to three dimensions copy vs inplace swizzling eager vs lazy swizzling direct vs indirect swizzling
copy vs inplace swizzling: data can either be copied into the application's local object cache or it can be accessed in place within the object manager's database cache. only modified objecs have to be swizzled back o their oids in inplace technique maqy have to unswizzle an entire page of an object if one object on the page is modified with the copy approach,every object may be exxplicitly copied into the local object cache.
eager vs lazy swizzling: Moss and Eliot define eager swizzling all data pages used by the application before any objects can be accessed kemper and kossman provide a more relaxed definition restricting the swizzling to all persistant oids within the object application wishes to access lazy swizzling involves less over head when an object is faulted into memory, but it does mean that two different types of pointer must be handled for every object access direct vs indirect swizzling: possible only for a swizzled pointer to refer an object that is no longer in virtual memory the virtual memory pointer of the referenced object is placed directly in the swizzled pointer in indirect swizzzling the virtual memory pointer is placed in an intermediate object which act as a place holder for the actual object 3.Explain the locking protocols? Centrlalized 2PL: * with the centralized 2PL protocol there is a single site that maintains all locking
informaton * there is only one scheduler, or lock manager, for the whole of the distributed DBMS
that can grand and release locks. * the transaction coordinator at site S1 divides the transaction into a number of
subtransactions,using tinformation held in the global system catalog * * the coordinator has responsibility for ensuring that consistencyis maintained if the data item is replicated,the coordinator must ensure that all copies of the data
item are updated * * thus the coordinator requests exclusively locks on all copies the local transactions managers involved in the global transaction request and release
locks * * * the advantage of centralized 2PL is that the implementation is straightforward deadlock detecion is nomoredifficult than that of a centralized DBMS the disadvantages with centralization in a distributed DBMS are bottlenecks and
lower reliability * for example,a global update operation that has agents (subtransactions) at n sites may
require a minimum of 2n+3 messages with a centralized lock manager: -1 lock request; -1 lock grant message; -n update messages; -n acknowledgements; -1 unlock request. Primary copy 2PL * * * * * * the lock managers to a number of sites each lock manager is then responsible for managing the locks for a set of data items for each replicated data item one copy is chosen as the primary copy the other copies are called slave copies choose as the primary site is flexible and the site that is choosen to manage the locks for a primary copy need not hold the
primary copy of the item * * * * * the protocol is a straightforward extension of centrlized 2PL in order to send the lock request to the appropriate lock manager reading out of-date values the disadvantages of this approach are that dead lock handling is more complex lock requests for a specific primary copy can be handled only by one site
* *
backup sites to hold locking information lower communication costs and better performance
Distributed 2PL * othewise distributed 2PL implements a Read-One-Write-All(ROWA) replica controll
protocol * deadlock handling is more complex -n lock requests messages -n lock grant messages -n update messages -n acknowledgements -n unlock requests Majority Locking * * lock all copies of a replicated item before an update the disadvantages are that the protocol is more complicated
it recieves a ma jority tha it ha srecieve a lock and inform all the sites that it has recieved the lock there should be atleast (n+1)/2 messages for lock request and (n+1)/2 messages for unlock request 4.Explain the strategies for developing ooadbms *extend ab existing object oriented programming language with database capabilities add traditional database capabilities languages used are small talk,c++ or java this approach is taken by gemstone *provide extendible object oriented dbms libraries class libaries are provided that support persistance,aggregation,datatypes,transactions,concurrency approach is taken by ontos versant and object store *embed object oriented database lanuage constructs in a conventional host language how sql can be embedded in conventional host language o2 is used *extend an existing database language with object oriented capabilities this approach is being pursued by both rdbms and oodbms vendors ontos and versant is used *develop a novel database datamodel/data language starts from the beginning and develop an entirly new database manager) approach is taken by sim(semantic information
5) Give notes on (i)nested transaction model (ii)sagas transaction model (iii)multilevel transaction model (i)nested transaction model: Introduced by moss The completed transaction is depicted as a tree or a hierarchy of subtransaction Top level transaction can have no. of child transaction and child transaction can also have nested transaction Eg:transaction has to commit from bottumup Transaction abort at one level doesnot affect the transaction progress at a higher level Instead a parent is allowed to perform its own recovery Different ways for recovery:abort the transaction ignore the failure in which case subtransaction is non vital retry the subtransaction run an alternative subtransaction called contingency or compensative subtransaction Advantages: (ii)sagas: a sequence of flat transaction that can be interleaved with other transaction sagas is bassed on the use of compensative transaction modularity granularity intra transaction parallelism intra transaction recovery
dbms gurantees that either all the transactions in the sagas are successfully completed or compensating transaction are run to recover from partial execution
if we
have a saga comprising of a sequence of n transaction
T1,T2.....Tn with the compensating transaction c1,c2,....cn final outcome may be T1,T2...Tn if transaction completes successfully or T1,T2...Ti,Ci-1,C2,C1 if fails relaxes isolation property difficult to define compensative transaction in advance
(iii)multilevel transaction model: 2 types (i)closed nested transaction (ii)open nested transaction
6)Explain ODMG model? ODMG-object data management group Superset of OM(object model) which enables both design and implementation to be ported between the complaint system Basic modelling primitives:(i)objects (ii)literals (i)objects: Described by 4 charcteristics (i)object structure (ii)object identifier (iii)object name (iv)object lifetime (ii)literals:-
Decomposed into (i)atomic (ii)collections (iii)structured (iv)null Two types of collection (i) ordered (ii)unordered 5 different builtin collection types (i)set (ii)bag (iii)list (iv)array (v)dictionary 7)Explain the features of postgres? Postgres is a research database system designed to be a successor to the ingres RDBMS Objectives:(i)to provide better support for complex objects (ii)to provide user extensibility for data types,operators & access methods (iii) to provide active db facilities and inferencing support (iv)to simplify the dbms code for cash recovery (v)to make as few changes as possible to the relational model Postgres extends the relational model to include the following mechanism (i)abstract datatypes (ii)data of type procedure (iii)rules GEMSTONE: Single database process monitor process or stone Gem process incorporate a data management kernel and interprete for ODAL
Gemstone configuration may include multiple hosts in the network and client simultaneously database on multiple host User interface on each gem process is provided by a program running as a separate process on the host machine Aspecial purpose Gemstone interface the OPAC programming
environment is provided for application development The Gemstone architecture is a client-server architecture The database allocates object identifiers and performs transaction FEATURES: GemStone is a highly scalable client-multiserver database for commercial applications. GemStone's features include: Server Smalltalk Concurrent Support for Multiple Languages Flexible multi-user transaction control Object-level security Dynamic schema and object evolution Production Services Scalability Legacy Gateways Developer Tools Database Administration Tools MANAGEMENT FUNCTIONS: Gemstone sessions access the database directly,thus eliminating a potential performance bottleneck It provide facilities necessary to allow external application programs written in c or smalltalk to access Gemstone (Gemstone objectNamed:objectname)as local object LocalBolt<-(Gemstone objectNamed:Bolt)as local object LocalBolt remoteperform: #PortName localBolt GemstonepartName
The first message has selector remote perform with the literal part name.
UNIT IV
1.Explain the operation associated with data mining techniques? There are 4 main operations associated with data mining.They are: Predictive Modeling: The model is developed using Supervised learning approach. There are 2 phases: i) Training ii) Testing There are 2 steps associated with predictive modeling: i) Classification ii) Value prediction Classification: This is used to establish a specific,predetermined class for each record in a database from a finite set of possible claa values. 2 specifications of classification i) ii) Tree induction Neural induction Predictive Modeling Database Segmentation Link Analysis Deviation Detection
Tree induction: Creates decision tree that predicts customers who have rented for more than 2 years and are over 25 years old. Neural induction:
Constructing neural network contains a collection of nodes input,processing,output. Value Prediction: Used to estimate a continuous numerical value that is associated with the database records. 2 techniques: i) Linear regression ii) Non linear regression Data Segmentation: Partition a database inro an unknown number of segments or clusters of similar records. Link analysis: There are 3 specialization of link analysis, i) Associaton Discovery
ii) Sequential pattern discovery iii) Time sequence discovery Deviation Detection: This is a source of truediscovery because it identifies outliers. 2.Describe DataWareHouse architecture. The components are: Operational data: Source of he data for the data warehouse is supplied from mainframe operational data. Mainframe operational source: Held in hierarchial and network database. Departmental operational data: The data that is available in relational DBMS. Private data: Data available in private database. External System: Data held in external system.Eg.Internet
Operatonal DataStore(ODS): Data from different sources are stored in ODS which is mainly used for analysis the data. Load Manager: It get the data directly from operational data sources or from ODS. Warehouse Manager: Managing the datawarehouse.Management of data in the datawarehouse.
Query Manager: Management of user queries.Query profiles are used to determine which indexes and aggregation are appropriate. Detailed Data: This area of warehouse stores all the detailed data in the database schema. Purpose of summary information: It is to speed up the performance of query. Archive/Backupdata: Both detailed and summarized data are stored in back updata that is used for recovery. Metadata: The area of warehouse stores all the metadata ie) data about data. End user access tools: Users interact with the database house using end user access tools. Datawarehouse and DataFlows: There are 5 preimary data flows: Inflow Outflow Upflow Downflow Metaflow
3.Explain Mobile DataBase with Diagram.
A database that is portable and physically separate from a centralised database server but is capable of communicating with that server from remote sites allowing the sharing of corporate data. Architecture of Mobile Database Environment: Components: 1) Corporated database server and DBMS: Manages and stores corporate data and provides corporate application. 2) Remoted database and DBMS: Manages and stores mobile data and provide mobile application. 3) Mobile db platform: Contains laptops,PDA, or other internet access devices. 2 way communication link between corporate and mobile DBMS Issues associated with mobile database: i) Management of mobile database
ii) Communicaton between mobile and corporate database Additional functionality required by mobile DBMS: 1) Ability to communicate with centralized db server through modes such as wireless. 2) Ability to replicate data available on mobile device or centralized server. 3) Synchronize data on the centralized db server and mobile device. 4) Capture data from different sources 5) Manage data on the mobile device 6) Analyse the data on mobile device 7) Create customized mobile application 4.Explain Web DBMS Architecture. Web is developed by Cern in1990. Hyper links are used for moving from one page to another.Website is a collection of webpages. Web as a database application platform: Integrate web with dbms.
Requirements for web dbms integration: 1) Security 2) Independent connectivity 3) Ability to interface to the database 4) A connectivity solution takes advantage of all the features of an organisations DBMS 5) Open architecture support 6) It must provide scalability 7) Multiple HTTP request 8) Support for session and application based authentication 9) Acceptable performance 10) Minimal administration overhead Web DBMS Architecture: Traditionally it uses 2-tier client-server architecture. Where, Tier1-client Tier2-server Task performed by client is Presentation Service. Task performed by server is data Service. Two problems in 2-tier: i) ii) A fat client A significant client site administration overhead.
3-tier Architecture: There are 3 layers. Tier1-client Tier2-business logic and data processing layer Tier3-DBMS Client is known as Thin client Applicaton layer-performs the main processing. Advantage of 3-tier architecture:
1) Less expensive hardware 2) Application maintenance is centralized 3) Load balancing is easier 3-tier architecture can be extended to n-tier architecture.Applicaton layer is divided into i) application server ii) Web server 5.Approaches to integrate web and DBMS. 1) Scripting languages such as VB script and java script 2) Using CGI 3) Using HTTP cookies 4) Extensions to web server such as Ntescape API,Microsoft Internet Information Server API 5) Java and JDBC,SQLJ,Servlets,JSP 6) Microsofts web solution platform with ASP and ActiveX data objects 7) Oracles internal platform
UNIT V
1.Explain the deductive database concepts. Deductive database: A deductive database includes capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database. Because part of the theoretical foundation for some deductive database systems is mathematical logic, such rules are often referred to as logic databases. Inference engine: An inference engine (or deductive mechanism) within the system can deduce new facts from the database by interpreting these rules. The model used for deductive databases is closely related to the relational data model, and particularly to the domain relational calculus formalism. It is related to the field of logic programming and the prolog language. Horn Clauses
In Datalog, rules are expressed as a restricted form of clauses called Horn Clauses, in which a clause can contain at most one positive literal. Use of relational operations: It is straightforward to specify many operations of the relational algebra in the form of Datalog rules that define the result of applying these operations on the database relations (fact predicates). This means that relational queries and views can easily be specified in Datalog. 2.Explain multimedia database. Multimedia database: Multimedia provide features that allow users to store and query different types of multimedia information, which includes images (such as photos or drawing), videoclips (such as movies, newsreels, or home videos), audio clips (such as songs, phone messages, or speeches), and documents (such as books or articles). Multimedia Information Systems are complex, and embrace a large set of issues, including the following: 1. Modeling 2. Design 3. Storage 4. Queries and retrieval 5. Performance Applications of multimedia database: 1. Documents and records management 2. Knowledge dissemination 3. Education and training 4. Marketing, advertising, retailing, entertainment, and travel 5. Real-time and monitoring. 3. Explain in detail about spatial database. Spatial database: Spatial databases provide concepts for databases that keep track of objects in a multi dimensional space.
Different spatial query language. The different spatial query languages are 1. Range query 2. nearest neighbor query Quad trees: Quad trees generally divide each space or subspace into equally sized areas, and proceed with the subdivisions of each subspace to identify the positions of various objects. 4.Explain in detail about active databases. Active databases: Active databases which provide additional functionality for specifying active rules. These rules can be automatically triggered by events that occur, such as database updates or certain times being reached, and can initiate certain actions that have been specified in the rule declaration to occur if certain conditions are met. There are the three main possibilities for rule consideration: 1. Immediate consideration 2. Deferred consideration 3. Detached consideration Triggers- A technique for specifying certain types of active rules.

CS 606 Advanced Database Technology

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CS 606 Advanced Database Technology

Hochgeladen von

Copyright:

Verfügbare Formate

NOORUL ISLAM COLLEGE OF ENGINEERING,KUMARACOIL DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.E.

2 MARKS & 16 MARKS

CS606- ADVANCED DATABASE TECHNOLOGY

Prepared By, J.E.Judith Lecturer/CSE NICE

6.Define degree of relationship?

9.Define candidate key?

15.Define query decomposition and what are its stages?

iii. 19.Define 3NF and BCNF

Inconsistent analysis problem

contact each of the

21. Write the differences between locking and non-locking protocols?

3. Explain the problems caused by concurrency control.

Distributed 2PL * othewise distributed 2PL implements a Read-One-Write-All(ROWA) replica controll

have a saga comprising of a sequence of n transaction

3.Explain Mobile DataBase with Diagram.

Das könnte Ihnen auch gefallen