Sie sind auf Seite 1von 29

Distributed DBMS(2170714)

Sr. Page
No. Practical Title Date No. Sign
Create two databases either on single DBMS and
Design Database to fragment and share the fragments
1.
from both database and write single query for creating
view.

Create two databases on two different computer


2. systems and create database view to generate single
DDB.

Create various views using any one of examples of


3.
database and Design various constraints.

Write and Implement algorithm for query processing


4.
using any of Example in either C /C++ /Java / .NET

Using any of example, write various Transaction


statement and show the information about
5.
concurrency control [i.e. various lock’s from
dictionary] by executing multiple update and queries.

Using Transaction /commit rollback, Show the


6.
transaction ACID properties

Write java JDBC program and use JTA to show various


7.
isolation level’s in transaction.

8. Implement Two Phase Commit Protocol

9. Case study on noSQL

10. Case study on Hadoop

150180107003 1
Distributed DBMS(2170714)

PRACTICAL – 1
AIM: Create two databases either on single DBMS and Design Database to fragment and
share the fragments from both database and write single query for creating view.
Solution:
Creating table computers in database manoj,

Creating table mobiles in database jonam,

150180107003 2
Distributed DBMS(2170714)

Creating fragment and viewing it..

150180107003 3
Distributed DBMS(2170714)

PRACTICAL – 2
AIM: Create two databases on two different computer systems and create database view
to generate single DDB.
Solution:
Creating table computers in database manoj(in Computer 1),

Creating table mobiles in database jonam (in Computer 2),

150180107003 4
Distributed DBMS(2170714)

Connecting two computers :

execsp_addlinkedsrvlogin@rmtsrvname=’localhost’,@useself=false,@rmtuser=’computer1’,@r
mtpassword=’1234’;
Joining the Code :

execsp_addlinkedsrvlogin@rmtsrvname=’(36061)’,@useself=false,@rmtuser=’computer2,@rm
tpassword=’4321’;

Creating a single view for both the tables in different databases…

150180107003 5
Distributed DBMS(2170714)

PRACTICAL – 3
AIM: Create various views using any one of examples of database and Design various
constraints.
Solution:
VIEW :

CONSTRAINTS :
SQL constraints are used to specify rules for the data in a table.
Constraints are used to limit the type of data that can go into a table.

This ensures the accuracy and reliability of the data in the table. If there is any violation
between the constraint and the data action, the action is aborted.
The following constraints are commonly used in SQL:
  NOT NULL - Ensures that a column cannot have a NULL value.
 
 UNIQUE - Ensures that all values in a column are different.

 KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a
PRIMARY
table.
  FOREIGN KEY - Uniquely identifies a row/record in another table.
 
 CHECK - Ensures that all values in a column satisfies a specific condition.
 
 DEFAULT - Sets a default value for a column when no value is specified.
 
INDEX - Used to create and retrieve data from the database very quickly.

Examples,

150180107003 6
Distributed DBMS(2170714)

NOT NULL

UNIQUE

AUTO_INCREMENT and PRIMARY KEY

150180107003 7
Distributed DBMS(2170714)

PRACTICAL – 4
AIM: Write and Implement algorithm for query processing using any of Example in either C
/C++ /Java / .NET.
Solution:

Query Processing includes translations on high level Queries into low level expressions that can
be used at physical level of file system, query optimization and actual execution of query to get
the actual result.

Parser: During parse call, the database performs the following checks- Syntax check,
Semantic check and Shared pool check, after converting the query into relational algebra.
Parser performs the following checks as (refer detailed diagram):
Syntax check concludes SQL syntactic validity. Example:
Semantic check determines whether the statement is meaningful or not. Example: query
contains a table_name which does not exist is checked by this check.
Shared Pool check Every query possess a hash code during its execution. So, this check
determines existence of written hash code in shared pool if code exists in shared pool then
database will not take additional steps for optimization and execution.

150180107003 8
Distributed DBMS(2170714)

Optimizer: During optimization stage, database must perform a hard parse atleast for one unique
DML statement and perform optimization during this parse. This database never optimizes DDL
unless it includes a DML component such as subquery that require optimization.

It is a process in which multiple query execution plan for satisfying a query are examined and most
efficient query plan is satisfied for execution.

Database catalog stores the execution plans and then optimizer passes the lowest cost plan
for execution.

Row Source Generation


The Row Source Generation is a software that receives a optimal execution plan from the
optimizer and produces an iterative execution plan that is usable by the rest of the database.
the iterative plan is the binary program that when executes by the sql engine produces the
result set.

Execution Engine: Finally runs the query and display the required result.

150180107003 9
Distributed DBMS(2170714)

PRACTICAL – 5
AIM: Using any of example, write various Transaction statement and show the information
about concurrency control [i.e. various lock’s from dictionary] by executing multiple update
and queries.
Solution:

INTRODUCTION
When more than one transactions are running simultaneously there are chances of a conflict
to occur which can leave database to an inconsistent state.

To handle these conflicts we need concurrency control in DBMS, which allows transactions to
run simultaneously but handles them in such a way so that the integrity of data remains intact.

LOCKS:
A lock is kind of a mechanism that ensures that the integrity of data is maintained. There are
two types of a lock that can be placed while accessing the data so that the concurrent
transaction can not alter the data while we are processing it.

1. Shared Lock(S)
2. Exclusive Lock(X)

1. Shared Lock(S): Shared lock is placed when we are reading the data, multiple shared locks
can be placed on the data but when a shared lock is placed no exclusive lock can be placed.

2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write the data. This lock
allows both the read and write operation, Once this lock is placed on the data no other lock
(shared or Exclusive) can be placed on the data until Exclusive lock is released.

S X

S True False

X False False

There are two rows, first row says that when S lock is placed, another S lock can be acquired
so it is marked true but no Exclusive locks can be acquired so marked False.

In second row, When X lock is acquired neither S nor X lock can be acquired so both
marked false.

150180107003 10
Distributed DBMS(2170714)

Lock Examples,

Creating table details,

Viewing details,

LOCKING the table detail into READ mode,

To unlock same details again we have to type following SQL query,


mysql> LOCK TABLE details WRITE;
This will than allow to redo all the insert and update tasks

150180107003 11
Distributed DBMS(2170714)

PRACTICAL – 6
AIM: Using Transaction /commit rollback, Show the transaction ACID properties.

Solution:

A transaction in a database system must maintain Atomicity, Consistency, Isolation, and


Durability, known as ACID properties in order to ensure accuracy, completeness, and
data integrity:

Atomicity This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none

Consistency The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database.

Durability The database should be durable enough to hold all its latest updates even if the
system fails or restarts.

Isolation In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system.

Presenting the Transition through SQL Query,

1. Deleting records from table before committing:


- creating table and shown table before delete command,

- delete command executing

150180107003 12
Distributed DBMS(2170714)

2. Performing rollback and showing output:

3. Now again deleting the last record:

4. Performing commit and then rollback:

150180107003 13
Distributed DBMS(2170714)

PRACTICAL – 7
AIM: Write java JDBC program and use JTA to show various isolation level’s in transaction.

Solution:

TRANSACTION
When auto-commit mode is disabled, no sql statements are committed until you call the
method commit explicitly.

All statements executed after the previous call to the method commit are included in the
current transactions and committed together as a unit.
-----------------------------------------------JAVA Programming------------------------------------------------------
import java.sql.Connection;

import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class Main {

public static void main(String[] args) throws Exception


{ String url = "jdbc:mysql://localhost/testdb";
String username = "root";

String password = "";


Class.forName("com.mysql.jdbc.Driver");

Connection conn = null;


try {

conn = DriverManager.getConnection(url, username, password);


conn.setAutoCommit(false);

150180107003 14
Distributed DBMS(2170714)

Statement st = conn.createStatement();

st.execute("INSERT INTO orders (username, order_date) VALUES ('java', '2007-12-


13')", Statement.RETURN_GENERATED_KEYS);

ResultSet keys = st.getGeneratedKeys();


int id = 1;
while (keys.next()) {

id = keys.getInt(1);
}

PreparedStatement pst = conn.prepareStatement("INSERT INTO order_details (order_id,


product_id, quantity, price) VALUES (?, ?, ?, ?)");
pst.setInt(1, id);
pst.setString(2, "1");
pst.setInt(3, 10);
pst.setDouble(4, 100);
pst.execute();

conn.commit();
System.out.println("Transaction commit...");

} catch (SQLException e) { if (conn != null) {


conn.rollback();

System.out.println("Connection rollback...");

e.printStackTrace();
} finally {

if (conn != null && !conn.isClosed()) {


conn.close();

150180107024 15
Distributed DBMS(2170714)

}
}
}

ROLLING BACK TO SAVEPOINTS


The method Connection.setSavepoint, sets a Savepoint object a within the current
transaction. The Connection.rollback method is overloaded to take a Savepoint argument.

import java.sql.Connection;

import java.sql.DriverManager;
import java.sql.Savepoint;
import java.sql.Statement;

public class Main {

public static void main(String[] args) throws Exception {


Connection conn = getConnection();
conn.setAutoCommit(false);
Statement st = conn.createStatement();

st.executeUpdate("create table survey (id int,myURL CHAR);");


st.executeUpdate("insert into survey(id) values(01)");
st.executeUpdate("insert into survey(id) values(02)");

Savepoint mySavepoint = conn.setSavepoint("MYSAVEPOINT");

st.executeUpdate("insert into survey(id) values(03)");

conn.commit();

150180107003 16
Distributed DBMS(2170714)

conn.rollback (mySavepoint);

st.close();
conn.close();
}

private static Connection getConnection() throws Exception {

Class.forName("org.hsqldb.jdbcDriver");
String url = "jdbc:hsqldb:mem:data/tutorial";

return DriverManager.getConnection(url, "sa", "");


}
}

JTA TRANSACTION MANAGER


Traditionally, it was the enterprise application server (e.g. Wildfly, WebLogic) responsibility to
provide a JTA compliant transaction manager. Nowadays, there is also a great variety of stand-
alone JTA transaction managers:

• Bitronix

• Atomikos
• RedHat Narayana
In this test, we are going to use Bitronix:
<bean id="jtaTransactionManager" factory-method="getTransactionManager"
class="bitronix.tm.TransactionManagerServices" depends-on="btmConfig, dataSource"

destroy-method="shutdown"/>

<bean id="transactionManager"
class="org.springframework.transaction.jta.JtaTransactionManager">
<property name="transactionManager" ref="jtaTransactionManager"/>

150180107003 17
Distributed DBMS(2170714)

<property name="userTransaction" ref="jtaTransactionManager"/>

</bean>

FOLLOWING EXCEPTION:

org.springframework.transaction.InvalidIsolationLevelException: JtaTransactionManager does


not support custom isolation levels by default - switch 'allowCustomIsolationLevels' to 'true'
So, let’s enable the custom isolation level setting and rerun the test:
<bean id="transactionManager"
class="org.springframework.transaction.jta.JtaTransactionManager">
<property name="transactionManager" ref="jtaTransactionManager"/>
<property name="userTransaction" ref="jtaTransactionManager"/>
<property name="allowCustomIsolationLevels" value="true"/>
</bean>

OUTPUT:
DEBUG [main]: c.v.s.i.StoreServiceImpl - Transaction isolation level is READ_COMMITTED

Even with this extra configuration, the transaction-scoped isolation level wasn’t propagated to the
underlying database connection, as this is the default JTA transaction manager behavior.

150180107003 18
Distributed DBMS(2170714)

PRACTICAL – 8
AIM: Implement Two Phase Commit Protocol.

Solution:
------------------------------------------PYTHON Programing---------------------------------------------------------
'''

Implementation of 2 Phase Commit as explained at Wikipedia:

https://en.wikipedia.org/wiki/Two-phase_commit_protocol

'''

import random, logging, time

from threading import Thread, Semaphore, Lock

_fmt = '%(user)s:%(levelname)s >>> %(message)s'

logging.basicConfig(format=_fmt)

LOG = logging.getLogger(__name__)

LOG.setLevel(logging.DEBUG)

MIN_ACCOUNT = 0

MAX_ACCOUNT = 100

NO_COHORTS = 2

class Coordinator(Thread):

def __init__(self):

Thread.__init__(self)

self.start_sem = Semaphore(0)

self.cohorts = []

self.votes = []

self.acks = []

self._log_extra = dict(user='COORD')

150180107003 19
Distributed DBMS(2170714)

def yes(self):

self.votes.append(True)

def no(self):

self.votes.append(False)

def ack(self):

self.acks.append(True)

def start_voting(self, cohort):

self.cohorts.append(cohort)

self.start_sem.release()

def run(self):

self.start_sem.acquire(NO_COHORTS)

## Voting Phase:

# 1. The coordinator sends a query to commit message to all cohorts and

# waits until it has received a reply from all cohorts.

for cohort in self.cohorts:

LOG.info('query_to_commit to {}'.format(cohort.uname), extra=self._log_extra)

cohort.query_to_commit()

## Commit Phase:

# If the coordinator received an agreement message from all cohorts

# during the commit-request phase

if all(self.votes):

# 1. The coordinator sends a commit message to all the cohorts.


LOG.info('Committing', extra=self._log_extra)

150180107003 20
Distributed DBMS(2170714)

for cohort in self.cohorts:

cohort.commit()

# If any cohort votes No during the commit-request phase (or the

# coordinator's timeout expires)

else:

# 1. The coordinator sends a rollback message to all the


cohorts. LOG.warning('Rolling back', extra=self._log_extra)

for cohort in self.cohorts:


cohort.rollback()

if all(self.acks):

LOG.info('END', extra=self._log_extra)

else:

LOG.error('KO something went wrong while receiving acks', extra=self._log_extra)

for cohort in self.cohorts:

cohort.end()

class Cohort(Thread):

def __init__(self, uname, coord):

Thread.__init__(self)

self.uname = uname

self.coord = coord

self.do = None

self.undo = None

self.sem = Semaphore(0)

self.lock = Lock()

self.account = random.randint(MIN_ACCOUNT, MAX_ACCOUNT)

150180107003 21
Distributed DBMS(2170714)

self._log_extra = dict(user=uname)

def query_to_commit(self):

## Voting phase:

# 3. Each cohort replies with an agreement message (cohort votes Yes to

# commit), if the cohort's actions succeeded, or an abort message

# (cohort votes No, not to commit), if the cohort experiences a failure

# that will make it impossible to commit.

if self.res:

LOG.info('vote YES', extra=self._log_extra)

self.coord.yes()

else:

LOG.info('vote NO', extra=self._log_extra)

self.coord.no()

def commit(self):

self.commit = True

def rollback(self):

self.commit = False

def end(self):

self.sem.release()

def run(self):

LOG.debug('BEFORE {}'.format(self.account), extra=self._log_extra)

# executing operation and saving result


self.lock.acquire()

150180107003 22
Distributed DBMS(2170714)

## Voting phase:

# 2. The cohorts execute the transaction up to the point where they

# will be asked to commit. They each write an entry to their undo log

# and an entry to their redo log.

for do in self.do:

do()

self.res = self.account >= MIN_ACCOUNT and self.account <= MAX_ACCOUNT


self.coord.start_voting(self)

LOG.debug('DURING {}'.format(self.account), extra=self._log_extra)

# waiting for the end of voting


phase self.sem.acquire()

## Commit phase:
if self.commit:

# 2. Each cohort completes the operation ...

LOG.info('commit', extra=self._log_extra)

else:

# 2. Each cohort undoes the transaction using the undo log ...

for undo in
self.undo: undo()

LOG.info('rollback', extra=self._log_extra)

# 2. ... and releases all the locks and resources held during the

# transaction.

self.lock.release()

# 3. Each cohort sends an acknowledgment to the coordinator.

150180107003 23
Distributed DBMS(2170714)

self.coord.ack()

LOG.debug('AFTER {}'.format(self.account), extra=self._log_extra)

if __name__ == '__main__':

coord = Coordinator()

u1 = Cohort('user1', coord)

u2 = Cohort('user2', coord)

amount = random.randint(MIN_ACCOUNT, MAX_ACCOUNT)

def u1_do():

u1.account -= amount

def u1_undo():

u1.account += amount

def u2_do():

u2.account += amount

def u2_undo():

u2.account -= amount

u1.do = [u1_do, ]

u2.do = [u2_do, ]

u1.undo = [u1_undo, ]

u2.undo = [u2_undo, ]

coord.start()

150180107003 24
Distributed DBMS(2170714)

u1.start()

u2.start()

u2.join()

u1.join()

coord.join()

OUTPUT:

150180107003 25
Distributed DBMS(2170714)

PRACTICAL – 9
AIM: Case study on noSQL.

Solution:

INTRODUCTION
NoSQL encompasses a wide variety of different database technologies that were developed
in response to the demands presented in building modern applications:
Developers are working with applications that create massive volumes of new, rapidly
changing data types — structured, semi-structured, unstructured and polymorphic data.
Long gone is the twelve-to-eighteen month waterfall development cycle. Now small
teams work in agile sprints, iterating quickly and pushing code every week or two, some
even multiple times every day.
Applications that once served a finite audience are now delivered as services that must
be always-on, accessible from many different devices and scaled globally to millions of
users.
Organizations are now turning to scale-out architectures using open source software,
commodity servers and cloud computing instead of large monolithic servers and storage
infrastructure.
Relational databases were not designed to cope with the scale and agility challenges
that face modern applications, nor were they built to take advantage of the commodity
storage and processing power available today.

NoSQL DATABASE TYPES


Document databases pair each key with a complex data structure known as a
document. Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents.
Graph stores are used to store information about networks of data, such as social
connections. Graph stores include Neo4J and Giraph.
Key-value stores are the simplest NoSQL databases. Every single item in the database is
stored as an attribute name (or 'key'), together with its value. Examples of key-value
stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value
to have a type, such as 'integer', which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over
large datasets, and store columns of data together, instead of rows.

150180107003 26
Distributed DBMS(2170714)

SCALABILITY
NoSQL databases were designed when cloud computing and server clusters were already the
de facto standard. So you'd think they'd be able to scale seamlessly.
In many respects, they can. But there are still scalability challenges. For example, not all
NoSQL databases are good at automating the process of sharding, which means spreading a
database across multiple nodes. If a database can't shard automatically, it can't scale up or
down automatically in response to fluctuating demand.
SQL databases are subject to these same sorts of problems. In many ways, in fact, SQL is even
worse at scaling than most NoSQL databases. Still, the fact that NoSQL is not completely
scalable in all situations constitutes another hurdle, especially as the DevOps revolution
makes the rest of the software stack more scalable than ever.

THE BENEFITS OF NOSQL


When compared to relational databases, NoSQL databases are more scalable and provide
superior performance, and their data model addresses several issues that the relational
model is not designed to address:
  Large volumes of rapidly changing structured, semi-structured, and unstructured data
 
 Agile sprints, quick schema iteration, and frequent code pushes
 
 Object-oriented programming that is easy to use and flexible

 distributed scale-out architecture instead of expensive, monolithic
Geographically
architecture

150180107003 27
Distributed DBMS(2170714)

PRACTICAL – 10
AIM: Case study on Hadoop.

Solution:

INTRODICTION
Hadoop is an open source distributed processing framework that manages data processing
and storage for big data applications running in clustered systems. It is at the center of a
growing ecosystem of big data technologies that are primarily used to support advanced
analytics initiatives, including predictive analytics, data mining and machine learning
applications. Hadoop can handle various forms of structured and unstructured data, giving
users more flexibility for collecting, processing and analyzing data than relational databases
and data warehouses provide.

Formally known as Apache Hadoop, the technology is developed as part of an open source
project within the Apache Software Foundation (ASF). Commercial distributions of Hadoop
are currently offered by four primary vendors of big data platforms: Amazon Web Services
(AWS), Cloudera, Hortonworks and MapR Technologies. In addition, Google, Microsoft and
other vendors offer cloud-based managed services that are built on top of Hadoop and
related technologies

HADOOP AND BIG DATA


Hadoop runs on clusters of commodity servers and can scale up to support thousands of
hardware nodes and massive amounts of data. It uses a namesake distributed file system
that's designed to provide rapid data access across the nodes in a cluster, plus fault-tolerant
capabilities so applications can continue to run if individual nodes fail. Consequently, Hadoop
became a foundational data management platform for big data analytics uses after it emerged
in the mid-2000s

Hadoop was created by computer scientists Doug Cutting and Mike Cafarella, initially to
support processing in the Nutch open source search engine and web crawler. After Google
published technical papers detailing its Google File System (GFS) and MapReduce
programming framework in 2003 and 2004, respectively, Cutting and Cafarella modified earlier
technology plans and developed a Java-based MapReduce implementation and a file system
modeled on Google's

In early 2006, those elements were split off from Nutch and became a separate Apache
subproject, which Cutting named Hadoop after his son's stuffed elephant. At the same time,
Cutting was hired by internet services company Yahoo, which became the first production user
of Hadoop later in 2006. (Cafarella, then a graduate student, went on to become a university
professor.

150180107003 28
Distributed DBMS(2170714)

Use of the framework grew over the next few years, and three independent Hadoop vendors
were founded: Cloudera in 2008, MapR a year later and Hortonworks as a Yahoo spinoff in
2011. In addition, AWS launched a Hadoop cloud service called Elastic MapReduce in 2009.
That was all before Apache released Hadoop 1.0.0, which became available in December 2011
after a succession of 0.x releases.

COMPONENTS OF HADOOP
The core components in the first iteration of Hadoop were MapReduce, the Hadoop
Distributed File System (HDFS) and Hadoop Common, a set of shared utilities and libraries. As
its name indicates, MapReduce uses map and reduce functions to split processing jobs into
multiple tasks that run at the cluster nodes where data is stored and then to combine what the
tasks produce into a coherent set of results. MapReduce initially functioned as both Hadoop's
processing engine and cluster resource manager, which tied HDFS directly to it and limited
users to running MapReduce batch applications.

That changed in Hadoop 2.0, which became generally available in October 2013 when version
2.2.0 was released. It introduced Apache Hadoop YARN, a new cluster resource management
and job scheduling technology that took over those functions from MapReduce. YARN -- short
for Yet Another Resource Negotiator but typically referred to by the acronym alone -- ended
the strict reliance on MapReduce and opened up Hadoop to other processing engines and
various applications besides batch jobs.

EVOLUTION OF THE HADOOP MARKET


In addition to AWS, Cloudera, Hortonworks and MapR, several other IT vendors -- most notably,
IBM, Intel and Pivotal (a Dell Technologies subsidiary) -- entered the Hadoop distribution
market. However, those three companies all later dropped out and aligned themselves with
one of the remaining vendors after failing to make much headway with Hadoop users. Intel
dropped its distribution and invested in Cloudera in 2014, while Pivotal and IBM agreed to
resell the Hortonworks version in 2016 and 2017, respectively.

Even the remaining vendors have hedged their bets on Hadoop itself by expanding their big
data platforms to also include Spark and numerous other technologies. Spark, which runs
both batch and real-time workloads, has ousted MapReduce in many batch applications and
can bypass HDFS to access data from Amazon Simple Storage Service (S3) in the AWS cloud --
a capability supported by Cloudera and Hortonworks, as well as AWS itself. In 2017, both
Cloudera and Hortonworks dropped the word Hadoop from the names of their rival
conferences for big data users

150180107003 29

Das könnte Ihnen auch gefallen