Art05Blockchain Articulo Diseño e Implementacion

Blockchain Meets Database: Design and Implementation
of a Blockchain Relational Database ∗
Senthil Nathan1 , Chander Govindarajan1 , Adarsh Saraf1 ,

Manish Sethi2 , and Praveen Jayachandran1
1
IBM Research India, 2 IBM Industry Platforms USA
1
(snatara7, chandg12, adasaraf, praveen.j)@in.ibm.com, 2 manish.sethi1@ibm.com
arXiv:1903.01919v2 [cs.DC] 31 May 2019
ABSTRACT transactional processing capabilities already built into rela-

In this paper, we design and implement the first-ever de- tional databases over decades of research and development,
centralized replicated relational database with blockchain we demonstrate that we can build a blockchain relational
properties that we term blockchain relational database. We database with all features provided by existing blockchain
highlight several similarities between features provided by platforms and with better support for complex data types,
blockchain platforms and a replicated relational database, constraints, triggers, complex queries, and provenance queries.
although they are conceptually different, primarily in their Furthermore, our approach makes building blockchain appli-
trust model. Motivated by this, we leverage the rich fea- cations as easy as building database applications; developers
tures, decades of research and optimization, and available can leverage the rich tooling available for backup, analytics,
tooling in relational databases to build a blockchain rela- and integration with existing enterprise systems. Applica-
tional database. We consider a permissioned blockchain tions that have a strong compliance and audit requirements
model of known, but mutually distrustful organizations each and need for rich analytical queries such as in financial ser-
operating their own database instance that are replicas of vices that are already built atop relational databases, or ones
one another. The replicas execute transactions indepen- that are reliant on rich provenance information such as in
dently and engage in decentralized consensus to determine supply chains are likely to most benefit from the blockchain
the commit order for transactions. We design two approaches, relational database proposed in this paper.
the first where the commit order for transactions is agreed With a focus on enterprise blockchain applications, we
upon prior to executing them, and the second where trans- target a permissioned setup of known but untrusted orga-
actions are executed without prior knowledge of the commit nizations each operating their own independent database
order while the ordering happens in parallel. We leverage nodes but connected together to form a blockchain network.
serializable snapshot isolation (SSI) to guarantee that the These database nodes need to be replicas of each other but
replicas across nodes remain consistent and respect the or- distrust one another. The replication of transaction logs
dering determined by consensus, and devise a new variant and state among multiple database nodes is possible us-
of SSI based on block height for the latter approach. We ing master-slave [16, 25, 45, 50] and master-master [25,
implement our system on PostgreSQL and present detailed 48, 50] protocols. In these protocols, the master node is
performance experiments analyzing both approaches. trusted as it is the only node that executes transactions and
propagates the final updates to other masters and slaves ei-
ther synchronously or asynchronously [28]. For synchronous
1. INTRODUCTION replication in the Master-Master architecture, the serializ-
Blockchain has gained immense popularity over recent able order is decided using techniques such as 2-phase com-
years, with its application being actively explored in several mit [38], 3-phase commit [20], a quorum protocol [44] to
industries. At its core, it is an immutable append-only log ensure consistency. Asynchronous replication can result in
of cryptographically signed transactions, that is replicated an intermediate inconsistent state between masters. Various
and managed in a decentralized fashion through distributed solutions [42, 46] exist to reconcile inconsistency. However,
consensus among a set of untrusted parties. Opinion on in a blockchain network, one node cannot trust the execu-
the technology has varied widely from being hailed as the tion results of another node. Hence, all transactions need
biggest innovation since the Internet to being considered as to be independently executed on all nodes. Further, such
another database in disguise. For the first time, we under- an execution and subsequent commits across multiple un-
take the challenge of explaining blockchain technology from trusted nodes must result in the same serializable order to
the perspective of concepts well known in databases, and ensure consistency. As a result, existing replication proto-
highlight the similarities and differences between the two. cols cannot be used for blockchain networks and hence, we
Existing blockchain platforms [17, 51, 39, 14], in an at- need a novel protocol with a notion of decentralized trust.
tempt to build something radically new and transforma- Although crash fault-tolerant synchronous commit proto-
tive, rebuild a lot of features provided by databases, and cols [29, 49] exist, it involves multiple rounds of commu-
treat it as just a store of information. We take a contrar- nications between database nodes per transaction or a cen-
ian view in this paper. By leveraging the rich features and tralized trusted controller. However, applying synchronous
∗
Copyright held by the owner/author(s) commit per transaction on a blockchain network would be
1
very costly as nodes may be globally distributed. Further, (1) Smart contract & transaction: Many blockchain
a trusted controller cannot be used. platforms such as Hyperledger Fabric [17], Ethereum [51]
In this paper, we address this key challenge of ensuring and Corda [30] support smart contracts which are functions
that all the untrusted database nodes execute transactions that manage state on the blockchain ledger. Transactions
independently and commit them in the same serializable or- are invocations of smart contract functions. Independent
der asynchronously. Towards achieving this, we make two execution of the smart contract functions by different mutu-
key design choices. First, we modify the database to sep- ally distrustful peers need to be deterministic. This is sim-
arate the concerns of concurrent transaction execution and ilar to stored PL/SQL procedures in relational databases.
decentralized ordering of blocks of transactions. We leverage Both accept input arguments, retrieve data by executing
ordering through consensus only to order blocks of transac- queries, apply the predefined logic, and write data back to
tions, and not for the serializable ordering of transactions the database. However, the execution of a PL/SQL proce-
within a single block. Second, independently at each node, dure may not be deterministic when executed independently
we leverage serializable snapshot isolation (SSI) [23, 43] to on multiple nodes if utility functions such as random and
execute transactions concurrently and then serially validate timestamp are used. Hence, we need to constrain PL/SQL
& commit each transaction in a block. to obtain a serializ- procedures to ensure that the execution is deterministic.
able order that will be the same across all untrusted nodes (2) User authenticity & non-repudiability: Permis-
and respect the ordering of blocks obtained from consensus. sioned blockchain systems employ public key infrastructure
Using these ideas, we present two novel approaches, the first for user management and ensuring authenticity. Participat-
where block ordering is performed before transaction execu- ing users belong to organizations are permissioned to par-
tion, and the second where transaction execution happens ticipate in the blockchain network. Transactions are digi-
parallelly without prior knowledge of block ordering, and tally signed by the invoker and recorded on the blockchain,
discuss their trade-offs. We leverage and modify serializable making them non-repudiable. Relational databases have so-
snapshot isolation [43] to create a novel variant which guar- phisticated user management capabilities including support
antees the same serializable ordering across all nodes. We for users, roles, groups, and various user authentication op-
implement our ideas on PostgreSQL and discuss several sub- tions. However, submitted and recorded transactions are
tle challenges we had to overcome. We make the following not signed by the invoker making them repudiable. Hence,
contributions in this paper: we need invoker’s signature on transaction.
(3) Confidentiality & access control: Some permis-
1. We devise two novel approaches of building a blockchain sioned blockchains support the notion of confidentiality, where
platform starting from a database, ensuring that a net- smart contracts, transactions and data are only accessible
work of untrusted database nodes can remain consis- by authorized users. In addition, access to invoke functions
tent. and to modify data is restricted to specific users. Relational
2. We devise a new variant of SSI based on block height databases have comprehensive support for access control on
to permit transaction execution to happen parallelly tables, rows, columns and PL/SQL procedures. Some re-
with ordering based on consensus. lational databases even provide features like content-based
access control, encryption and isolation of data. Advanced
3. We implement the system in PostgreSQL that required confidentiality features where only a subset of the nodes have
an addition of only 4000 lines of C code and present access to a particular data element or smart contract can be
the modified architecture. implemented, although we don’t consider that in this paper.
The rest of this paper is organized as follows. We highlight (4) Immutability of transactions and ledger state:
the properties required in a blockchain platform, motivate The blockchain ledger is an append-only log of blocks con-
how several of these are either partially or fully available taining sets of transactions that are chained together by
in relational databases, and identify shortcomings that need adding the hash of each block to it’s succeeding block. This
to be addressed in §2. We discuss our design in §3 and makes it difficult for an attacker to tamper with a block and
describe two approaches to transaction processing leveraging avoid detection. The ledger state can only be modified by
serializable snapshot isolation. We provide an account of our invoking smart contract functions recorded as transactions
prototype implementation on PostgreSQL in §4. In §5, we on the blockchain and are immutable otherwise. Although a
present a detailed performance study. We discuss related database logs all submitted transactions, executed queries,
work in §6, future work in §7, and conclude in §8. and the user information, it isn’t possible to detect changes
to the log as there are no digital signatures.
(5) Consistent replicated ledger across distrustful
2. MOTIVATION nodes: All non-faulty peers in a blockchain must main-
Table 1 presents a comparison of properties required by tain a consistent replica of all transactions and the ledger
a blockchain platform and features available in relational state. This is ensured by all peers committing transactions
databases. We argue that there is much to be gained by in the same sequential order as agreed through consensus.
leveraging and enhancing the features of relational databases Note that consensus is performed on blocks of transactions
to build a permissioned blockchain platform, rather than rather than on individual transactions to be more perfor-
building one from scratch. This approach helps us lever- mant. Databases support replication of transaction logs and
age the decades of research in relational databases, stable state among nodes using master-slave [16, 25, 45, 50] and
enterprise-ready code with rich features, the large commu- master-master [25, 48, 50] protocols. In these protocols, the
nity, libraries, tools and integrations with enterprise appli- master node is trusted as it is the only node that executes
cations. Next, we compare blockchains and databases across transactions and propagates the final updates to other mas-
eight features and the enhancements needed. ters and slaves. Hence, we need to propose a replication
2
Table 1: Similarities between blockchain properties and relational database features.
Blockchain Properties Relational Database Features Enhancement Needed
Smart contract PL/SQL procedure Deterministic execution
Authenticity, non-repudiability User management with groups and roles Crypto-based transaction authentication
Access control Role & content-based ACL policies None
Immutable transaction logs Transaction logs Digitally signed transactions
Consistent replicated ledger be- Master-slave & master-master replication with Decentralized trust and transaction or-
tween untrusted nodes trust on transaction ordering and update logs dering determined by consensus
Serializable isolation level Strict 2-phase locking, optimistic concurrency Order must respect block ordering ob-
control, serializable snapshot isolation tained through consensus
Async transaction, notification LISTEN & NOTIFY commands None
Provenance queries Maintains all versions of a row Enable query on historical records
protocol with a notion of decentralized trust, while ensuring §3.1. We provide background on Serializable Snapshot Iso-
that the replicas remain consistent. lation in §3.2 and show later in §3.4 that SSI, if directly
(6) Serializability isolation, ACID: Blockchain trans- applied, does not guarantee serializability and consistency
actions require serializable isolation, in which dirty read, across nodes for the execute-order-in-parallel approach. Fig-
non-repeatable read, phantom reads, and serialization anoma- ure 1 juxtaposes the two proposed approaches and we de-
lies are not possible. Transactions, when executed in paral- scribe them in detail in §3.3 and §3.4, including a novel SSI
lel, must follow the same serializable order across all nodes. based on block height technique. We discuss security prop-
Further, transactions should be ACID [22] compliant. Seri- erties of the proposed approaches in §3.5. We describe a
alizable isolation can be achieved in databases by employing: mechanism for peer node recovery in §3.6 and then discuss
(i) strict 2-phase locking [38], (ii) optimistic concurrency how a permissioned blockchain network can be bootstrapped
control [32], or (iii) SSI [23, 43]. This needs to be enhanced in §3.7.
to follow the block order as determined through consensus,
but can be leveraged to a large extent. 3.1 Key Components
(7) Asynchronous transactions: As transaction pro- We consider a permissioned network of organizations that
cessing and consensus may involve a non-trivial delay, clients are known to one another but may be mutually distrustful.
submit transactions asynchronously and then leverage noti- The network is private to the participating organizations
fication mechanisms to learn whether their transaction was and a new organization must be permissioned to join the
successfully committed. Databases provide support for noti- network. Each organization may include clients, database
fication channels and triggers that applications can leverage. peer nodes and ordering service nodes, which we describe
(8) Provenance: The auditable append-only log of trans- below, that together form the decentralized network.
actions in blockchains can be harnessed as a provenance Client: Each organization has an administrator respon-
store, for several use cases including supply chain tracking sible for onboarding client users onto the network. The ad-
and financial compliance. However, most blockchain plat- ministrator and each client have a digital certificate regis-
forms today do not support complex queries on historic data tered with all the database peers in the system, which they
or are not optimized for provenance queries. In certain rela- use to digitally sign and submit transactions on the network
tional databases which support Multi-Version Concurrency (we describe the network setup process in §3.7). This helps
Control [19] such as snapshot isolation [18], all versions of a support authenticity, non-repudiability and access control
row are maintained and old rows are purged on demand or properties. Clients may also listen on a notification channel
periodically. However, a SQL query cannot access the old to receive transaction status.
rows. For provenance queries, we need to disable purging Database Peer Node: An organization may run one or
and enable queries on the old rows. more database nodes in the network. All communication to
send and receive transactions and blocks happens via a se-
cure communication protocol such as TLS. Each node also
3. DESIGN OF A BLOCKCHAIN DATABASE has a cryptographic identity (i.e., public key) and all com-
We identify two approaches to achieve our goal of building munications are signed and authenticated cryptographically.
a consistent replicated ledger across untrusted nodes start- Each database node maintains its own replica of the ledger
ing from a relational database. The first approach, order- as database files, independently executes smart contracts
then-execute, orders all the transactions through a consensus as stored procedures, and validates and commits blocks of
service and then nodes execute them concurrently, whereas transactions formed by the ordering service.
in the second approach, execute-order-in-parallel, transac- Ordering Service: Consensus is required to ensure that
tion execution happens on nodes without prior knowledge the untrusted database nodes agree on an ordering of blocks
of ordering while block ordering happens parallelly through of transactions. In order to leverage the rich literature on
a consensus service. Intuitively, while the first approach is consensus algorithms with different trust models, such as
simpler and requires fewer modifications to the relational crash fault tolerant (e.g., Raft [40], Paxos [34], Zab [31])
database, the second approach has the potential to achieve and byzantine fault tolerant (e.g., PBFT [24], XFT [37])
better performance. We design and implement both ap- consensus we make the ordering service in our system plug-
proaches to study their trade-offs. gable and agnostic to the database implementation. The
We first describe the key components of our system in ordering service consists of consensus or orderer nodes, each
3
Client1 form Database Node Database Node Database Node Ordering Client1 form Database Node Database Node Database Node Ordering
Org1 of Org1 of Org2 of OrgN Service Org1 of Org1 of Org2 of OrgN Service
2 2 T1
T1: invoke 2
1 performs consensus to 1 T1: invoke 2
smartcontract smartcontract
foo(args) 3 construct a block executes transactions
3 3 block foo(args) at 4
3 3 concurrently using 3
height 'n' performs consensus to
Notify executes transactions 4 proposed SSI
4 4 construct a block
Client in block concurrently at height n
Notify 5 block 5
2 using proposed SSI Client 5
ordering
commit transactions commit transactions 6
3 phase 5 5 5 execution ordering 6 6
as per the order in as per the order in
execution block phase 2 4 phase
block
4 phase 6 7
6 state change hash 3 5 7 7 state change hash
6
committing 8
5 7 6
phase state change hashes are state change hashes are
6 7 8 9 8 block 8 added in the next block 7 8 9 10 9 block 9 added in the next block
8 9
checkpointing checkpointing
9 9 checkpoints or 9 10 10 checkpoints or 10
phase phase
point in time rollback
point in time rollback
(a) Order then Execute (b) Execute and Order In Parallel
Figure 1: Proposed transaction flows—order-then-execute and execute-and-order-in-parallel
rw-dependency wr-dependency
if T1 must have preceded T2 in the apparent serial order
of execution. Three types of dependencies can create these
T1 T2 T1 T2 T3 T1 T2 T3 edges:
(a) (b) (c) • rw-dependency: if T1 writes a version of some object,

and T2 reads the previous version of that object, then
Figure 2: SI Anomalies between 2 and 3 transactions T1 appears to have executed after T2 , because T2 did
not see the update. To represent this, an edge from
T2 to T1 with a label rw needs to be added. As we
will see, these rw-conflicts are central to SSI. Further,
owned by a different organization. Each orderer node, sim- a dependency can also be caused by predicate reads.
ilar to database nodes, have their own digital certificate or
identity. The output of consensus yields a block of transac- • wr-dependency: if T1 writes a version of an object,
tions, which is then atomically broadcast to all the database and T2 reads that version, then T1 appears to have
nodes. A block consists of (a) a sequence number, (b) a set executed before T2 (an edge from T1 to T2 with a label
of transactions, (c) metadata associated with the consensus wr).
protocol, (d) hash of the previous block, (e) hash of the cur-
rent block, i.e., hash (a, b, c, d); and (f) digital signature on • ww-dependency: if T1 writes a version of some object,
the hash of the current block by the orderer node. and T2 replaces that version with the next version,
then T1 appears to have executed before T2 (an edge
from T1 to T2 with a label ww).
3.2 Serializable Snapshot Isolation (SSI) If a cycle is present in the graph, then the execution does not
Strict 2-phase locking (S2PL), optimistic concurrency con- correspond to any serial order, i.e., a SI anomaly has caused
trol (OCC), and SSI are approaches to achieve serializability. a serializability violation. Otherwise, the serial order can be
As SSI offers greater concurrency than a S2PL and OCC, we determined using a topological sort.
have chosen SSI in our design to ensure serializability. SSI In SI, a wr or ww-dependency from T1 to T2 denotes that
achieves serializability using a modified snapshot isolation T1 must have committed before T2 began. In practice, writes
(SI) technique. acquire an exclusive lock to prevent ww-dependency between
Snapshot Isolation and Anomalies. In SI, each trans- two concurrent transactions. A wr-dependency also can-
action reads from a consistent snapshot of the database com- not occur between two concurrent transactions. In contrast,
prising of the last committed values that existed at the time only rw-dependency occurring between concurrent transac-
the transaction started. Although SI prevents dirty read, tions is relevant when studying serialization anomalies.
non-repeatable read, and phantom read, it cannot guarantee Adya et. al. [15] observed that every cycle in a serial-
serializability due to SI anomalies [18] which violates con- ization graph contains at least two rw-dependency edges.
sistency (i.e., C in ACID), specifically integrity constraints Fekete et. al. [26] subsequently found that two such edges
(refer to [18, 23, 43] for examples). Hence, Cahill et. al. [23] must be adjacent. Figure 2(a) shows the only possible cycle
proposed SSI to detect and resolve anomalies automatically with two transactions, and Figure 2(b) and (c) show the two
to ensure serializability. possible cycles with three transactions. If any of the trans-
Background on SI Anomalies. To detect SI anomalies, actions is aborted, a serializable order could be achieved.
Adya et. al. [15] proposed Multi-Version Serialization his- SSI - Resolving Anomalies. SSI automatically detects
tory Graph (MVSG). This graph contains a node per trans- and resolves anomalies. As tracking rw & wr dependen-
action, and an edge from transaction T1 to transaction T2 , cies, and subsequently detecting cycles is costly, SSI applies
4
heuristics that are anomaly-safe, but could result in false blocks one at a time, in the order of their block sequence
positives. They are: number, from the block store and performs the following
(1) abort immediately: Cahill et. al. [23] used two flags four operations:
per transaction T: (a) inConflict—set when there is a rw-
dependency from a concurrent transaction to T; (b) out- 1. The database node assigns a thread per transaction
Conflict—set when there is a rw-dependency from T to a (provided that the identifier present in a transaction’s
concurrent transaction. As soon as both of these flags are field is unique) to authenticate and execute it. In an
set for T, which could lead to an SI anomaly, T is aborted. append-only ledger table, it records each transaction
(2) abort during commit: Ports et. al. [43] maintained in a block. This ledger table is used for recovery as
two lists per transaction T: (a) inConflictList—maintains explained in §3.6 and also for provenance queries as
a list of concurrent transactions from which there is a rw- demonstrated in §4.2.
dependency to T. (b) outConflictList—maintains a list of 2. Each thread retrieves the public key associated with
concurrent transactions to which T has a rw-dependency. the username in the transaction, to verify the user’s
The transactions present in inConflictList of T are called digital signature. On successful authentication, the
nearConflicts. The transactions present in the inCon- username is set for the thread’s session which is needed
flictList of each nearConflict are called farConflicts. For for access control. We leverage the database’s access
e.g., in Figure 2(b), for T2 , T1 is a nearConflict and T3 is control capabilities without modification.
a farConflict. Recall that rw-dependency occurs only be-
tween concurrently executing transactions (such as in Fig- 3. Each thread executes the PL/SQL procedure with the
ures 2(a) and (b)). For each pair of nearConflict and far- passed arguments as per the transaction. To ensure
Conflict, if both transactions are not yet committed, then that on all nodes the transactions are executed on the
this heuristic aborts the nearConflict so that an immedi- same committed state, all valid transactions in a block
ate retry can succeed. In contrast, a wr-dependency only are executed concurrently using the abort during com-
occurs between a committed and a just commenced trans- mit SSI variant. This helps in ensuring that the set
action (Figure 2(c)). In this case, only if T3 has committed of transactions marked to be committed are the same
first, its nearConflict T2 is aborted. Otherwise, no transac- and that they follow the same serializable order across
tions are aborted. In other words, while the heuristic does all nodes.
not track wr-dependency, it accounts for its possibility and
aborts a transaction whose outConflict has committed. 4. Once a thread completes execution of a transaction,
In the next two sections, we describe the transaction flows the transaction would be ready to either commit or
for (1) Order then Execute; and (2) Execute and Order in abort (as per the procedure’s execution logic), but
Parallel as shown in Figure 1. In both flows, transactions waits without proceeding. This is because it could
are committed asynchronously across nodes. result in a different commit order in different nodes,
if the execution completion order is different (which
3.3 Order then Execute could result in different aborts by SSI in each node).
A transaction submitted by a client in the order-then-
Only after committing a block of transactions, the execution
execute approach comprises of (a) a unique identifier, (b)
phase will process the next block. In Figure 1(a), step 4
the username of the client, (c) the PL/SQL procedure ex-
denotes the execution phase.
ecution command with the name of the procedure and ar-
guments, and (d) a digital signature on the hash(a, b, c) 3.3.3 Committing Phase
using the client’s private key. The transaction flow consists
To ensure that the commit order is the same on all nodes,
of four pipelined phases: ordering, execution, committing,
the order in which the transactions get committed is the
and checkpointing, wherein a submitted transaction needs
order in which the transactions appear in the block. Only
to first be ordered, then executed, and finally committed
when all valid transactions are executed and ready to be
before recording a checkpoint.
either committed or aborted, the node serially notifies one
3.3.1 Ordering phase thread at a time to proceed further. Every transaction ap-
Clients submit transactions directly to any one of the or- plies the abort during commit approach to determine whether
dering service nodes (it is possible for a peer to act as a proxy to commit, and only then does the next transaction enter
for the client to submit transactions to the ordering service, the commit stage. While it is possible to apply SSI for all
as long as the client trusts the peer to not drop its trans- transactions at once to obtain a serializable order, we de-
actions). On a periodic timeout (say every 1 second), the fer such optimizations for simplicity. The node records the
ordering service nodes initiate a consensus protocol among transaction status in the ledger table and emits an event via
themselves to construct a block of transactions. After con- a notification channel to inform the client. In Figure 1(a),
structing a block, the ordering service delivers it to all nodes step 5 denotes the committing phase.
in the network via atomic broadcast. In Figure 1(a), steps There is one additional challenge. In SSI, ww-dependency
2 and 3 denote this phase. is handled using an exclusive lock. For example, if T1 and
T2 are competing to write an object, only one transaction
3.3.2 Execution phase can acquire a lock (say T1 ) and the other (say T2 ) must
On receiving a block of transactions, each database node wait for the lock to be released which can happen only after
verifies whether the received block is in sequence and sent the commit/abort of T1 . However, in our system, to ensure
by the ordering service. On successful verification, the node consistency across all nodes, the committing phase cannot
appends the block to a block store which is maintained in the start unless all transactions complete execution. So, we can-
local file system. In parallel, the node retrieves unprocessed not use locks for ww-dependency. However, as the ordering
5
Logical Storage database nodes and the ordering service in the background.
Current The four steps described in §3.3.2 for the execution phase
3 DB state at block height 3
Block applies for the execute-order-in-parallel approach as well,
2 DB state at block height 2
except for the default SSI and the logic to handle dupli-
Height 1 DB state at block height 1 cate transaction identifiers. This is shown in steps 1-3 in
Figure 1(b). Unlike the order-then-execute approach where
Example: Execute => Can access only the DB execution of transactions happen after ordering and on the
transaction-1 at snapshot-height 1 state at block height 1 committed state from the previous block, in the execute-
order-in-parallel approach we endeavor to execute and or-
Figure 3: Logical Database Storage to Enable Snap- der in parallel. To ensure that the transaction is executing
shot Isolation Based on Block Height. on the same committed data on all nodes, the submitted
transaction includes the block height on which it should be
executed (the client can obtain this from the peer it is con-
nected with). We propose SSI based on block height as
between transactions is determined by consensus and fixed depicted in Figure 3, as the default SSI cannot ensure that
across all nodes, we leverage this to permit both transactions transactions are executed on the same committed state on
to write to different copies of the object, but subsequently all nodes as each node can be at a different block height
commit only the first as determined by the ordering. We depending on its processing speed.
show how such an implementation is possible in §4. SSI Based on Block Height: In this isolation level, each
It is noteworthy that certain existing blockchain plat- row of a table contains a creator block number and deleter
forms such as Ethereum also follow an order-then-execute block number which denote the block that created and deleted
approach. However, they execute transactions sequentially this row, respectively (irrespective of the transaction within
in the order they appear in the block to ensure serializability the block that created it). Note that, creator and deleter
and consistency across nodes, but this affects performance. block number are part of the implementation for the order-
In contrast, leveraging SSI to execute transactions concur- then-execute approach as well, but are used only for prove-
rently and then sequentially issuing committing as per or- nance queries and are not required for SSI. During a trans-
dering from consensus, enables us to optimize performance. action execution, based on the block number specified, the
3.3.4 Checkpointing Phase transaction can view the database state comprising of all
commits up to this block height as shown in Figure 3. We
Once all transactions in a block are processed, each node
refer to this specified block number as the transaction’s
computes the hash of the write set, which is the union of
snapshot-height. In other words, a transaction sees all
all changes made to the database by the block, and sub-
rows with creator lesser than or equal to its snapshot-height.
mits it to the ordering service as a proof of execution and
For all such rows, the deleter should be either empty or
commit. When a node receives subsequent blocks, it would
greater than the snapshot-height. For this SSI to work,
receive the hash of write set computed by other nodes. The
we need to maintain all versions of a row and every update
hash computed by all non-faulty nodes would be the same
should be a flagging of the old row and an insertion of the
and the node then proceeds to record a checkpoint (note
new row. Every delete should be a marking of deleter.
that this is different from the Write-Ahead Logging check-
This facilitates the transaction to be executed on the same
pointing). It is not necessary to record a checkpoint every
committed data on all nodes.
block. Instead, the hash of write sets can be computed for
However, the current block (current-block) processed for
a preconfigured number of blocks and then sent to the or-
commit by a node might be either lower or higher than the
dering service. In Figure 1(a), steps 6, 7, and 8 denote the
specified snapshot-height of a transaction. If the current
checkpointing phase.
-block is lower, the transaction would start executing once
3.4 Execute and Order in Parallel the node completes processing all blocks and transactions up
A transaction submitted by a client in the execute-order- to the specified snapshot-height. If the current-block is
in-parallel approach comprises of (a) the username of the higher, the transaction would be executed immediately, but
client, (b) the PL/SQL procedure execution command with the serializability requirement could be violated because of a
the name of the procedure and arguments, (c) a block num- phantom or stale data read, which needs to be detected and
ber, (d) a unique identifier which is computed as hash(a, handled. For example, assume that a transaction is updat-
b, c), and (e) a digital signature on the hash(a, b, c, d) ing all rows in a table which satisfy a given predicate men-
using the client’s private key. The transaction flow consists tioned in a WHERE clause. There is a possibility that a row
of four phases: execution, ordering, committing, and check- that satisfies the predicate was committed by a block with a
pointing phase. A submitted transaction is executed by the number which is greater than the specified snapshot-height
database nodes and in parallel ordered by the ordering nodes and lesser than or equal to current-block. In this scenario,
and placed in a block. This is followed by the commit and a phantom read [18] has occurred that violated the serial-
checkpoint phases. We describe each phase in detail below. izability. Similarly, a transaction can read a row from a
snapshot as of snapshot-height, but if that row was either
3.4.1 Execution Phase updated or deleted by a subsequent block it would result in
Clients submit transactions directly to one of the database a stale read for this transaction.
nodes. When a node receives a transaction, it assigns a In order to ensure serializability, the proposed SSI ap-
thread to authenticate, forward and execute the transaction. proach detects such phantom reads and stale data reads to
On successful authentication (same as in the order-then- abort the corresponding transaction. To detect and abort
execute approach), the transaction is forwarded to other
6
SSI variant—block-aware abort during commit. Ta-
Table 2: Abort rule for our proposed SSI variants when ble 2 presents the abort rules for the proposed SSI variant.
Transaction T is committing.
In addition to nearConflict and far- Conflict used in
nearConflict farConflict to commit first Abort
same block among conflicts abort during commit, our variant considers two additional
" " nearConflict farConflict
parameters:
" " farConflict nearConflict
" %
1. whether the nearConflict and farConflict are in the
nearConflict farConflict same block.
% " farConflict
% % - nearConflict 2. if they are in the same block, which among them is
% none - earlier as per the ordering.
When either or both the conflicts are in the same block
such a transaction, the proposed approach applies row visi- as transaction T, it is straightforward to abort the one that
bility logic on the committed state: comes later in the ordering, and is deterministic on all nodes.
1. when a row with creator greater than the specified The tricky case is when neither of the conflicts are in the
snapshot-height satisfies the given predicate, abort same block. In this case, we abort the nearConflict transac-
the transaction provided that the deleter is empty tion. Note, the nearConflict is not in the same block but ex-
(handles phantom read); ecutes concurrently with T —this means that the snapshot
-height specified for both could be lesser than or equal to
2. when a row with creator lesser than snapshot-height the current block height at the node. With no synchroniza-
satisfies the given predicate, abort the transaction pro- tion on transaction executions between nodes, it is possible
vided that the deleter is non-empty and greater than for an SI anomaly to occur only at a subset of nodes. To
snapshot-height (handles stale read). ensure consistency between nodes, we need to ensure that
Further, concurrent transactions which are going to be com- the same set of transactions are aborted on all nodes. Let
mitted during or after current-block can also create a phan- us consider possible scenarios in other nodes (say T2 is the
tom read or a stale read problem. Such cases are tracked by nearConflict transaction):
our proposed SSI as described in the committing phase.
1. If T2 is concurrent with T and an anomaly structure
3.4.2 Ordering Phase is detected, then T2 is aborted as per our heuristic.
Database nodes submit transactions to the ordering ser- 2. If T commits before T2 starts execution, then T2 being
vice unlike the order-then-execute approach. Note that the a nearConflict for T read a stale value and would be
transactions are being executed in the database nodes while aborted as discussed earlier.
they are being ordered by the ordering service. The rest of
the logic is same as explained in §3.3.1 for the order-then- 3. If T2 is concurrent with T , but T is committed first,
execute approach. In Figure 1(b), the steps 4 and 5 denote then this is a case of a rw-dependency where the out-
the ordering phase. Conflict has already committed leading to an anomaly
structure (similar to Figure 2(c)) and T2 will be aborted
3.4.3 Committing Phase in this case as well.
Similar to the order-then-execute approach, an important
pre-condition for entering commit phase is that all transac- Even if there is no farConflict, the nearConflict would get
tions in a block must have completed its execution and wait- aborted (if it not in same block as T) as it could result
ing to proceed with commit/abort. However, there are two in a stale read only at a subset of nodes. Whereas, in
key differences compared to the commit phase of order-then- other nodes, it might result in a rw-dependency without an
execute. First, after receiving a block, if all transactions are anomaly structure. Hence, we must abort the nearConflict
not running (this occurs if the node has not received com- irrespective of the presence of a farConflict, whenever the
munication of a transaction from another peer due to mali- nearConflict is not in the same block.
ciousness or network delay), the committer starts executing The unique identifier used for a transaction must be the
all missing transactions and waits for their completion be- hash of (a) the username of the client, (b) the PL/SQL pro-
fore proceeding to the committing phase. Second, unlike the cedure execution command with the name of the procedure
previous approach, it is possible for concurrent transactions and arguments, and (c) a block number specified for the SSI
to be executing at different snapshot heights (as specified by by the client. The reason is that if two different transac-
the respective clients). Further, transactions that are con- tions are submitted to two different nodes with the same
current on one node, may not be concurrent on another, but unique identifier, whichever transaction executes first on a
we need the set of transactions that are decided to be com- given node is the one that would succeed, whereas the other
mitted to be the same on all nodes. As a result, we don’t would fail due to the duplicate identifier. As this can result
support blind updates such as UPDATE table SET column = in an inconsistent state across nodes, the unique identifier
value; which might result in a lock for ww-dependency only is composed of the three fields in the transaction to ensure
on a subset of nodes. Note, ww-dependencies are handled that no two different transactions have the same identifier.
in the same way as described in §3.3.3. Further, instead of
employing the abort during commit variant of SSI, we pro- 3.4.4 Checkpointing Phase
pose a block-aware abort during commit variant of SSI as This phase is same as the one explained in §3.3.4.
described next.
7
3.5 Discussion on Security Properties following are two ways to detect such malicious behavior.
(1) Submission of invalid transactions. A malicious The user can submit the query to multiple database nodes
client can potentially submit a large number of invalid trans- and verify whether the results are the same. Otherwise,
actions (e.g., ones that will eventually fail due to stale reads, any of the existing query authentication [35, 36, 41, 52, 53]
try to perform operations they do not have access for) in an methods can be used to verify the integrity of query results.
attempt to launch a denial of service attack. This can be Further, when a stored data is tampered with, it would even-
thwarted in one of two ways. First, similar to permission- tually be identified through the checkpointing phase.
less blockchain networks, a transaction fee could be levied (6) Tampering of blockchain. Each database node
for each transaction using a native currency (it is possible stores all blocks in a separate store called the block store. If
to implement a native currency in our system if desired). any malicious database node tampers its block store, it will
Second, by monitoring clients and their behavior on the net- not affect the other replicas maintained at other organiza-
work, it is possible to exclude them from participation. We tions’ node. In order to tamper the block store and not be
leave such enhancements for future work as it does not affect detected, the database node would need the private cryp-
the core design of our system. tographic key of the orderer node as well as the client who
(2) Obscuration of valid transactions. In order- submitted the transaction to forge their signatures. Even if
execute approach, when a malicious orderer node receives the malicious node achieves this, if the majority of the nodes
a transaction from the user, it might not include the trans- are non-malicious, honest organizations could prove that the
action in a block. Similarly, in execute-order-in-parallel ap- malicious organization has tampered with the block store by
proach, when a malicious database node receives a trans- comparing the replicated chain when a conflict occurs. Only
action from the user, it might not forward the transaction if 51% of the nodes are malicious, a blockchain can be suc-
to other database and orderer nodes. In both scenarios, at cessfully tampered.
the client side, the transaction request would timeout. The
client can then submit the same transaction to some other
3.6 Recovery After a Failure
orderer or database node depending on the approach. Note A blockchain network is required to tolerate node fail-
that even if the client side timeout was a false alarm (i.e., ures and we need to ensure that a failed node recovers to
the transaction is forwarded, included in a block and exe- a consistent state when it restarts. In both order-then-
cuted), the resubmission of the same transaction does not execute and execute-order-in-parallel approaches, the block
affect the data consistency as all nodes check for the unique processing stage has the following two common operations
identifier included in the transaction before execution. per block: (1) atomically store all transactions information
In execute-order-in-parallel approach, if the database node in the ledger table along with the block number; (2) atom-
forwards the transaction to an orderer node but not to other ically store all transactions’ status (i.e., commit/abort) in
database nodes, eventually the request would be added into the ledger table. Note, only after all transactions get writ-
a block and delivered to all database nodes. The default ten to write-ahead-logging and the default transaction logs,
committing phase described in §3.4 would take care of exe- the step 2 gets executed. A node can fail during any of
cuting missing transactions while processing the block. the above two stages. When the node comes back online, it
(3) Withholding of a block or transaction com- would execute the following operations to recover the node
mit. In both approaches, when a malicious database node to a consistent state:
receives a block, it might skip committing the block or a
1. Retrieves the last processed block number from the
particular transaction in that block. In such a scenario, the
ledger table and checks whether all transactions in the
hash computed during the checkpointing phase would differ
ledger table have a status. Note, as we store all trans-
from other nodes, and it would become evident during the
actions’ status atomically, either all transactions must
checkpointing process that the malicious node did not com-
have a status or none.
mit the block correctly. As other organizations can detect
such maliciousness, there is no incentive for the malicious 2. If a status is found in the ledger table for transactions,
organization to engage in such a scenario. Since, database it means that the block was committed successfully
nodes validate and commit blocks independently of one an- and no recovery is needed. Note, if a node fails af-
other, a malicious database node cannot hamper the liveness ter committing a transaction, during restart, the de-
of the network. fault recovery mechanism that uses write-ahead log-
(4) Byzantine ordering nodes. A malicious ordering ging (WAL) would take care of disk consistency.
node could send an incorrect block to database nodes con-
nected to it. If an organization’s database node trusts the 3. If a status is not found in the ledger table for trans-
ordering node it operates, but the ordering node is mali- actions, a node might have (a) either failed after com-
cious, then that database node would also become faulty. mitting all those transactions but before updating the
This scenario would then be similar to the previous scenario ledger table with the status, or (b) failed before com-
of the database node being malicious. If a database node mitting all or some those transactions. If a transaction
does not trust the ordering node, it should obtain blocks was successfully committed, the relational database
from 2f + 1 ordering nodes (assuming the consensus algo- would have recorded the status on the default trans-
rithm tolerates f failures), to ensure it obtains the correct action logs. The node first checks the status of each
block after consensus. transaction by reading the default transaction logs. If
(5) Tampering of user data. In both approaches, the a status is present for all transactions, the node would
user can submit a read query to a single database node to update the ledger table with the status (i.e., case (a)).
fetch the stored data. A malicious database node can tam-
4. If a status is not present in the transaction log for all
per the data and return an incorrect result to the user. The
or some transactions (i.e., case (b)), the node needs to
8
rollback all other committed transactions in the same 1. create_deployTx() creates, replaces or drops a smart
block. Then the node can start re-executing all trans- contract. The argument to this procedure can be CREATE
actions as described in §3.3.2 and §3.4.1, commit them, FUNCTION statement (to create a new smart-contract),
and record the status in the ledger table. The rollback CREATE OR REPLACE FUNCTION (to update an existing
of committed transactions is required as we need to smart-contract), or DROP FUNCTION (to delete an ex-
execute all transactions in a block parallelly using SSI isting smart-contract). On a successful execution, this
at the same time to get a consistent result with other procedure creates a record in the deployment table
nodes as well. with the passed argument but does not yet execute
it to CREATE/REPLACE/DROP.
By the time a node restarts and recovers after a failure,
a few more blocks may have been added in the network. 2. submit_deployTx() executes the SQL statement present
When the node receives a block from the ordering service, it in the deployment table after verifying that an admin
would notice a gap in the sequence number due to missing from each organization has approved the deployment
blocks. To ensure consistency, the node then retrieves any transaction. If not all organizations have approved,
missing blocks, processes and commits them one by one. the invocation returns an error. If a smart contract is
We observe that it is possible to do a single atomic commit updated, any uncommitted transactions that executed
for all transactions in a block after validating them serially. on an older version of the contract are aborted.
This can simplify the recovery process described above, but
we have omitted describing this in the interest of simplicity. 3. approve_deployTx() approves a deployment transac-
tion by adding a digital signature provided by the or-
3.7 Network Bootstrapping ganization’s admin.
To bootstrap a permissioned network, first, we need to 4. reject_deployTx() rejects a deployment transaction
bring up the database and orderer nodes each with their by adding a digital signature provided by the organi-
respective admin users and secure communication via TLS zation’s admin and a reason for the rejection.
between them. We need to then create users & roles for
each organization, deploy PL/SQL procedures as smart con- 5. comment_deployTx(): adds a comment to a deploy-
tracts, and define access control policies. ment transaction. When an admin wants to suggest
Setting up database nodes and ordering service. a change to a smart contract present in a deployment
At network startup, each organization shares data consist- transaction, this system contract can be used.
ing of TLS certificates, domain names of database and or-
derer nodes, and credentials of admin users (i.e., public keys) Further, each node creates three more system smart con-
with all other organizations. Each organization then starts tracts to create, delete, and update users with cyrptographic
its database and orderer nodes with the credentials of all credentials. As an invocation of all system smart contracts
organizations’ admin users. are recorded as blockchain transactions, the network holds
Support for blockchain and non-blockchain schema. an immutable transaction log and a history of approvals
As a part of database node startup, the node creates a de- from organizations before deploying smart contracts. For
fault database with two schemas named blockchain and non- access control policies, the network cannot rely on the sys-
blockchain (the latter being optional for any private data tem contract as it is unware of tables to be created and
held by the organization). Any transaction submitted to a privileges for each user. Hence, access control policies need
blockchain schema is treated as a blockchain transaction, to be embedded within a smart contract itself. Once user’s
which must follow the transaction flow described in §3.3 smart contracts are created successfully using the above sys-
or §3.4. In the blockchain schema, both users and admins tem contracts, applications can start invoking them.
can execute only PL/SQL procedures and individual SELECT
statements. Note, individual SELECT statements would not 4. IMPLEMENTATION USING POSTGRESQL
be recorded on the blockchain and can only be a read-only PostgreSQL [11] is the first open source database to im-
operation. Other individual DML & DDL statements are plement the abort during commit SSI variant [43]. Further,
not allowed to be executed, but only through PL/SQL pro- it maintains all rows even after an update or delete and
cedures. On a non-blockchain schema, transactions are ex- is highly modular and extensible. For these reasons, we
ecuted using the default single node transaction flow imple- chose to modify PostgreSQL to build a blockchain relational
mented in a relational database. The non-blockchain schema database, rather than implementing one from scratch. The
in a database node is accessible only by the organization modifications only amounted to adding around 4000 lines
which owns that node. of other organizations can query of C code to implement both approaches. We present rele-
or submit transactions only on the blockchain schema of the vant background on PostgreSQL in §4.1. Then, in §4.2 and
database node and not on the non-blockchain schema. Users §4.3, we describe the components we added and modified
of an organization can execute reports or analytical queries in PostgreSQL respectively. In §4.4, we present implemen-
combining the blockchain and non-blockchain schema on the tation details of a crash fault-tolerant ordering service, and
node owned by their organization. describe the transaction flows in §4.5.
Setting up of PL/SQL procedures. To facilitate de-
ployment of smart contracts, each node exposes the follow- 4.1 PostgreSQL Background
ing four system smart contracts. These contracts are created PostgreSQL supports three isolation levels—read commit-
in the blockchain schema during the database node startup. ted, repeatable read (i.e., SI), and serializable (i.e., SI with
These system smart contracts can only be invoked by orga- detection and mitigation of anomalies). A snapshot com-
nization admins, are considered blockchain transactions and prises of a set of transaction IDs, which were committed
follow the transaction flow described earlier.
9
application Denotes modified or newly added module
(2) Middleware: Forward transaction to other
(1) [Transaction{uuid, queries}blk#, sig] nodes and ordering service
libpq jdbc (6) Middleware: On receiving the block, verify
odbc
(2) Execution Phase. Verify user signature. Invoke
fork orderer signature and store in pgBlockstore
postgres backend Middleware to forward transaction.
postgres Middleware fork BlockProcessor
exec_simple_query(query) (3) Execute transaction using proposed SSI. Wait for fork
(server) 1. Parse/Analyze
2. Plan Crypto signal from BlockProcessor before either commit/abort.
3. Execute Modified SSI (8) Commit Phase. Apply the proposed SSI
XLOG: WAL CLOG vaccum Lock Statistics Semaphore
page buffer
BlockProcessor
background writer TxWriteSet TxOrder TxMetadata Metadata
Postgres Node 1 Shared Memory
storage
Middleware pgCerts pgBlockstore pgLedger (5) Block
application (2) Transaction{uuid, queries} blk#, sig
(5) Block (7) BlockProcessor: Signal each backend
libpq jdbc odbc
(4) Ordering Phase: Ordering Service serially as per transaction order in block. Wait
Postgres Node 2 Construct & send the block (solo, kafka, pbft)
for all to commit/abort. Update pgLedger.
Figure 4: Transaction Flow in PostgreSQL for execute-order-in-parallel
as of the start of this transaction, whose effects are visi- ing the transaction (as needed in the commit phase).
ble. Each row has two additional elements in the header, The block processor uses this data structure to check
namely xmin and xmax, which are the IDs of the transac- whether all transactions have completed its execution
tions that created and deleted the row, respectively. Note, and to signal each backend to proceed further. Each
every update to a row is a delete followed by an insert (both entry consists of the global transaction identifier (as
in the table and index). Deleted rows are flagged by set- present in the transaction), transaction ID assigned lo-
ting xmax instead of being actually deleted. In other words, cally by the node, process id of the backend, a semaphore
PostgreSQL maintains all versions of a row unlike other im- to denote whether the transaction has completed its
plementations such as Oracle that update rows in-place and execution and waiting for a signal to proceed further,
keep a rollback log. This is ideal for our goal of building and a final status (i.e., commit/abort).
a blockchain that maintains all versions of data. For each
row, a snapshot checks xmin and xmax to see which of these 2. BlockProcessorMetadata helps in signaling block pro-
transactions’ ID are included in the snapshot to determine cessor from the backend once it commits/aborts a trans-
row visibility. action. Additionally, this data structure holds the
Clients connect to the PostgreSQL server, postgres, us- last committed block number, current block number
ing an application interface such as libpq [10]. A backend (so that the backend can use it to set the creator &
process [9] is then assigned to each client connection to exe- deleter block number in rows), and a semaphore to
cute queries. PostgreSQL supports background workers [8] enable signaling from middleware to block processor.
(additional processes other than backend) to perform other 3. TxWriteSet holds the write-set of each transaction so
activities such as logical replication and statistics collection. that after SSI validation and before writing to WAL,
Note, the postgres server, backends, and background work- backend can store the creator or deleter in each row.
ers coordinate and share data through shared memory re- Note, TxWriteSet just keeps a pointer to the row in
gions. Both the shared memory data structures and back- buffer. For an update, it stores both the old and new
ground workers are easily extensible. row pointer. For delete, it stores only the old row
4.2 New Components Introduced pointer. Further, TxWriteSet is used to compute the
hash of the write-set for each block.
Communication Middleware & Block Processor.
We introduced two new background workers: (1) commu- 4. TxOrder helps backend to apply our proposed block-
nication middleware to communicate with other nodes and aware abort during commit variant of SSI. It stores
orderer, to transfer and receive transactions/blocks. The the global transaction identifier of each transaction in
received blocks are stored in an append-only file named pg- the block as per the commit order to aid in finding
Blockstore. Further, the middleware is also responsible for whether a nearCon- flict or farConflict is present
starting a transaction using libpq in the order-then-execute in the same block.
flow, and for starting any missing transactions in the execute-
order-in-parallel flow; (2) block processor to process a block. Blockchain Related Catalog Tables. We introduced
It executes the commit phase as described in §3.3.3 and two system catalog tables, pgLedger and pgCerts. The pgLedger
§3.4.3. is the ledger table described in §3.6, and stores information
Shared Memory Data Structures. We introduced the about each transaction such as the global identifier, local
following four data structures in the shared memory (the last transaction ID, query passed in the transaction, client who
two are used only for the execute-order-in-parallel flow). submitted the transaction, and commit/abort status. It is
used for recovery and for supporting provenance queries.
1. TxMetadata enables communication and synchroniza- The pgCerts table stores the cryptographic credentials of
tion between block processor and backends execut- all blockchain users.
10
Table 3: Example provenance queries to audit user transactions.
Audit Scenarios Queries
Get all invoice details from table invoices which were SELECT invoices.* FROM invoices, pgLedger WHERE
updated by a supplier S between blocks 100 and 300 pgLedger.blockNumber BETWEEN 100 AND 300 AND
pgLedger.user = S AND invoices.xmax = pgLedger.txid
AND invoices.deleter IS NULL;
Get all historical details of an invoice from table SELECT invoices.* FROM invoices, pgLedger WHERE invoiceID =
invoices whose invoiceID (primary key) is k and was k AND pgLedger.user IN (S, M) AND pgLedger.commitTime
updated by either supplier S or manufacturer M in the > now() - interval ‘24 hours’ AND invoices.xmax =
last 24 hours pgLedger.txid AND invoices.deleter IS NULL;
Provenance Query. We introduced a special type of we have modified the abort rule to follow our proposed rules
read only query called provenance. This query type can see in Table 2. The modified SSI utilizes the TxOrder data
all committed rows present in tables irrespective of whether structure in the shared memory. For ww-conflicts, we al-
it is inactive (i.e., marked with xmax ) or active. As it can low writes to the same object by different transactions (as
access all historical content, it enables support for very com- updates are anyway not in-place in PostgreSQL) by main-
plex analytical and audit queries with the help of pgLedger. taining an array of xmax values comprising of transaction
Table 3 presents two examples of provenance queries. IDs of all competing transactions in the row being updated.
During commit, for each old row entry in the TxWriteSet,
4.3 Components Modified the backend (corresponding to the transaction that is com-
Application Interface & Deterministic PL/SQL Pro- mitting now) checks xmax values and marks all other trans-
cedures. We have enhanced the default application inter- actions for abort as only one transaction can write to the
face, i.e., libpq, with additional APIs to submit blockchain row to avoid lost update. Finally, it retains only its own
transactions & provenance queries, and fetch the latest block transaction ID in the xmax field.
height at the node. To make the PL/SQL procedure deter-
ministic, we have restricted the usage of date/time library, 4.4 Ordering Service
random functions from the mathematics library, sequence As described in §3.1, any consensus algorithm can be
manipulation functions, and system information functions. leveraged. For this work we use two different ordering ser-
Further, SELECT statements must specify ORDER BY primary vices: an Apache Kafka [1] and ZooKeeper [2] based crash
_key when using LIMIT or FETCH. Additionally, it cannot use fault tolerant, and a BFT-SMaRt [21] based byzantine fault
row headers such as xmin, xmax in WHERE clause. tolerant ordering service. Clients/peers connect to indepen-
SSI Based on Block Height. We have added two fields dent orderer nodes to submit transactions and receive cre-
for each row: (i) creator block number, (ii) deleter block ated blocks. Orderer nodes either: 1. connect to a Kafka
number. During commit, these two fields are filled for en- cluster and publish all received transactions to a Kafka topic,
tries in the TxWriteSet depending on whether an entry is an which delivers the transactions in a FIFO order, or 2. submit
update, insert, or delete. SI applies a row visibility logic us- the transactions to a BFT-SMaRt cluster which is then re-
ing xmin and xmax to identify whether a row should be seen sponsible for totally ordering the received transactions among
by a transaction (as described in §4.1). We enhance the row them. For creating a block of transactions, we use two pa-
visibility logic to have additional conditions using the row’s rameters: block size, the maximum number of transactions
creator and deleter block number and the snapshot-height in a block, and block timeout, the maximum time since the
of the transaction (as described in §3.4.1). This is in addi- first transaction to appear in a block was received. Each
tion to the default row visibility logic that helps avoid seeing orderer node either publishes a time-to-cut message to the
rows updated/deleted by concurrent transactions. During Kafka topic or sends it to the BFT-SMaRt cluster when its
predicate reads, the default visibility logic in PostgreSQL timer expires. The first time-to-cut message is considered
traverses rows as per the index entries that satisfies the to cut a block and all other duplicates are ignored. Once
given predicate or traverses the whole table when an in- a block is cut, orderer nodes append their signatures to the
dex is missing. For our approach to work (mainly to avoid block, persist the block in filesystem and then send it to
phantom or stale read described in §3.4.1), all read access connected peers.
is enforced to happen via index entries only which satisfies
a given predicate clause. Otherwise, there is a high possi- 4.5 Transaction Flow
bility of transactions getting aborted due to false positive Figure 4 depicts the new and modified components de-
phantom or stale read. Hence, if an index is not available scribed in the previous sections and outlines the execute-
for the given predicate clause, nodes abort the transaction. order-in-parallel transaction flow. Since the order-then-execute
Further, SELECT * FROM table_name; is not allowed from flow is simpler, we omit presenting the details for it in the
PL/SQL procedures as it always traverses the whole table. interest of brevity. The application leverages the blockchain
It is possible to implement our approach without the need interfaces in libpq to fetch the latest block height and sub-
for an index, but for simplicity we defer such optimizations. mit the transaction to the PostgreSQL backend. After veri-
Note, any kind of individual SELECT statements without any fying the client signature leveraging pgCerts, the transaction
writes on the blockchain schema will not be affected by SSI is forwarded to other PostgreSQL nodes and to the order-
as the transaction would be marked as read-only and would ing service using the communication middleware (the client
be executed on one node only. can also submit the transaction to all peers, rather than
SSI Block-Aware Abort During Commit & ww-
dependency. For the execute-order-in-parallel approach,
11
(a) order then execute (b) execute and order in parallel
100 100 queries in Appendix A. Note that while simple contract
2500 2500 can be implemented in most blockchain platforms today,
Throughput (tps)
Latency (sec)
Latency (sec)
10 10
both complex-join contract and complex-group contract
Throughput
2000 2000
1500 1 1500 1 are impossible to implement efficiently. We consider two

1000 1000 deployment models, the first where all organizations host
0.1 0.1
500 500 their database nodes on a common public cloud data center
0 0.01 5 0.01 (LAN), and second where nodes are hosted independently
1200 1500 1800 2100 1800 2100 2400 2700
Transaction Arrival Rate (tps) Transaction Arrival Rate (tps) leading to a multi/hybrid-cloud setup (WAN). In our exper-
Throughput (block size: 10) Latency (block size: 10)
Throughput (block size: 100) Latency (block size: 100) iments, the WAN setup involved four data centers spread
Throughput (block size: 500) Latency (block size: 500)
across four continents. These models help us study the
Figure 5: Performance with simple contract communication overheads of the two approaches. In both
deployments, database and orderer nodes were hosted on
Table 4: Order then execute: micro metrics for an ar- virtual machines each with 32 vCPUs of Intel Xeon E5-2683
rival rate of 2100 tps. v3 @ 2.00GHz and 64 GB of memory. In the multi-cloud
bs brr bpr bpt bet bct tet su
deployment, the network bandwidth between nodes was be-
10 209.7 163.5 6 5 1 0.2 98.1%
100 20.9 17.9 55.4 47 8.3 0.2 99.1% tween 50 and 60 Mbps, whereas, it is 5 Gbps in the single
500 4.2 3.5 285.4 245 44.3 0.4 99.7% cloud deployment.
In addition to throughput and latency, we also measure
one peer forwarding to all other peers). The backend up- the following seven micro metrics (all as averages) to gain a
dates TxMetadata, executes the transaction leveraging the deeper understanding of system behavior:
SSI variant based on block-aware abort during commit, and
collects the write-set into TxWriteSet. It then sets ready- 1. the block receive rate (brr)—the number of blocks re-
to-proceed status in TxMetadata and waits for a signal from ceived per second at the middleware from orderer.
the block processor.
2. the block processing rate (bpr)—the number of blocks
Upon receiving the block from the ordering service, the
processed and committed per second at the block pro-
middleware verifies the orderer signature and stores the block
cessor.
in pgBlockstore. The block processor retrieves each unpro-
cessed block from the pgBlockstore one at a time and adds 3. the block processing time (bpt in ms)—the time taken
all transactions to pgLedger. It confirms from TxMetadata to process and commit a block.
that all transactions have completed execution and then se-
rially signals each backend as per the order in the block to 4. the block execution time (bet in ms)—the time taken
proceed further. Upon receiving this signal, the backend val- to start all transactions in a block till they suspend for
idates the transaction based on the block-aware abort during commit/abort.
commit logic as explained in §3.4.3, sets either commit/abort
5. the transaction execution time (tet in ms)—the time
status in TxMetadata, and signals the block processor. Once
taken by the backend to execute a transaction until it
all transactions have been processed, the block processor up-
suspends for commit/abort.
dates the pgLedger table with the status for each transaction.
We are yet to implement the checkpoint flow described in 6. the block commit time (bct in ms)—the time taken
§3.4.4. to perform the serial commit of all transactions in a
block and can be measured as bpt − bet.
5. EVALUATION
7. the missing transactions (mt)—the number of transac-
In this section, we study the performance and cost of both
tions missing per second while processing blocks at the
our design approaches. We measure performance in terms of
block processor (relevant for execute-order-in-parallel ).
throughput and latency. Throughput is defined as the aver-
age number of unique transactions committed per second in We define two additional terms: (1) the peak throughput as
the blockchain network, and latency is defined as the aver- the maximum achieved throughput for a given smart con-
age time taken to commit a transaction, measured from the tract type and block size across all possible arrival rates;
time a client submits it. A transaction is said to be commit- (2) the system utilization (su) as the fraction of time the
ted in the blockchain network when majority of the nodes block processor is busy (bpb) expressed as a percentage. bpb
commit. The cost is measured in terms of resource utiliza- can be estimated as the average number of blocks processed
tion such as CPU, memory, disk and network bandwidth in 1s (bpr) multiplied by the time taken to process each
utilization. We study the effect on performance and cost by block (bpt). In an ideal scenario, when peak throughput is
varying several system parameters, namely (a) block size, achieved, our system utilization should be close to 100%.
(b) transaction arrival rate, (c) smart contract query com- Unless mentioned otherwise, our experiments use a sin-
plexity, (d) deployment model (local/wide area network), gle data center with three organizations, each running one
and (e) the number of database nodes (network size). database and one orderer node. At the orderer, the block
We use three smart contracts (1) simple contract—inserts timeout was set to 1s and the block size was varied. We used
values into a table; (2) complex-join contract—performs pgTune [7] to configure PostgreSQL and max_connection
complex joins between two tables to execute aggregate oper- (i.e., number of backends) was set to 2600. The arrival rate
ations and write the result to a third table; (3) complex-group was load balanced among the orderer nodes in the case of
contract—performs aggregate queries over subgroups in a order-then-execute and the database nodes in the case of
group and uses order by and limit to write the max aggre- execute-order-in-parallel.
gate across the subgroups into a table. We illustrate these
12
(a) order-then-execute (b) execute-and-order-in-parallel
1000 250 1000 250 block commit time (bct) was observed to be higher. This
could again be because of a large number of active backends.
Peak Throughput
Peak Throughput
800 200 800 200
Comparison With Ethereum’s Order then Exe-
Time (ms)
Time (ms)
600 150 600 150
cute. Blockchain platforms such as Ethereum also adopt an
400 100 400 100 order-then-execute approach, but execute transactions seri-
200 50 200 50 ally once the block is formed. To emulate this behavior, we
0 0 5
0 0 made our block processor also execute and commit transac-
10 50 100 10 50 100
Block Size (#tx) Block Size (#tx) tions one at a time. This resulted in a peak throughput of
Peak Throughput bet
bpt tet 800 tps (for a block size of 100, although the block size does
Figure 6: Performance with complex-join contract not matter when we execute serially). This is only about
40% of the throughput achieved with our approach, which
(a) order-then-execute (b) execute-and-order-in-parallel
supports parallel execution of transactions leveraging SSI.
250 250
1400 1400
Table 5: Execute and order in parallel: micro metrics
Peak Throughput
Peak Throughput
1200 200 1200 200

for an arrival rate of 2400 tps.
Time (ms)
Time (ms)
1000 1000
150 150 bs brr bpr bpt bet bct tet mt su
800 800
600 100 600 100 10 232.26 232.26 3.86 2.05 1.81 0.58 479 89%
400 400 100 24.00 24.00 35.26 18.57 16.69 3.08 519 84%
50 50
200 200 500 4.83 4.83 149.64 50.83 98.81 6.27 230 72%
0 0 5
0 0
10 50 100 10 50 100
Block Size (#tx) Block Size (#tx)
Peak Throughput bet 5.2 Smart Contract Complexity
bpt tet
Order then Execute. Figure 6(a) plots the peak through-
Figure 7: Performance with complex-group contract
put and micro metrics achieved when using complex-join
contract for order-then-execute. As observed earlier, with
an increase in block size, the throughput increased and reached
5.1 Block Size and Arrival Rate a peak throughput of 400 tps. This was less than 25% of
Order then Execute. Figure 5(a) plots the throughput what was observed with simple contract, primarily be-
and latency achieved in order-then-execute approach using cause of transaction execution time (tet) increasing by about
the simple contract. Table 4 presents the micro metrics 160× as compared to simple contract. The block pro-
for an arrival rate of 2100 transactions per second (tps). cessing time (bpt) and the block execution time (bet) also
With an increase in transaction arrival rate, the through- increased. The CPU and memory utilization for simple
put increased linearly as expected till it flattened out at contract was 10% and 15 GB respectively, compared to
around 1800 tps, which was the peak throughput (system 30% and 15GB for complex-join contract.
utilization close to 100% as shown in Table 4). When the Execute and Order in Parallel. Figure 6(b) plots the
arrival rate was close to or above the peak throughput, the same for execute-order-in-parallel. Both the block process-
latency increased significantly from an order of 100s of mil- ing time (bpt) and the block execution time (bet) were lower
liseconds to 10s of seconds. For an arrival rate lower than as compared the one observed in order-then-execute. This
the peak throughput, with an increase in the block size, the is because, by the time the block reaches the block proces-
latency increased. The reason is that with an increase in sor, all transactions were either executing or already com-
block size, it took longer for the orderer to fill a block with pleted execution. The peak throughput achieved was more
transactions leading to an increased block creation time, and than twice that of order-then-execute. Unlike order-then-
hence, a larger waiting time for each transaction at the or- execute approach, this approach permits concurrent execu-
derer. For an arrival rate greater than the peak throughput, tion of a larger number of transactions unrestricted by the
with an increase in block size, the latency decreased. This block size (in order-then-execute, the maximum number of
is because there was no waiting for transactions to arrive concurrently executing transactions is capped by the block
and form a block, and more transactions were executed in size). This manifested as a significantly larger increase to
parallel. This is also observable as the block processing time the transaction execution time tet compared to order-then-
(bpt) of a block of size n was always lesser than the sum of execute. For the same reason, the CPU utilization increased
n
bpt of m blocks each of size m . For the same reason, with to 100%, i.e., 3.3× higher than what was observed for order-
an increase in the block size, the throughput also increased. then-execute.
Execute and Order in Parallel. Figure 5(b) plots the In the third complex-group contract smart-contract, for
same for execute-and-order-in-parallel approach. Table 5 a block size of 100, the maximum throughput achieved for
presents the micro metrics for an arrival rate of 2400 tps. order-then-execute and execute-order-in-parallel was 1.75×
The maximum throughput achieved was 2700 tps, i.e., 1.5× and 1.6× higher respectively than what was observed with
higher than what was achieved with order-then-execute. Note the complex-join contract. We plot the graphs for the
that the system utilization (su) was only 90%. When the ar- same in Figure 7.
rival rate was greater than 2700 tps, the throughput started
to degrade and su never reached 100%. We believe the rea- 5.3 Deployment Model and Network Size
son to be a large number of active backends that resulted in Deployment Model. Figure 8(a) plots the peak through-
a lot of contention on default shared memory datastructure put achieved with both approaches in a multi-cloud deploy-
(we intend to study this further as part of the future work). ment. As compared to a single cloud deployment (refer
Though the block processing time (bpt) and the block exe- to Figure 6), only the latency increased on average by 100
cution time (bet) were observed to be lesser with execute- ms due to the WAN network but the throughput reminded
and-order-in-parallel as compared to order-then-execute, the the same for most part except a 4% reduction in the peak
13
(a) a multi-cloud deployment (b) impact of network size
1000 3500 Corda [30] refers to their platform as a decentralized
Orderer Throughput
1000
database. Transactions are executed by one node at a time
Peak Throughput
3000
800
100 2500 (not parallelly executed by all nodes) and the results are
Time (ms)
600 2000
10
shared with other nodes that have a need to know. There is
400 1500
1
1000
no notion of a blockchain in Corda. However, optionally, a
200 0.1
500
notary could order transactions and validate them for double
0
10 50 100
0.01 5
0 4 8 12 16 20 24 28 32 36
spending. Corda uses the H2 embedded SQL engine as the
Block Size (#tx) #Orderer Nodes state database. State objects can define a relational map-
Peak Throughput (OE) increase in latency (OE) Kafka Throughput
Peak Throughput (EO) increase in latency (EO) BFT Throughput ping, and an object serialization framework is used to store
Figure 8: Performance (a) withcomplex contractin
the states in a relational database. This permits querying
amulti-cloud deployment; (b) with different network
the database using SQL and also permits rich queries (such
sizes.
as joins) with an organization’s private non-blockchain data.
However, it does not enable rich query support within the
throughput when the block size was 100. In our system, smart contract itself.
each transaction was only 196 bytes in size, and hence the Veritas [27] proposes shared verifiable tables using a set
size of a block with 500 transactions was only about 100 of Redis key-value stores each owned by an organization.
KB. Hence, the lower bandwidth in the multi-cloud deploy- Only the verifiability property, i.e., immutable logs, is sup-
ment did not have a significant impact on the performance ported. For ensuring consistency across replicas, it uses a
that our system can work equally well in a geographically centralized trusted timestamp server to order transactions.
distributed environment. Further, a transaction is executed only on one of the nodes,
Network Size Figure 8(b) plots the throughput achieved and each node periodically ships logs of multiple read-write
with kafka and bft based ordering service while varying the sets to other nodes via a Kafka-based broadcast service.
number of orderer nodes and fixing the transaction arrival Nodes are trusted to execute and share results correctly with
rate to 3000 tps. With an increase in the number of orderer other nodes, and hence is not decentralized. Nodes vote on
nodes, we observed no impact on the performance of kafka- transactions to resolve any conflicts.
based ordering service but the performance achieved with BigchainDB [14] employs Tendermint consensus [33, 12]
bft-based ordering service reduced from 3000 tps to 650 tps over a set of independent MongoDB [6] instances, each owned
as expected due to the communication overhead. With an by a different organization. It supports immutability and
increase in the number of database nodes alone, the overall decentralization. While the overall goals of BigchainDB are
system throughput did not get affected but limited by the similar to ours, there are fundamental architectural differ-
peak throughput of the ordering service. ences. It supports only user tokens/assets similar to Bitcoin
and no support for smart contracts. A transaction submit-
6. RELATED WORK ted to BigchainDB is forwarded to Tendermint nodes, which
Bitcoin [39] and Ethereum [51] adopt an order-execute ensure that all the BigchainDB nodes agree on the next
model, where transactions are first ordered in a block through block in a Byzantine fault tolerant way. Transactions are
consensus (such as proof-of-work) and each node validates serially executed post ordering, similar to Ethereum.
and commits transactions to its copy of the blockchain by It is evident from the related work that ours is the first
serially executing each transaction. In our order-execute blockchain relational database with rich features of both
approach, we leverage SSI to execute transactions concur- blockchain and databases. Further, the execute-and-order-
rently. Further, such platforms only support a simple key- in-parallel approach proposed in this paper is highly novel
value data model. and we believe it is applicable to centralized replicated databases.
Hyperledger Fabric [17] adopts an execute-then-order
approach, where transactions are first executed and endorsed
by multiple nodes, then ordered by consensus, and then
followed by serial validation and commit. In contrast, in
7. DISCUSSION AND FUTURE WORK
our execute-order-in-parallel approach transaction execution In ongoing work, we are studying the performance of both
and ordering happen parallelly. Fabric only supports lev- our design approaches in the presence of rw/ww dependencies
eldb and couchdb as the underlying database, with sup- with two setups: (a) a homogeneous network so that all
port for composite keys and range queries. In contrast, nodes would be approximately at the same current-block;
we support the full spectrum of SQL queries in a smart (b) a heterogeneous network that could lead to a significant
contract (for the first time) with the exception of libraries gap in the current-block of different nodes over time.
that could introduce non-determinism and blind updates. As stated in §3.3.3 and §3.6, there is a potential to op-
Further, by leveraging capabilities already available within timize the current implementation by (a) executing the se-
a relational database we provide support for complex data rialization check using all transaction’s inConflictList and
types, schemas, constraints, and provenance queries. Perfor- outConflictList at once in a separate background process
mance studies on Fabric [17, 47] have shown that a through- instead of executing at the backend serially, and (b) com-
put of 3000 tps can be achieved with goLevelDB as the state mitting the whole block atomically rather than one transac-
database for a simple smart-contract, while the throughput tion at a time. Optimizations are also possible in our com-
drops to 700 tps with CouchDB. munication layer by leveraging gossip protocols (for larger
Hyperledger Composer [4] is a set of collaboration number of nodes) and batching together transaction repli-
tools for building blockchain business networks utilizing Hy- cation. We expect the performance to improve with these
perledger Fabric. It allows a user to write a basic SQL query optimizations.
with limited syntax [5] and internally converts the SQL Over time, the amount of historical data in the network
query to a JSON selector query [3] supported by CouchDB. would grow significantly. Hence, we need to enhance the
14
existing pruning tool such as vacuum [13] to remove rows G. Laventman, Y. Manevich, S. Muralidharan,
based on fields such as creator, deleter. As privacy is C. Murthy, B. Nguyen, M. Sethi, G. Singh, K. Smith,
an important consideration for blockchain solutions, there A. Sorniotti, C. Stathakopoulou, M. Vukolić, S. W.
is potential to enhance relational database features such as Cocco, and J. Yellick. Hyperledger fabric: A
user management to provide stronger privacy guarantees. distributed operating system for permissioned
Data privacy can be achieved using database schemas and blockchains. In Proceedings of the Thirteenth EuroSys
the concept of channels as in Hyperledger Fabric [17]. Conference, EuroSys ’18, pages 30:1–30:15, New York,
NY, USA, 2018. ACM.
8. CONCLUSION [18] H. Berenson, P. Bernstein, J. Gray, J. Melton,
In this paper, we presented the design of a blockchain E. O’Neil, and P. O’Neil. A critique of ansi sql
relational database, a decentralized database with replicas isolation levels. In Proceedings of the 1995 ACM
managed by different organizations that do not trust one SIGMOD International Conference on Management of
another. The key challenge we addressed is in ensuring that Data, SIGMOD ’95, pages 1–10, New York, NY, USA,
all untrusted replicas commit transactions in the same serial- 1995. ACM.
izable order that respects the block ordering determined by [19] P. A. Bernstein and N. Goodman. Multiversion
consensus. We proposed two design approaches that lever- concurrency control—theory and algorithms.
aged and modified SSI to achieve this, and devised a new ACM Trans. Database Syst., 8(4):465–483, Dec. 1983.
variant of SSI based on block height. Leveraging features [20] P. A. Bernstein, V. Hadzilacos, and N. Goodman.
already available in databases enables us to better support Concurrency Control and Recovery in Database
complex data types, schemas, complex queries and prove- Systems. Addison-Wesley Longman Publishing Co.,
nance queries that are not provided by blockchain platforms Inc., Boston, MA, USA, 1986.
today. We implemented the system on PostgreSQL and pre- [21] A. Bessani, J. Sousa, and E. E. P. Alchieri. State
sented detailed performance results. machine replication for the masses with bft-smart. In
2014 44th Annual IEEE/IFIP International
9. REFERENCES Conference on Dependable Systems and Networks,
[1] Apache kafka. https://kafka.apache.org. pages 355–362, June 2014.
[2] Apache zookeeper. http://zookeeper.apache.org. [22] E. Brewer. Cap twelve years later: How the "rules"
[3] Couchdb selector query. have changed. Computer, 45(2):23–29, Feb 2012.
https://docs.couchdb.org/en/2.2.0/api/database/find.html. [23] M. J. Cahill, U. Röhm, and A. D. Fekete. Serializable
[4] Hyperledger composer. isolation for snapshot databases. In Proceedings of the
https://www.hyperledger.org/projects/composer. 2008 ACM SIGMOD International Conference on
[5] Hyperledger composer query language. Management of Data, SIGMOD ’08, pages 729–738,
https://hyperledger.github.io/composer/v0.19/reference/query- New York, NY, USA, 2008. ACM.
language. [24] M. Castro and B. Liskov. Practical byzantine fault
[6] Mongodb. https://www.mongodb.com/. tolerance. In Proceedings of the Third Symposium on
[7] pgtune. https://github.com/gregs1104/pgtune. Operating Systems Design and Implementation, OSDI
[8] Postgresql background worker processes. ’99, pages 173–186, Berkeley, CA, USA, 1999.
https://www.postgresql.org/docs/10/bgworker.html. USENIX Association.
[9] Postgresql frontend/backend protocol. [25] E. Cecchet, G. Candea, and A. Ailamaki.
https://www.postgresql.org/docs/10/protocol.html. Middleware-based database replication: The gaps
[10] Postgresql libpq - c library. between theory and practice. In Proceedings of the
https://www.postgresql.org/docs/10/libpq.html. 2008 ACM SIGMOD International Conference on
[11] Postgresql v10. https://www.postgresql.org/. Management of Data, SIGMOD ’08, pages 739–752,
New York, NY, USA, 2008. ACM.
[12] Tendermint. https://tendermint.com/.
[26] A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and
[13] Vacuum. https://www.postgresql.org/docs/10/sql-
D. Shasha. Making snapshot isolation serializable.
vacuum.html.
ACM Trans. Database Syst., 30(2):492–528, June
[14] B. 2018. Bigchaindb: The blockchain database. 2005.
https://www.bigchaindb.com/whitepaper/bigchaindb-
[27] J. Gehrke, L. Allen, P. Antonopoulos, A. Arasu,
whitepaper.pdf.
J. Hammer, J. Hunter, R. Kaushik, D. Kossmann,
2018.
R. Ramamurthy, S. T. V. Setty, J. Szymaszek, A. van
[15] A. Adya, B. Liskov, and P. O’Neil. Generalized Renen, J. Lee, and R. Venkatesan. Veritas: Shared
isolation level definitions. In Proceedings of 16th verifiable databases and tables in the cloud. In CIDR
International Conference on Data Engineering (Cat. 2019, 9th Biennial Conference on Innovative Data
No.00CB37073), pages 67–78, Feb 2000. Systems Research, Asilomar, CA, USA, January
[16] P. A. Alsberg and J. D. Day. A principle for resilient 13-16, 2019, Online Proceedings, 2019.
sharing of distributed resources. In Proceedings of the [28] J. Gray, P. Helland, P. O’Neil, and D. Shasha. The
2Nd International Conference on Software dangers of replication and a solution. In Proceedings of
Engineering, ICSE ’76, pages 562–570, Los Alamitos, the 1996 ACM SIGMOD International Conference on
CA, USA, 1976. IEEE Computer Society Press. Management of Data, SIGMOD ’96, pages 173–182,
[17] E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, New York, NY, USA, 1996. ACM.
K. Christidis, A. De Caro, D. Enyeart, C. Ferris,
15
[29] J. Gray and L. Lamport. Consensus on transaction propagation for weakly consistent replication. In
commit. ACM Trans. Database Syst., 31(1):133–160, Proceedings of the Sixteenth ACM Symposium on
Mar. 2006. Operating Systems Principles, SOSP ’97, pages
[30] M. Hearn. Corda 2016. 288–301, New York, NY, USA, 1997. ACM.
https://www.corda.net/content/corda-technical- [43] D. R. K. Ports and K. Grittner. Serializable snapshot
whitepaper.pdf. isolation in postgresql. Proc. VLDB Endow.,
[31] F. P. Junqueira, B. C. Reed, and M. Serafini. Zab: 5(12):1850–1861, Aug. 2012.
High-performance broadcast for primary-backup [44] S. H. Son. Replicated data management in distributed
systems. In Proceedings of the 2011 IEEE/IFIP 41st database systems. SIGMOD Rec., 17(4):62–69, Nov.
International Conference on Dependable 1988.
Systems&Networks, DSN ’11, pages 245–256, [45] M. Stonebraker. Concurrency control and consistency
Washington, DC, USA, 2011. IEEE Computer Society. of multiple copies of data in distributed ingres. IEEE
[32] H. T. Kung and J. T. Robinson. On optimistic Transactions on Software Engineering,
methods for concurrency control. ACM Trans. SE-5(3):188–194, May 1979.
Database Syst., 6(2):213–226, June 1981. [46] D. B. Terry, M. M. Theimer, K. Petersen, A. J.
[33] J. Kwon. Tendermint: Consensus without mining. Demers, M. J. Spreitzer, and C. H. Hauser. Managing
2014. update conflicts in bayou, a weakly connected
[34] L. Lamport. The part-time parliament. ACM Trans. replicated storage system. In Proceedings of the
Comput. Syst., 16(2):133–169, May 1998. Fifteenth ACM Symposium on Operating Systems
[35] F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin. Principles, SOSP ’95, pages 172–182, New York, NY,
Dynamic authenticated index structures for USA, 1995. ACM.
outsourced databases. In Proceedings of the 2006 [47] P. Thakkar, S. Nathan, and B. Viswanathan.
ACM SIGMOD International Conference on Performance benchmarking and optimizing
Management of Data, SIGMOD ’06, pages 121–132, hyperledger fabric blockchain platform. In 2018 IEEE
New York, NY, USA, 2006. ACM. 26th International Symposium on Modeling, Analysis,
[36] F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin. and Simulation of Computer and Telecommunication
Authenticated index structures for aggregation Systems (MASCOTS), pages 264–276, Sep. 2018.
queries. ACM Trans. Inf. Syst. Secur., [48] R. H. Thomas. A majority consensus approach to
13(4):32:1–32:35, Dec. 2010. concurrency control for multiple copy databases. ACM
[37] S. Liu, P. Viotti, C. Cachin, V. Quéma, and Trans. Database Syst., 4(2):180–209, June 1979.
M. Vukolic. Xft: Practical fault tolerance beyond [49] B. Vandiver, H. Balakrishnan, B. Liskov, and
crashes. In Proceedings of the 12th USENIX S. Madden. Tolerating byzantine faults in transaction
Conference on Operating Systems Design and processing systems using commit barrier scheduling.
Implementation, OSDI’16, pages 485–500, Berkeley, In Proceedings of Twenty-first ACM SIGOPS
CA, USA, 2016. USENIX Association. Symposium on Operating Systems Principles, SOSP
[38] D. A. Menasce, G. J. Popek, and R. R. Muntz. A ’07, pages 59–72, New York, NY, USA, 2007. ACM.
locking protocol for resource coordination in [50] M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and
distributed databases. In Proceedings of the 1978 G. Alonso. Understanding replication in databases
ACM SIGMOD International Conference on and distributed systems. In Proceedings 20th IEEE
Management of Data, SIGMOD ’78, pages 2–2, New International Conference on Distributed Computing
York, NY, USA, 1978. ACM. Systems, pages 464–474, April 2000.
[39] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash [51] G. Wood. Ethereum: A secure decentralised
system,” http://bitcoin.org/bitcoin.pdf. generalised transaction ledger. Ethereum project
[40] D. Ongaro and J. Ousterhout. In search of an yellow paper, 2014.
understandable consensus algorithm. In Proceedings of [52] C. Xu, J. Xu, H. Hu, and M. H. Au. When query
the 2014 USENIX Conference on USENIX Annual authentication meets fine-grained access control: A
Technical Conference, USENIX ATC’14, pages zero-knowledge approach. In Proceedings of the 2018
305–320, Berkeley, CA, USA, 2014. USENIX International Conference on Management of Data,
Association. SIGMOD ’18, pages 147–162, New York, NY, USA,
[41] H. H. Pang and K. . Tan. Authenticating query results 2018. ACM.
in edge computing. In Proceedings. 20th International [53] Y. Zhang, J. Katz, and C. Papamanthou. Integridb:
Conference on Data Engineering, pages 560–571, April Verifiable sql for outsourced databases. In Proceedings
2004. of the 22Nd ACM SIGSAC Conference on Computer
[42] K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. and Communications Security, CCS ’15, pages
Theimer, and A. J. Demers. Flexible update 1480–1491, New York, NY, USA, 2015. ACM.
16
APPENDIX
A. SMART CONTRACTS USED FOR EVALUATION
Figure 9: Simple query with single inserts into table
Figure 10: Complex queries with joins and aggregates
Figure 11: Complex queries with group-by, order-by, limit and aggregates
17

Art05Blockchain Articulo Diseño e Implementacion

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Art05Blockchain Articulo Diseño e Implementacion

Hochgeladen von

Copyright:

Verfügbare Formate

Blockchain Meets Database: Design and Implementation

of a Blockchain Relational Database ∗

Senthil Nathan1 , Chander Govindarajan1 , Adarsh Saraf1 ,

ABSTRACT transactional processing capabilities already built into rela-

Figure 1: Proposed transaction flows—order-then-execute and execute-and-order-in-parallel

(a) (b) (c) • rw-dependency: if T1 writes a version of some object,

Figure 4: Transaction Flow in PostgreSQL for execute-order-in-parallel

1500 1 1500 1 are impossible to implement efficiently. We consider two

1200 200 1200 200

Figure 9: Simple query with single inserts into table

Figure 10: Complex queries with joins and aggregates

Das könnte Ihnen auch gefallen