CrateDB The DBMS For IoT.v2

CrateDB: The SQL DB for Machine Data
Designing a Real-time SQL DBMS for the Things Data Era
August 2017
IoT is a New Database Workload
With the rise of IoT, we are entering the era of things data. In it, IoT applications process data
generated by millions of sensors. Data is analyzed in real time to monitor and control the
connected vehicles we drive, the machinery we operate, and smart-cities we inhabit.
Gartner Research suggests that IoT will pose new data volume, query complexity, and
integration challenges. And the TPC, the independent standards-setter for DBMS benchmarks,
is defining a new mixed workload benchmark for IoT.
At Crate.io, we engineered CrateDB to process IoT data. By building a distributed SQL engine
on a NoSQL storage and clustering foundation, weve made it easy and economical for
mainstream developers to meet data requirements like these:
Ingest millions of data points per second - sensor or GPS readings, network messages,
logs...
Query data in real-time
Handle a wide variety of data structures
Execute complex queries such as time series, geospatial, text search, and machine
learning
Process data at the edge and in the cloud
CrateDB is a unique combination of SQL, NoSQL, and Container technology
2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 1

CrateDB - Machine Data Customer Use Cases
Over 75% of CrateDB customers use it to manage machine-generated data in systems such as:
Industrial IoT
Connected cities and buildings
Vehicle fleet tracking & management
Network & IT security monitoring
Here are some examples of typical CrateDB customer projects...
Alpla, a $4B global manufacturer of packaging products, uses CrateDB to process

data from thousands of different sensors on each of its 1000+ production lines. The
data provides real-time insights that enable operators to optimize machinery
efficiency and reduce defects, downtime, and raw material waste.
Zumtobel, a $2B global producer of intelligent lighting systems uses CrateDB to

monitor and control smart-lighting in large retail chains and buildings. They migrated
from MySQL to CrateDB to enable better scaling and performance in a system that
provides real-time monitoring of system status, lighting & sensor outages, and energy
consumption.
Clickdrive.io collects GPS and system data from fleet vehicles in order to
provide real-time location tracking and to inform dispatchers that repairs are
needed. As a result, Clickdrive has helped its customers reduce breakdowns
and accidents, and lower fleet maintenance costs by 20%
Skyhigh Networks analyzes cloud network traffic in real time for nearly half of
the Fortune to help keep them safe from cyber-security threats. They manage
over 80TB of data, and replacing MySQL & Elasticsearch with CrateDB
reduced their database hosting costs by 75%.

Designing a DBMS for the IoT Era - the CrateDB Architecture
CrateDB was started in 2014 to make database development and scaling simple. It was one of
the first databases to combine the familiarity of SQL with the scalability and data flexibility of
NoSQL. These are the design choices we made to build a database for the IoT era:
Architecture: Distributed, shared-nothing, container-native

CrateDB operates in a shared-nothing architecture as
a cluster of identically configured servers (nodes) The
nodes coordinate seamlessly with each other, and the
execution of write and query operations are
automatically distributed across the nodes in the
cluster.
Increasing or decreasing database capacity is a

simple matter of adding or removing nodes. We
worked hard on the simple part by automating the
sharding, replication (for fault tolerance), and
rebalancing of data as the cluster changes size.
CrateDB was born in the container era and allows you

to scale and administer it easily via container
orchestration platforms like Docker or Kubernetes in a
microservices environment.

Access: SQL via Postgres wire protocol, JDBC, ODBC, Rest...
We chose SQL as the data access language to make CrateDB easy for mainstream developers
to adopt. Everyone knows SQL; its powerful, and it makes integration easy. CrateDB is
compatible with most SQL tools, interfacing via the PostgreSQL wire protocol, JDBC, ODBC,
and a REST interface.
CrateDB is compatible with much of the ANSI SQL 92 standard. It supports joins, aggregations,
indexes, BLOBs, sub-queries, user-defined functions, and so on. We juiced our SQL up with
some nice things commonly found with NoSQL, like full-text search, geospatial queries, and
nested JSON object columns.
Open machine data stack

Another benefit of SQL is ease of integration. With CrateDB, you are free to choose your own
machine data stack rather than being locked into ETL and visualization and reporting tools
written for specific NoSQL engines like Splunk, Elasticsearch, or InfluxDB. CrateDB can be
accessed via SQL from most new and legacy ETL, BI and Reporting, programming frameworks,
and so on.
Other machine data interfaces

CrateDB supports other access interfaces that are common in IoT and machine data:
MQTT - CrateDB (Enterprise Edition) embeds an MQTT broker, which enables it to
subscribe to and receive MQTT messages, parse them, and store them in a table. This
simplifies application architectures by eliminating the need for message queueing
middleware.
Telegraf interface - CrateDB is a Telegraf target, which makes it easy to rout time
series data from various Telegraf-supported source systems into CrateDB.
Prometheus remote reader/writer - Enables the Prometheus time series database to
pass data and queries through to CrateDB for processing larger volumes of data or
performing more complex analyses. It makes it easy to scale up software systems (such
as Docker) that support Prometheus as an endpoint for time series metrics.
Grafana - a Grafana plugin makes it easy to visualize and interact with time series data
from CrateDB.
Apache Kafka, Spark, Node-Red, StreamSets, et al - CrateDB fits well into the IoT
ecosystem, with customers using tools like Kafka and Spark and many others to build
scalable, fault-tolerant systems. Contact Crate.io if you have questions about specific
integrations.

Storage & Indexing: NoSQL-style
CrateDB was one of the first databases to combine the familiarity of SQL with the scalability and
data flexibility of NoSQL. This was accomplished by building a distributed SQL engine on a
foundation of our own and other open source NoSQL technologies instead of using traditional
relational DBMS techniques.
CrateDB uses bits of the following open source projects to form its physical foundation:
Lucene - storage and indexing, including text search and geospatial
Elasticsearch- masterless clustering and transaction logging
Netty - asynchronous, event-driven, full-mesh networking between nodes
CrateDB is packaged into a single binary, which is simple to install and start.
Access to scaling and replication features is simple, via SQL. CREATE TABLE supports
additional storage and table parameters for sharding, replication and routing of the data. In the
example below, a table to hold sensor readings is partitioned by week; queries will execute on
relevant partitions only, which speeds up performance. And partitions can be dropped, which
makes data deletion or archival of aging data easier.
The example also creates shards, which contain subsets of the table data and are distributed
across the cluster. A rule of thumb is to have as many shards as there are CPUs in the cluster;
CrateDB will parallelize query execution across all of the shards for maximum throughput.
Replicas create redundant copies of the data, which are also distributed across the cluster for
high availability and query throughput.
CREATE TABLE IF NOT EXISTS t1 (

"ts" TIMESTAMP,
"tenant_id" INTEGER,
"sensor_id" STRING,
"v1" INTEGER,
"v3" FLOAT,
"v5" BOOLEAN,
"week_generated" TIMESTAMP GENERATED ALWAYS AS date_trunc('week', ts)
) with (number_of_replicas = 2)
PARTITIONED BY ("week_generated")
CLUSTERED BY ("tenant_id") INTO 3 SHARDS;
CrateDB distributes shards and replicas intelligently and automatically. This helps avoid
performance bottlenecks and ensures that the database will continue to operate reliably, even if
node hardware failures occur.

Schema: Dynamic
Another benefit of the CrateDB SQL-NoSQL architecture is schema flexibility. Traditional
relational schemas are rigid and changing them is a pain. As you saw before, tables are defined
using the CREATE TABLE statement. If an INSERT statement includes a column that wasnt
defined in the table, CrateDB can be configured to either:
a) Enforce the original schema by rejecting the INSERT and throwing an error
b) Dynamically update the schema by adding the new column found in the INSERT
statement.
Internally, each relational record in CrateDB is actually stored as a JSON document, and those
can change structure on the fly. This gives CrateDB the flexibility to handle evolving data
structures.
For example: a global packaging manufacturer collects data from 900 different types of sensors
on each of its production lines. In SQL Server, they stored that data in 900 different tables, one
per sensor type. After moving to CrateDB, they stored all the readings in just one table. Much
simpler. And queries executed 40 times faster.
Writing: High Velocity INSERTs

IoT systems ingest streams of machine-generated data. We decided on an eventually
consistent, non-blocking, data insertion model. This allows CrateDB to insert tens of thousands
of data points per second per node, while querying the data at the same time.
The CrateDB distributed architecture provides

linearly scalable INSERT performance. As the
customer benchmark here shows, CrateDB
provided superior ingestion versus other
distributed databases as the customer increased
the number of threads concurrently connecting to
CrateDB.
Data durability and consistency are also

important, and we took steps to address those
with as little impact on performance as possible.
To ensure data durability, we implemented
write-ahead logging. For consistency, CrateDB
includes record versioning, optimistic
concurrency control, and a table-level refresh
frequency setting, which forces CrateDB data to
become consistent on a periodic basis (every n milliseconds).

Querying: Real-time via in-memory columnar indexing
Real-time databases usually require all data to fit in main memory, but that limits how much data
you can manage. Our solution for real-time performance without data volume limitations was to
implement memory-resident columnar field caches on each node. The caches tell the query
engine whether there are rows on that node that meet the query criteria and where the rows are
located; this is all performed at in-memory speed.
Distributed query processing also contributes to fast performance, and a query planner that
makes smart decisions about which nodes are best-suited to finalize processing of aggregations
and joins.
Benchmarks show the CrateDB query architecture to be linearly scalable:
Provide up to 33x better price-performance than traditional SQL databases, when executing
complex time series and text search queries:

And provide 10x higher time series query throughput under load than specialized time series
databases like InfluxDB:
Platform: Java, run at the edge or in the cloud

IoT data processing is often distributed, from cloud data centers to remote sites and even onto
devices. DBMS portability makes cloud and edge architectures easier to implement, so we
wrote CrateDB in Java. Thus, CrateDB can run anywhere, on JVMs in the data center or
remotely if internet network latency overhead is intolerable or if data needs to be aggregated
before being pipelined to a central cloud instance for wider-scale processing.

How does CrateDB Compare?
The CrateDB architecture combines the familiarity of SQL with the scalability and data flexibility
of NoSQL. CrateDB is oriented towards analytic workloads--mixed, with heavy querying and
ingestion.
Experiences might differ based on use case, but heres how CrateDB generally compares to
other database categories for IoT workloads:
Traditional
NoSQL CrateDB
SQL
Fire hose of data No Yes Yes
Query versatility &

real-time No No Yes
performance
Data versatility No Yes Yes
SQL access Yes No Yes
Simple scalability No Yes Yes
Next Steps...
CrateDB is freely available under the Apache 2 open source license, or with a commercial
license for the CrateDB Enterprise Edition.
You can download CrateDB and find other resources at c rate.io.

CrateDB The DBMS For IoT.v2

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CrateDB The DBMS For IoT.v2

Hochgeladen von

Copyright:

Verfügbare Formate

CrateDB: The SQL DB for Machine Data

Designing a Real-time SQL DBMS for the Things Data Era

CrateDB is a unique combination of SQL, NoSQL, and Container technology

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 1

Here are some examples of typical CrateDB customer projects...

Alpla, a $4B global manufacturer of packaging products, uses CrateDB to process

Zumtobel, a $2B global producer of intelligent lighting systems uses CrateDB to

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 2

Architecture: Distributed, shared-nothing, container-native

Increasing or decreasing database capacity is a

CrateDB was born in the container era and allows you

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 3

Open machine data stack

Other machine data interfaces

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 4

CREATE TABLE IF NOT EXISTS t1 (

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 5

Writing: High Velocity INSERTs

The CrateDB distributed architecture provides

Data durability and consistency are also

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 6

Benchmarks show the CrateDB query architecture to be linearly scalable:

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 7

Platform: Java, run at the edge or in the cloud

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 8

Query versatility &

Data versatility No Yes Yes

SQL access Yes No Yes

Simple scalability No Yes Yes

You can download CrateDB and find other resources at c rate.io.

2017 Crate.io, Inc. CrateDB for IoT - Technical Overview 9

Das könnte Ihnen auch gefallen