Services

ClickHouse: open source column-oriented database management system developed by
Yandex (it currently powers Yandex.Metrica, the worlds second-largest web

analytics platform).
Parallel processing for a single query (utilizing multiple cores).Image title
Distributed processing on multiple servers.
Very fast scans (see benchmarks below) that can be used for real-time queries.
Column storage is great for working with wide / denormalized tables (many
columns).
Good compressionGood set of functions, including support for approximated

calculations.
Different storage engines (disk storage format)Great for structural log/event data
as well as time series data (engine MergeTree requires date field).
Index support (primary key only, not all storage engines).
Nice command line interface with user-friendly progress bar and formatting.
https://dzone.com/articles/a-look-at-clickhouse-a-new-open-source-columnar-database
StatsD: A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends aggregates to
one or more pluggable backend services (e.g., Graphite).
We (Etsy) blogged about how it works and why we created it.
Inspiration
StatsD was inspired (heavily) by the project (of the same name) at Flickr. Here's a
post where Cal Henderson described it in depth: Counting and timing Cal re-released
the code recently: Perl StatsD
Key Concepts
buckets Each stat is in its own "bucket". They are not predefined anywhere. Buckets
can be named anything that will translate to Graphite (periods make folders, etc)
values Each stat will have a value. How it is interpreted depends on modifiers. In
general values should be integer.
flush After the flush interval timeout (defined by config.flushInterval, default 10

seconds), stats are aggregated and sent to an upstream backend service.
Hekad:
Heka is an open source stream processing software system developed by Mozilla. Heka
is a Swiss Army Knife type tool for data processing, useful for a wide variety of
different tasks, such as:
Loading and parsing log files from a file system.

Accepting statsd type metrics data for aggregation and forwarding to upstream time
series data stores such as graphite or InfluxDB.
Launching external processes to gather operational data from the local system.
Performing real time analysis, graphing, and anomaly detection on any data flowing
through the Heka pipeline.
Shipping data from one location to another via the use of an external transport
(such as AMQP) or directly (via TCP).
Delivering processed data to one or more persistent data stores.
Heka is a heavily plugin based system. Common operations such as adding data to
Heka, processing it, and writing it out are implemented as plugins. Heka ships with
numerous plugins for performing common tasks.
There are six different types of Heka plugins:
Inputs
Input plugins acquire data from the outside world and inject it into the Heka
pipeline. They can do this by reading files from a file system, actively making
network connections to acquire data from remote servers, listening on a network
socket for external actors to push data in, launching processes on the local system
to gather arbitrary data, or any other mechanism.
Input plugins must be written in Go.
Splitters
Splitter plugins receive the data that is being acquired by an input plugin and
slice it up into individual records. They must be written in Go.
Decoders
Decoder plugins convert data that comes in through the Input plugins to Hekas
internal Message data structure. Typically decoders are responsible for any
parsing, deserializing, or extracting of structure from unstructured data that
needs to happen.
Decoder plugins can be written entirely in Go, or the core logic can be written in
sandboxed Lua code.
Filters
Filter plugins are Hekas processing engines. They are configured to receive
messages matching certain specific characteristics (using Hekas Message Matcher
Syntax) and are able to perform arbitrary monitoring, aggregation, and/or
processing of the data. Filters are also able to generate new messages that can be
reinjected into the Heka pipeline, such as summary messages containing aggregate
data, notification messages in cases where suspicious anomalies are detected, or
circular buffer data messages that will show up as real time graphs in Hekas
dashboard.
Filters can be written entirely in Go, or the core logic can be written in
sandboxed Lua code. It is also possible to configure Heka to allow Lua filters to
be dynamically injected into a running Heka instance without needing to reconfigure
or restart the Heka process, nor even to have shell access to the server on which
Heka is running.
Encoders
Encoder plugins are the inverse of Decoders. They generate arbitrary byte streams
using data extracted from Heka Message structs. Encoders are embedded within Output
plugins; Encoders handle the serialization, Outputs handle the details of
interacting with the outside world.
Encoder plugins can be written entirely in Go, or the core logic can be written in
sandboxed Lua code.
Outputs
Output plugins send data that has been serialized by an Encoder to some external
destination. They handle all of the details of interacting with the network,
filesystem, or any other outside resource. They are, like Filters, configured using
Hekas Message Matcher Syntax so they will only receive and deliver messages
matching certain characteristics.
Output plugins must be written in Go.
Information about developing plugins in Go can be found in the Extending Heka

section. Details about using Lua sandboxes for Decoder, Filter, and Encoder plugins
can be found in the Sandbox section.
hekad
The core of the Heka system is the hekad daemon. A single hekad process can be
configured with any number of plugins, simultaneously performing a variety of data
gathering, processing, and shipping tasks. Details on how to configure a hekad
daemon are in the Configuring hekad section.
Telegraf:
Telegraf is an agent written in Go for collecting metrics and writing them into
InfluxDB or other possible outputs. This guide will get you up and running with
Telegraf. It walks you through the download, installation, and configuration
processes, and it shows how to use Telegraf to get data into InfluxDB.
https://docs.influxdata.com/telegraf/v1.3/introduction/getting_started/
Telegraf is an agent written in Go for collecting, processing, aggregating, and

writing metrics.
Design goals are to have a minimal memory footprint with a plugin system so that
developers in the community can easily add support for collecting metrics from well
known services (like Hadoop, Postgres, or Redis) and third party APIs (like
Mailchimp, AWS CloudWatch, or Google Analytics).
Telegraf is plugin-driven and has the concept of 4 distinct plugins:
Input Plugins collect metrics from the system, services, or 3rd party APIs
Processor Plugins transform, decorate, and/or filter metrics
Aggregator Plugins create aggregate metrics (e.g. mean, min, max, quantiles, etc.)
Output Plugins write metrics to various destinations
https://github.com/influxdata/telegraf
ElasticSearch: Basic Elasticsearch Concepts
In this document, we'll cover the basics of what you need to know about
Elasticsearch in order to use it.
Indexing
Elasticsearch is able to achieve fast search responses because, instead of

searching the text directly, it searches an index instead.
This is like retrieving pages in a book related to a keyword by scanning the index
at the back of a book, as opposed to searching every word of every page of the
book.
This type of index is called an inverted index, because it inverts a page-centric

data structure (page->words) to a keyword-centric data structure (word->pages).
Elasticsearch uses Apache Lucene to create and manage this inverted index.
Apache Kafka is a distributed streaming platform. What exactly does that mean?
We think of a streaming platform as having three key capabilities:
It lets you publish and subscribe to streams of records. In this respect it is

similar to a message queue or enterprise messaging system.
It lets you store streams of records in a fault-tolerant way.
It lets you process streams of records as they occur.
What is Kafka good for?
It gets used for two broad classes of application:
Building real-time streaming data pipelines that reliably get data between systems
or applications
Building real-time streaming applications that transform or react to the streams of
data
To understand how Kafka does these things, let's dive in and explore Kafka's
capabilities from the bottom up.
First a few concepts:
Kafka is run as a cluster on one or more servers.

The Kafka cluster stores streams of records in categories called topics.
Each record consists of a key, a value, and a timestamp.
Apache Kafka is a distributed streaming platform. What exactly does that mean?
We think of a streaming platform as having three key capabilities:
It lets you publish and subscribe to streams of records. In this respect it is

similar to a message queue or enterprise messaging system.
It lets you store streams of records in a fault-tolerant way.
It lets you process streams of records as they occur.
What is Kafka good for?
It gets used for two broad classes of application:
Building real-time streaming data pipelines that reliably get data between systems
or applications
Building real-time streaming applications that transform or react to the streams of
data
To understand how Kafka does these things, let's dive in and explore Kafka's
capabilities from the bottom up.
First a few concepts:
Kafka is run as a cluster on one or more servers.

The Kafka cluster stores streams of records in categories called topics.
Each record consists of a key, a value, and a timestamp.
https://www.3pillarglobal.com/insights/advantages-of-elastic-search
Aerospike:
Aerospike is a distributed, scalable NoSQL database. The architecture has three key
objectives:
Create a flexible, scalable platform for web-scale applications.

Provide the robustness and reliability (as in ACID) expected from traditional
databases.
Provide operational efficiency with minimal manual involvement.
http://www.aerospike.com/docs/architecture/index.html
Kibana
Grafana
InfluxDB

Services

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Services

Hochgeladen von

Copyright:

Verfügbare Formate

ClickHouse: open source column-oriented database management system developed by

Yandex (it currently powers Yandex.Metrica, the worlds second-largest web

Parallel processing for a single query (utilizing multiple cores).Image title

Distributed processing on multiple servers.

Good compressionGood set of functions, including support for approximated

Index support (primary key only, not all storage engines).

We (Etsy) blogged about how it works and why we created it.

flush After the flush interval timeout (defined by config.flushInterval, default 10

Loading and parsing log files from a file system.

There are six different types of Heka plugins:

Input plugins must be written in Go.

Output plugins must be written in Go.

Information about developing plugins in Go can be found in the Extending Heka

Telegraf is an agent written in Go for collecting, processing, aggregating, and

Telegraf is plugin-driven and has the concept of 4 distinct plugins:

ElasticSearch: Basic Elasticsearch Concepts

Elasticsearch is able to achieve fast search responses because, instead of

This type of index is called an inverted index, because it inverts a page-centric

We think of a streaming platform as having three key capabilities:

It lets you publish and subscribe to streams of records. In this respect it is

It gets used for two broad classes of application:

First a few concepts:

Kafka is run as a cluster on one or more servers.

We think of a streaming platform as having three key capabilities:

It lets you publish and subscribe to streams of records. In this respect it is

It gets used for two broad classes of application:

First a few concepts:

Kafka is run as a cluster on one or more servers.

Create a flexible, scalable platform for web-scale applications.

Das könnte Ihnen auch gefallen