Beruflich Dokumente
Kultur Dokumente
Very fast scans (see benchmarks below) that can be used for real-time queries.
Column storage is great for working with wide / denormalized tables (many
columns).
Different storage engines (disk storage format)Great for structural log/event data
as well as time series data (engine MergeTree requires date field).
Nice command line interface with user-friendly progress bar and formatting.
https://dzone.com/articles/a-look-at-clickhouse-a-new-open-source-columnar-database
StatsD: A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends aggregates to
one or more pluggable backend services (e.g., Graphite).
Inspiration
StatsD was inspired (heavily) by the project (of the same name) at Flickr. Here's a
post where Cal Henderson described it in depth: Counting and timing Cal re-released
the code recently: Perl StatsD
Key Concepts
buckets Each stat is in its own "bucket". They are not predefined anywhere. Buckets
can be named anything that will translate to Graphite (periods make folders, etc)
values Each stat will have a value. How it is interpreted depends on modifiers. In
general values should be integer.
Hekad:
Heka is an open source stream processing software system developed by Mozilla. Heka
is a Swiss Army Knife type tool for data processing, useful for a wide variety of
different tasks, such as:
Heka is a heavily plugin based system. Common operations such as adding data to
Heka, processing it, and writing it out are implemented as plugins. Heka ships with
numerous plugins for performing common tasks.
Inputs
Input plugins acquire data from the outside world and inject it into the Heka
pipeline. They can do this by reading files from a file system, actively making
network connections to acquire data from remote servers, listening on a network
socket for external actors to push data in, launching processes on the local system
to gather arbitrary data, or any other mechanism.
Splitters
Splitter plugins receive the data that is being acquired by an input plugin and
slice it up into individual records. They must be written in Go.
Decoders
Decoder plugins convert data that comes in through the Input plugins to Hekas
internal Message data structure. Typically decoders are responsible for any
parsing, deserializing, or extracting of structure from unstructured data that
needs to happen.
Decoder plugins can be written entirely in Go, or the core logic can be written in
sandboxed Lua code.
Filters
Filter plugins are Hekas processing engines. They are configured to receive
messages matching certain specific characteristics (using Hekas Message Matcher
Syntax) and are able to perform arbitrary monitoring, aggregation, and/or
processing of the data. Filters are also able to generate new messages that can be
reinjected into the Heka pipeline, such as summary messages containing aggregate
data, notification messages in cases where suspicious anomalies are detected, or
circular buffer data messages that will show up as real time graphs in Hekas
dashboard.
Filters can be written entirely in Go, or the core logic can be written in
sandboxed Lua code. It is also possible to configure Heka to allow Lua filters to
be dynamically injected into a running Heka instance without needing to reconfigure
or restart the Heka process, nor even to have shell access to the server on which
Heka is running.
Encoders
Encoder plugins are the inverse of Decoders. They generate arbitrary byte streams
using data extracted from Heka Message structs. Encoders are embedded within Output
plugins; Encoders handle the serialization, Outputs handle the details of
interacting with the outside world.
Encoder plugins can be written entirely in Go, or the core logic can be written in
sandboxed Lua code.
Outputs
Output plugins send data that has been serialized by an Encoder to some external
destination. They handle all of the details of interacting with the network,
filesystem, or any other outside resource. They are, like Filters, configured using
Hekas Message Matcher Syntax so they will only receive and deliver messages
matching certain characteristics.
hekad
The core of the Heka system is the hekad daemon. A single hekad process can be
configured with any number of plugins, simultaneously performing a variety of data
gathering, processing, and shipping tasks. Details on how to configure a hekad
daemon are in the Configuring hekad section.
Telegraf:
Telegraf is an agent written in Go for collecting metrics and writing them into
InfluxDB or other possible outputs. This guide will get you up and running with
Telegraf. It walks you through the download, installation, and configuration
processes, and it shows how to use Telegraf to get data into InfluxDB.
https://docs.influxdata.com/telegraf/v1.3/introduction/getting_started/
Design goals are to have a minimal memory footprint with a plugin system so that
developers in the community can easily add support for collecting metrics from well
known services (like Hadoop, Postgres, or Redis) and third party APIs (like
Mailchimp, AWS CloudWatch, or Google Analytics).
Input Plugins collect metrics from the system, services, or 3rd party APIs
Processor Plugins transform, decorate, and/or filter metrics
Aggregator Plugins create aggregate metrics (e.g. mean, min, max, quantiles, etc.)
Output Plugins write metrics to various destinations
https://github.com/influxdata/telegraf
In this document, we'll cover the basics of what you need to know about
Elasticsearch in order to use it.
Indexing
This is like retrieving pages in a book related to a keyword by scanning the index
at the back of a book, as opposed to searching every word of every page of the
book.
Apache Kafka is a distributed streaming platform. What exactly does that mean?
Building real-time streaming data pipelines that reliably get data between systems
or applications
Building real-time streaming applications that transform or react to the streams of
data
To understand how Kafka does these things, let's dive in and explore Kafka's
capabilities from the bottom up.
Apache Kafka is a distributed streaming platform. What exactly does that mean?
Building real-time streaming data pipelines that reliably get data between systems
or applications
Building real-time streaming applications that transform or react to the streams of
data
To understand how Kafka does these things, let's dive in and explore Kafka's
capabilities from the bottom up.
Aerospike:
Aerospike is a distributed, scalable NoSQL database. The architecture has three key
objectives:
Kibana
Grafana
InfluxDB