Sie sind auf Seite 1von 17

A glance of Pulsar and druid

Pulsar is an open source project of eBay and it includes two parts, pulsar pipeline and pulsar reporting. Pulsar
pipeline is a streaming framework which will distribute more than 8 billion events every day and pulsar reporting is in
response of storing, querying and visualizing these data. Druid is part of pulsar reporting.
This paper will have an introduction and a little deep dive of druid and show you the role it is playing at pulsar
reporting.

Druid components introduction


Druid is an open source project which is an analytics data store designed for business intelligence (Online analytical
processing) queries on event data.
Druid Skills (From official website):
1. Sub-Second Queries.
Support multidimensional filtering, aggression and is able to target the very data to do query.
2. Real time Ingestion
Support streaming data ingestion and offers insight son events immediately after they occur
3. Scalable
Able to deal with trillions of events for total, millions events for each second
4. Highly Available
SaaS (Software as a service), need to be up all the time and Scale up and down will not lose data
5. Designed for Analytics
Supports a lot of filters, aggregators and query types, is able to plugging in new functionality.
Supports approximate algorithms for cardinality estimation, and histogram and quantile calculations.

Glance at Druid Structure of Pulsar reporting:

Receive about 10 Billion events per day and the peak traffic is about 200k/s.
Each machine at our cluster is with 128GB memory and for each historical nodes, disk is more than 6 TB.
Druid ata glance:

Brief introduction to all nodes:

Real-time
Real-time node index the coming data and these indexed data are able to query immediately. Real-time nodes will
build up data to segments and after a period of time the segment will handover to historical node.

An example of real-time segment: 2015-11-18T06:00:00.000Z_2015-11-18T07:00:00.000Z, which will be stored at


the folder of the scheme you defined. All segments are stored like the above format.
Here is the segment information at My SQL:
Id |dataSource | created_date | start | end | partitioned | version | used |payload
pulsar_event_2014-09-15T05:00:00.000-07:00_2014-09-15T06:00:00.000-07:00_2014-09-15T05:00:00.000-07:00_1|
15T09:37:30.231-07:00

2014-09-15T05:00:00.000-07:00|

2014-09-15T06:00:00.000-07:00

pulsar_event

2014-09-

2014-09-15T05:00:00.000-07:00

{"dataSource":"pulsar_event","interval":"2014-09-15T05:00:00.000-07:00/2014-09-15T06:00:00.000-07:00","version":"2014-09-15T05:00:00.00007:00","loadSpec":{"type":"hdfs","path":"hdfs://xxxx/20140915T050000.000-0700_20140915T060000.000-0700/2014-09-15T05_00_00.00007_00/1/index.zip"},"dimensions":"browserfamily,browserversion,city,continent,country,deviceclass,devicefamily,eventtype,guid,js_ev_type,linespeed,osf
amily,osversion,page,region,sessionid,site,tenant,timestamp,uid","metrics":"count","shardSpec":
{"type":"linear","partitionNum":1},"binaryVersion":9,"size":60096778,"identifier":"pulsar_event_2014-09-15T05:00:00.000-07:00_2014-0915T06:00:00.000-07:00_2014-09-15T05:00:00.000-07:00_1"}

For real-time config:


A Druid ingestion spec consists of 3 components:
{
"dataSchema": {...} #specify the incoming data scheme
"ioConfig": {...}

#specify the data come and go

"tuningConfig": {...} #some detail parameters


}
Realtime nodes do not need too much space as the segment will be deleted after handover successfully.

Historical Node
Historical nodes load up historical segments and expose them for querying.
Historical nodes keep constant connection to zk, and they do not connect to each other directly. So zk environment
is very important and if zk environment get into problems, historical cluster will lose nodes.
When querying historical nodes, it will check the local disk (cache) for the information of the segment, otherwise it
will turn to ZK to get the information of the segment, which includes the where the segment is stored and how to
decompress and process. After this, it will announce to zk that the segment is associated with this node.

The config of historical is not complex, and you may have a reference
here: http://druid.io/docs/0.8.1/configuration/historical.html

Glance at the resource of historical node of our QA environment (40 machines and we set cache expiration time to
30 days):

Filesystem
/ Dev / sda1
tmpfs
/ Dev / sda3
/ Dev / sdb
/dev/sdc
/dev/sdd
/ Dev / shower
/ Dev / sdf

Size
49G
127G
494G
1.1T
1.1T
1.1T
1.1T
1.5T

Used
15G
0
199M
167G
167G
167G
167G
168G

Avail
31G
127G
469G
878G
878G
878G
878G
1.2T

Use%
33%
0%
1%
16%
16%
16%
16%
13%

Mounted
/
/dev/shm
/data
/data01
/data02
/data03
/data04
/data10

Coordinator Node
Coordinator nodes are responsible to assign new segments to historical nodes, which is done by creating an
ephemeral ZKentry. In general, coordinator node is responsible for loading new, dropping old, manage replica and
balance segment.
Coordinator runs periodically and it will assesses the cluster before taking action, which is based on rules defined by
user.
Rules set at coordinator:

CleanJobs

Lose historical nodes


The missing node will not be forgotten immediately and there will be a transitional data structure to represent a
period of time. In this period, if we can start the historical again, the segments will not be reassigned.
Balance Segment
Determine the historical node with the utilization. Move segments from the highest to the lowest if the percentage
exceeds the threshold.

Broker Node

Used to query real-time and historical and merge the result.


Query process:

Cache
LRU is the strategy and broker is able to store the result of per segment.
Query can be break to cache level and do merge.
Mem cached is supported.
Real-time nodes are not cached as it isalways changing.

Indexing service
Used to create (or destroy) Druid Segments. Usually it is used to re-index historical data.
This part we did not used that much and we had done some tests on this component, which showed that this part is
not that impeccable.

An index job pic

The overlord node is responsible for accepting tasks, coordinating task distribution, creating locks around tasks, and
returning statuses to callers
Middle manager node is a worker node that executes submitted tasks
Peons run a single task in a single JVM. Middle Manager is responsible for creating Peons for running tasks.

Deep Dive into Druid the way that makes druid fast
General Storage strategy
Pre-aggregation/roll up

And the table ismade up by three things:

Partition Data

Immutable blocks of data called segments which is a fundamental storage unit in druid and no contention between
reads and writes

Column based storage


Druid scans/loads only what you need:

Column compression dictionaries


Create ids
Xiaomizhang -> 0, Ken -> 1
Store
people -> [0 0 0 1 1 1]
team -> [0 0 0 0 0 0]

Bitmap indices

Xiaomizhang ->[0, 1, 2] -> [111000]


Ken -> [3, 4, 5] -> [000111]
Fast and flexible queries

The summary of the structure of a segment:


1.

Dictionary that encodes column values

{
"xiaomizhang": 0,
"Ken":

}
2.

Column data

[0, 0, 1, 1]
3.

Bitmaps - one for each unique value of the column

value="xiaomizhang": [1,1,0,0]
value="Ken":

[0,0,1,1]

Segment data Components


1.
2.
3.

Version.bin
4 bytes representing the current segment version as an integer
meta.smoosh
A file with metadata (filenames and offsets) about the contents of the other smoosh files
xxxx.smoosh
There are some number of these files, which are concatenated binary data
The smoosh files house individual files for each of the columns in the data

All above you may know from the website or the QCon by druid team.
Then following is:

Druid bug - Lessons paid for with blood

1.

Try to restart coordinator and historical at the same time.

Once we did this, and finally, we found that the historical nodes are living dead. The thread of historical nodes are
exits while from the log, they are not doing anything because they do not receive any tasks from coordinator.
From the basic knowledge of druid, we know that the components of druid are not isolate, they dependent on each
other and cannot start druid components as a random order.

2.

Try to consume slc and phx kafka at the same real-time instance.

Kafka at slc and phx cluster is the same, the same data format, the same topic. Just for HA, we have to use two
kafka clusters.
We had struggled at this for more than one month. We just knew something was wrong, but we did not know what
that is. We were lucky enough to have the druid contributor debug for us and he told us it is a druid bug.
3.

Try to use uppercase name

Recently we found that some real-time nodes did hand over segments but they are not deleted. So the real-time
nodes space was getting smaller and smaller, finally we found a lot of segments were failed.
From log we could not get anything that may help us to debug. Just ken casually said that except the name was
uppercase, I did not find anything is weird, which reminded me that I had saw the druid Google group posts and
somebody told it is a druid bug.
4.

Try to use -9

Maybe it is a lower version of druid and in recent version of druid we did not find this error. Once we found that
druid is unable to start again and from the log, the only information we get is that druid is unable to read the data as
the format is not right.
Stop druid forcibly will get unpredictable problem. But from our experience, if stop the thread for more than 10
minutes, the thread is still there, it is able to stop the thread forcibly. My guessing is that the local disk IO is
completed; maybe some network connection is not stopped.

Druid monitor and daily issues

Pipeline traffic monitor:

Reporting traffic monitor

Compare with metric calculator

Druid hourly monitor:


A job running hourly to query the segments which are registered at My SQL:

With so many tools, we found problems such as:

Disk Full

1.

Zookeeper disk full, which is always caused by log. As time goes by, the log of zookeeper becomes larger and

larger. Need to pay attention to both


2.

Druid disk full. This happens more frequent than 1. Real-time nodes handover fail and the segments are

stored at the very real-time node and usually each segment of druid is quite large, it will be several MB for each data
scheme and for each hour. If real-time handover fail for a long period of time, the real-time node will be out of space
and fail to consume
3.

Druid disk full by druid temp files. This happens after we setup druid for more than half a year. Pay attention to

this parameter a druid start up file: -Djava.io.tmpdir should give a relatively large volume at this path.

ZK issue
1.

ZK machine down. This happens quite frequent and most time we need to take other team's effort to make it

recover. Bind cnames again and use fabric script to config the machines.
2.

ZK machine OOM. We found this issue just last week and we have run druid more than half a year. Usually we

set druid to delete segments which are kept at druid for one or two months. One scheme is there all time long,
because we forgot to clean the outdated data If we keep all the segments, the large traffic of our pipeline will crash
our druid cluster

Druid issue
1.

Handover fails. Hand over need the cooperation of real time nodes, historical nodes, coordinator nodes and

HDFS cluster. Coordinator and HDFS are less likely to get into problems and if we find hand over fail, we should
check if all historical machines are healthy, last time we found historical nodes dead at a large scale and handover
fail.
2.

GC strategy. We tuning the parameters ofG1GC for some times and finally we choose not to set the

parameters of G1. G1will automatically doing jobs for us such as the volume of each generation change, if we setup
a solid number, G1 will not be that efficiency.
3.

HDFS path. It is OK to change the HDFS file path. But once it is changed, please shutdown the cluster and re-

config all things. And more, segments are recorded by My SQL. All segments information is recorded including their
dimensions types, schemes and HDSF path. Please change the HDFS path manually.
4. Cache volume. Both real-time and historical nodes have this setting. Segments assigned to a Historical node are
first stored on the local file system (in a disk cache) and then served by the Historical node. These locations define
where that local cache resides. Please set them properly.

Druid installation notes:


1.

Hadoop path should create or change

2.

MySQL for segment setup

3.

Real-time and Historical path should change to the large space

4.

Make the user have the right to do things

5.

Kafka group should be a new one

6.

Install java if needed

7.

If prod machines unable to visit internet, copy all dependencies to the maven repoof prod machines

Links to refer:
Druid official website: http://druid.io/
Druid Google group: https://groups.google.com/forum/#!forum/druid-user

Source: http://blog.csdn.net/ebay/article/details/50205611, 08/08/2016

Das könnte Ihnen auch gefallen