Sie sind auf Seite 1von 7

Asynchronous Elastic Search Log Appender

By: Anand, Abhidnya, Humza

Motivation:
The act of manually going through plain log files, grepping all over the
place, severely limits the value you can extract from them.
Thus, we felt the need to have a Log analytics tool. Which can help with
quick searches, and also draw valuable conclusions from the same.

Available options: for log management tools.


ElasticSearch vs Splunk
When comparing the two through Google Trends we see that both are
rising in popularity, with interest in solutions based on Elastic Search
quickly gaining momentum and gradually passing Splunk.
Considering other log management tools in the interest graph, doesnt
even come close.

Criticism against the two:


1.
Splunk is expensive;
2.
while elasticSearch Solution is free and open source, is time
consuming and includes additional hardware costs that grow
exponentially
We were not expecting :
1.
use cases to grow fast over time,
2.
include complex scenarios with ever changing specs,
3.
serving multiple users and departments inside the company
Thus, mostly all were searching for is advanced grepping capabilities
(which have a lot of value on their own) and easy visualizations, then a
full Splunk deployment is probably an overkill.

Analysis of the Log message:


A sample Log message generated by Log4j in its LoggingEvent:
Sat Sep 03 09:41:37 2016 GMT AESWidgetsUI 238957E7002595217428E8CD4@dev-dsk-anasinha-1c-i-5efa9583.us-west2.amazon.com:0 [INFO] <AYTUALVFGD3WG@10.50.228.245> (http-bio-8663- 3
exec-2)
com.amazon.aes.widgets.commands.onboarding.CalculateIndividualOnboardingSch
emaCommand: Documents needs to be collected for onboarding person [amzn.aop.person.70f4e45c-3 2c29-4db3-b754-e95ecd5d4e71],
RID=7E7002595217428E8CD4, SESSION=845-3348347-4301820,
MARKETPLACE=ATVPDKIKX0DER

This includes timestamp, SessionID, RequestID, the logging message,


marketplace, hostID.

Conclusions from Analysis of the Log message:


The MDC(Mapped Diagnostic Context) values stored in LoggingEvent as
per use cases, are important. like, HostID, RequestID, sessionID, thread
details, etc.
2

Implementation details on Elastic Search:


Document level:

A single Document in ElasticSearch is representative of a Single


LoggingEvent
Also, from Elastic Search perspective, a decision was made, to keep the
MDC values as separate fields in each document, in addition to the
message field(containing the whole message). This would bring more data
type accurate indexing abilities to the MDC values. As for the String
message, ElasticSearch already makes a hash for each word in the
document. Thus its keywords will be readily available for fast searches as
well.

Index Level:
ElasticSearch Query capabilities can be vastly optimized by proper
organization of Documents in Indexes.
A few possible solutions would be:
1. Keep all logs in 1 index (Cons: Very large Search space)
2. Organize logs on the basis of HostIDs generating them (Con: Still a
very large search space, Pro: if Host to be examined is previously
known, searches can be more target specific)
3

3. Organize logs on the basis of Date/Day generated ( Pro1: Search


space reasonable, given the expected number of hosts and number of
logs generated per day per host remains reasonable. Pro2: With
time, irrelevant logs with respect to time can be done away with, by
simply deleting the Index. Pro3: Given the time frame an issue was
observed in, search space reduces to logs within the particular days)
4. Both 2 & 3 (Con: Duplicated data)

Conclusion:
As per the given use cases, HostID related Queries are rare. Hence, logs
maintained with respect to time(Option 3) was selected.

Other ElasticSearch related decisions made:


1.
An index stays in ES for 2 months, after which it is deleted
2.
Using the time to live field in the properties for the Index, to auto
delete the Index after the stipulated time. ( Con: Later if we want to
change the time to live, each Index has to be addressed individually, an
alternative would be a Elastic Search Curator( AWSLamda) that runs like
a Cron job after a fixed interval of time, deleting the irrelevant Indexes. If
a change is needed, only the Curator needs to be modified )
3.
Currently, each day the 1st logging event will result in creation of
the index, subsequent calls would only verify the existence of an index.
( Con: two http calls made , alternative: Make a curator to do the creation
every fixed time interval( like day/week/month) )
To use ElasticSearch we are using Jest ( Jest is a Java HTTP Rest client
for ElasticSearch).

Design For the LogAppender:


There is no centralized logger, rather the appender runs on each host, but
making http calls to the same ElasticSearch host.
Main requirements:
1. Asynchronous logging, to give back control to the main logic
execution as soon as possible.
2. Making lesser http calls to elasticSearch.
Asynchronous logging, necessitates use of threads, and to reduce thread
creation/Deletion time, a thread Pool is used. Too many Threads can
hamper CPU performance, thus a balance needs to be observed when
testing is done to determine the appropriate number of
threads(logistically).

The Diagram shows flow for a single host. This applies to all hosts running the Appender.

Using only one extra thread, can defeat the idea of Asynchronous logging,
when the number of logging requests generated parallel increase, and the
main thread would need to wait for the single thread, which would be
making http calls to finish sending the previous lot of logs before it can
take up the new queue of logs. Thus a thread pool adds more parallelism
and later for scale, only the number of threads can be adjusted to fit the
need.

To make lesser http calls, a buffer is used to aggregate enough logging


events before the Sending it to another thread from the thread pool to send
the bulk of logging events to ElasticSearch. Because Elastic Search
supports bulk on-boarding of data, a single http call can take care of this.