You are on page 1of 2

visual analytics framework to deal with the unique cyber security problems in big data



The management of large data warehouses has traditionally been expensive, and their
deployment usually requires strong business cases. Security practitioners are beginning to see
the need to conduct behaviour profiling to counter security attacks. Whether the attacks are
carried out by malware or by humans, they often exhibit behaviours that deviate from statistical
norms. If the security practitioners can model the norms, statistical outliers could point to
potential attacks worthy of further investigation. This is called behaviour-based anomaly
detection, which complements conventional signature-based detection, for commoditizing the
deployment of large-scale, reliable clusters and therefore are enabling new opportunities to
process and analyze data. The other features of framework are improving the information
available to security analysts by correlating, consolidating, and contextualizing even more
diverse data sources for longer periods of time. The proposed framework aims at providing
techniques that make humans capable of analyzing real time data streams by presenting results
in a meaningful and intuitive way while allowing interacting with the data. Tools such Hadoop
will be used for storing and process the collected.

Big Data analytics can be leveraged to improve information security and situational awareness.
For example, Big Data analytics used for Enterprise Events Analytics, can be employed to
analyze financial transactions, log files, and network traffic to identify anomalies and
suspicious activities, and to correlate multiple sources of information into a coherent view. Big
Data tools have the potential to provide a significant advance in actionable security intelligence
by reducing the time for correlating, consolidating, and contextualizing diverse security event
information, and also for correlating long-term historical data for forensic purposes. New Big
Data technologies, such as databases related to the Hadoop ecosystem and stream processing,
are enabling the storage and analysis of large heterogeneous data sets at an unprecedented scale
and speed in heterogeneous, incomplete, and noisy formats efficiently.

we can resume the activities for five analytic approaches, summarization, dimensional
reduction, clustering and aggregation, anomaly detection, and the last is scoring. The question
for PhD are three big topics, about data (feature engineering, infer data structure, auto parse,
and taxonommy the data). next is algorithms, make sure them work well on categorical data
and IP address, sparse and long tail data. How develop auto-adopt mechanism and choose
which the relevant algorithm. upwards, for build pattern detection and classification and how
to detect anomaly activity.