Sie sind auf Seite 1von 6

Tracking Attacks Data Through Log Files Using

MapReduce

Yassine Azizi ✉ , Mostafa Azizi, and Mohamed Elboukhari


( )

Lab. MATSI, ESTO, University Mohammed 1st, Oujda, Morocco


azizi.yass@gmail.com, {azizi.mos,m.elboukhari}@ump.ac.ma

Abstract. In this paper, we propose a methodology of security analysis that aims


to apply Big Data techniques, such as MapReduce, over several system log files
in order to locate and extract data probably related to attacks. These data will lead,
through a process of analysis, to identify attacks or detect intrusions. We have
illustrated this approach through a concrete case study on exploiting access log
files of web apache servers to detect SQLI and DDOS attacks. The obtained results
are promising; we are able to extract malicious indicators and events that char‐
acterize the intrusions, which help us to make an accurate diagnosis of the system
security.

Keywords: Big Data · Security · Attacks · Log files · MapReduce · SQL injection
DDOS

1 Introduction

The world has experienced a data revolution in all digital domains due to the exponential
use of connected tools and objects. According to statistics developed by IBM [1], we
generate 2.5 trillion bytes of data each day, these data come from different sources,
namely social networks, climate information, GPS signals, sensors and log files. The
log files are a very important source of information; they retrace all the events that occur
during the activity of the system. These are often of great volume and come from every‐
where, operating systems, application servers, data servers …
In this paper, we present our approach to system security analysis which aims to
track data related to DDOS and SQL injection attacks and analyze them in order to
extract knowledge that helps us to improve the security rules. The proposed method is
mainly based on log files analysis. The log files have a vital interest in computer security
because they present an overview of all what has happened on of the whole system in
order, for example, to explain an error, to understand how a system detects and attacks
anomalies
This paper is organized as follows: Sect. 2 presents main related works that really
have used log files for extracting useful information. Then, Sect. 3 illustrates our meth‐
odology that we use to deal with log files for extracting data on eventual attacks. Before
concluding, in the last section, we show a case study on Apache web servers.

© Springer Nature Switzerland AG 2019


Á. Rocha and M. Serrhini (Eds.): EMENA-ISTL 2018, SIST 111, pp. 331–336, 2019.
https://doi.org/10.1007/978-3-030-03577-8_36
332 Y. Azizi et al.

2 Related Works

In the literature, several research studies consider log files as a very useful data
source in several areas. Authors in [2, 3] exploit log files in the field of e-commerce
to predict the behavior of their customers and improve the income of their business.
In [4], the work was devoted to in-depth analysis, log file data from NASA’s website
to identify very important information about a web server, the behaviors of users, the
main mistakes, potential visitors of the site, all this in order to help the system
administrator and web designer to improve the system by questioning. In [5] they
used the log files the routers for error diagnosis and troubleshooting in home
networks because the information contained in the log file helps to clarify the causes
of network problems, such as misconfigurations or hardware failures. In [6], the
researchers propose a diagnostic approach in a cloud computing architecture; this
approach is based on exploiting log files of different systems of that architecture for
finding the wrong uses and detecting anomalies which will improve system security,
and in [7] they propose a multi-stage log analysis architecture, which use logs gener‐
ated by the application during attacks to effectively detect attacks and to help
preventing future attacks.

3 Proposed Methodology

We are interested in the exploitation of techniques of Big Data in the security analysis
of systems and networks. In this sense, we have proposed a methodology that consists
of four stages (Fig. 1):

Fig. 1. Proposed approach


Tracking Attacks Data Through Log Files Using MapReduce 333

1. Data collection
2. Data processing
3. Data storage
4. Data analysis

4 Case Study

Nowadays, there are over than 3.81 billion users connected to the Internet and more than
a billion websites; 60% of these websites are hosted on the Apache web servers. The
Web server provides different mechanisms for logging anything that may occur in the
server, from the initial request to the URL mapping process to the connection, including
any errors that may happen during processing.
In our case study, we are working on access log files from web servers apache, To
apply the proposed methodology, we started by defining and determining the usable data
in the “access log” file of the web server. Through a java program, we retrieve the
indicators of each event and we save them in a database, then we use an ETL “Pentaho
Data integration” to transform the collected data to a standard XML format.
These preprocessing and data formatting steps have ensured the transition from a
state of unstructured data to well-structured consolidated data, which facilitates subse‐
quent analysis and exploitation.
Then, we analyze log files of Web servers looking forwards to trace some attacks
like SQL injection (SQLi) and distributed denial of service (DDOS). The approach
targets to analyze correlate several events recorded in Access Log files over time and to
release useful security information.
We store all generated log files in a common platform to make the analysis of
these files more efficient. Then we use MapReduce to perform parallel and distrib‐
uted processing. Our implementation of MapReduce runs on a large access log files
stored in HDFS. The inputs and outputs of our Map-Reduce job are in the form of

Fig. 2. MapReduce processing


334 Y. Azizi et al.

peers {(K, V)}, the entry of each Map is a set of words {w} of a partition of log file
records. Map function will calculate the number of times a key w appears in the
partition; The Reduce function calculates the total number of occurrences of a key
indicator (Fig. 2).

SQL Injection
SQL injection is an attack that exploits a security vulnerability of an application
interacting with a database, this happens when inserting an SQL query not planned by
the system [8], it consists of injecting SQL code that will be interpreted by the base
engine of data. This attack involves entering specific characters in a variable that will
be used in SQL query. These characters will cause the original query to be deviated from
its purpose in order to open roads to malicious users [9]. They could, for example,
authenticate themselves without knowing the password, create a new administrator user
whose password they will know, destroy a table, screw up the data, and so on.
Three injection mechanisms can execute malicious SQL code on the databases of a
Web application: injection into user inputs, injection into the cookies, and injection into
server variables, which consists of injecting values into the http header. The mechanism
of this attack is to inject special characters, which will make the original request, deviated
from its purpose (Table 1).

Table 1. Some indicators of SQL Injection


Indicators Signification
(\’)|(\%27) The single quote and its URL encoded version
(\-\-)|(%20--%20) The double-dash, comment on a line
(;)|(%20;%20) Semicolon, request delimiter
(%20UNION%20), (%20SELECT%20%), Structured Query Language keywords
(%20OR%20), (20%INSERT%20)

Here, for detecting the SQL injection attack at log files, we parse access log file line
by line and we look for SQL keywords or specious characters in order to identify the
deviations in the behavior of the monitored events and to clear the IP addresses that
make the SQL injection attempts. It is impossible to carry out an attack without injecting
dangerous characters into the input parameters since this is the only way to be able to
change the structure or the syntax tree an SQL query at run time. We obtain as a result
the IP addresses that launched the malicious requests, the number of attempt and the
index of the attack. After running our MapReduce program that contains the sets of SQL
injection tracking and detection instructions, we get the result of this analysis in a file
named part-r-00000, and we can clearly deduce the malicious users who attempts to
attack the system in question with the number of attempts and the detection indicator in
order to take the necessary countermeasures (Fig. 3).
Tracking Attacks Data Through Log Files Using MapReduce 335

Fig. 3. The result of the SQLI attack detection approach

DDOS Attack
Distributed Denial of Service (DDoS) is a malicious attempt to disrupt the normal
traffic of a targeted server, service or network by saturating the target or its surrounding
infrastructure with a flood of Internet traffic [10]. DDOS attacks owe their effectiveness
by using multiple compromised computer systems as sources of attack traffic, specifi‐
cally, it is for hackers to send a large number of requests on a device (host, server, web
application, etc …) in order to saturate and cause a total interruption of service.
In this work, we are interested in the DDOS attack detection that aims to exhaust the
processing capabilities of a target. For example, an attacker can try to reach the limit of the
number of concurrent connections that a web server can process. In this case, the attacker
constantly sends a large number of HTTP GET or POST requests to the targeted server. A
single HTTP request is not expensive to execute on the client side, but can be expensive for
the target server to respond to it, it must often load multiple files and execute database
queries to create a web page.
Our approach is to scan the access log file to detect users or machines that attempt to
send massive queries in a very short time interval for particular resources in the hope of
making the service unavailable. For this, we have developed a MapReduce program that
releases the number of requests sent by users in a time interval of 5 s (Fig. 4).

Fig. 4. The result of the DDOS attack detection approach

The analysis of the log files allowed us to extract some indicators which characterize
the attacks like SQLI and DDOS in order to anticipate this threat, and to take a certain
number of technical and organizational measures to protect system security. These results
also represent some limits, on one hand, the difficulty to confirm whether it is a potential
336 Y. Azizi et al.

attack or not which can generate false alarms. On the other hand, the challenge to determine
in advances all the dangerous characters and behaviors which evolve rapidly.

5 Conclusion

In this paper, we present a methodology that aims to exploit the log files in the domain
of computer security, looking forwards to improve anomaly detection and increase the
level of security. This methodology is made of four stages: Data collection, Data
processing, Data storage, and Data analysis. As a case study, we have collected and
saved the events of a web server regarding SQL Injection and DDOS attacks, and then
organized these data in a common structure. For data analysis and extraction of knowl‐
edge, we used a parallel and distributed approach based on MapReduce. The obtained
results are encouraging but is still there some limits about accurate detection.

References

1. Miranda, M.: S. Big Brother au Big Data. In: Conférence de Big Data, Université Sophia
Antipolis (2015)
2. Savitha, K., Vijaya, M.S.: Mining of web server logs in a distributed cluster using Big Data
technologies. IJACSA 5(3), 137–142 (2014)
3. Salama, S.E., Marie, M.I., El-Fangary, L.M., Helmy, Y.K.: Web server logs preprocessing
for web intrusion detection. Comput. Inf. Sci. 4(4), 123–133 (2011)
4. Saravanan, S., Uma Maheswari, B.: Analyzing large web log files in a Hadoop distributed
cluster environment. Int. J. Comput. Technol. Appl. (IJCTA) 5(5), 1677–1681 (2014)
5. Müller, A., Miinz, G., Carle, G.: Collecting router information for error diagnosis and
troubleshooting in home networks. In: IEEE 36th Conference on Local Computer Networks
(LCN), pp. 764–769. IEEE, October 2011
6. Amar, M.M., Lemoudden, M., El Ouahidi, B.: Log file’s centralization to improve cloud
security. In: International Conference on Cloud Computing Technologies and Applications,
CloudTech 2016, pp. 178–183 (2016)
7. Moh, M., et al.: Detecting web attacks using multi-stage log analysis. In: IEEE 6th
International Conference on Advanced Computing (IACC). IEEE (2016)
8. Halfond, W.G., Viegas, J., Orso, A.: A classification of SQL-injection attacks and
countermeasures. In: Proceedings of the IEEE International Symposium on Secure Software
Engineering, vol. 1, pp. 13–15. IEEE, March 2006
9. Alwan, Z.S., Younis, M.F.: Detection and prevention of SQL injection attack: a survey (2017)
10. Balakrishnan, H.P., Moses, J.C.: A survey on defense mechanism against DDOS attacks. Int.
J. 4(3) (2014)

Das könnte Ihnen auch gefallen