Global Terrorism Predictive-Analysis

Global Terrorism Predictive—Analysis
Sandeep Chaurasia, Vinayak Warikoo and Shanu Khan
Abstract In the recent view of increasing number and lethality of terrorist attacks,
it has become important for us to recognize a strategic vision to help prepare and
prevent such events from happening. This paper includes descriptive and predic-
tive analyses of Global Terrorism Database which reveal vital information about the
trends of such events and help identify the perpetrators of any such future terrorist
activities. The descriptive phase covers elucidation of the dataset to identify useful
features for forecasting and predictive phase involves data manipulation and com-
pares the performance of various multi-class classification and regression algorithms
like decision trees, random forest, etc., on the dataset. Python with Scikit-learn library
was used for the experimentation purpose.
Keywords Global terrorism database · Descriptive analysis · Predictive analysis ·

Classification and regression
1 Introduction
In the recent scenario there is an increase in the number of terrorist activity and
lethality of terrorist attacks, it has become important for us as citizens to recognize
a strategic vision, prepare, and if possible prevent such events. Each step taken into
this direction enables us to tackle such adverse situations better and bring justice.
Golabal Terrorism Database [GTD] is an open—source database maintened by US
Department of Homeland Security which includes all statistical information about
the terrorist events around the world from 1970 to 2017. This database can be used
to detect terrorist group and analyze frequency of the attacks [1, 2]. The purpose is to
develop a supervised predictive model [3] around this database where it can be trained
S. Chaurasia (B) · V. Warikoo

Manipal University Jaipur, Jaipur, India
e-mail: chaurasia.sandeep@gmail.com
S. Khan
Indian Institute of Technology Roorkee, Roorkee, India
© Springer Nature Singapore Pte Ltd. 2019 77

S. K. Bhatia et al. (eds.), Advances in Computer Communication
and Computational Sciences, Advances in Intelligent Systems and Computing 924,
https://doi.org/10.1007/978-981-13-6861-5_7
78 S. Chaurasia et al.
to make predictions about future terrorist activities and provide vital information for
counter-measures. The interest in predictive analytics for counter-terrorism agenda
can be traced back to the reaction to the tragic events of 9/11 attacks in 2001 [4]. The
attack itself was a concrete evidence of the fact that unlikely and unfortunate events
with disastrous effect on society could happen and that pre-existing social structure
system is unpredictable and complex.
Many data analytic organizations began mining and profiling the data but the
sheer vastness of the uncategorized data has posed a challenge and most recently,
a dependable source of database was released by the US Department of Homeland
Security in partnership with University of Maryland, Global Terrorism Database [5].
2 Literature Review
2.1 Survey
The literature on models of predictive analysis used to identify or make predictions

on the terrorism group is divided between technically practical literature and theo-
retically ideological literature.
The technical literature consists of massive number of reports directed toward
intelligent service practice. This literature is pivotal to understand the possibilities
extended by pattern-based data-mining and classification [6]. The theoretical litera-
ture shares similar trust in the technical possibilities but many practitioners discover
that it is not possible to predict terrorism established on outliner algorithms as the
noise is great for the recorded database and there are too many feature variables to
classify or form any pattern.
This fact is just seen as another social difficulty that can be addressed technically
by the new avenues presented by big data. It is widely known that prevention and
prediction are now possible. Now that there are new methods of predictive analytics,
society’s prevailing profiling of terrorist groups and elimination of terrorist attacks
can be more targeted, just and effective.
2.2 Previous Shortcomings
Earlier works in this field have led to development of models with moderate range
of predictive accuracy [2] which means none of them are qualified to be deployed as
a successful model for forecasting any group responsible for such events.
This can be traced back to methods adopted to develop such models which fail to
address the use of outliner algorithms [7], as all types of descriptive data is potentially
relevant, resulting in a huge quantity of variables making it improbable for the model
to converge on one ideal fit model.
Global Terrorism Predictive—Analysis 79
2.3 Distinctive Methodology
Running big stock of unconventional data can be time consuming and counterpro-
ductive for the desired result to be accomplished. The data need to be categorized in
nature and superflous or erronous data need to be removed. Proper visualization of
data is required to find out the key factors which can help train the model better and
a distinctive categorization of the data is required to better train multiple models to
help achieve high accuracy.
2.4 Result Significance
A predictive accuracy of anything above eighty-five percent would be a huge step

of success in the direction of eliminating terrorism and deploying a real-time model
which can be fed required metadata to forecast future terrorist activities.
3 Methodology
Main aim of this research is to infer from the Global Terrorism Database and obtain
vital demographics of the data provided to develop a predict model which can learn
and utilize the dataset fed to make forecast on identifying future terrorist groups and
casualties when provided real-time data of the recent event.
To streamline and maintain the order of the work, the task at hand is divided into
three phases.
3.1 Experimental Setup
The Global Terrorism Database tool database kit was acquired from their official
Web site after verification which includes the dataset and a codebook guide for
understanding the dataset.
Primarily Python 2.7 was used as a programming language for the development
of models alongside several open-source libraries. Pandas library was used to import
dataset into a data structure for the editing and modification of the dataset. Scikit-
learn library was used for the machine learning aspect of the model development.
Plotly, an open-source client-based framework was used to obtain all the desired
graphs and plots.
3.2 Descriptive Analysis
This gives a clearer picture of what kind of database it is and summarizes it in a

meaningful way. In this particular study, we have a large amount of data to simplify
and understand. Each descriptive statistics reduces lot of data into a simple summary
and breaks down hard to understand trends or important features [8]. Several graph
plots were obtained identifying the most active terrorist groups, most incidents by a
group, and methods of active terrorist groups.
3.3 Data Manipulation and Selection
Data manipulation and selection include all steps of data transformation, formatting,
and structuring. Based on the criteria of selection of important features, tasks such
as updating, adding/removing, sorting, selection, merging, shifting, and aggregation.
of data were undertaken.
A total of one thirty-six features were reduced to nineteen features providing
vital information about the terrorist event keeping the prediction model performance
in mind. Doubtful attacks were removed and unknown or missing data were filled
with median data of the same group to help better categorize the terrorist activity.
Categorical data like attack type, weapon type, and target type were one hot encoded
to be made suitable for the predictive classification and regression algorithms.
The data were divided into 12 regions geographically to maintain continuity and
obtain specialized models. Top five terrorist groups were identified from every region
and rest were labeled in “Others” category.
3.4 Predictive Analysis
The first step is to design a predictive model using supervised learning algorithms
based on classifiers and regression. Then the dataset is divided at random into training
and test set for the predictive model. The training set is utilized for the machine
learning algorithms to train themselves for the features provided to identify the
corresponding labels. The test set is used to score the accuracy of the trained model
to forecast the label in regard to the feature information provided.
The predictive analysis is broken into constituents’ models based on different
machine learning predictive algorithms.
3.4.1 K-Nearest Neighbors
It is a simple machine learning algorithm which stores all normalized cases and
classifies new cases based on measure of similarity [8]. The similarity is measured
on basis of distance function (Euclidean) and number of neighbors specified. KNN
is based on learning by analogy and known for simplicity and applicability sup-
porting multiple data structures. A drawback of KNN classifiers is assigning equal
weight to all the attributes which may cause irregular results when there are many
irrelevant attributes in the data [9]. KNN also incurs expensive computational cost
when a number of potential neighbors are great, therefore require efficient indexing
technique.
3.4.2 Logistic Regression
It is a type of classification algorithm involving a linear discriminant, and it does

not try to predict the value of a numerical variable given set of input data [10]. It
instead outputs the probability that the given input point belongs to certain label. It is
a machine learning algorithm used for predicting binary output. For our multi-label
(multiple group name) data, N number of distinct logistic regression model are built
to provide probabilities for the N number of groups [11].
3.4.3 Decision Trees
These are a non-parametric supervised learning structure used for classification. The
objective is to develop a model that predicts the value of target variable by learning
basic decision trends or rules derived from the data features. A decision tree is
a decision support tool that uses a tree-like graph or model of decision and their
possible consequences [10]. It assigns a class label to unseen record and explains
why the decision was made in a classification rule [8]. Decision trees have some
weakness when it comes to small number of instances for large variety of different
classes as it brings out higher error rate in the classification.
3.4.4 Random Forest
It is an ensemble-based classifier which uses several decision tree models to predict

the result. It is an estimator that fits a number of decision tree classifiers on various
sub-samples of the dataset and uses averaging to improve the predictive accuracy [8]
and control over-fitting.
3.4.5 XGBoost
It is an extreme gradient boosting approach to build new models which account for
predicting residuals or errors of older models and then added together to make final
prediction [12]. It is so named because it uses gradient descent algorithm to minimize
the loss while adding new models.
4 Result
On the given dataset, five different classifiers were applied to obtain the results.
Scikit python library was used with Anaconda IDE on i5 core processors. Thus, the
following results are obtained for the mentioned regions (Table 1).
It is observed from the above table that among the five classifiers, random forest
and the XG boost dominate over the other classifier. For American continent, random
forest gave the better accuracy, whereas for the Asian continent, XG boost provides
the better accuracy.
Refer Fig. 1 for the prediction accuracy; best accuracy was considered on Y-axis
and twelve regions were considered on X-axis. It has been observed that for the later
Asian and African countries obtained accuracy was much higher.
Table 1 Performance accuracy of all models regionwise on the test dataset

Model accuracy in (%)
Regions KNN Logistic Decision Random forest XGBoost
regression tree
North America 76.34 64.15 78.49 80.864 80.28
South America 73.90 49.10 78.19 79.18 76.63
Western Europe 75.36 72.06 80.40 81.52 81.41
Central America 82.05 80.46 87.57 89.06 90.39
and Caribbean
Sub-Saharan 75.97 56.74 93.5 94.76 94.77
Africa
Middle East and 79.87 47.65 89.37 90.58 90.98
North Africa
East Asia 60 65 70 70 70
Eastern Europe 69.38 65.30 77.55 81.63 77.55
Australia and 28.57 64.28 85.71 57.14 85.71
Oceania
South East Asia 71.74 66.70 72.30 78.81 80.27
South Asia 77.32 60.57 87.57 89.05 89.11
Central Asia 36.36 72.72 100 100 100
PREDICTION
120
% Of Accuracy
100
80
60
40
20
0
Regions
Fig. 1 Prediction on various regions
Fig. 2 XG boost classifier performance on regions
Based on the experimentations, it is observed that random forest classifier and XG

boost classifier perform better. So, a comparative graph has been plotted as Figs. 2
and 3 for the mentioned regions.
Fig. 3 Random forest classifier performance on regions
5 Conclusions
The selection of features from the data as well as the strategy to divide the dataset
into different regions to improve the classification result has proven to be successful.
The desired accuracy can be obtained by fine tuning the aforementioned models.
search and study works need to be done in mathematical and statistical domain in
order to better understand the intricacies of the machine learning models to improve
the performance of the same.
References
1. LaFree, G., Dugan, L.: Introducing the global terrorism database. Polit. Violence Terrorism
19, 181–204 (2007). Zarri, G.P.: Semantic web and knowledge representation. In: Proceedings
of the 13th International Workshop on Database and Expert System Applications (DEXA’02),
pp. 1529–4188 (2002)
2. LaFree, G.: The global terrorism database: accomplishments and challenges. Perspect. Terror-
ism 4(1) (2010)
3. Al Hasan, M., Chaoji, V., Salem, S., Zaki, M.: Link prediction using supervised learning. In:
SDM06: Workshop on Link Analysis, Counter Terrorism and Security, Rensselaer Polytechnic
Institute, NY (2006)
4. Munk, T.B.: Why anti-terror algorithms don’t work. First Monday 22(9) (2017)
5. National Consortium for the Study of Terrorism and Responses to Terrorism (START): Global
Terrorism Database [Data file] (2017). Retrieved from https://www.start.umd.edu/gtd
6. Taipale, K.A.: Data mining and domestic security: connecting the dots to make sense of data.
Columbia Sci. Tech. Law Rev. 5, 1–83 (2013)
7. Malathi, A., Dr. Santhosh Baboo, S.: Evolving data mining algorithms on the prevailing crime
trend—an intelligent crime prediction model. Int. J. Sci. Eng. Res. 2(6) (2016)
8. Hongbo, D.: Data Mining Techniques and Applications: An Introduction. Cengage Learning,
Boston, (2010)
9. Xiao, X., Ding, H.: Enhancement of K-nearest neighbor algorithm based on weighted entropy
of attribute value. In: 2012 5th International Conference on Bio-Medical Engineering and
Informatics (BMEI) (2012)
10. Cerri, R., et al.: An extensive evaluation of decision tree-based hierarchical multilabel classi-
fication methods and performance measures. Comput. Intell. 31(1), 1–46 (2015)
11. Logistic Regression Module. http://scikit-learn.org
12. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM
(2016)

Global Terrorism Predictive-Analysis

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Global Terrorism Predictive-Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

Global Terrorism Predictive—Analysis

Sandeep Chaurasia, Vinayak Warikoo and Shanu Khan

Keywords Global terrorism database · Descriptive analysis · Predictive analysis ·

S. Chaurasia (B) · V. Warikoo

© Springer Nature Singapore Pte Ltd. 2019 77

The literature on models of predictive analysis used to identify or make predictions

2.2 Previous Shortcomings

2.3 Distinctive Methodology

2.4 Result Significance

A predictive accuracy of anything above eighty-five percent would be a huge step

3.1 Experimental Setup

3.2 Descriptive Analysis

This gives a clearer picture of what kind of database it is and summarizes it in a

3.3 Data Manipulation and Selection

3.4 Predictive Analysis

3.4.1 K-Nearest Neighbors

3.4.2 Logistic Regression

It is a type of classification algorithm involving a linear discriminant, and it does

3.4.3 Decision Trees

3.4.4 Random Forest

It is an ensemble-based classifier which uses several decision tree models to predict

Table 1 Performance accuracy of all models regionwise on the test dataset

Fig. 1 Prediction on various regions

Fig. 2 XG boost classifier performance on regions

Based on the experimentations, it is observed that random forest classifier and XG

Fig. 3 Random forest classifier performance on regions

Das könnte Ihnen auch gefallen