Terrorist Network Analysis and Identification of Main Actors Using Machine Learning Techniques

Terrorist Network Analysis and Identification of Main
Actors Using Machine Learning Techniques

Jawairia Rasheed Usman Akram Ahmed Kamran Malik
COMSATS University Islamabad EME College NUST COMSATS University Islamabad
latest_genius@gmail.com usman.akram@ceme.edu.pk Kamranocp@gmail.com
ABSTRACT different types of relationships among them and their different

The prediction of terrorist network and identifying main actors is roles in network. So that their network can be destabilized through
an important issue for intelligence and security informatics. In this proper modification of key nodes for changing its structure by
article we present a method to analyze social network using addition or dissolution of members to make it inoperable [6, 7].
machine learning techniques. The proposed technique uses k-core To distinguish between multiple terrorist networks four types of
concept to remove unwanted and passive nodes from the whole networks are of tremendous importance. First of them is trust
network. It then extracts multiple features and uses hybrid network, secondly support network necessary for successful
classifier to identify main actors. The proposed technique is tested terrorist network, operations network is sub network of trust
on a publicly available dataset and results show significance of network for performing specific task such as attack. Third type of
proposed system. network is communication network that captures all sort of
communication between members of trust network. Fourth type of
CCS Concepts network is proximity network representing two individuals that
• Applied computing➝Sociology have been present at the same location in the past and is very
difficult to track as most of the individuals shift to different
Keywords geographical locations. Proximity networks are useful for
Terrorist network analysis; covert network analysis; social anticipating upcoming attack [6].
network analysis; machine learning techniques for identification According to Gupta [7] existence of terrorist network depends
of main actors. upon three factors: fascinating leader , need of support, ideology
or greed. The counter-terrorist agencies detect the terrorist,
1. INTRODUCTION analyze his relations and add his close contacts in the network by
To prevent and stop terrorist network is of utmost importance for
using any of the above mentioned strategy. Counter-terrorist
the security of the nation and public safety [1]. Scientists and
agencies strengthen their network around him by continuing their
intelligence communities are using tremendous effort, manpower
support in their business or facilitating in their other tasks and
and resources for eradication of terrorist activities. Now a day’s
repeat the process as desired. Premature stop cannot recognize
terrorist networks are becoming much organized, international and
some of the main actors, while stopping the network at later stage
socio-technical [2]. These groups cannot work in isolation, for
will also include surrounding main actors [8].
continuing their work terrorists work in groups, they build units,
books, websites, and TV channels [3]. Terrorists also take care Identification of prominent actors is the major task when we
about their secrecy during their interaction for hiding themselves analyze social networks. Most important node in a social network
from surveillance of legal institutions [4]. Prediction of group represents the main actor. Centrality is the most crucial feature for
behavior of terrorists is important because of early warning signs determining social network structure and predicting organizational
of some criminal activity [5]. behavior. The objective of this paper is to identify the main actors
involved in the terrorist network through combination of centrality
A terrorist network is a social network comprising of different
measures and k-core as well as machine learning techniques and
social actors and representing social ties as kinship, friendship,
analyze the differences among different techniques. In section 2
classmates, business partners etc. They also cooperate with other
we will review the literature, in section 3 we will propose the
terrorist groups for enhancing the effectiveness of their activities
model for the identification of main actors, section 4 will describe
such as training, defending etc. In this way they drive the system
results and analysis and in section 5 we will describe conclusion.
dynamically making it complex in the form of multilayer network.
To combat with terrorism, it is important for counter–terrorism 2. LITERATURE REVIEW
agencies to understand the development of terrorist network, Social Network Analysis (SNA) has been widely used for
Permission to make digital or hard copies of all or part of this work for
analyzing various social structures especially in social sciences
personal or classroom use is granted without fee provided that copies are long ago. [9]. Later on this field merged and used extensively in
not made or distributed for profit or commercial advantage and that various other fields like computer science, biology, medical and
copies bear this notice and the full citation on the first page. Copyrights physics etc [1, 10, 11]. A social network is a graph G (V, E)
for components of this work owned by others than ACM must be honored. representing V vertices and E edges. Here V represents vertices or
Abstracting with credit is permitted. To copy otherwise, or republish, to social actors and E edges describe the relationship between the
post on servers or to redistribute to lists, requires prior specific vertices [12]. Prominence of a node in a graph or network is
permission and/or a fee. Request permissions from Permissions@acm.org.
measured through centrality. Different measures of centrality help
ICIT 2018, December 29–31, 2018, Hong Kong, Hong Kong
in determining the network pattern [1]. Degree centrality (DC) of
© 2018 Association for Computing Machinery. a node represents the number of direct link of other nodes
ACM ISBN 978-1-4503-6629-8/18/12…$15.00 connected to it. The simplest measure of determining centrality
https://doi.org/10.1145/3301551.3301573 measure is degree centrality (DC). It represents connections of
7
node as a measure of interconnectedness [13]. If we use adjacency networking site twitter. The author used peak detection method
matrix A  (aij ) , DC can be formulated as follows. from deviation of time series of events for detecting abnormal
behavior. Outlier detection is also important for detection of
DC(k )  i 1 aik abnormal activity when combined with identification of main
n
(1)
actor [13]. Irregular behavior on social networking sites and their
Here k is the node for which we are calculating the degree mobility in different geographical area is the most distinguishing
centrality. Closeness centrality (CC) describes how central is an factor for determining an event [20]. In multiplex networks
actor having shortest distance to other connected nodes for authors generated k-cores through self-consistency equations [16].
communication relation. Closeness centrality (CC) is the measure In another paper [21]important communities were identified on
of closeness of node in the node to other important nodes. The the same k-core algorithm. In this paper we use the dataset of
nodes at short distance to main actors are useful in spreading of multilayer Noordin Top terrorist network [22] for analysis of
important information to other nodes in the network. different patterns using Gephi 0.902 tool and extract different
features using data mining on the data set using machine learning
CC( k )  1/ I 1 d (k , i)
N
(2) algorithm for social network analysis and main actor identification.
Here d is the distance between the nodes. Betweenness centrality 3. CK-SDK MODEL
(BC) is interaction of an actor between pair of nonadjacent nodes Our proposed methodology is distributed in two parts. In the first
[14]. Betweenness Centrality (BC) is measure of finding the node part we will use terrorist social network data for identifying main
that falls at the shortest path between many important nodes. The actors involved in the network on the basis of centrality measure.
importance of node k between two important nodes i and j is the In the second step we will apply data mining techniques for
interaction of between these two nodes who are not directly feature extraction.
connected.
3.1. Data Pre-processing Stage
g ij ( k )
BC( k )  i0,ik  ji , j k
n n In pre-processing stage we remove the noisy data and discard
(3)
missing values or imputation of missing values. We will use KNN
g ij
(K nearest neighbor) approach for computing missing values as
Eigenvector centrality (EC) is the most influential node that is KNN uses neighbor values for calculating missing information.
also connected to other well connected nodes in a network. For noise removal we use Euclidean distance between pair of
nodes in n-space having d distance between them [23].
1

n
EC ( k )  k x  a jx k j (4)
max A j 1
 p  qj 
n
d ( p, q )  (5)
2
i 1 i
where k  (k1 , k 2 ,.......k n ) T here max is maximum value of

adjacency matrix A. where p  ( p1 , p2 ....... pn ) and q  (q1 , q2 ,.....qn ) are two nodes
K-core is another technique that finds the maximal sub graphs of in Euclidean space. By assigning distance threshold between pair
graph and is used to find number of applications of social of nodes we calculate distance among cluster of nodes. If the
networks. Sediman used the idea of cores for the very first time. distance is greater than threshold then it represents noise and we
It’s used to find the influential or important nodes from the remove this noise value.
network. Vladimir [15] proposed an efficient algorithm for
3.1.1. Resampling of Data
distributing networks into cores of complexity O(m), where m is
For improving the performance of classifiers for handling class
the number of edge lines on the basis of k-core for finding
imbalance we will apply class imbalance. The number of key
important nodes in the network. N.Azimi [16] used k-core
players in the proposed research work are low as compared to
percolation on multiplex networks with vertices of one type and
total number of nodes in the network, so we apply re-sampling
different edges having multiple interactions. In such type of
technique and use synthetic minority over sampling technique
graphs k-core is largest sub graph. To combat terrorism modeling
(SMOTE) for improving the number of key players in our dataset
and simulation technique [17] was used for detecting and tracking
[24].
terrorist groups and their intents. The authors also presented
technique based on support vector machine (SVM) for recognition
of simulated potential attack. In [13] authors presented a novel
technique for predicting main members from covert networks
through combined framework that computes centrality for
determining the structure of network and then applied hybrid
classifier (using KNN, GMM and SVM) for main actor detection.
They also applied technique for detection of abnormal activity for
preventing anomalous event. Pal [18] used deep learning for
network analysis. He performed supervised and unsupervised
classification on networks and designed a framework for learning
node representation of networks. In another study authors [19]
used online social network data of bloggers for identification of
their behavior using different classifiers such as lazy learning,
Figure 1. CK-SDK model for identification of main actors.
decision trees and ensemble techniques. Carmela [20] used hybrid
algorithm for determining event detection through combination of
space time feature extraction and text analysis of social
8
3.2. Identification of Main Actor
We will use centrality measure for finding the main actors
involved in the terrorist network. We will use hybrid centrality
measure (HCM) through combining the DC, BC, CC and EC for
enhancing the effect for finding main actors. The multilayer social
network extracts features from every network and ensemble them
into global feature for provision of better decision. We will further
use K-core algorithm for maximum accuracy of our results. A k-
core in a network is represent sub network formed through
repeatedly deleting vertices having degree less than some
threshold k.
3.3. K-Core Algorithm

Let a graph G (V, E) with vertices V and edges E and H is a
maximal sub graph of G such that H  G and Figure 3. Framework of hybrid machine learning algorithm.
δ (G)  k. The maximal value in the graph represents the 4. RESULTS AND ANALYSIS
maximum set of nodes in the graph having least number of When we analyzed the data of Noordin terrorist top network we
neighbors within that group[25]. Here δ (G) is degree of the graph identified ten multiple layers of the network having total length of
vertices. The vertices of sub graph H are adjacent to k other network 79. Table1 represents the different layers and number of
vertices of sub graph. The example below shows that 1-core nodes involved in each layer of network. Figure 4 shows only
where parameter k=1 deletes all isolated vertices from the graph. Business and Finance network with labels representing the actors
Similarly 2-core deletes all nodes with two vertices and so on. The involved in Business and Finance layer and their connections
algorithm deletes all the vertices less than k. The graph having within network.
maximum value for core is the main core.2
4.1. Network Centrality Measure
We applied social network analysis techniques on the dataset,
firstly we applied centrality measures on the first layer and find
the important nodes of the network layers on the basis of
centrality measures as shown in Table 2. For further
understanding structure of the network we applied modularity and
identified node partitioning with respect to degree as shown in
(a)Graph with 0-core (b)Graph with 1-core Figure 5.
Table 1. Representing different layers in terrorist network

and nodes in every layer
Layers Nodes
(c)Graph with 2-core (d)Graph with 3-core 13

1. Business & Finance
Figure 2. An example of graph with k-core parameter. 39
2. Classmates
3. Communications 74
1) Input graph G = (V,E) 4. Friendship 61
2) Compute the degrees of vertices; 5. Kinship 24
3) For each v V whose degree K 6. Logistics 16
4) For each neighboring vertices (u) whose degree is less 7. Meetings 26
than V 8. Operations 39
5) Delete the minimum degree vertices 9. Soul mates 9
6) End 10. Training Events 38
3.4. Hybrid Machine Learning Algorithm:
In this process we use extracted features and apply three
classifiers support vector machines (SVM), decision trees (DT)
and K- nearest neighbor (KNN) and voting is then used to
ensemble hybrid classifier for improving reliability of our model.
9
Figure 6. Business and finance layer after k-core, K=3
nodes=8.
Table 3. Nodes filtered after k-core from each layer

Key Nodes
Layers Nodes
after k-core
1. Business &Finance 13 8
2. Classmates 39 12
3. Communications 74 22
4. Friendship 61 16
5. Kinship 24 6
6. Logistics 16 6
7. Meetings 26 6
8. Operations 39 18
Figure 4. Business and finance network. 9. Soul mates 9 4
10. Training Events 38 11
4.2. Application of K-core Algorithm
We applied K-core algorithm after applying statistical techniques
of social network analysis. We applied threshold value of 3 after 4.3. Hybrid Machine Learning Algorithm
that removed passive nodes and only prominent nodes were We extracted the features from each layer of the network and
present representing only key players as shown in Figure 6. We labeled the main actors according to k-core (CK) values. Then we
analyzed that orange nodes removed and only green and purple identified main actors from CK drawn data through hybrid
nodes were present and only 8 nodes were present out of 13 nodes. centrality measure and in the second iteration applied CK
Comparing graph of Figure 4 with figure 6 represent the evident algorithm from the main actors identified through HCM for
difference. Similarly we applied k-core on each layer of the focusing only on core members of the network. We used three
network and identified main nodes from each layer of the network classifiers decision trees, support vector machines and K- nearest
as shown in the Table 3. neighbor and used ensemble technique for combining three
classifiers through majority voting. We trained the data for
identifying the main actors and our test data correctly classified
key players. The hybrid classifier algorithm performed better than
the individual classifiers. We also applied hybrid learning
algorithm on the features extracted through hybrid centrality
measures (HCM) and preprocessed through SMOTE. We also
compared hybrid learning algorithm on decision trees, K-nearest
neighbor and support vector machine algorithm and showed our
results in figures 7 and 8 respectively.
4.3.1. Performance Parameters

Evaluation of designed technique carried out through different
performance parameters. The parameters analyzed in this paper
are sensitivity, specificity, accuracy and area under ROC curve.
Figure 5. Business and Finance layer before applying k-core Sensitivity correctly classifies nodes as correct members and is
representing the structure of network, nodes=13. expressed as
TP
SN 
TP  FN
Specificity correctly classifies nodes as normal members and is

expressed as
TN
SP 
TN  FP
Accuracy is the percentage of correctly classified main members

and normal members.
10
TP  TN Ensemble 100 74 100
ACC 
TP  TN  FP  FN
Our hybrid algorithm ensemble showed 100 percent accuracy, 120

sensitivity and specificity on the test data. When only decision 100
tree classifier was applied it showed 95 percent accuracy as
80
compared to k- nearest neighbors which showed 97 percent
accuracy. We tested our model on publically available data and it 60
Decision Trees
successfully identified main actor. 40 SVM
20 KNN
0 Ensemble
120
100
Specificity
80
60 Sensitivity
40 Accuracy
20 Figure 9. Comparison of hybrid learning algorithm with
0 HCM and CK.
5. CONCLUSION
In this paper we extracted features of network that are maximally
connected to core members inside particular layers as well as
other layers through combination of Hybrid centrality measures
and k-core for identification of core members from terrorist
Figure 7. Comparison of CK-SDK with other classifiers. networks. There are many techniques for identification of key
players in social networks, and each of these techniques focus on
Table 4. Comparison of results of hybrid learning algorithm their objective according to their own perspective. The important
CK-SDK nodes are extracted through centrality measures which we further
pruned to identify only important nodes. Therefore only important
Classifier Accuracy Sensitivity Specificity nodes remain after application of algorithm. Some applications of
anti terrorist networks require set of prominent actors that can
Decision Tree 95 77 100
accomplish goals well interacting with special key players who
SVM 100 100 0.58 belong to some particular group for information retrieval. CK–
KNN 97 100 96 SDK framework identifies main actors with more accuracy than
Ensemble 100 100 100 identifying through hybrid centrality measure.
6. REFERENCES
120 [1] S. Wasserman and K. Faust, Social network analysis:
Methods and applications vol. 8: Cambridge university press,
1994.
100
[2] A. Gutfraind and M. Genkin, "A graph database framework
80 for covert network analysis: An application to the Islamic
State network in Europe," Social Networks, vol. 51, pp. 178-
60 188, 2017.
[3] C. Comito, D. Falcone, and D. Talia, "A peak detection
40 method to uncover events from social media."
[4] C. Chiu, Y. Ku, T. Lie, and Y. Chen, "Internet auction fraud
20 detection using social network analysis and classification tree
approaches," International Journal of Electronic Commerce,
Figure 8. Comparison of HCM model with other classifiers.
vol. 15, pp. 123-147, 2011.
[5] I. McCulloh and K. M. Carley, "Detecting change in
Table 5. Comparison of HCM learning algorithm.
longitudinal social networks," Military Academy West Point
Classifier Accuracy Sensitivity Specificity NY Network Science Center (NSC)2011.
Decision Tree 98 100 100 [6] H. Eiselt, "Destabilization of terrorist networks," Chaos,
Solitons & Fractals, vol. 108, pp. 111-118, 2018.
SVM 100 100 0.58
[7] D. K. Gupta, Understanding terrorism and political violence:
KNN 100 100 96 The life cycle of birth, growth, transformation, and demise:
Routledge, 2008.
11
[8] V. Krebs, "Connecting the dots: tracking two identified [18] S. Pal, Y. Dong, B. Thapa, N. V. Chawla, A. Swami, and R.
terrorists," 2001a) http://www. orgnet. com/tnet. html, 2015. Ramanathan, "Deep learning for network analysis: Problems,
[9] R. H. Davis, "Social network analysis: An aid in conspiracy approaches and challenges," in Military Communications
investigations," FBI L. Enforcement Bull., vol. 50, p. 11, Conference, MILCOM 2016-2016 IEEE, 2016, pp. 588-593.
1981. [19] Y. Asim, A. R. Shahid, A. K. Malik, and B. Raza,
[10] G. A. Pavlopoulos, A.-L. Wegener, and R. Schneider, "A "Significance of machine learning algorithms in professional
survey of visualization tools for biological network analysis," blogger's classification," Computers & Electrical
Biodata mining, vol. 1, p. 12, 2008. Engineering, 2017.
[11] K. Matia, Y. Ashkenazy, and H. E. Stanley, "Multifractal [20] C. Comito, D. Falcone, and D. Talia, "A peak detection
properties of price fluctuations of stocks and commodities," method to uncover events from social media," in Data
EPL (Europhysics Letters), vol. 61, p. 422, 2003. Science and Advanced Analytics (DSAA), 2017 IEEE
International Conference on, 2017, pp. 459-467.
[12] A.-L. Barabási and R. Albert, "Emergence of scaling in
random networks," science, vol. 286, pp. 509-512, 1999. [21] R.-H. Li, L. Qin, J. X. Yu, and R. Mao, "Finding influential
communities in massive networks," The VLDB Journal—The
[13] W. H. Butt, M. U. Akram, S. A. Khan, and M. Y. Javed, International Journal on Very Large Data Bases, vol. 26, pp.
"Covert network analysis for key player detection and event 751-776, 2017.
prediction using a hybrid classifier," The Scientific World
Journal, vol. 2014, 2014. [22] N. a. S. F. E. Roberts. (2011, 15/3/2018). Roberts and
Everton Terrorist Data: Noordin Top Terrorist Network
[14] N. E. Friedkin and E. C. Johnsen, "Social positions in (Subset). Available:
influence networks," Social Networks, vol. 19, pp. 209-222, https://sites.google.com/site/sfeverton18/research/appendix-1
1997.
[23] M. A. Malik and M. Kang, "Euclidean distance based label
[15] V. Batagelj and M. Zaveršnik, "Fast algorithms for noise cleaning," in Ubiquitous and Future Networks
determining (generalized) core groups in social networks," (ICUFN), 2017 Ninth International Conference on, 2017, pp.
Advances in Data Analysis and Classification, vol. 5, pp. 237-239.
129-145, 2011.
[24] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P.
[16] N. Azimi-Tafreshi, J. Gómez-Gardenes, and S. Dorogovtsev, Kegelmeyer, "SMOTE: synthetic minority over-sampling
"k− core percolation on multiplex networks," Physical technique," Journal of artificial intelligence research, vol. 16,
Review E, vol. 90, p. 032816, 2014. pp. 321-357, 2002.
[17] C. Weinstein, W. Campbell, B. Delaney, and G. O'Leary, [25] A. Bickle, The k-cores of a graph: Western Michigan
"Modeling and detection techniques for counter-terror social University, 2010.
network analysis and intent recognition," in Aerospace
conference, 2009 IEEE, 2009, pp. 1-16.
12

Terrorist Network Analysis and Identification of Main Actors Using Machine Learning Techniques

Hochgeladen von

Dokumentinformationen

Copyright

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Terrorist Network Analysis and Identification of Main Actors Using Machine Learning Techniques

Hochgeladen von

Copyright:

Terrorist Network Analysis and Identification of Main

Actors Using Machine Learning Techniques

ABSTRACT different types of relationships among them and their different

where k  (k1 , k 2 ,.......k n ) T here max is maximum value of

3.3. K-Core Algorithm

Table 1. Representing different layers in terrorist network

(c)Graph with 2-core (d)Graph with 3-core 13

Table 3. Nodes filtered after k-core from each layer

4.3.1. Performance Parameters

Specificity correctly classifies nodes as normal members and is

Accuracy is the percentage of correctly classified main members

Our hybrid algorithm ensemble showed 100 percent accuracy, 120

Das könnte Ihnen auch gefallen