Sie sind auf Seite 1von 28

Chapter 3

Data Mining for


Counter-Terrorism

Bhavani Thuraisingham
The MITRE Corporation
Burlington Road, Bedford, MA
On leave at the National Science Foundation, Arlington, VA

Abstract:
Data mining is becoming a useful tool for detecting and preventing terrorism. This
paper first discusses some technical challenges for data mining as applied for counter-
terrorism applications. Next it provides an overview of the various types of terrorist
threats and describes how data mining techniques could provide solutions to counter-
terrorism. Finally some privacy concerns and potential solutions that could detect ter-
rorist activities and yet attempt to maintain privacy will be discussed.

Keywords: Counter-terrorism, Data Mining, Privacy

3.1 Introduction
Data mining is the process of posing queries and extracting useful patterns or trends
often previously unknown from large amounts of data using various techniques such
as those from pattern recognition and machine learning. There have been several de-
velopments in data mining and the technology is being used for a wide variety of ap-

191
192 C HAPTER T HREE

plications from marketing and finance to medicine and biotechnology to multimedia


and entertainment. Recently there has been much interest on exploring the use of data
mining for counter-terrorism applications. For example, data mining can be used to
detect unusual patterns, terrorist activities and fraudulent behavior. While all of these
applications of data mining can benefit humans and save lives, there is also a negative
side to this technology, since it could be a threat to the privacy of individuals. This is
because data mining tools are available on the web or otherwise and even nave users
can apply these tools to extract information from the data stored in various databases
and files and consequently violate the privacy of the individuals. To carry out effective
data mining and extract useful information for counter-terrorism and national security,
we need to gather all kinds of information about individuals. However, this information
could be a threat to the individuals’ privacy and civil liberties.
In this paper we will provide an overview of applying data mining for counter-
terrorism. At the workshop on Next Generation Data Mining (NGDM), a panel was
conducted on Data Mining for Counter-terrorism. The panel raised many interest-
ing technical challenges. In section 3.2 of this paper we will discuss some of these
challenges. To understand how data mining could be applied, we need a good under-
standing of what the terrorist threats are. We have grouped the threats into several
categories and will discuss them in Section 3.3. Applying data mining techniques for
counter-terrorism will be the subject of Section 3.4. There have been many discus-
sions recently of the privacy violations that could occur as a result of data mining. In
section 3.5 we address privacy as well as discuss data mining solutions that attempt
to detect/prevent terrorism and at the same time maintain some level of privacy. The
paper is concluded in Section 3.6.

3.2 Research Challenges


The panel on data mining for counter-terrorism at the NGDM workshop discussed
several technical challenges. We discuss a few of the challenges in this section. Data
mining technologies have advanced a great deal. They are now being applied for many
applications. The main question is, are they ready for detecting and /or preventing
terrorist activities? For example, can we completely eliminate false positives and false
negatives? False positives could be disastrous for various individuals. False negatives
could increase terrorist activities. The challenge is to find the “needle in the haystack.”
We need knowledge directed data mining to eliminate false positives and false negatives
as much as possible.
Another challenge is mining data in real-time. We now have tools to detect credit
card violations and calling card violations. These tools function in real-time. However
can one build models in real-time? The general view among the research community
is that real-time model building is a challenge. Furthermore, for detecting counter-
terrorism activities we need good training examples. How can we get such examples
especially in an unclassified setting?
A third challenge is multimedia data mining. While we now have tools to mine
structured and relational databases, mining unstructured databases is still a challenge.
Do we extract structure from unstructured databases and then mine the structured data
T HURAISINGHAM 193

or do we apply mining tools directly on unstructured data? Furthermore, while there is


progress on text mining, we need work on audio and video as well as on image mining.

Other directions include graph and pattern mining. For example, one has to connect
all the dots. Essentially one builds a graph structure based on the information he or she
has. If multiple agencies are working on the problem, then each agency will have its
own graph. The challenge is to be able to make inferences about missing nodes and
links in the graph. Also the graph could be very large. The question is how can one
reduce the graph to a more manageable size?

Finally finding the data to test the ideas is still a major challenge. How can we
get unclassified data? Is it possible to scrub and clean the classified data and produce
reasonable data at the unclassified level? How can we find large data sets consisting
of multimedia data types? Is it possible to develop a test-bed where one can apply the
various data mining tools to determine their efficiency?

Web mining is a challenge for detecting unusual patterns. In a way web mining
encompasses data mining as one has to mine all the data on the web as well as mine
the structure and usage patterns. By mining the usage patterns one could get patterns
such as there are an unusual number of visits to a federal web site from Paris around
3am in the morning. Data on the web includes structured as well as unstructured data.
Therefore the tools developed for data mining apply for web mining also. In addition,
we need tools to mine the structure of the web as well as the usage patterns.

Privacy is a major challenge with respect to data mining for counter-terrorism. The
challenge is to extract useful information from data mining but at the same time main-
tain privacy. Several efforts are under way for privacy preserving data mining. The
idea here is to use various techniques such as randomization, cover stories, as well
as multi-party policy enforcement for privacy preserving data mining. While there is
some progress, the effectiveness of these techniques needs to be determined.

The above are some of the challenges for data mining for counter-terrorism dis-
cussed at the workshop. That is, while data mining could become a useful tool for
counter-terrorism, there are many challenges that need to be addressed. They include
mining multimedia data, graph mining, building models in real-time, knowledge di-
rected data mining to eliminate false positives and false negatives, web mining, and
privacy sensitive data mining. Research is progressing in the right direction. However,
there is still much to be done (see also [14]).

Now that we have provided an overview of the challenges on data mining for
counter-terrorism, in the next three sections we will provide some more details on this
topic. To understand how data mining may be applied, we need a good understanding
of what the threats are. In section 3.3 we will provide an overview of various threats
and protection measures. In section 3.4 we will examine how data mining could pro-
vide potential counter-terrorism solutions, especially for the threats discussed in section
3.3. Because of the important of privacy and the potential threats to privacy due to data
mining, we will discus various privacy issues in Section 3.5.
194 C HAPTER T HREE

3.3 Some Information on Terrorism, Security Threats,


and Protection Measures
3.3.1 Overview
Now we are ready to embark on a critical application of data mining technologies.
This application is counter-terrorism. Counter-terrorism is mainly about developing
counter-measures to threats occurring from terrorist activities. In this section we focus
on the various types of threats that could occur. In section 3.4 we will discuss how data
mining could help prevent and detect the threats.
Our discussion of counter-terrorism is rather preliminary. We are not claiming to be
counter-terrorism experts. The information on terrorist threats we have presented here
has been obtained entirely from unclassified newspaper articles and news reports that
have appeared over the years. Our focus is to illustrate how data mining could help to-
wards combating terrorism. We are not saying that data mining solves all the problems.
But because of the fact that data mining has the capability to extract patterns and trends,
often previously unknown, we should certainly explore the various data and web data
mining technologies for counter-terrorism. For us web data mining goes beyond data
mining. It not only includes data mining techniques, but also focuses on web traffic
and usage mining as well as web structure mining. That is, there are additional chal-
lenges for web data mining that are not present for just data mining. Furthermore, web
data mining also includes structured data mining as well as unstructured data mining.
Furthermore, we believe that much of the data will eventually be on the web, whether
they are public networks such as the Internet or private such corporate intranets and
classified intranets. Therefore, studying web data mining encompasses studying data
mining as well.
Before we embark on a discussion of counter-terrorism, we need to discus the
types of threats. Note that threats could be malicious threats due to terror attacks
or non-malicious threats due to inadvertent errors. While our main focus is on ma-
licious attacks, we also cover some of the inadvertent errors, as there may be similar
solutions to combat such problems. The types of terrorist threats we have discussed
include non-information related terrorism, information related terrorism, bio-terrorism
and chemical attacks. By non-information related terrorism we mean people attacking
others with say bombs and guns. For this we need to find out who these people are
by analyzing their connections and then develop counter-terrorism solutions. By infor-
mation related threats we mean threats due to the existence of computer systems and
networks. These are unauthorized intrusions and viruses as well as computer related
vandalism. Information related terrorism is essentially cyber terrorism. Then there is
bio-terrorism, chemical and nuclear attacks. These are terrorist attacks caused by bio-
logical substances and chemical/nuclear weapons. It does not mean that these are all
the types of threats that exist. But these are the threats we will be examining. We will
discus how data mining could perhaps be used to help prevent and detect attack due to
such threats.
The organization of this section is as follows. Section 3.3.2 discusses threats from
natural disasters as well as human errors. We then focus on malicious threats in the
T HURAISINGHAM 195

next three sections. Non-information related threats would be discussed in Section


3.3.3. These include terrorist attacks as well as insider threat analysis, border and
transportation threats. In section 3.3.4 we discuss information related threats. Essen-
tially this is about cyber-terrorism. Threats occurring from biological, chemical and
nuclear weapons will be disused in Section 3.3.5. Attacks on critical infrastructures
will be given special consideration in Section 3.3.6. Note that infrastructures may also
be attacked during information related attacks and non-information related attacks. We
group the threats into two categories in Section 3.3.8. They are non real-time threats
and real-time threats. We analyze the threats discussed in section 3.3.3 through 3.3.6
to see whether they are non real-time threats or real-time threats. Then we focus on
counter-terrorism measures in Section 3.3.8. These include counter-terrorism for non-
information related threats; information related threats as well as bio-terrorism. We
also briefly examine counter-terrorism measures for non real-time threats as well as for
real-time threats. Note that when we want to carry out data mining to combat terrorism,
we need good data. This means that we need data about terrorists as well as terrorist
activities. This also means we will have to gather data about all kinds of people, events
and entities. Therefore, there could be a serious threat to privacy. Therefore, we will
address privacy and civil liberties in Section 3.5.

3.3.2 Natural Disasters and Human Errors


As we have stated in Section 3.3.1, threats could occur due to natural disasters and
human errors as well as through malicious attacks. While the solutions to the attacks
in the near-term may not be that different in terms of emergency responses, the way to
combat these threats in the longer-term will very likely be quite different.
By natural disasters we mean disasters due to hurricanes, earthquakes, fires, power
failures and accidents. Some of these disasters may be due to human errors such as
pressing the wrong button in a process plant causing the plant to explode. Data mining
could help detect some of the natural disasters. That is, by analyzing lot of geological
data, a data mining tool may predict that an earthquake is about to occur in which
case the people in the area could be evacuated beforehand. Similarly by analyzing
the weather data, the tool could predict that hurricanes are about to occur. Emergency
responses, whether a building is caught on fire through natural disasters or by terrorist
attacks, may not be that different. In both cases, there will intense panic, although if the
building explodes due to a bomb the panic may be more intense and the collapse may
be more rapid. We need effective emergency response teams to handle such attacks.
Data mining could be used to analyze say previous attacks and train various tools and
then be able to give advice how to handle the emergency situation. Here again we need
training examples some of which may not exist. In this case we may need to train with
hypothetical scenarios and simulated examples.
The long terms measures to be taken for natural disasters may be quite different
from terrorist attacks. It is not every day that we have an earthquake, even in the most
earthquake prone regions. It is not often that we have hurricanes, even in the most
hurricane prone regions. Therefore we have time to plan and react. This does not mean
that a natural disaster is less complex to manage. It could be devastating and take many
human lives. Nevertheless countries usually plan for such disasters mainly through
196 C HAPTER T HREE

experiences.
Human errors are also a source of major concern. We need to continually train
say the operators and give them advice to be cautious and alert. We need to take
proper actions if humans have been careless. That is, unless there is an absolutely good
excuse, human errors should not be treated lightly. This way, humans will be cautious
and perhaps not make such errors.
Terrorist attacks are quite different. The problem is, one does not know when it will
happen and how it will happen. Many of us could never have imagined that airplanes
would be used as weapons of mass destruction to bring the famous world trade center
towers down. Many of us still may not know what the next attack may be. Would they
be attacks caused by suicide bombers or would they be attacks caused by chemical
weapons or would they be attacks caused by cyber terrorism. The counter-measures for
prevention and detection may be quite intense for terrorist attacks. As we have stated,
we are not experts on counter-terrorism or have studied the nature of the attacks. Our
goal is to examine the various data mining techniques to see how they could be applied
to handle the various threats that have been discussed almost daily in the newspapers
and on television.
It should however be noted that to develop effective techniques, the data mining
specialists have to work together with counter-terrorism experts. That is, one cannot
use the techniques without a good understanding of what the threats are. Therefore,
while the contents of this paper may be used as a reference, I would urge those in-
terested in applying data mining techniques to solve real world problems and terrorist
attacks to work with counter-terrorism specialists. In the next few sections we will
discuss various types of terrorism and counter-terrorism measures.

3.3.3 Non-Information Related Terrorism


3.3.3.1 Overview
In this section we will provide an overview of various types of non-information related
terrorism. Note that by information related terrorism we mean attacks essentially on
computers and networks. That is, they are threats that damage electronic information.
By non-information related terrorism we mean terrorism due to other means such as
terrorist attacks, car bombing, vandalism such as setting fires etc.
The organization of this section is as follows. We discuss terrorist attacks and
external threats in Section 3.3.3.2. Insider threats are discussed in section 3.3.3.3.
Attacks on borders and transportation are discussed in Section 3.3.3.4. Note that border
and transportation attacks may be considered to be part of non information related
attacks, we have given special consideration as there is so much discussion now related
to securing the borders and transportation mechanisms.

3.3.3.2 Terrorist Attacks and External Threats


When we hear the word terrorism it is the external threats that come to our mind.
My earliest recollection of terrorism is “riots” where one ethnic group attacks another
ethnic group by essentially killing, looting, setting fires to houses, and other acts of
T HURAISINGHAM 197

terrorism and vandalism. Then later on we heard of airplane hijackings where a group
of terrorists hijack airplanes and then make demands on governments such as releasing
political prisoners who could possibly be terrorists. Then we heard of suicide bomb-
ings where terrorists carry bombs and blow themselves up as well as others nearby.
Such attacks usually occur in crowded places. More recently we have heard of using
airplanes to blow up buildings.
While the above acts are all terrorist attacks, we hear almost daily about someone
shooting and killing someone else when neither party belongs to any gangs or terrorist
groups. This in a way is terrorism also, but these acts are more difficult to detect and
prevent because there are always what are called “crazy people” in our society. While
the technologies should detect and prevent such attacks also, what this paper focuses is
on how to detect attacks from people belonging to terrorist groups.
All of the threats we have discussed above are sort of external threats. These are
threats occurring from the outside. In general, the terrorists are usually neither friends
nor acquaintances of the victims involved. But there are also other kinds of threats and
they are insider threats. We will discuss them in the next section.

3.3.3.3 Insider Threats


Insider threats are threats from people inside an organization attacking the others around
them through perhaps not bombs and airplanes but using other sinister mechanisms.
Examples of insider threats include some one from a corporation giving information
to a competitor of proprietary products. Another example is an agent from an intelli-
gence agency committing espionage. A third example is a threat coming from one’s
own family. For example, betrayal from a spouse who has insider information about
assets and the betrayer giving the information to a competitor to his or her advantage.
That is, insider threats can occur at all levels and all walks of life and could be quite
dangerous and sinister because you never know who these terrorists are. They may be
your so-called “best friends” or even your spouse or your siblings.
Note that people from the inside could also use guns to shoot people around them.
We often hear about office shootings. But these shootings are not in general insider
threats, as they are not happening in sinister ways. That is, these shootings are sort
of external threats although they are coming from people within an organization. We
also hear often about domestic abuse and violence such as husbands shooting wives
or vice versa. These are also external threats although they are occurring from the
inside. Insider threats are threats where others around are totally unaware until perhaps
something quite dangerous occurs. We have heard that espionage goes on for years
before someone gets caught. While both insider threats and external threats are very
serious and could be devastating, insider threats can be even more dangerous because
one never knows who these terrorists are.

3.3.3.4 Transportation and Border Security Violations


Let us examine border threats first and then discuss transportation threats. Safeguarding
the borders is critical for the security of a nation. There could be threats at borders from
illegal immigration to gun and drug trafficking as well as human trafficking to terrorists
198 C HAPTER T HREE

entering a country. We are not saying that illegal immigrants are dangerous or are
terrorists. They may be very decent people. However, they have entered a country
without the proper papers and that could be a major issue. For official immigration into
say the USA, one needs to go through interviews at US embassies, go through medical
checkups and X-rays as well as checks for diseases such as tuberculosis, background
checks and many more things. It does not mean that people who have entered a country
legally are always innocent. They could be terrorists also. At least there is some
assurance that proper procedures have been followed. Illegal immigration can also
cause problem to the economy of a society and violate human rights through cheap
illegal labor etc.
As we have stated, drug trafficking has occurred a lot at borders. Drugs are a dan-
ger to society. It could cripple a nation, corrupt its children, cause havoc in families,
and damage the education system and cause extensive damage. It is therefore critical
that we protect the borders from drug trafficking as well as other types of trafficking
including firearms and human slaves. Other threats at borders include prostitution and
child pornography, which are serious threats to decent living. It does not mean that ev-
erything is safe inside the country and these problems are only at borders. Nevertheless
we have to protect our borders so that there are no additional problems to a nation.
Transportation systems security violations can also cause serious problems. Buses,
trains and airplanes are vehicles that can carry tens of hundreds of people at the same
time and any security violation could cause serious damage and even deaths. A bomb
exploding in an airplane or a train or a bus could be devastating. Transportation systems
are also the means for terrorists to escape once they have committed crimes. Therefore
transportation systems have to be secure. A key aspect of transportation systems secu-
rity is port security. These ports are responsible for ships of the United States Navy.
Since these ships are at sea throughout the world, terrorist may have opportunities to
attack these ships and the cargo. Therefore, we need security measures to protect the
ports, cargo, and our military bases. In Section 3.3.7 we will discuss various counter-
terrorism measures for the threats we have discussed here. The next three sections will
discuss additional types of terrorism.

3.3.4 Information Related Terrorism


3.3.4.1 Overview
This section discusses information related terrorism. By information related terrorism
we mean cyber-terrorism as well as security violations through access control and other
means. Trojan horses as well as viruses are also information related security violations,
which we group into information related terrorism activities.
In the next few subsections we discuss various information related terrorist attacks.
In section 3.3.4.2 we give an overview of cyber terrorism and then discuss insider
threats and external attacks. Malicious intrusions are the subject of Section 3.3.4.3.
Credit card and identity theft are discussed in Section 3.3.4.4. Information security vi-
olations such as access control violations are discussed in Section 3.3.4.5. Since web is
a major means of information transportation, we give web security threats special con-
sideration in Section 3.3.4.6. Note that an excellent book on web security discussing
T HURAISINGHAM 199

various threats and solutions is the one by Ghosh [10]. We also discuss some of the
cyber threats and countermeasures in [11].

3.3.4.2 Cyber-terrorism, Insider Threats, and External Attacks

Cyber-terrorism is one of the major terrorist threats posed to our nation today. As we
have mentioned earlier, there is now so much of information available electronically
and on the web. Attack on our computers as well as networks, databases and the
Internet could be devastating to businesses. It is estimated that cyber-terrorism could
cause billions of dollars to businesses. For example, consider a banking information
system. If terrorists attack such a system and deplete accounts of the funds, then the
bank could loose millions and perhaps billions of dollars. By crippling the computer
system millions of hours of productivity could be lost and that equates to money in
the end. Even a simple power outage at work through some accident could cause
several hours of productively loss and as a result a major financial loss. Therefore it is
critical that our information systems be secure. Next we discuss various types of cyber
terrorist attacks. One is spreading viruses and Trojan horses that can wipe away files
and other important documents. Another is intruding the computer networks, which we
will discuss in the next section. Information security violations such as access control
violations as well as a discussion of various other threats such as sabotage and denial
of service will be given later.
Note that threats can occur from outside or form the inside of an organization. Out-
side attacks are attacks on computers from someone outside the organization. We hear
of hackers breaking into computer systems and causing havoc within an organization.
There are hackers who start spreading viruses and these viruses cause great damage to
the files in various computer systems. But a more sinister problem is the insider threat.
Just like non-information related attacks, there is the insider threat with information
related attacks. There are people inside an organization who have studied the business
practices and develop schemes to cripple the organization’s information assets. These
people could be regular employees or even those working at computer centers. The
problem is quite serious as some one may be masquerading as someone else and caus-
ing all kinds of damage. In the next few sections we will examine how data mining
could detect and perhaps prevent such attacks.

3.3.4.3 Malicious Intrusions

We have discussed some aspects of malicious intrusions. These intrusions could be


intruding the networks, the web clients and servers, the databases, operating systems,
etc. Many of the cyber terrorism attacks that we have discussed in the previous sections
are malicious intrusions. We will revisit them in this section.
We hear a lot of network intrusions. What happens here is that intruders try to tap
into the networks and get the information that is being transmitted. These intruders may
be human intruders or Trojan horses set up by humans. Intrusions could also happen on
files. For example, one can masquerade as some one else and log into someone else’s
computer system and access the files. Intrusions can also occur on databases. Intruders
200 C HAPTER T HREE

posing as legitimate users can pose queries such as SQL queries and access the data
that they are not authorized to know.
Essentially cyber terrorism includes malicious intrusions as well as sabotage through
malicious intrusions or otherwise. Cyber security consists of security mechanisms that
attempt to provide solutions to cyber attacks or cyber terrorism. When we discuss ma-
licious intrusions or cyber attacks, we need to think about the non cyber world, that is
non information related terrorism and then translate those attacks to attacks on com-
puters and networks. For example, a thief could enter a building through a trap door. In
the same way, a computer intruder could enter the computer or network through some
sort of a trap door that has been intentionally built by a malicious insider and left unat-
tended through perhaps careless design. Another example is a thief entering the bank
with a mask and stealing the money. The analogy here is an intruder masquerading as
someone else, legitimately entering the system and taking all the information assets.
Money in the real world would translate to information assets in the cyber world. That
is, there are many parallels between non-information related attacks and information
related attacks. We can proceed to develop counter-measures for both types of attacks.
These counter-measures are discussed in Section 3.3.8.

3.3.4.4 Credit Card Fraud and Identity Theft


We are hearing a lot these days about credit card fraud and identity theft. In the case
of credit card fraud, others get hold of a person’s credit card and make all kinds of
purchases, by the time the owner of the card finds out, it may be too late. The thief
may have left the country by then. A similar problem occurs with telephone calling
cards. In fact this type of attack has happened to me once. Perhaps while I was making
phone calls using my calling card at airports someone must have noticed say the dial
tones and used my calling card. This was my company calling card. Fortunately our
telephone company detected the problem and informed my company. The problem was
dealt with immediately.
A more serious theft is identity theft. Here one assumes the identity of another
person say but getting hold of the socials security number and essentially carried out
all the transactions under the other person’s name. This could even be selling houses
and depositing the income in a fraudulent bank account. By the time, the owner finds
out it will be far too late. It is very likely that the owner may have lost millions of
dollars due to the identity theft.
We need to explore the use of data mining both for credit card fraud detection as
well as for identity theft. There have been some efforts on detecting credit card fraud
(see citeAFCE). We need to start working actively on detecting and preventing identity
thefts.

3.3.4.5 Information Security Violations


In this section we provide an overview of the various information security violations.
These violations do not necessarily mean that they are occurring through cyber attacks
or cyber terrorism. They could occur through bad security design and practices. Nev-
ertheless we have included this discussion for completion.
T HURAISINGHAM 201

Information security violations typically occur due to access control violations.


That is, users are granted access depending on their roles which is called role-based
access control) or their clearance level (which is called multilevel access control) or
on a need to know basis. Access controls are violated usually due to poor design or
designer errors. For example, suppose John does not have access to salary data. By
some error this rule may not be enforced and as a result, John gets access to salary
values. Access control violations can occur due to malicious attacks also. That is,
someone could enter the system by pretending to be the system administrator and delete
the access control rule that John does not have access to salaries. Another way is for
a Trojan horse to operate on behalf of the malicious users and each time John makes a
request, the malicious code could ensure that the access control rule is bypassed.

3.3.4.6 Security Problems for the Web

As mentioned in section 3.3.4.1, there are numerous security attacks that can occur
due to the web. We discuss some of the web security threats in this section. As we
have mentioned, in his book Ghosh [10] has provided an excellent introduction to web
security and various threats. Note that while we have focused on web threats in this
section, the threats discussed are applicable to any information system such as net-
works, databases and operating systems. The threats include access control violations,
integrity violations, sabotage, fraud, denial of service and infrastructure attacks.
For example, the traditional access control violations could be extended to the web.
User may access unauthorized data across the web. Note that with the web there is
so much of data all over the place that controlling access to this data will be quite a
challenge. Data on the web may be subject to unauthorized modifications. This makes
it easier to corrupt the data. Also, data could originate from anywhere and the producers
of the data may not be trustworthy. Incorrect data could cause serious damages such
as incorrect bank accounts, which could result in incorrect transactions. We hear of
hackers breaking into systems and posting inappropriate messages. With so much of
business and commerce being carried out on the web without proper controls, Internet
fraud could cause businesses to loose millions of dollars. Intruder could obtain the
identity of legitimate users and through masquerading may empty the bank accounts.
We hear about infrastructures being brought down by hackers. Infrastructures could be
the telecommunication system, power system, and the heating system. These systems
are being controlled by computers and often through the Internet. Such attacks would
cause denials of service.
Other threats include violations to confidentiality, authenticity, and no repudiation.
Confidentiality violations enable intruders to listen in on the message. Authentication
violations include using passwords without permissions, and non-repudiation viola-
tions enable someone from denying that he sent the message. The web threats dis-
cussed here occur because of insecure clients, servers and networks. To have complete
security, one needs end-to-end security; that means secure clients, secure servers, se-
cure operating systems, secure databases, secure middleware and secure networks.
202 C HAPTER T HREE

3.3.5 Bio-Terrorism, Chemical and Nuclear Attacks


The previous two sections discussed non-information related as well as information re-
lated terrorist attacks. Note that by information related attacks we mean cyber attacks.
Non-information related attacks mean everything else. However we have separated
bio-terrorism and chemical weapons attacks from non-information related attacks. We
have also given special consideration for critical infrastructure attacks. That is, the
non-information related attacks are essentially attacks due to bombs, explosions and
other similar activities.
While bio-terrorism and chemical/nuclear weapons attacks have been discussed at
least for several decades, it is only after September 11, 2001 that the public is paying a
lot of attention to these discussions. The anthrax attacks that occurred during the latter
part of 2001 have resulted in increased fear and awareness of the potential dangers of
bio-terrorism attacks and chemical/nuclear weapons attacks. Such attacks could kill
several million people within a short space of time. More recently there is increasing
awareness of the dangers due to bio-terrorism attacks resulting in the spread of infec-
tious diseases such as smallpox, yellow fever, and similar diseases. These diseases
are so infectious that it is critical that their spread is detected as soon as they occur.
Preventing such attacks would be the ultimate goal. One option is to carry out mass
vaccination. But this would mean some health hazards to various groups of people. Our
challenge is to use technology to prevent and detect such deadly attacks. Technologies
would include sensor technology and data mining and data management technologies.
Attacks using chemical weapons are equally deadly. One could spray poisonous
gas and other chemicals into the air, water and food supplies. For example, various
dangerous chemical agents could be sprayed from the air on plants and crops. These
plants and crops could get into the food supply and kill millions. We have to develop
technologies to detect and prevent such deadly attacks. Another form of deadly attacks
is the nuclear attacks. Such attacks could wipe out the entire population in the world.
There are various nations developing nuclear weapons when they do not have the au-
thorization to develop them. That is, these weapons are being developed illegally. This
is what makes the world very dangerous. We have to develop technologies to detect
and prevent such deadly attacks.
In this section we have only briefly mentioned the various biological, chemical and
nuclear attacks. There are some good books that are being written about such terrorist
activities (see [4] and [5]). As we have stressed, we are not counter-terrorism experts;
nor have we studied the various types of terrorist attacks in any depth. Our information
is obtained from various newspaper articles and documentaries. Our main goal is to
examine various data mining techniques and see how they could be applied to detect
and prevent such deadly terrorist attacks. Data mining for counter-terrorism will be
discussed in sections 3.4 and 3.5.

3.3.6 Attacks on Critical Infrastructures


Attacks on critical infrastructures could cripple a nation and its economy. Infrastruc-
ture attacks include attacking the telecommunication lines, the electronic, power, gas,
reservoirs and water supplies, food supplies and other basic entities that are critical for
T HURAISINGHAM 203

the operation of a nation.


Attacks on critical infrastructures could occur during any type of attacks whether
they are non-information related, information related or bio-terrorism attacks. For ex-
ample, one could attack the software that runs the telecommunications industry and
close down all the telecommunications lines. Similarly software that rues the power
and gas supplied could be attacked. Attacks could also occur through bombs and ex-
plosives. That is, the telecommunication lines could be attacked through bombs. At-
tacking transportation lines such as highways and railway tracks are also attacks on
infrastructures.
As we have mentioned in Section 3.3.2, infrastructures could also be attacked by
natural disaster such as hurricanes and earth quakes. Our main interest here is the
attacks on infrastructures through malicious attacks both information related and non-
information related. Our goal is to examine data mining and related data management
technologies to detect and prevent such infrastructure attacks.

3.3.7 Non Real-time Threats vs. Real-time Threats

The threats that we have discussed so far can be grouped into two categories; non real-
time threats or real-time threats. In a way all threats are real-time as we have to act
in real-time once the threats have occurred. However, some threats are analyzed over
a period of time while some others have to be handled immediately. We discuss the
various threats here.
Consider for example the biological, chemical and nuclear threats. These threats
have to be handled in real-time. That is, the response to these threats have timing con-
strains. If smallpox virus is being spread maliciously, then we have to start vaccinations
immediately. Similarly if networks say for critical infrastructures are being attacked,
the response has to be immediate. Otherwise we could loose millions of lives and/or
millions of dollars.
There are some other threats that do not have to be handled in real-time. For ex-
ample consider the behavior of suspicious people such as those belonging to a certain
terrorist organization or those enrolling in flight training schools. In a way these people
are also planning attacks but sometime even they are not sure when they will attack.
Therefore, one has to monitor these people, analyze their behavior and predict their
actions. While there are timing constraints for these threats, the urgency is not as great
as say the spread of the smallpox virus. But one should be vigilant about these non
real-time threats also.
In general there is no way to say that A is a real-time threat and B is a non real-time
threat. A non real-time threat could turn into a real-time threat. For example, once
the terrorists had hijacked the airplanes on September 11, 2001, the threat became a
real-time threat as action had to be taken within say an hour.
204 C HAPTER T HREE

3.3.8 Aspects of Counter-Terrorism


3.3.8.1 Overview

Now that we have provided some discussion on various types of terrorist attacks includ-
ing non-information related terrorism, information related terrorism, bio-terrorism, etc.
we will discus what counter-terrorism is all about. Counter-terrorism is a collection of
techniques used to combat, prevent, and detect terrorism. Our goal in this paper is to
examine various data mining techniques to see how we can combat terrorism using
these techniques. In this section we will briefly discuss what counter-terrorism is all
about for the terrorist attacks discussed in the previous sections.
In Section 3.3.8.2 we discuss protecting from non-information related terrorism. In
section 3.3.8.3 we discuss protecting from information related terrorism. In particular,
we discuss various web security measures as well as other aspects such as intrusion
detection and access control, briefly. In section 3.3.8.4 we discuss protecting from
bio-terrorism and chemical attacks and nuclear attacks. In section 3.3.8.5 we discuss
protecting the critical infrastructures. We analyze counter-terrorism measures for non
real-time threats as well as for real-time threats in Section 3.3.8.6.

3.3.8.2 Protecting from Non-information Related Terrorism

As we have stated, non-information related counter-terrorism includes protecting from


bombings, explosions, vandalism and other kinds of terrorist attacks not involved with
computers. For example, hijacking an airplane and attacking buildings with airplanes
is a case of non-information related terrorist activity. The questions are how we do
protect against such terrorist attacks?
First of all we need to gather information about various scenarios and examples.
That is, we need to identify all kinds of terrorist acts that have occurred in history
starting from airplane hijacking to bombing of buildings. We also need to gather infor-
mation about those under suspicion. All of the data that we have gathered needs to be
analyzed to see if any patterns emerge.
We also need to ensure there are physical safety measures. For example, we need
to check the identity at airports or other places. We need to check for identity randomly
say in trains as well as routinely say at checkpoints. We need to check the belongings
of a person either randomly, routinely or if that person arouses suspicions to see if there
are dangerous weapons or chemicals in his/her belongings. We should also use sniff-
ing dogs, sensor devices to see if there are potentially hazardous materials. We need
surveillance cameras to see who is entering the building. These cameras should also
capture perhaps the facial expressions of various people. The data gathered from the
cameras should be analyzed further for suspicious behavior. We also need to enforce
access control measures at military bases and seaports.
In summary, several counter-terrorism measures have to be taken to combat non-
information related terrorism. These include information gathering and analysis, surveil-
lance, physical security and various other mechanisms. In the next few sections we will
examine the data mining techniques and see how they can detect and prevent such ter-
rorist attacks.
T HURAISINGHAM 205

3.3.8.3 Protecting from Information related terrorism

General Discussion
We will first provide an overview of counter-terrorism with respect to information re-
lated terrorism. We will give special consideration for security solution for the web
later on. Essentially protecting from information related terrorism is involved with de-
tecting and preventing malicious attacks and intrusions. These attacks could be attacks
due to viruses or spoofing or masquerading and stealing say information assets. These
attacks could also be attacks on databases and malicious corruption of data. That is,
terrorist attacks are not necessarily stealing and accessing unauthorized information.
They could also include malicious corruption and alteration of the data so that the data
will be of little or no use. Terrorist attacks also include credit card frauds and identity
thefts.
Various data mining techniques are being proposed for detecting intrusions as well
as credit card fraud. We will discuss them in later sections. Preventing malicious
attacks is more challenging. We need to design systems in such a way that malicious
attacks and intrusions are prevented. When an intruder attempts to attack the system,
the system would figure this out and alert the security officer. There is research being
carried out on secure systems design so that such intrusions are prevented. However
there is more focus on detecting such intrusions than prevention.
Enforcing appropriate access control techniques is also a way to enforce security.
For example, users may have certificates to access the information they need to carry
out the jobs that they are assigned to do. The organization should give the users no
more or no less privileges. There is much research on managing privileges and access
rights to various types of systems.
We have briefly discussed cyber security measures. We will discuss security solu-
tions for the web in more detail next. Note that there are also additional problems such
as the inference problem where users pose sets of queries and infer sensitive informa-
tion. This is also an attack. We will visit the inference problem later when we discuss
privacy.
Security Solutions for the Web
We need end-end-end security and therefore the components include secure clients,
secure servers, secure databases, secure operating systems, secure infrastructures, se-
cure networks, secure transactions and secure protocols. One needs good encryption
mechanisms to ensue that the sender and receiver communicate securely. Ultimately
whether it be exchanging messages or carrying out transactions, the communication
between sender and receiver or the buyer and the seller has to be secure. Secure client
solutions including securing the browser, securing the Java virtual machine, securing
Java applets, and incorporating various security features into languages such as Java.
Note that Java is not the only component that has to be secure. Microsoft has come up
with a collection of products including ActiveX and these products have to be secure
also. Securing the protocols include secure HTTP, the secure socket layer. Securing the
web server means the server has to be installed securely as well as it has to be ensured
that the server cannot be attacked. Various mechanisms that have been used to secure
operating systems and databases may be applied here. Notable among them are access
control lists, which specify which users have access to which web pages and data. The
206 C HAPTER T HREE

web servers may be connected to databases at the backend and these databases have
to be secure. Finally various encryption algorithms are being implemented for the net-
works and groups such as OMG (Object Management Group) are envisaging security
for middleware such as ORB (Object Request Brokers).
One of the challenges faced by the web mangers is implementing security policies.
One may have policies for clients, servers, networks, middleware, and databases. The
question is how do you integrate these policies? That is how do you make these policies
work together? Who is responsible for implementing these policies? Is there a global
administrator or are there several administrators that have to work together? Security
policy integration is an area that is being examined by researchers.
Finally, one of the emerging technologies for ensuring that an organization’s assets
are protected is firewalls. Various organizations now have web infrastructures for in-
ternal and external use. To access the external infrastructure one has to go through the
firewall. These firewalls examine the information that comes into and out of an orga-
nization. This way, the internal assets are protected and inappropriate information may
be prevented from coming into an organization. We can expect sophisticated firewalls
to be developed in the future. Other security mechanism includes cryptography.

3.3.8.4 Protecting from Bio-terrorism and Chemical Attacks


We discussed biological, chemical and nuclear threats in Section 3.3.5. In this sec-
tion we discuss counter-terrorism measures. First of all unlike say non information
related terrorism where bombing and shootings are fairly explicit, bio-terrorism and
even chemical attacks are not immediately obvious. Suppose a terrorist spreads the
smallpox virus, it takes time, at least a few days before the symptoms surface and few
more days before the diagnosis is made. By then it may be too late as millions of people
may be infected in trains and planes and large gatherings and meetings. The challenge
here is to prevent as well as detect such attacks as soon as possible.
Preventing such attacks could mean developing special sensors to sense that there
are certain viruses in the air. The sensors may also have to detect what these viruses are.
A cold virus may not be as harmful as a smallpox virus. If the disease has spread then
some quick actions have to be taken as to who and how many to vaccinate. Chemical
weapons may also be treated similarly. One needs sensors to detect as to who has these
weapons. Once the dangerous chemicals are spilt, we need to determine what other
agents do we spray to limit the damage caused by the chemicals. For example when
one spills acidic material, then one counters it by washing with soap-based materials.
In the case of nuclear attacks, we need to determine what nuclear weapons have
been used and then decide what actions to take. How do we evacuate the various
groups of people in an organized fashion? What medications do we give them? These
are very difficult challenges. Research activities are proceeding, but it will take a very
long time to find viable solutions.

3.3.8.5 Critical Infrastructure Protection


Next we discuss critical infrastructure protection. Our critical infrastructures are telecom-
munication lines, networks, water, food, gas electric lines, etc. Attacking the critical
T HURAISINGHAM 207

infrastructure could cripple businesses and the country. We need to determine the mea-
sures to be taken when the infrastructures are attacked.
Essentially the counter-measures include those developed for non information-
based terrorism as well as for information-based terrorism. For example one could
bomb the telecommunication lines or create viruses that would affect the telecom-
munications software. This means that communication through telephones as well
as computer communications that occurs through phone lines could be crippled. The
counter-measures developed for non information related terrorisms well for informa-
tion related terrorism could be applied here. We need to gather information about the
terrorist groups and extract patterns. We also need to detect any unauthorized intru-
sions. Our ultimate goal is to prevent such disastrous acts.
Even biological, chemical and nuclear weapons could attack the infrastructure of
the nation. For example our food supplies, water supplies and hospitals could be dam-
aged by biological warfare. Here again we need to examine the counter-terrorism mea-
sures for biological, chemical and nuclear attacks and apply them here.

3.3.8.6 Protecting from Non Real-time and Real-time Threats


In section 3.3.7 we discussed both non real-time and real-time threats. As we have
mentioned, it is difficult to state that A is a real-time threat and B is a non real-time
threat. Over time, a non real-time threat could become a real-time threat. Real-time
threats have to be handled in real-time. Example of a real-time threat is detecting and
preventing the spread of the smallpox virus.
When it comes to counter-measures for handling these threats, one needs to develop
techniques that meet timing constraints to handle real-time threats. For example, if data
mining is to be used to detect and prevent the malicious intrusions into say corporate
networks, then these data mining techniques have to give results in real-time. In the
case of non real-time threats, the data mining techniques could analyze the data and
make predictions that certain threats could occur say in July 2003.
In the next section we will revisit non real-time threats and real-time threats from
a data mining perspective. While real-time threats need immediate response, both non
real-time threats as well as real-time threats could be deadly and have to be taken
seriously.

3.4 Data Mining Applications in Counter-Terrorism


3.4.1 Overview
In the previous section we discussed various threats and counter-measures. In partic-
ular, we discussed non information related attacks such as bombings and explosions;
information related attacks such as cyber terrorism; biological, chemical and nuclear
attacks such as the spread of smallpox; and critical infrastructure attacks such as at-
tacks on power and gas lines. Counter-terrorism measures include ways of protecting
from non-information related attacks, information related attacks, biological, chemical
and nuclear attacks, as well as critical infrastructure attacks.
208 C HAPTER T HREE

In this section we will provide a high level overview of how web data mining as well
as data mining could help toward counter-terrorism. Note that we have used web data
mining and data mining sort of interchangeably as our definition of web data mining
goes beyond just mining structured data. We have included mining unstructured data,
mining for business intelligence, web usage mining and web structure mining as part
of web data mining. That is, in a way web data mining encompasses data mining.
As we have stated data mining could contribute towards counter-terrorism. We are
not saying that data mining will solve all our national security problems. However
the ability to extract hidden patterns and trends from large quantities of data is very
important for detecting and preventing terrorist attacks.
The organization of this section is as follows. Section 3.4.2 provides an overview
of web data mining for counter-terrorism. We will analyze the techniques in Section
3.4.3. A particular technique, called link analysis, that may be very important for
counter-terrorism applications will be given more consideration in Section 3.4.4. The
section is summarized in section ??.

3.4.2 Data Mining for Handling Threats


3.4.2.1 Overview

In Section 3.3 we grouped threats different ways. One grouping was whether they were
based on information related or non-information related. It was somewhat artificial, as
we need information for all types of threats. However in our terminology, information
related threats were threats dealing with computers; some of these threats were real-
time threats while some others were non real-time threats. Even here the grouping
was somewhat arbitrary, as a non real-time threat could become a real-time threat. For
example, one could suspect that a group of terrorists will eventually perform some act
of terrorism. However when we set time bounds such as a threat will likely occur say
before July 1, 2003, then it becomes a real-time threat and we have to take actions
immediately. If the time bounds are tighter such as a threat will occur within two days
then we cannot afford to make any mistakes in our response.
The purpose of this section is to examine both the non real-time threats and real-
time threats and see how data mining in general and web data mining in particular could
handle such threats. Again we want to stress that web data mining in our terminology
encompasses data mining as it deals with data mining on the web as well as mining
structured and unstructured data. Furthermore, we are assuming that much of the data
will be on the web whether they be public networks such as the Internet or private
networks such as corporate intranets. Therefore, we are using the terms data mining
and web data mining interchangeably. In section 3.4.2.2 we discuss non real-time
threats and in section 3.4.2.3 we discuss real-time threats. We will refer to the specific
examples that we have mentioned in the previous section in our discussions as needed.
Section 3.4.3 will examine the various data mining outcomes and techniques and see
how they can help toward counter-terrorism. Some very good articles on data mining
for counter-terrorism have been presented at the Security Informatics Workshop held
in June 2003 (see [6]).
T HURAISINGHAM 209

3.4.2.2 Non Real-time Threats

Non real-time threats are threats that do not have to be handled in real-time. That is,
there are no timing constraints for these threats. For example, we may need to collect
data over months, analyze the data and then detect and/or prevent some terrorist attack,
which may or may not occur. The question is how does data mining help towards such
threats and attacks? As we have stressed in [14], we need good data to carry out data
mining and obtain useful results. We also need to reason with incomplete data. This is
the big challenge, as organizations are often not prepared to share the data. This means
that the data mining tools have to make assumptions about the data belonging to other
organizations. The other alternative is to carry out federated data mining under some
federated administrator. For example, the Homeland security department could serve
as the federated administrator and ensure that the various agencies have autonomy but
at the same time collaborate when needed.
Next, what data should we collect? We need to start gathering information about
various people. The question is, who? Everyone in the world? This is quite impossible.
Nevertheless we need to gather information about as many people as possible; because
sometimes even those who seem most innocent may have ulterior motives. One possi-
bility is to group the individuals depending on say where they come from, what they
are doing, who their relatives are etc. Some people may have more suspicious back-
grounds than others. If we know that someone has had a criminal record, then we need
to be more vigilant about that person.
Again to have complete information about people, we need to gather all kinds of
information about them. This information could include information about their behav-
ior, where they have lived, their religion and ethnic origin, their relatives and associates,
their travel records etc. Yes, gathering such information is a violation to one’s privacy
and civil liberties. The question is what alternative do we have? By omitting informa-
tion we may not have the complete picture. From a technology point of view, we need
complete data not only about individuals but also about various events and entities. For
example, suppose I drive a particular vehicle and information is being gathered about
me. This will also include information about my vehicle, how long I have driven, do I
have other hobbies or interests such as flying airplanes, have I enrolled in flight schools
and asked the instructor that I would like to learn to fly an airplane, but do not care
learning about take-offs or landings, etc.
Once the data is collected, the data has to be formatted and organized. Essentially
one may need to build a warehouse to analyze the data. Data may be structured or
unstructured data. Also, there will be some data that is warehoused that may not be of
much use. For example, the fact that I like ice cream may not help the analysis a great
deal. Therefore, we can segment the data in terms of critical data and non-critical data.
Once the data is gathered and organized, the next step is to carry out mining. The
question is what mining tools to use and what outcomes to find? Do we want to find
associations or clusters? This will determine what our goal is. We may want to find
anything that is suspicious. For example, the fact that I want to learn flying without
caring about take-off or landing should raise a red flag as in general one would want to
take a complete course on flying. In Section 3.4.3 we discuss the various outcomes of
interest to counter-terrorism activities. Once we determine the outcomes we want, we
210 C HAPTER T HREE

determine the mining tools to use and start the mining process.
Then comes the very hard part. How do we know that the mining results are use-
ful? There could be false positives and false negatives. For example, the tool could
incorrectly produce the result that John is planning to attack the Empire State Building
on July 1, 2003. Then the law enforcement officials will be after John and the con-
sequences could be disastrous. The tool could also incorrectly product the result that
James is innocent when he is in fact guilty. In this case the law enforcement officials
may not pay much attention to James. The consequence here could be disastrous also.
As we have stated we need intelligent mining tools. At present we need the human
specialists to work with the mining tools. If the tool states that John could be a ter-
rorist, the specialist will have to do some more checking before arresting or detaining
John. On the other hand if the tool states that James is innocent, the specialist should
do some more checking in this case also.
Essentially with non real-time threats, we have time to gather data, build say pro-
files of terrorists, analyze the data and take actions. Now, a non real-time threat could
become a real-time threat. That is, the data mining tool could state that there could
be some potential terrorist attacks. But after a while, with some more information, the
tool could state that the attacks will occur between September 10, 2001 and September
12, 2001. Then it becomes a real-time threat. The challenge will then be to find exactly
what the attack will be? Will it be an attack on the World Trade Center or will it be an
attack on the Tower of London or will it be an attack on the Eiffel Tower? We need data
mining tools that can continue with the reasoning as new information comes in. That
is, as new information comes in, the warehouse needs to get updated and the mining
tools should be dynamic and take the new data and information into consideration in
the mining process.

3.4.2.3 Real-time Threats


In the previous section we discussed non real-time threats where we have time to han-
dle the threats. In the case of real-time threats there are timing constraints. That is,
such threats may occur within a certain time and therefore we need to respond to it im-
mediately. Example of such threats are the spread of smallpox virus, chemical attacks,
nuclear attacks, network intrusions, bombing of a building before 9am in the morning,
etc. The question is what type of data mining techniques do we need for real-time
threats?
By definition, data mining works on data that has been gathered over a period of
time. The goal is to analyze the data and make deductions and predict future trends.
Ideally it is used as a decision support tool. However, the real-time situation is entirely
different. We need to rethink the way we do data mining so that the tools can give out
results in real-time.
For data mining to work effectively, we need many examples and patterns. We
use known patterns and historical data and then make predictions. Often for real-time
data mining as well as terrorist attacks we have no prior knowledge. For example, the
attack on the world trade center came as a surprise to many of us. As ordinary citizens,
no way could we have imagined that the buildings would be attacked by air planes.
Another good example is the recent sniper attacks in the Washington DC area. Here
T HURAISINGHAM 211

again many of us could never have imagined that the sniper would do the shootings
from the trunk of a car. So the question is, how do we train the data mining tools such
as neural networks without historical data? Here we need to use hypothetical data as
well as simulated data. We need to work with counter-terrorism specialists and get as
many examples as possible. Once we gather the examples and start training the neural
networks and other data mining tools, the question is what sort of models do we build?
Often the models for data mining are built before hand. These models are not dynamic.
To handle real-time threats, we need the models to change dynamically. This is a big
challenge.
Data gathering is also a challenge for real-time data mining. In the case of non real-
time data mining, we can collect data, clean data, format the data, build warehouses and
then carry out mining. All these tasks may not be possible for real-time data mining
as there are time constraints. Therefore, the questions are what tasks are critical and
what tasks are not? Do we have time to analyze the data? Which data do we discard?
How do we build profiles of terrorists for real-time data mining? We need real-time
data management capabilities for real-time data mining.
From the pervious discussion it is clear that a lot has to be done before we can ef-
fectively carry out real-time data mining. Some have argued that there is no such thing
as real-time data mining and it will be impossible to build models in real-time. Some
others have argued that without real world examples and historical data we cannot do
effective data mining. These arguments may be true. However our challenge is to then
perhaps redefine data mining and figure out ways to handle real-time threats.
As we have stated, there are several situations that have to be managed in real-
time. Examples are the spread of smallpox, network intrusions, and even analyzing
data emanating from sensors. For example, there are surveillance cameras placed in
various places such as shopping centers and in front of embassies and other public
places. The data emanating from the sensors have to be analyzed in many cases in
real-time to detect/prevent attacks. For example, by analyzing the data, we may find
that there are some individuals at a mall carrying bombs. Then we have to alert the
law enforcement officials so that they can take actions. This also raises the questions
of privacy and civil liberties. The questions are what alternatives do we have? Should
we sacrifice privacy to protect the lives of millions of people? As stated in [12] we
need technologists, policy makers and lawyers to work together to come up with viable
solutions. We will revisit privacy in section 3.5.

3.4.3 Analyzing the Techniques


In section 3.4.2 we discussed data mining both for non real-time threats as well as real-
time threats. As we have mentioned, applying data mining for real-time threats is a
major challenge. This is because the goal of data mining is to analyze data and make
predictions and trends. Current tools are not capable of making the predictions and
trends in real-time, although there are some real-time data mining tools emerging and
some of them have been listed in [16]. The challenge is to develop models in real-time
as well as get patterns and trends based on real world examples.
In this section we will examine the various data mining outcomes and discuss how
they could be applied for counter-terrorism. Note that the outcomes include making
212 C HAPTER T HREE

associations, link analysis, forming clusters, classification and anomaly detection. The
techniques that result in these outcomes are techniques based on neural networks, de-
cisions trees, market basket analysis techniques, inductive logic programming, rough
sets, link analysis based on the graph theory, and nearest neighbor techniques. As we
have stated in [14], the methods used for data mining are top down reasoning where
we start with a hypothesis and then determine whether the hypothesis is true or bottom
up reasoning where we start with examples and then come up with a hypothesis.
Let us start with association techniques. Examples of these techniques are market
basket analysis techniques. The goal is to find which items go together. For exam-
ple, we may apply a data mining tool to data that has been gathered and find that
John comes from Country X and he has associated with James who has a criminal
record. The tool also outputs the result that an unusually large percentage of people
from Country X have performed some form of terrorist attacks. Because of the asso-
ciations between John and Country X, as well as between John and James, and James
and criminal records, one may need to conclude that John has to be under observation.
This is an example of an association. Link analysis is closely associated with making
associations. While association-rule based techniques are essentially intelligent search
techniques, link analysis uses graph theoretic methods for detecting patterns. With
graphs (i.e. node and links), one can follow the chain and find links. For example A
is seen with B and B is friends with C and C and D travel a lot together and D has a
criminal record. The question is what conclusions can we draw about A? Link analysis
is becoming a very important technique for detecting abnormal behavior. Therefore,
we will discuss this technique in a little more detail in the next section.
Next let us consider clustering techniques. One could analyze the data and form
various clusters. For example, people with origins from country X and who belong to a
certain religion may be grouped into Cluster I. People with origins from country Y and
who are less than 50 years old may form another Cluster II. These clusters are formed
based on their travel patterns or eating patterns or buying patterns or behavior patterns.
While clustering divides the population not based on any pre-specified condition, clas-
sification divides the population based on some predefined condition. The condition is
found based on examples. For example, we can form a profile of a terrorist. He could
have the following characteristics: Male less than 30 years of a certain religion and
of a certain ethnic origin. This means all males under 30 years belonging to the same
religion and the same ethnic origin will be classified into this group and could possibly
be placed under observation.
Another data mining outcome is anomaly detection. A good example here is learn-
ing to fly an airplane without wanting to learn to takeoff or land. The general pattern
is that people want to get a complete training course in flying. However there are now
some individuals who want to learn flying but do not care about take off or landing.
This is an anomaly. Another example is John always goes to the grocery store on
Saturdays. But on Saturday October 26, 2002 he goes to a firearms store and buys a
rifle. This is an anomaly and may need some further analysis as to why he is going
to a firearms store when he has never done so before. Is it because he is nervous after
hearing about the sniper shootings or is it because he has some ulterior motive? If he is
living say in the Washington DC area, then one could understand why he wants to buy
a firearm, possibly to protect him. But if he is living in say Socorro, New Mexico, then
T HURAISINGHAM 213

his actions may have to be followed up further.


As we have stated, all of the discussions on data mining for counter-terrorism have
consequences when it comes to privacy and civil liberties. As we have mentioned
repeatedly, what are our alternatives? How can we carry out data mining and at the
same time preserve privacy? We revisit privacy in section 3.5.

3.4.4 Link Analysis


In this section we discuss a particular data mining technique that is especially useful
for detecting abnormal patterns. This technique is link analysis. There have been many
discussions in the literature on link analysis. In fact, one of the earlier books on data
mining by Berry and Linoff [2] discussed link analysis in some detail. As mentioned
in the previous section, link analysis uses various graph theoretic techniques. It is
essentially about analyzing graphs. Note that link analysis is also used in web data
mining, especially for web structure mining. With web structure mining the idea is to
mine the links and extract the patterns and structures about the web. Search engines
such as Google use some form of link analysis for displaying the results of a search.
As mentioned in [2], the challenge in link analysis is to reduce the graphs into
manageable chunks. As in the case of market basket analysis, where one needs to carry
out intelligent searching by pruning unwanted results, with link analysis one needs to
reduce the graphs so that the analysis is manageable and not combinatorially explosive.
Therefore results in graph reduction need to be applied for the graphs that are obtained
by representing the various associations. The challenge here is to find the interesting
associations and then determine how to reduce the graphs. Various graphs theoreticians
are working on graph reduction problems. We need to determine how to apply the
techniques to detect abnormal and suspicious behavior.
Another challenge on using link analysis for counter-terrorism is reasoning with
partial information. For example, agency A may have a partial graph, agency B another
partial graph and agency C a third partial graph. The question is how do you find the
associations between the graphs when no agency has the complete picture? One would
ague that we need a data miner that would reason under uncertainty and be able to
figure out the links between the three graphs. This would be the ideal solution and
the research challenge is to develop such a data miner. The other approach is to have
an organization above the three agencies that will have access to the three graphs and
make the links. One can think of this organization to be the Homeland security agency.
In the next section as well as in some of the ensuing sections we will discuss various
federated architectures for counter-terrorism.
We need to conduct extensive research on link analysis as well as on other data
and web data mining techniques to determine how they can be applied effectively for
counter-terrorism. For example, by following the various links, one could perhaps trace
say the financing of the terrorist operations to the president of say country X. Another
challenge with link analysis as well with other data mining techniques is having good
data. However for the domain that we are considering much of the data could be
classified. If we are to truly get the benefits of the techniques we need to test with actual
data. But not all of the researchers have the clearances to work on classified data. The
challenge is to find unclassified data that is a representative sample of the classified
214 C HAPTER T HREE

data. It is not straightforward to do this, as one has to make sure that all classified
information, even through implications, is removed. Another alternative is to find as
good data as possible in an unclassified setting for the researchers to work on. However,
the researchers have to work not only with counter-terrorism experts but also with data
mining specialists who have the clearances to work in classified environments. That is,
the research carried out in an unclassified setting has to be transferred to a classified
setting later to test the applicability of the data mining algorithms. Only then can we
get the true benefits of data mining.

3.5 A Note on Privacy


In section 3.4 we briefly mentioned the challenges to privacy due to data mining. There
has been much debate recently among the counter-terrorism experts and civil liberties
unions and human rights lawyers about the privacy of individuals. That is, gathering
information about people, mining information about people, conduction surveillance
activities and examining say e-mail messages and phone conversations are all threats
to privacy and civil liberties. However, what are the alternatives if we are to combat
terrorism effectively? Today we do not have any effective solutions. Do we wait until
privacy violations occur and then prosecute or do we wait until national security dis-
asters occur and then gather information? What is more important? Protecting nations
from terrorist attacks or protecting the privacy of individuals? This is one of the ma-
jor challenges faced by technologists, sociologists and lawyers. That is, how can we
have privacy but at the same time ensure the safety of nations? What should we be
sacrificing and to what extent?
The challenge is to provide solutions to enhance national security but at the same
time ensure privacy. There is now research at various laboratories on privacy en-
hanced sometimes called privacy sensitive data mining (e.g., Agrawal at IBM Almaden,
Gehrke at Cornell University and Clifton at Purdue University, see for example [1,3,9]).
The idea here is to continue with mining but at the same time ensure privacy as much
as possible. For example, Clifton has proposed the use of the multiparty security policy
approach for carrying out privacy sensitive data mining. While there is some progress
we still have a long way to go. Some useful references are provided in [3] (see also [8]).
An approach we are proposing is to process privacy constraints in a database man-
agement system. Note that one mines the data and extracts patterns and trends. The
privacy constraints determine which patterns are private and to what extent. For exam-
ple, suppose one could extract the names and healthcare records. If we have a privacy
constraint that states that names and healthcare records are private then this information
is not released to the general public. If the information is semi-private, then it is re-
leased to those who have a need to know. Essentially the inference controller approach
we have discussed in [15] is one solution to achieving some level of privacy. It could
be regarded to be a type of privacy sensitive data mining. In [13] we have proposed an
approach to handle privacy constraints during query, update and database design oper-
ations. Also recently IBM Almaden Research Center is developing a similar approach
to privacy management. They call their approach hypocritical databases (see [7]).
Note that not all approaches to privacy enhanced data mining are the same. Re-
T HURAISINGHAM 215

searchers are taking different approaches to such data mining. Some have argued that
privacy enhanced data mining may be time consuming and may not be scalable. How-
ever we need to investigate this area more before we can come up with viable solutions.

3.6 Summary and Directions


We first provided an overview of some of the challenges for applying data mining for
counter-terrorism. These include eliminating false positives and false negatives, mul-
timedia data mining, real-time data mining and privacy. Next we discussed various
threats. That is, we provided a fairly broad overview of various aspects of threats and
counter-terrorism measures. First we discussed natural disasters and human errors.
Then we divided the threats into various groups including non-information related ter-
rorism, information related terrorism and biological, chemical, and nuclear threats.
We also discussed critical infrastructure threats. Next we discussed counter-terrorism
measures for all types of threats. For example, we need to gather information about
terrorists and terrorist groups, mine the information and extract patterns. In the case of
bio-terrorism, we need to prevent terrorist attacks with say with the use of sensors.
Next we provided a rather broad overview of data mining for counter-terrorism.
We have used the terms data mining and web data mining interchangeably. Again we
can expect much of the data to be on the web, whether they be on the Internet or on
corporate intranets, and therefore, mining the data sources and databases on the web to
detect and prevent terrorist attacks will become a necessity. These databases could be
public databases or private databases.
First we discussed data mining for non real-time threats. The idea here is to gather
data, build profiles or terrorists, learn from examples and then detect as well as prevent
attacks. The challenge here is to find real world examples as in many cases a particular
attack has not happened before. Next we discussed real-time data mining. Here the
challenge is to build models in real-time. Finally we discussed data mining outcomes
and techniques for counter-terrorism as well as focused on link analysis for counter-
terrorism.
We are not counter-terrorism experts. Our discussions on counter-terrorism are
based on various newspaper articles and documentaries. Our goal is to explore how
data mining can be exploited for counter-terrorism. We want to raise the awareness
that data mining could possibly help detect and prevent terrorist attacks. Again this
area is a new area. Lot of research needs to be done. It should be noted that we also
need to make sure that the data mining tools produce accurate and useful results. For
example, if there are false positives, the effects could be disastrous. That is, we do not
want to investigate someone who is innocent. This will raise many privacy concerns.
We also do not want the data mining tools to give out false negatives. We hope that this
paper will spawn interesting ideas so that researchers and practitioners start or continue
to work on data mining and apply the techniques for counter-terrorism.
We also provided an overview of some of the privacy concerns and discussed the
directions in privacy preserving data mining and privacy constraint processing. There
are many discussions now on privacy preserving approaches as we need to continue
with this research and develop viable solutions that can carry out useful mining and at
216 C HAPTER T HREE

the same time ensure privacy.


Data mining and web data mining technologies will have a significant impact on
counter-terrorism. As we are seeing, one of the major concerns of our nation today is to
detect and prevent terrorist attacks. This is also becoming the goal of many nations in
the world. We need to examine the various data mining and web mining technologies
and see how they can be adapted for counter-terrorism. We also need to develop special
web mining techniques for counter-terrorism. As we have stressed in [14], we expect
much of the data to be on the web. The web could be the Internet or Intranets. Analysts
will have to collaborate via the web within an agency or between agencies. Also, the
founding of the Homeland security department perhaps may have an impact on how
data mining will be carried out.
In addition to improving on data mining and web mining techniques and adapting
them for counter-terrorism, we also need to focus of federated data mining. We can
expect agencies to collaboratively work together. They will have to share the data as
well as mine the data collaboratively. We can expect to see an increased interest in
federated data mining. In this paper we have discussed just the high level ideas. We
need to explore the details.
Some other areas of interest include multilingual data mining. Terrorism is not con-
fined to one country and it has no borders. There is terrorism everywhere and carried
out by people from different countries speaking different languages. We need technolo-
gies to understand the various languages as well as mine the text in different languages.
We also need translators to translate one language to another before mining. We also
need language experts to work with technologists for multilingual data management
and mining. Note that terrorists may come from different countries and speak different
languages. We need to understand their language without any ambiguity.
As we have stressed in [14], we cannot forget about privacy. National security
measures will mean violating privacy and civil liberties. We cannot abandon our quest
for eliminating terrorism. However, we also have to be sensitive to the privacy of in-
dividuals. This will be a major challenge. We need to develop techniques for privacy
sensitive data sharing and data mining.

Disclaimer: The views and conclusions expressed in this paper are those of the
author and do not reflect the policies or procedures of the MITRE Corporation or of
the National Science Foundation.
Bibliography

[1] R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of


the ACM SIGMOD Conference, Dallas, TX, May 2000.
[2] M. Berry and G. Linoff. Data Mining Techniques for Marketing, Sales, and Cus-
tomer Support. John Wiley, New York, 1997.
[3] C. Clifton, M. Kantarcioglu, and J. Vaidya. Defining privacy for data mining.
Technical report, Purdue University, 2002. (see also Next Generation Data Min-
ing Workshop, Baltimore, MD, November 2002).
[4] H. Ellison. Handbook of Chemical and Biological Warfare Agents. CRC Press,
1999.
[5] F. Bolz et al. The Counterterrorism Handbook: Tactics, Procedures, and Tech-
niques. CRC Press, 2001.
[6] H. Chen et al. In Proceedings of the 1st Conference on Security Informatics,
Tucson, AZ, June 2003.
[7] R. Agrawal et. al. Hypocritical databases. In Proceedings of VLDB, 2002.
[8] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving min-
ing of association rules. In Proceedings of the Eighth ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, Edmonton, Al-
berta, Canada, July 2002.
[9] J. Gehrke. Research problems in data stream processing and privacy-preserving
data mining. In Proceedings of the Next Generation Data Mining Workshop,
Baltimore, MD, November 2002.
[10] A. Ghosh. Ecommerce Security, Weak Links and Strong Defenses. John Wiley,
New York, 1998.
[11] B. Thuraisingham. Managing threats to web databases and cyber systems: Issues,
solutions and challenges. In V. Kumar et al, editor, Cyber Security: Threats and
Countermeasures. Kluwer.
[12] B. Thuraisingham. Data mining, national security, privacy and civil liberties.
SIGKDD Explorations, January 2003.

217
218 C HAPTER T HREE

[13] B. Thuraisingham. Privacy constraint processing in a database management sys-


tem. (accepted to be published) Data and Knowledge Engineering Journal, 2003.

[14] B. Thuraisingham. Web Data Mining Technologies and Their Applications in


Business Intelligence and Counter-terrorism. CRC Press, June 2003.

[15] B. Thuraisingham and W. Ford. Security constraint processing in a multilevel


distributed database management system. IEEE Transactions on Knowledge and
Data Engineering, April 1995.

[16] http://www.kdnuggets.com.

Das könnte Ihnen auch gefallen