Sie sind auf Seite 1von 11

Behavioral Biometrics and Machine Learning

to Secure Website Logins

Falaah Arif Khan1(&), Sajin Kunhambu2, and K. Chakravarthy G2


1
DES India, Dell, Bangalore, India
Falaah_Arif_Khan@dell.com
2
DCS DCP India, Dell, Bangalore, India
{Sajin_Kunhambu,k_chakravarthy_g}@dell.com

Abstract. In a world dominated by e-commerce and electronic transactions, the


business value of a secure website is immeasurable. With the ongoing wave of
Artificial Intelligence and Big Data, hackers have far more sophisticated tools at
their disposal to orchestrate identity fraud on login portals. Such attacks bypass
static security rules and hence protection against them requires the use of
machine learning based ‘intelligent’ security algorithms. This paper explores the
use of client behavioral biometrics to secure website logins. A client’s mouse
dynamics, keystrokes and click patterns during login are used to create a cus-
tomized security model for each user that can differentiate the user of interest
from any other impersonator. Such a model, combined with existing protocols,
will provide enhanced security for the user’ profile, even if credentials are
compromised. The module first employs a means of collecting relevant
behavioral data from the client side when a new account is created. The col-
lection module can easily be integrated with any web application without
impacting website performance. After sufficient collection of login data, a
biometric-based fraud detection algorithm is created that secures the account
against future impersonators. Our choice of algorithms is the Multilayer Per-
ceptron, Support Vector Machine and Adaptive Boosting, the outcomes of
which are polled to give the prediction. We find that such a model shows good
performance (accuracy, precision and recall) for different train: test splits.
Moreover, the model is easily implementable for any web based authentication,
is scalable and can be fully automated, if a dataset like ours can be created from
client activity on the web application of interest.

Keywords: Behavioral biometrics  Machine learning  Artificial Intelligence 


Login fraud  Intelligent security  Keystroke  Mouse movements 
Multilayer Perceptron  Support vector machine  Adaptive Boosting

1 Introduction

Net bots and other ‘intelligent’ methods at the disposal of malicious users to perform
fraudulent logins on websites make user information susceptible to misuse. In an era
where online transactions drive sales, such attacks cost millions of dollars to the
business. Dictionary and other brute force attacks easily bypass static security rules and
put user information in malicious hands. Once credentials have been compromised

© Springer Nature Singapore Pte Ltd. 2019


S. M. Thampi et al. (Eds.): SSCC 2018, CCIS 969, pp. 667–677, 2019.
https://doi.org/10.1007/978-981-13-5826-5_52
668 F. Arif Khan et al.

intruders can perform multiple subsequent malicious logins that go virtually undetected
during authentication. The most relevant, generally overlooked and underused infor-
mation from the client side is behavioral, namely; the mouse dynamics, keystrokes and
click patterns of the user. We propose a model that would secure user accounts even if
credentials have been compromised, that is based on behavioral biometrics of the user
during login. The model would first employ a means of collecting relevant behavioral
data from the client side at login to create a unique template. Then a fraud detection
model is created. It consists of three separate modules, namely; the Multilayer Per-
ceptron, Support Vector Machine and Adaptive Boosting, the outcomes of which are
polled to give an optimal prediction, real time, while the user is logging in.
Key metrics of the work have been identified as: making a model that accounts for
the sensitive nature of the dataset; while the algorithm is being created, the account
would have the default level of security, making it susceptible to attacks by imposters.
Hence, our model should be designed such that its creation should be possible with a
reasonably small amount of data. Ensuring that the model should not be computa-
tionally expensive is another important metric. Fraud detection needs to be done in real
time and evaluation of the model should not impact login performance on the
respective website. Lastly, the model should be easily scalable; by fixing the archi-
tecture of the model and accommodating easy scalability of our application we allow
for automation of the creation of the detection model for each new user/ account on the
respective website.

2 Literature Review

Research on security applications that use behavioral information of the client for
authentication have identified two sources of relevant data; mouse movements and
keystrokes; which are together termed as behavioral biometrics. There has been a
myriad of applications that rely on behavioral biometrics and these use logs of mouse
movements and keystrokes in isolation as well as in a combination of the two.
The use of such applications has varied from being a method of re-authentication
[1], to replacing conventional password type logins [2], to adding additional layers of
security [4–6, 11]. The authors of [1] use behavioral biometrics for re-authentication
and not as the first wall of security. The authors of [2] propose a twofold security
system, where a keystroke based template is the first level of authentication, and mouse
movements is the second. However, such a system is not based on passive authenti-
cation; where the biometrics work to complement existing security protocols. The
authors instead use a keystroke template as the entity against which authentication is
provided. Similarly, a template of a unique mouse movement is used as a ‘password’ at
the second level of security. Another kind of application of such a model in seen in [4],
where the authors seek to use such a model for Data Loss Prevention by predicting the
identity of a data creator. This can be contrasted with [5, 6], which focus more on user
profile identification in web applications.
Looking at the kind of features extracted from the client’s behavioral biometrics in
all these applications, we see certain fundamental similarities. Most literature [1, 4, 6]
using mouse movements identifies 8 classes into which each mouse event can be
Behavioral Biometrics and Machine Learning 669

classified into, based on the relative direction of movement. We find this feature
engineering to be well researched and proven to be meaningful and hence decide to
extract similar features from mouse event data. Use of keystroke biometrics mostly
focus on monogram and di-grams, i.e. dwell time and flight time. Researchers, how-
ever, have not tried to extract features from click patterns of the user and so we see our
work contributing to new features that can be extracted from login activity.
There is a fundamental lack of open source datasets for such applications. More-
over, seeing as this a customized form of security for a specific web application, most
authors choose to make their own data. In [2], the authors explain how, for data
collection, each user was made to login 10 times, from which features were extracted
and a template was created.
In [4], the model created is implemented as a software agent that resides on user’s
desktop. For data collection for model creation, the authors explain how an organi-
zation can mandate its employees to install this agent and require them to run the
software in the background of operating system every time they use a computer. This
kind of organizationally mandated data collected would enable the agent to record and
analyze the user’s keystroke and mouse movement behavior over more than just login
activity.
For data collection, the authors of [5] made each user log in a web site as them-
selves (genuine user) or other users (intruder). For each login session, the logging type
was recorded. Credentials are shared with all users, to allow for impersonation attack.
In total, 24 users with different background and computer skills participated in the data
collection, giving rise to logs of 193 legitimate visits and 101 intrusive visits. In [6], a
total of 25 subjects were asked to come up with a new password. Each subject or owner
typed this password 150 to 400 times during a period of several days, and the last 75
timing vectors collected were set aside for testing. The remaining timing vectors were
used to train the network. A total of 15 imposters were given all the 21 passwords and
asked to type each password five times, resulting in 75 imposter test vectors for each
password. Combined with the owner’s 75 test vectors previously set aside, a total of
150 test vectors per password were obtained. That is a significant amount of login
activity that needs to be performed at default (reduced) level of security, before the
intelligent fraud detector can be created. In all these papers, like in [6, 12], we see a
significant amount of data being logged, and that gave us a metric of how much login
data would be required to create a reasonably good model.
Some preliminary analysis was also done on the few datasets that are publicly
available. For mouse movements, the balabit/mouse dynamics challenge dataset [7]
was used. The goal of the challenge was to protect a set of users from the unauthorized
usage of their accounts by learning the characteristics of how they use their mouse. The
dataset contained timing and positioning information of mouse pointers of different
users, from multiple sessions on a web application. For the purpose of collecting data, a
network monitoring device was set between the client and the remote computer that
inspected all traffic. This included the mouse interactions of the user that is transmitted
from the client to the server during the remote session. Hence, the dataset contained the
following fields: record timestamp: elapsed time (in sec) since the start of the session as
recorded by the network monitoring device, client timestamp: elapsed time (in sec)
since the start of the session as recorded by the RDP client, button: the current
670 F. Arif Khan et al.

condition of the mouse buttons, state: additional information about the current state of
the mouse, x: the x coordinate of the cursor on the screen and y: the y coordinate of the
cursor on the screen. Work on this dataset helped understand what kind of features can
be meaningful for a model based on mouse movements.
We then worked on the open source datasets on keystrokes. The Benchmark Data
Set (by Kevin Killourhy and Roy Maxion) released as an accompaniment to [8]
contains the timing data for 51 typists all typing the same word. The dataset contains
the flight and dwell time for the predefined password, and hence no preprocessing or
feature engineering was required before using the dataset. The BeiHang Keystroke
Dynamics Database (released as an accompaniment to [9]) was another dataset we did
extensive work on, before making our own dataset and model. The dataset contains
2057 test samples and 556 train samples, taken from 117 subjects, divided into two
subsets, based on the collection environment. The keystrokes for a particular session
were read as a sequence of PiRi vectors, where Pi and Ri represent the press and release
time of the ith key of the password. With this dataset we had the liberty to extract as
many ngram features as we wanted.
Our work on these available datasets was a verification of the documented success
of feature engineering of client biometrics. Moreover, this investigation showed that
relevant work has been done either on a mouse dynamics dataset or on a keystroke one.
Our dataset, which exploits the valuable information present in both of these sources,
would allow for a more wholistic dataset for such applications.
A metric of success of such a project, is its behavior across users. The authors of [1]
successfully verify the scalability of a biometric based solution by proving its working
for multiple users and across environments. The authors of [2] keep in mind the fact
that their model, while analyzing the user’s keystroke and mouse movement behavior,
needs to track the strokes on his keyboard and the movement of his mouse without
influencing user’s work. Hence, they show the practical implementation of such models
as a software agent that resides on user’s desktop.
Another practical consideration would be the variation in passwords that users
choose to keep. The authors of [5] identify that limited amount of work has been
accomplished on free text detection and make accommodations for dealing with free
text and free mouse movements, and the fact that many web sessions tend to be very
short. Our work focuses on a customized model for each user and hence overcomes the
limitation of fixed password length. By formulating this problem as N, binary classifiers
(one user of interest, vs all other users) instead of an N-class classifier, we accom-
modate for variable lengths of credentials.
Considerations on practicality are further answered in [6], where the dispropor-
tionation between data labels is identified. The authors highlight how such a problem is
a binary classification problem (owner vs. imposters) problem, yet the patterns from
only one class, the owner’s are available in advance. Since there are millions of
potential imposters, it is not practical to obtain enough patterns from all kinds of
imposters. Also, it is not practically feasible to publicize credentials in order to collect
potential imposters’ timing vectors. Hence, they propose that the only solution is to
build a model of the owner’s keystroke dynamics and use this to detect imposters using
some sort of a similarity measure. They show us how our problem is that of a “partially
exposed environment” or “novelty detection”.
Behavioral Biometrics and Machine Learning 671

Another consideration the authors of [6, 11] point out is the situation where a new
password has been registered. This would require new data to be collected for a new
model to be created. During this time the proposed identity verification cannot be used
and so an ordinary level of security can be maintained with the conventional password
security system. The length of the collection period can be dynamically determined by
monitoring the variability of typing patterns. Moreover, for each password or user, a
separate model must be constructed. Also, whenever a user changes his or her pass-
word, a new model needs to be built.
Looking at the type of detection algorithms used in literature, we see a variety of
outcomes. The authors of [2] create a classifier that verifies the similarity between the
pattern to be verified and the template of the prototypes (created from the collected
logs), using the Distance Pattern between the vector of feature of the pattern and the
prototype. In [4] they employ the use of separate Support Vector Classifiers on mouse
dynamic features and for keystrokes. The authors of [5] show the applicability of
Bayesian Networks in such a problem. The authors of [6] use an Auto Associative
Neural Network for novelty detection. [3] summarizes other similar works, covering
techniques like the Monte Carlo approach for data collection to Gaussian probability
density function, direction similarity measure, parallel decision trees, etc. for classifi-
cation. The authors of [15] propose an interesting notion that there exist keystroke
classes, akin to blood types, that people can be classified into and hence find suitable
the implementation of a clustering model for keystrokes, as done in [10]. Seeing as the
performance of these algorithms differs based on their inherent biases and variances we
identified the need for a polling or ensemble of conventional algorithms.

3 Dataset and Features

3.1 Dataset Description


The dataset was created by mimicking login activity at a dummy login page, by 8
different typists; 1 true user and 7 imposters, all entering the same credentials into the
created portal. A total of 102 login sessions were recorded out of which 65 sessions
were of the true user and 37 were sessions of fraudulent login attempts. The behavioral
information collected was:
• Mouse coordinates at each time instant
• Keystroke; timestamp of keypress event and key release event
• Timestamps of all clicks
• Key code of each key pressed

3.2 Features Extracted


The mouse activity was divided into 5 minibatches for each session. Within each
minibatch, each mouse movement was classified into one of 8 classes based on the
relative direction of movement. These classes of mouse movements have been shown
in Fig. 1 and described in Table 1. After categorization, features were extracted as an
average of attributes logged across the minibatch for each class, namely;
672 F. Arif Khan et al.

Fig. 1. Categorization of mouse movements

Table 1. Classes of mouse movements


Class Angle (in degrees)
1 0–45
2 45–90
3 90–135
4 135–180
5 180–225
6 225–270
7 270–315
8 315–360

• Average speed in x direction, per class


• Average speed in y direction, per class
• Average speed per class
• Average distance covered, per class
• Percentage of mouse movements logged in each direction
By such a method, we got 5 features per category per minibatch which translates to
40 features per minibatch, since there are 8 categories. Choosing N (number of
minibatches) as 4, we got 160 features from the mouse activity logs.
The click times give us an approximation of how long the user takes to login, by
making the simple assumption that the first click is to enter the username field, while
the last one is to submit the entered credentials. Hence, we also extracted the login time
from the click patterns as a relevant feature.
For the keystrokes, we first divided the typing activity based on the kind of key that
was pressed; control keys, shift altered keys, lower case keys or other keys. Each
keystroke was associated with a corresponding category as described in Table 2.
Behavioral Biometrics and Machine Learning 673

Table 2. Key categories


Category Description
1 Uppercase: A-Z and special characters that require a preceding shift (control key)
2 Lower case: a-z, numbers
3 Control: tab, backspace, delete, arrow keys
4 Others

Next, we split the entire session keystrokes into those for username typing and for
password. For each of these we extracted;
• Mean flight time, per key category
• Mean dwell time, per key category
This gave us 2 features for each input (there are 2 inputs, namely username and
password) and hence we got 4 features per category of keys. We defined 4 key
categories and hence we got a total of 16 such features. We further extracted the mean
and standard deviation of dwell and flight times for each type of input, across all
categories. This gave us another 8 features. Finally, we also noted the distribution of
keystrokes across categories (as a percentage) which gave rise to 4 more features. This
makes our total features extracted from keystroke logs come to 28. Hence, our feature
vector for each session came to a length of 189, i.e. We had 189 features for each data
point.

4 Methods and Results

4.1 Classifier Architectures


The classifiers for detecting fraudulent logins were implemented in Python using the
sklearn library.
• Multilayer Perceptron (MLP): A neural network with 2 hidden layers, each con-
taining 250 neurons and a tanh activation was created. Training was done using the
adam optimizer, with a minibatch size of 1 sample and an initial learning rate of
0.001 which was updated adaptively.
• Support Vector Machine (SVM): The libsvm implementation using a polynomial
kernel of degree 3 was used.
• Adaptive Boosting (Adaboost): An ensemble of decision trees, each with a maxi-
mum depth of 200 was created using the Adaboost module of the sklearn library
Keeping in mind the importance of maintaining the performance of the web
application while implementing this model, we must analyze the computational com-
plexity of our model. The choice of machine learning models with simple architectures
over deep learning models is a conscious one, to ensure that the detection API, which
essentially is a polling of these three architectures, does not become too computa-
tionally expensive.
674 F. Arif Khan et al.

4.2 Results
Our dataset consisted of biometrics from 102 user logins. We applied a random split of
this data, and found the average accuracy over 50 such splits. This was done to
understand the optimal amount of data required to create an effective model.
The results for different lengths of the feature vector are tabulated in Tables 3 and 4
by varying the minibatch size of mouse movements (N).

Table 3. Comparision of accuracies for different train: test splits, with n = 4


Train: test split Accuracy
MLP SVM Adaboost
80:20 0.883 0.969 0.961
70:30 0.879 0.954 0.946
60:40 0.873 0.949 0.936
50:50 0.854 0.949 0.937

Table 4. Comparision of accuracies for different train: test splits, with n = 5


Train: test split Accuracy
MLP SVM Adaboost
80:20 0.895 0.973 0.947
70:30 0.900 0.971 0.947
60:40 0.902 0.969 0.942
50:50 0.892 0.983 0.952

Seeing as our models gave reasonably good results (accuracy), we saved the best
working models, to be loaded and used in our real-time detection API. Table 5 sum-
marizes the performance of these chosen models against more performance metrics.

Table 5. Summary of saved models


Performance metric Classifier
MLP SVM Adaboost
Accuracy 0.952 1 0.952
Precision 1 1 0.928
Recall 0.933 1 1
Behavioral Biometrics and Machine Learning 675

5 Conclusions
5.1 Analysis
The results of our work seemed very promising. If we look at the results of Tables 3
and 4 we see a reasonably reliable performance even on a 40:60 split. This means that
model creation required around 30 user logs and this is far less that the sizes reported
previously. Moreover, testing of the created application that does prediction using the
loaded models, described in Table 5, performed exceedingly well and did not affect the
performance of the website.
Another very promising outcome of this study is the simplicity of the network
architecture involved. The support vector machine showed the best performance on our
dataset which led us to conclude that the data extracted after feature engineering is
separable by a polynomial kernel. Given that, no matter who the user and how fast/slow
he/she types or how much he/she traverses the mouse around the screen, the features
extracted shall remain the same, it is safe to assume that a similar architecture will work
for creating an SVM-based detection algorithm for any user. Similarly, with reference
to the MLP, when taking 4 minibatches of mouse movements per session (N = 4), we
get 189 features. From our results we saw that a network with 2 hidden layers, of 250
neurons each, works well. Across all users, the number of features remain fixed and
will be independent of the length of the password. Consequently, the same architecture
will create an equally well performing MLP for any other user. The same argument is
applied for the Adaboost model.
These results prove that we can automate the entire process of model creation, by
making the creation of the API architecture independent, and purely data dependent.
Defining a certain minimum required accuracy, we can automate the deployment of
these modules into a new user’s detection API by requiring data to be collected and
models to be retrained till the specified accuracy is reached.

5.2 Challenges Faced


The biggest challenge faced in the creation of this model, and sure to be faced in the
scaling up of such models on live websites, will be data collection. While the tech-
nology required to log such behavioral biometrics is abundantly available and easy to
create, the sources of data are scarce. A new user on a web platform will provide the
positive samples of the dataset in his first few logins. However, it will be virtually
impossible to gather negative samples. Even if imposters use the profile, there is no
way of labelling those logs as fraudulent (negative samples) prior to the existence of the
very algorithm that is being created to identify them.
If we had such a capability, we wouldn’t require the creation of our detection
algorithm in the first place. Hence, subsequent work needs to be done to convert the
outcomes of this study, into one that works with unsupervised or at the very least semi
supervised learning frameworks that do novelty detection. As is the case with any
increase in client data being logged by a business, a review of legal implications might
also be necessary and possibly a challenge.
676 F. Arif Khan et al.

Another major challenge to this application is the varied platforms and environ-
ments that a user can type from. Casual typing, account sharing by multiple persons or
onehanded typing can pose challenges in our application as pointed out in [14]. One
might need to consider the effect that different hardware used by the client might have
on the dataset. To make this model more generalizable, care should be taken to gen-
eralize for hardware and accommodate for aforementioned cases during the feature
extraction and training stages itself to ensure that the model has seen these rare cases.

5.3 Future Work


The results of this study enable the easy implementation of an effective security pro-
tocol for web based applications. The use of biometrics makes the security cus-
tomizable, where protection is granted against human imposters as well as netbots and
other malicious scripts. Moreover, such a paradigm provides the conventional pro-
tection against attacks that seek to find out credentials as well as against imposters who
already possess the user’s credentials. Such a security protocol would be extremely
secure and defensible against a myriad of attackers and attacks. The results of this study
can be built upon by recreating the same experiment with more number of users and a
larger dataset.
Using the same feature engineering, the results of this study can be implemented for
different levels of intelligent authentication based on transactional importance. For
premium account holders, a customized model for each user can be created. This is
intuitive as the amount of activity from these accounts (and hence quantity of data
available) will be higher as well as the importance of securing such accounts. For
regular users, clustering into broad biometric classes can be done. Checking for the
cluster into which the incoming client sample falls into, against the expected sample for
those credentials might be sufficient for regular accounts.
Another extension of this work could be replacing the conventional models
employed here with semi supervised learning frameworks, to overcome the problem of
collecting negative labelled samples. A deep neural network could replace the proposed
MLP. Keeping in mind our prioritization of performance of the web application, further
studies into computational complexities of using deep architectures in such an appli-
cation should be explored. Work can also be done to replace the existing credential-
based security protocols with the biometric based ones, as opposed to using these
intelligent security protocols as an accompaniment to static rules.

Acknowledgment. We thank our managers; Mukund and Swami for their unwavering support.
We also extend a hearty thanks to all the interns at Dell, Hyderabad who took part in the process
of data collection. Without the data, there could have been no machine learning and so your
contribution does not go unnoticed. We dedicate this project to the Python community for all the
extraordinary work they do in creating new useful libraries for developers, while maintaining
requisite documentation and user support on existing libraries. The work of this study, like the
work of countless others, would not have been possible without their unwavering dedication to
the Pythonic way.
Behavioral Biometrics and Machine Learning 677

References
1. Zheng, N., Paloski, A., Wang, H.: An efficient user verification system via mouse
movements. In: Proceedings of the 18th ACM Conference on Computer and Communi-
cations Security, pp. 139–150 (2011)
2. Gurav, S., Gadekar, R., Mhangore, S.: Combining keystroke and mouse dynamics for user
authentication. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 6(2), 055–058
(2017). ISSN 22786856
3. Ponkshe, R.V., Chole, V.: Keystroke and mouse dynamics: a review on behavioral
biometrics. Int. J. Comput. Sci. Mob. Comput. 4, 341–345 (2015)
4. Wu, J.-H., Lin, C.-T., Lee, Y.-J., Chong, S.-K.: Keystroke and mouse movement profiling
for data loss prevention
5. Traore, I., Woungang, I., Obaidat, M.S., Nakkabi, Y., Lai, I.: Combining mouse and
keystroke dynamics biometrics for risk based authentication in web environments
6. Cho, S., Han, C., Han, D., Kim, H.: Web based keystroke dynamics identity verification
using neural network. J. Organ. Comput. Electron. Commer. 10(4), 295–307 (2000)
7. Fülöp, Á., Kovács, L., Kurics, T., Windhager-Pokol, E.: Balabit mouse dynamics challenge
data set (2016)
8. Killourhy, K.S., Maxion, R.A.: Comparing anomaly detectors for keystroke dynamics. In:
Proceedings of the 39th Annual International Conference on Dependable Systems and
Networks (DSN-2009), pp. 125–134, Estoril, Lisbon, Portugal, 29 June–2 July, 2009
9. Li, Y., Cao, B.Z., Zhao, S., Gao, Y., Liu, J.: Study on the BeiHang keystroke dynamics
database. In: International Joint Conference on Biometrics (IJCB), pp. 1–5 (2011)
10. Monrose, F., Rubin, A.: Authentication via keystroke dynamics. In: ACM Conference on
Computer and Communications Security, pp. 48–56 (1997)
11. Hashiaa, S., Pollettb, C., Stamp, M.: On using mouse movements as a biometric. In:
International Conference on User Science and Engineering (i- USEr), pp. 206–211,
December 2011
12. Jorgensen, Z., Yu, T.: On mouse dynamics as a behavioral biometric for authentication.
IEEE Syst. J. 8(2), 262–284 (2013)
13. Gamboa, H., Fred, A.: A behavioral biometric system based on human-computer interaction.
In: Proceedings of the SPIE, vol. 5404, Biometric Technology for Human Identification,
381, 25 August 2004
14. Teh, P.S., Teoh, A.B.J., Ong, T.S., Tee, C.: Keystroke dynamics in password authentication
enhancement. Expert Syst. Appl. 37, 8618–8627 (2010)
15. Lau, S.-h., Maxion, R.: Clusters and Markers for Keystroke Typing Rhythms. Learning from
Authoritative Security Experiment Result, LASER 2014 (2014)