Beruflich Dokumente
Kultur Dokumente
1 Introduction
Net bots and other ‘intelligent’ methods at the disposal of malicious users to perform
fraudulent logins on websites make user information susceptible to misuse. In an era
where online transactions drive sales, such attacks cost millions of dollars to the
business. Dictionary and other brute force attacks easily bypass static security rules and
put user information in malicious hands. Once credentials have been compromised
intruders can perform multiple subsequent malicious logins that go virtually undetected
during authentication. The most relevant, generally overlooked and underused infor-
mation from the client side is behavioral, namely; the mouse dynamics, keystrokes and
click patterns of the user. We propose a model that would secure user accounts even if
credentials have been compromised, that is based on behavioral biometrics of the user
during login. The model would first employ a means of collecting relevant behavioral
data from the client side at login to create a unique template. Then a fraud detection
model is created. It consists of three separate modules, namely; the Multilayer Per-
ceptron, Support Vector Machine and Adaptive Boosting, the outcomes of which are
polled to give an optimal prediction, real time, while the user is logging in.
Key metrics of the work have been identified as: making a model that accounts for
the sensitive nature of the dataset; while the algorithm is being created, the account
would have the default level of security, making it susceptible to attacks by imposters.
Hence, our model should be designed such that its creation should be possible with a
reasonably small amount of data. Ensuring that the model should not be computa-
tionally expensive is another important metric. Fraud detection needs to be done in real
time and evaluation of the model should not impact login performance on the
respective website. Lastly, the model should be easily scalable; by fixing the archi-
tecture of the model and accommodating easy scalability of our application we allow
for automation of the creation of the detection model for each new user/ account on the
respective website.
2 Literature Review
Research on security applications that use behavioral information of the client for
authentication have identified two sources of relevant data; mouse movements and
keystrokes; which are together termed as behavioral biometrics. There has been a
myriad of applications that rely on behavioral biometrics and these use logs of mouse
movements and keystrokes in isolation as well as in a combination of the two.
The use of such applications has varied from being a method of re-authentication
[1], to replacing conventional password type logins [2], to adding additional layers of
security [4–6, 11]. The authors of [1] use behavioral biometrics for re-authentication
and not as the first wall of security. The authors of [2] propose a twofold security
system, where a keystroke based template is the first level of authentication, and mouse
movements is the second. However, such a system is not based on passive authenti-
cation; where the biometrics work to complement existing security protocols. The
authors instead use a keystroke template as the entity against which authentication is
provided. Similarly, a template of a unique mouse movement is used as a ‘password’ at
the second level of security. Another kind of application of such a model in seen in [4],
where the authors seek to use such a model for Data Loss Prevention by predicting the
identity of a data creator. This can be contrasted with [5, 6], which focus more on user
profile identification in web applications.
Looking at the kind of features extracted from the client’s behavioral biometrics in
all these applications, we see certain fundamental similarities. Most literature [1, 4, 6]
using mouse movements identifies 8 classes into which each mouse event can be
Behavioral Biometrics and Machine Learning 669
classified into, based on the relative direction of movement. We find this feature
engineering to be well researched and proven to be meaningful and hence decide to
extract similar features from mouse event data. Use of keystroke biometrics mostly
focus on monogram and di-grams, i.e. dwell time and flight time. Researchers, how-
ever, have not tried to extract features from click patterns of the user and so we see our
work contributing to new features that can be extracted from login activity.
There is a fundamental lack of open source datasets for such applications. More-
over, seeing as this a customized form of security for a specific web application, most
authors choose to make their own data. In [2], the authors explain how, for data
collection, each user was made to login 10 times, from which features were extracted
and a template was created.
In [4], the model created is implemented as a software agent that resides on user’s
desktop. For data collection for model creation, the authors explain how an organi-
zation can mandate its employees to install this agent and require them to run the
software in the background of operating system every time they use a computer. This
kind of organizationally mandated data collected would enable the agent to record and
analyze the user’s keystroke and mouse movement behavior over more than just login
activity.
For data collection, the authors of [5] made each user log in a web site as them-
selves (genuine user) or other users (intruder). For each login session, the logging type
was recorded. Credentials are shared with all users, to allow for impersonation attack.
In total, 24 users with different background and computer skills participated in the data
collection, giving rise to logs of 193 legitimate visits and 101 intrusive visits. In [6], a
total of 25 subjects were asked to come up with a new password. Each subject or owner
typed this password 150 to 400 times during a period of several days, and the last 75
timing vectors collected were set aside for testing. The remaining timing vectors were
used to train the network. A total of 15 imposters were given all the 21 passwords and
asked to type each password five times, resulting in 75 imposter test vectors for each
password. Combined with the owner’s 75 test vectors previously set aside, a total of
150 test vectors per password were obtained. That is a significant amount of login
activity that needs to be performed at default (reduced) level of security, before the
intelligent fraud detector can be created. In all these papers, like in [6, 12], we see a
significant amount of data being logged, and that gave us a metric of how much login
data would be required to create a reasonably good model.
Some preliminary analysis was also done on the few datasets that are publicly
available. For mouse movements, the balabit/mouse dynamics challenge dataset [7]
was used. The goal of the challenge was to protect a set of users from the unauthorized
usage of their accounts by learning the characteristics of how they use their mouse. The
dataset contained timing and positioning information of mouse pointers of different
users, from multiple sessions on a web application. For the purpose of collecting data, a
network monitoring device was set between the client and the remote computer that
inspected all traffic. This included the mouse interactions of the user that is transmitted
from the client to the server during the remote session. Hence, the dataset contained the
following fields: record timestamp: elapsed time (in sec) since the start of the session as
recorded by the network monitoring device, client timestamp: elapsed time (in sec)
since the start of the session as recorded by the RDP client, button: the current
670 F. Arif Khan et al.
condition of the mouse buttons, state: additional information about the current state of
the mouse, x: the x coordinate of the cursor on the screen and y: the y coordinate of the
cursor on the screen. Work on this dataset helped understand what kind of features can
be meaningful for a model based on mouse movements.
We then worked on the open source datasets on keystrokes. The Benchmark Data
Set (by Kevin Killourhy and Roy Maxion) released as an accompaniment to [8]
contains the timing data for 51 typists all typing the same word. The dataset contains
the flight and dwell time for the predefined password, and hence no preprocessing or
feature engineering was required before using the dataset. The BeiHang Keystroke
Dynamics Database (released as an accompaniment to [9]) was another dataset we did
extensive work on, before making our own dataset and model. The dataset contains
2057 test samples and 556 train samples, taken from 117 subjects, divided into two
subsets, based on the collection environment. The keystrokes for a particular session
were read as a sequence of PiRi vectors, where Pi and Ri represent the press and release
time of the ith key of the password. With this dataset we had the liberty to extract as
many ngram features as we wanted.
Our work on these available datasets was a verification of the documented success
of feature engineering of client biometrics. Moreover, this investigation showed that
relevant work has been done either on a mouse dynamics dataset or on a keystroke one.
Our dataset, which exploits the valuable information present in both of these sources,
would allow for a more wholistic dataset for such applications.
A metric of success of such a project, is its behavior across users. The authors of [1]
successfully verify the scalability of a biometric based solution by proving its working
for multiple users and across environments. The authors of [2] keep in mind the fact
that their model, while analyzing the user’s keystroke and mouse movement behavior,
needs to track the strokes on his keyboard and the movement of his mouse without
influencing user’s work. Hence, they show the practical implementation of such models
as a software agent that resides on user’s desktop.
Another practical consideration would be the variation in passwords that users
choose to keep. The authors of [5] identify that limited amount of work has been
accomplished on free text detection and make accommodations for dealing with free
text and free mouse movements, and the fact that many web sessions tend to be very
short. Our work focuses on a customized model for each user and hence overcomes the
limitation of fixed password length. By formulating this problem as N, binary classifiers
(one user of interest, vs all other users) instead of an N-class classifier, we accom-
modate for variable lengths of credentials.
Considerations on practicality are further answered in [6], where the dispropor-
tionation between data labels is identified. The authors highlight how such a problem is
a binary classification problem (owner vs. imposters) problem, yet the patterns from
only one class, the owner’s are available in advance. Since there are millions of
potential imposters, it is not practical to obtain enough patterns from all kinds of
imposters. Also, it is not practically feasible to publicize credentials in order to collect
potential imposters’ timing vectors. Hence, they propose that the only solution is to
build a model of the owner’s keystroke dynamics and use this to detect imposters using
some sort of a similarity measure. They show us how our problem is that of a “partially
exposed environment” or “novelty detection”.
Behavioral Biometrics and Machine Learning 671
Another consideration the authors of [6, 11] point out is the situation where a new
password has been registered. This would require new data to be collected for a new
model to be created. During this time the proposed identity verification cannot be used
and so an ordinary level of security can be maintained with the conventional password
security system. The length of the collection period can be dynamically determined by
monitoring the variability of typing patterns. Moreover, for each password or user, a
separate model must be constructed. Also, whenever a user changes his or her pass-
word, a new model needs to be built.
Looking at the type of detection algorithms used in literature, we see a variety of
outcomes. The authors of [2] create a classifier that verifies the similarity between the
pattern to be verified and the template of the prototypes (created from the collected
logs), using the Distance Pattern between the vector of feature of the pattern and the
prototype. In [4] they employ the use of separate Support Vector Classifiers on mouse
dynamic features and for keystrokes. The authors of [5] show the applicability of
Bayesian Networks in such a problem. The authors of [6] use an Auto Associative
Neural Network for novelty detection. [3] summarizes other similar works, covering
techniques like the Monte Carlo approach for data collection to Gaussian probability
density function, direction similarity measure, parallel decision trees, etc. for classifi-
cation. The authors of [15] propose an interesting notion that there exist keystroke
classes, akin to blood types, that people can be classified into and hence find suitable
the implementation of a clustering model for keystrokes, as done in [10]. Seeing as the
performance of these algorithms differs based on their inherent biases and variances we
identified the need for a polling or ensemble of conventional algorithms.
Next, we split the entire session keystrokes into those for username typing and for
password. For each of these we extracted;
• Mean flight time, per key category
• Mean dwell time, per key category
This gave us 2 features for each input (there are 2 inputs, namely username and
password) and hence we got 4 features per category of keys. We defined 4 key
categories and hence we got a total of 16 such features. We further extracted the mean
and standard deviation of dwell and flight times for each type of input, across all
categories. This gave us another 8 features. Finally, we also noted the distribution of
keystrokes across categories (as a percentage) which gave rise to 4 more features. This
makes our total features extracted from keystroke logs come to 28. Hence, our feature
vector for each session came to a length of 189, i.e. We had 189 features for each data
point.
4.2 Results
Our dataset consisted of biometrics from 102 user logins. We applied a random split of
this data, and found the average accuracy over 50 such splits. This was done to
understand the optimal amount of data required to create an effective model.
The results for different lengths of the feature vector are tabulated in Tables 3 and 4
by varying the minibatch size of mouse movements (N).
Seeing as our models gave reasonably good results (accuracy), we saved the best
working models, to be loaded and used in our real-time detection API. Table 5 sum-
marizes the performance of these chosen models against more performance metrics.
5 Conclusions
5.1 Analysis
The results of our work seemed very promising. If we look at the results of Tables 3
and 4 we see a reasonably reliable performance even on a 40:60 split. This means that
model creation required around 30 user logs and this is far less that the sizes reported
previously. Moreover, testing of the created application that does prediction using the
loaded models, described in Table 5, performed exceedingly well and did not affect the
performance of the website.
Another very promising outcome of this study is the simplicity of the network
architecture involved. The support vector machine showed the best performance on our
dataset which led us to conclude that the data extracted after feature engineering is
separable by a polynomial kernel. Given that, no matter who the user and how fast/slow
he/she types or how much he/she traverses the mouse around the screen, the features
extracted shall remain the same, it is safe to assume that a similar architecture will work
for creating an SVM-based detection algorithm for any user. Similarly, with reference
to the MLP, when taking 4 minibatches of mouse movements per session (N = 4), we
get 189 features. From our results we saw that a network with 2 hidden layers, of 250
neurons each, works well. Across all users, the number of features remain fixed and
will be independent of the length of the password. Consequently, the same architecture
will create an equally well performing MLP for any other user. The same argument is
applied for the Adaboost model.
These results prove that we can automate the entire process of model creation, by
making the creation of the API architecture independent, and purely data dependent.
Defining a certain minimum required accuracy, we can automate the deployment of
these modules into a new user’s detection API by requiring data to be collected and
models to be retrained till the specified accuracy is reached.
Another major challenge to this application is the varied platforms and environ-
ments that a user can type from. Casual typing, account sharing by multiple persons or
onehanded typing can pose challenges in our application as pointed out in [14]. One
might need to consider the effect that different hardware used by the client might have
on the dataset. To make this model more generalizable, care should be taken to gen-
eralize for hardware and accommodate for aforementioned cases during the feature
extraction and training stages itself to ensure that the model has seen these rare cases.
Acknowledgment. We thank our managers; Mukund and Swami for their unwavering support.
We also extend a hearty thanks to all the interns at Dell, Hyderabad who took part in the process
of data collection. Without the data, there could have been no machine learning and so your
contribution does not go unnoticed. We dedicate this project to the Python community for all the
extraordinary work they do in creating new useful libraries for developers, while maintaining
requisite documentation and user support on existing libraries. The work of this study, like the
work of countless others, would not have been possible without their unwavering dedication to
the Pythonic way.
Behavioral Biometrics and Machine Learning 677
References
1. Zheng, N., Paloski, A., Wang, H.: An efficient user verification system via mouse
movements. In: Proceedings of the 18th ACM Conference on Computer and Communi-
cations Security, pp. 139–150 (2011)
2. Gurav, S., Gadekar, R., Mhangore, S.: Combining keystroke and mouse dynamics for user
authentication. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 6(2), 055–058
(2017). ISSN 22786856
3. Ponkshe, R.V., Chole, V.: Keystroke and mouse dynamics: a review on behavioral
biometrics. Int. J. Comput. Sci. Mob. Comput. 4, 341–345 (2015)
4. Wu, J.-H., Lin, C.-T., Lee, Y.-J., Chong, S.-K.: Keystroke and mouse movement profiling
for data loss prevention
5. Traore, I., Woungang, I., Obaidat, M.S., Nakkabi, Y., Lai, I.: Combining mouse and
keystroke dynamics biometrics for risk based authentication in web environments
6. Cho, S., Han, C., Han, D., Kim, H.: Web based keystroke dynamics identity verification
using neural network. J. Organ. Comput. Electron. Commer. 10(4), 295–307 (2000)
7. Fülöp, Á., Kovács, L., Kurics, T., Windhager-Pokol, E.: Balabit mouse dynamics challenge
data set (2016)
8. Killourhy, K.S., Maxion, R.A.: Comparing anomaly detectors for keystroke dynamics. In:
Proceedings of the 39th Annual International Conference on Dependable Systems and
Networks (DSN-2009), pp. 125–134, Estoril, Lisbon, Portugal, 29 June–2 July, 2009
9. Li, Y., Cao, B.Z., Zhao, S., Gao, Y., Liu, J.: Study on the BeiHang keystroke dynamics
database. In: International Joint Conference on Biometrics (IJCB), pp. 1–5 (2011)
10. Monrose, F., Rubin, A.: Authentication via keystroke dynamics. In: ACM Conference on
Computer and Communications Security, pp. 48–56 (1997)
11. Hashiaa, S., Pollettb, C., Stamp, M.: On using mouse movements as a biometric. In:
International Conference on User Science and Engineering (i- USEr), pp. 206–211,
December 2011
12. Jorgensen, Z., Yu, T.: On mouse dynamics as a behavioral biometric for authentication.
IEEE Syst. J. 8(2), 262–284 (2013)
13. Gamboa, H., Fred, A.: A behavioral biometric system based on human-computer interaction.
In: Proceedings of the SPIE, vol. 5404, Biometric Technology for Human Identification,
381, 25 August 2004
14. Teh, P.S., Teoh, A.B.J., Ong, T.S., Tee, C.: Keystroke dynamics in password authentication
enhancement. Expert Syst. Appl. 37, 8618–8627 (2010)
15. Lau, S.-h., Maxion, R.: Clusters and Markers for Keystroke Typing Rhythms. Learning from
Authoritative Security Experiment Result, LASER 2014 (2014)