Sie sind auf Seite 1von 6

Binary Watermarks: A Practical Method to Address

Face Recognition Replay Attacks on Consumer Mobile Devices

Daniel F. Smith, Arnold Wiliem, and Brian C. Lovell

School of ITEE, The University of Queensland
Qld. 4072. Australia

Abstract You access online banking to pay a bill, only to find that
there is no money in your account. After contacting the
Mobile devices (laptops, tablets, and smart phones) are bank, you find that all of your money was transferred to
ideal for the wide deployment of biometric authentication, a foreign bank — by you! Indeed, your bank shows you
such as face recognition. However, their uncontrolled use the video of you actually transferring the funds. Reports
and distributed management increases the risk of remote in the news are that thousands of customers have also been
compromise of the device by intruders or malicious pro- affected. How could this happen?
grams. Such compromises may result in the device being Replay Attacks are a significant threat to face recogni-
used to capture the user’s face image and replay it to gain tion systems that use cameras on uncontrolled devices (such
unauthorized access to their online accounts, possibly from as desktop computers, laptops, tablets, or smart phones) to
a different device. Replay attacks can be highly automated capture a person’s biometric data. Replay attacks occur
and are cheap to launch worldwide, as opposed to spoofing when the images of the user’s face are captured in digital
attacks which are relatively expensive as they must be tai- form and then injected back into the system at a later time
lored to each individual victim. In this paper, we propose (and possibly from a different device). The attack can be
a technique to address replay attacks for a face recognition created cheaply and launched worldwide in the form of ma-
system by embedding a binary watermark into the captured licious software such as Trojan Horses or viruses [21, 22].
video. Our monochrome watermark provides high contrast A major attack of this nature could completely undermine
between the signal states, resulting in a robust signal that public trust in biometric systems.
is practical in a wide variety of environmental conditions. Replay attacks are very different from spoofing attacks
It is also robust to different cameras and tolerates relative which occur when the attacker creates a biometric facsimile
movements well. In this paper, the proposed technique is such as a photograph or 3D mask of the target victim for
validated on different subjects using several cameras in a use in front of the camera. Current smart phones now pos-
variety of lighting conditions. In addition, we explore the sess fingerprint readers, which were quickly compromised
limitations of current devices and environments that can through the use of fake fingerprints. Spoofing attacks are
negatively impact on performance, and propose solutions expensive to create as each attack must be crafted for the
to reduce the impact of these limitations. target victim. Yet, the public is more aware of spoofing at-
tacks as they have been portrayed in popular movies (e.g.
James Bond “Never Say Never Again”, 1983).
1. Introduction In our opening scenario, since the video was of a real
person, the anti-spoofing mechanisms are unable to detect
Imagine the near future. By now, most banks will that it is fraudulent. By automating the attack, many vic-
have implemented three-factor authentication for access to tims could be compromised in a short space of time. The
higher risk transactions. The usability of online banking has root cause of the problem is that the bank cannot tell when
improved significantly — no more difficult-to-remember the video was captured, or from which device. The com-
passwords [4, 5]. Accessing your account now is as simple munication path from the camera to the device is generally
as looking at your smart phone or desktop computer. The not protected, allowing the original video stream to be inter-
face verification system is smart enough to detect spoofing cepted prior to any additional security (such as a timestamp)
attempts, such as a photograph or 3D mask [7, 10]. Banks being added. If security is to be added to the video, it must
have greatly reduced disputed transactions as the system has be done prior to an intruder gaining access to the video data.
a strong audit trail of who performed the transaction, since Recently, Smith et al. [19] proposed a method of using
the video of the authenticated user performing the transac- a coded color sequence on the device screen to reflect from
tion is available. the user’s face. These reflections were analyzed to deter-
mine the color patterns in the captured video of the user’s for liveness detection. De Marsico et al. [6] used random
face. The method was very sensitive to small movements, head movements to determine the 3-D nature of the face.
and required a dark room to provide the necessary contrast. However, these undirected movements could be previously
In this paper, we propose significant improvements in captured and replayed at a later time.
both utility and robustness to this work. Instead of en- Akhtar et al. [1] used a fusion of biometric modes and
coding a watermark signal as a series of colors, our pro- classification algorithms to determine liveness. Whilst these
posed system uses high contrast monochromatic illumina- techniques demonstrate liveness, the timing of when the
tion. The proposed challenge signal is coded as a frame- video was captured is not secured.
wise binary stream. The use of monochromatic illumina- Smith et al. [19] proposed to watermark the video signal
tion provides higher contrast between the two illumination captured using a webcam on an uncontrolled device, by dis-
states in a wider variety of environments and mitigates color playing a color sequence on the screen that is reflected from
constancy problems. In addition, instead of using a Sup- the user’s face. Their proposal was limited to very dark en-
port Vector Machine and specific training data, we propose vironments, and was susceptible to small hand movements
a classification algorithm that adaptively learns its model that naturally occur when using a hand held device.
from the current environment. The improved system is far
more tolerant to small hand movements, which naturally oc-
3. Proposed approach
cur when the smart device is hand held.
Contributions: The contributions of this paper are: In this paper, we propose using high contrast monochro-
matic illumination to produce the reflected watermark. This
• To propose a non-cooperative, watermark-based, anti-
high contrast signal, coupled with an adaptive analysis pro-
replay attack technique for face recognition on uncon-
cess, provides a robust and practical solution to the replay
trolled consumer devices that is robust in a wide and
attack problem in a wide variety of environments. The wa-
practical set of environmental conditions;
termark is a binary nonce challenge in a challenge-response
• To test the performance of our proposed system under
system, and is inserted into the video prior to the intruder
different environmental conditions and different cam-
gaining access to the video signal.
eras, under natural usage conditions;
The watermark challenge signal is displayed on the en-
• To provide public datasets of our experiment to en-
tire screen of the device as a random sequence of illumi-
courage independent validation and further research.
nation levels (Light or illuminated means screen illumina-
We continue this paper as follows: Section 2 examines tion on; Dark or non-illuminated means screen illumination
previous work on determining liveness of biometric sub- off ). The challenge signal can include consecutive displays
jects. Section 3 outlines our proposal to address replay at- of the same illumination level. This displayed illumination
tacks in face recognition systems. Section 4 details the ex- reflects from the person or object positioned in front of the
perimental evaluation of the proposal, with Section 5 show- camera. Each time the next illumination level in the se-
ing the results of that experiment. Section 6 covers conclu- quence is displayed, an image of the object is captured, and
sions and future work. later analyzed to determine the level of reflection from the
2. Related works These reflections form the response in the challenge-
response system. If the reflection sequence is (mostly) the
Ambalakat [2], Bolle et al. [3], and Khan et al. [13] pro- same as the displayed illumination sequence, then the video
posed challenge-response systems that require an intelligent is deemed to have been captured by this device and at the
biometric sensor. Unfortunately, such intelligent sensors are time the illumination was displayed, thus defeating any at-
not currently widely deployed on consumer smart devices. tempt to replay this video later, as the next challenge se-
Frischholz and Werner [8] used head pose estimation to quence will be different.
determine the liveness of the subject, by directing the user The next subsections discuss the binary watermark inser-
to turn their head as directed, but details were omitted. tion process, highlighting limitations imposed by the cur-
Maltoni et al. [14] stated that the replay of fingerprint rent technology, followed by the description of the extrac-
data “can be prevented by using standard cryptographic tion process that classifies the reflected illumination states.
techniques”. Galbally et al. [9] indicated that replay attacks
“exploit possible weak points in the communication chan- 3.1. Binary watermark insertion process
nels of the system”. Shelton et al. [18] limit their discussion
on replay attacks to data sent across a network. For non- In our binary watermark insertion process, we use a static
specialized consumer devices, the internal communication Region Of Interest (ROI) window, to assist with analysis.
channels are generally not well protected, and using either This window is 30% of the width by 50% of the height of
cryptography to protect those channels, multiple sensors, or the video frame, centered on the frame, which is the ap-
multi-factor authentication is simply not feasible. proximate face region when normally using a webcam. The
Jee et al. [12] used eye movement and blinking to deter- camera is started for three seconds (3s) to allow the auto-
mine liveness. Pan et al. [16] also proposed eye blinking matic camera settings to settle. During this time, the user is
requested to align with the ROI and to remain relatively sta- However, this technique could not determine when a frame
tionary throughout the entire capture process. In practice, had only partially changed, or fully changed illumination.
this ROI could be replaced with a robust face detector.
Six frames of Dark are captured and recorded, followed 3.3. Binary watermark extraction
by one frame of Light. These are used as analysis start and To extract the binary watermark from the video data
calibration signals. stream, we propose an adaptive analysis technique that in-
The challenge signal is then displayed as a randomized crementally updates the model based upon that data stream.
sequence of 31 illuminations (Dark or Light) while the im- Unlike Smith et al. [19] that used multiple color states, we
ages are captured and recorded. The capture protocol con- use only Light and Dark illumination states.
cludes with six frames of Dark. The proposed challenge To that end, we propose an intensity feature to repre-
signal forms a 231 entropy challenge. Figure 1 shows a se- sent the intensity of reflected illumination above the am-
quence of Light and Dark frames being captured, with the bient background in each ROI window (see Section 3.1).
different illumination states representing the inserted binary We then define an illumination state classifier that deter-
watermark. mines the frame’s illumination state that adaptively updates
its model. The sequence of classified illumination states
represents the extracted binary watermark signal.
Intensity feature extraction: We extract the intensity fea-
ture from the ROI window by using θ : Rp×q×3 7→ [0, 1].
More precisely, we define the ROI window in the nth frame,
Wn ∈ Rp×q×3 , as a static area, centered on the frame,
Figure 1. Video sequence of Light and Dark frames. where p and q are 30% of the width and 50% of the height
of the frame. This is the most likely area within the cap-
During testing, it was observed that when changing illu- tured frame to find the object of interest. This also serves to
mination state, captured frames would continue to display remove extraneous background information from the frame
reflections from the previous illumination state, either fully which may produce spurious results due to subtle move-
or partially, for several frames. Therefore, the watermark ments. We first minimize the noise (ambient light and
insertion process implements a delay each time the illumi- static scenery) by performing a clipped subtraction of a non-
nation level changes, defined in the next Section. illuminated window, W0 , from the subsequent windows,
Wn , in the RGB color space, as:
3.2. Time delay between illumination changes
∆Wn = max (Wn − W0 , 0) , (1)
From testing, there is a delay between when the illumi-
where ∆Wn is the result of the clipped subtraction.
nation state is requested to be updated, and when that update
Next, we convert ∆Wn from the RGB color space to
is observed in the reflection by the camera. The root cause
the HSV color space to isolate the intensity using the V
was identified as a delay in updating the screen illumina-
channel (similar to Smith et al. [19]). In our method, we are
tion, and not when the cameras capture images. Empirical
not concerned which color (or Hue) is used for illumination.
analysis (results not shown) determined that a fixed delay
Finally, we calculate the function, θ, as the normalized,
of 200ms turned out to be sufficient to complete the screen
weighted level of intensity, using Equation 2:
update in most cases. If the illumination state is unchanged,
the next frame can be used immediately. The current cap- Pp Pq
ture process therefore requires a duration from 4.5s (min.) i=1 j=1 M (∆Wn (i, j) [V ])
θ (∆Wn ) = , (2)
to 9.9s (max.) to fully complete (at 30f ps). α×p×q
Reducing the screen update time (e.g. using a LED for
illumination instead of the screen) may reduce this 200ms 
time delay. Recent smart phones now possess bright white  0 if 0 ≤ ∆Wn (i, j) [V ] < β
M (∆Wn (i, j) [V ]) = β if β ≤ ∆Wn (i, j) [V ] < α .
LEDs, as well as high speed cameras capable of capturing  α if α ≤ ∆Wn (i, j) [V ]
240f ps. The time to complete the capture process could (3)
fall to 3.2s, (including the preamble of 3s) for a 231 entropy.
Such a system could allow many more frames to be captured ∆Wn (i, j) [V ] represents the value of the V channel of
in a shorter time, significantly increasing the entropy and the pixel in the ith row and jth column of ∆Wn . α and
allowing for an increased tolerance to classification errors. β are parameters defined from experimental analysis. The
Therefore, the current capture time could be significantly weighting function in Equation 3 quantizes the V channel
reduced in the near future, further reducing the problem of values to limit the effect of noise pixels from relative move-
movement occurring during the capture process. ments during the capture process. Other techniques can be
Another possible way to address this was by measuring explored here that may produce better results.
the intensity of each frame as it was captured, and discard- Illumination state classifier: The illumination state classi-
ing frames until the illumination state change was observed. fier determines if the illumination state for the current frame
will implicitly remain the same as for the previous frame, classifications without significantly reducing the security of
or be explicitly set to a particular state. In order to detect if the system, resulting in a more robust and practical solution.
the illumination state should be updated, we first define the
change in intensity between consecutive frames, ∆θn , as: 4. Experiment
∆θn = θ (∆Wn ) − θ (∆Wn−1 ) . (4) During our test experiment, we set the weighting values
Small movements of the subject during the capture pro- in Equations 2 and 3 (α and β), to 30 and 3, respectively.
cess result in small changes in ∆θn . Therefore, we must We set γ from Equation 5 and Conditions L2 and D2 to 1%
define the required threshold, τ1 , which if |∆θn | > τ1 , the of the maximum possible value of the function θ.
illumination state will be determined appropriately. The experimental equipment consists of: Microsoft Sur-
To calculate τ1 , we maintain the Light and Dark mod- face Pro 2 tablet; Windows 8.1 Pro; VMware Player 6.0.4
els, λL and λD , represented as the average intensity of the virtual machine; Ubuntu 13.10; OpenCV 2.4.9; internal Mi-
previously classified Light and Dark windows, respectively, crosoft LifeCam webcam; external Logitech QuickCam Pro
updated for each frame. We calculate τ1 as the square of the 5000 webcam; external Logitech QuickCam Pro 9000 we-
Euclidean distance between λL and λD , as: bcam; screen resolution 1920 × 940 pixels; video capture

 resolution 640 × 480 pixels at either 15 or 30f ps (selected
τ1 = max ||λL − λD ||2 , γ . (5) automatically by the camera driver, depending on the light-
By squaring the Euclidean Distance, we place more em- ing conditions).
phasis on larger differences between λL and λD , and less Our experiment uses six environments: Dark Room; Of-
emphasis on smaller differences. We enforce that τ1 is not fice Light (using fluorescent lights); Natural Light indoors;
less than γ to address the situation where the difference be- Cloud Cover outdoors; Full Shade but under an open sky;
tween λL and λD is small, resulting in the illumination state and Full Sunlight. In Smith et al. [19], the environment was
being updated regardless of the magnitude of ∆θn . restricted to only a darkened room which is not a typical use
In addition, as discussed in Section 3.2, a time delay of case for online authentication.
200ms is used when capturing frames to remove almost all Although this technique is aimed at face recognition sys-
of the partially illuminated frames. However, very occa- tems, any objects in front of the camera are sufficient to
sionally, one partially illuminated frame is captured. As a generate suitable reflections for classification. All objects
result, both ∆θn and ∆θn+1 might not exceed τ1 , whereas, are positioned approximately 35cm in front of the screen,
if ∆θn represented a completed illumination change, it which is a normal usage distance. The experiment is per-
would have exceeded τ1 . formed on three primary objects.
Therefore, a further condition is defined to determine The first set of objects consist of Soft Toys. A total of 20
the illumination state, based on the intensity of the cur- Soft Toys are used, which provides a wide variety of shapes,
rent frame (θ (Wn )). This condition first requires that sizes, textures, and colors. The Soft Toys are hand held in
|∆θn | > γ to ensure that some illumination has occurred. front of the tablet to simulate the hand holding of the mobile
If this is true, then the illumination state will be determined device. Each toy is recorded five times in each environment.
by whether θ (Wn ) is closer to λL or λD . This results in 100 videos for each of the six environments.
To this end, a second threshold, τ2 , is defined as the mid- The second set of objects are faces printed on paper with
point between λL and λD (i.e. τ2 = λL +λ D
). The illumi- the face outline cut out and the background removed. This
nation state of Wn will be defined as Dark if θ (Wn ) < τ2 , is done to contrast our results against those obtained by
Light if θ (Wn ) > τ2 , and unchanged if θ (Wn ) = τ2 or Smith et al. [19]. The Paper Faces are hand held in front
|∆θn | ≤ γ. of the tablet to simulate the hand holding of the mobile de-
The illumination state classifier therefore uses the results vice. Five different faces are used, and are recorded five
of the following four test conditions as input: times for each environment. This results in 25 videos for
each of the six environments.
L1 : sign (∆θn − τ1 ) = +; (i.e. ∆θn > τ1 ); The final set of objects consist of real people. Five par-
L2 : sign (∆θn − γ) = sign (θ (Wn ) − τ2 ) = +; ticipants hold the tablet computer in their hands, which in-
D1 : sign (∆θn + τ1 ) = −; (i.e. ∆θn < −τ1 ); troduces additional movement relative to the background in
D2 : sign (∆θn + γ) = sign (θ (Wn ) − τ2 ) = −; the video. Each participant is recorded five times in the six
environments, resulting in 25 videos per environment.
If either condition L1 or L2 are met, then the illumina- The resulting datasets (FRAUD2) are available from:
tion state for Wn is Light. If either condition D1 or D2 .
are met, then the illumination state for Wn is Dark. If no We then performed further analysis on the conditions of
conditions are met, the illumination state for Wn remains each environment. Our proposed system can be viewed as a
unchanged from Wn−1 . Note that Li and Di are mutually typical communication system in a noisy environment [17].
exclusive. The reflected light is a communication signal, and the am-
Due to the increased entropy in this system, it is possible bient light is noise. Using this analogy, we can quantify the
to tolerate a small number of incorrect illumination state problem by calculating the Signal-to-Noise Ratio (SNR).
Lux is the measure of illumination over a defined area. 100

ISO2720 [11] defines luminance, L, when calibrating re- 80
flected light meters as: 70

L = K1 A2 ÷ (tS) , 60

where A is the f -number, t is the exposure time (in secs), 40
and S is the arithmetic film speed. K1 is a calibration con- 30
stant, defined by Padfield [15] as: K1 = π × ρ × σ, where π 10
converts luminance in cd/m2 to lux; ρ is 12.4 (ISO2720 de- 0
fines the range as 10.6 to 13.4), and σ is a correction factor Dark Office Natural Cloud Shade Sunlight
P8hluxp P125hluxp P800hluxp P9000hluxp P12500hluxp P60000hluxp
of 1.3 to account for lens absorption and diffuse reflectance. SofthToys PaperhFaces LivehFaces
We used a dSLR camera to measure the reflected ambi-
ent light for each environment, and calculated the lux using
Figure 3. Correct Classification Rate (CCR, in %) for all objects
Equation 6. These calculated lux values are a guide only as,
in each environment.
for example, office lighting varies between locations, cloud
cover is highly variable, and sun illumination varies with
geographic location, time of day, and season. environments. In particular, the performance for Live Faces
We also measured the lux of the reflection illuminated by was no better than any other outdoor environment (i.e. 0%).
the following devices: Apple iPhone 6 and iPad 3 screens; The Full Shade environment produced even lower re-
Apple iPhone 6 LED in light and flash mode; and Microsoft sults. The measured ambient lux for this environment was
Surface Pro 2 screen. Using the following equation, we cal- 39% more than the Cloud Cover environment, resulting in
culated the SNR for the reflected illumination signal in each a significant decrease in performance. From Figure 2, this
environment (results shown in Figure 2): suggests that the minimum SNR to perform adequately is
  approximately -15dB.
Psignal In Full Sunlight, it was not possible to determine any
SN RdB = 10 × log10 . (7)
Pnoise significant difference between when the subject was il-
20.0 luminated by the screen, and when it was not. Fig-
ure 4 shows two examples of the signal strength of the
Pp Pqlight. P The signal strength was calculated as

i=1 j=1 Ci ∈{R,G,B} Wn (i, j) [Ci ], where Ci is the
-20.0 Red, Green, or Blue color channel for pixel Wn (i, j). The
total score for each frame was then normalized to the range
of [0, 100%].
Dark Office Natural Cloud Shade Sunlight
s8nlux3 s125nlux3 s800nlux3 s9000nlux3 s12500nlux3 s60000nlux3 Natural Light, LifeCam, Subject 4, 20141031151540.avi
Strengthf (inf%)
Scaledf Signal

iPhonen6nLEDnsflash3ns363.0nlux3 iPadn3nScreenns62.3nlux3
iPhonen6nLEDns60.8nlux3 SurfacenPron2ns51.6nlux3 50
iPhonen6nScreenns20.1nlux3 0
1 6 11 16 21 26 31 36 41
Figure 2. Signal-to-Noise Ratio (SNR, in dB) for different devices Full Sunlight,InputfSignal
LifeCam, Subject 4, 20141031153442.avi
in each environment. 100
Strengthf (inf%)
Scaledf Signal


5. Results and analysis 0

1 6 11 16 21 26 31 36 41
Figure 3 shows the results for the experiment. In all
InputfSignal ReflectedfSignalf(Wn)
cases, we accept up to two errors in frame illumination clas-
sification (i.e. a Hamming Distance of two is acceptable) out
of the total of 31 illumination classification possibilities. Figure 4. Input signal and unprocessed reflected signal strength
As can be seen, for a typical indoor environment, our (in %) for sample videos from Natural Light and Full Sunlight
environments taken from the FRAUD2 dataset.
proposed technique performs exceptionally well, regardless
of the object used in front of the camera. All cameras per- As can be seen in Figure 4, there is a strong signal in the
formed equally well (detailed analysis not shown here). De- Natural Light environment, but the signal is more challeng-
spite hand holding and simulated hand holding of the tablet ing to identify in the Full Sunlight environment. Based upon
computer during the video capture which produces normal the results shown in Figure 3, we conclude that the proposed
relative movements, our results demonstrate that our pro- technique using the screen to illuminate the subject is lim-
posed technique is robust to these movements. ited to indoor environments that are not excessively lit.
For outdoor environments, the results are much less fa- Possible causes for the failure to discriminate reflections
vorable. The Cloud Cover environment produced a signif- when outdoors are that the camera may adjust its auto set-
icantly decreased correct classification rate than all indoor tings to a point where it cannot observe the reflections, or
that the reflection from the subject is simply not strong 5222 of Lecture Notes in Computer Science, pages 421–440.
enough to be distinguished from the ambient light (i.e. the Springer Berlin / Heidelberg, 16–18 Sep 2008.
SNR is too low; see Figure 2). [6] M. De Marsico, M. Nappi, D. Riccio, and J.-L. Dugelay.
To investigate the former, we brought an object (in this Moving face spoofing detection via 3d projective invariants.
case, a hand) closer to the screen and camera. Since light In 5th IAPR International Conference on Biometrics (ICB),
follows the inverse-square law [20], bringing the object pages 73–78. IEEE, 29 Mar – 1 Apr 2012.
[7] N. Erdogmus and S. Marcel. Spoofing face recognition with
closer to the screen should increase the signal (reflection).
3d masks. IEEE Transactions on Information Forensics and
Holding a hand at 5cm distance in the Full Shade environ- Security, 9(7):1084–1097, 2014.
ment resulted in a 93.3% CCR (allowing zero or one errors). [8] R. W. Frischholz and A. Werner. Avoiding replay-attacks
To investigate the latter, we used a small mirror at 35cm in a face recognition system using head-pose estimation. In
distance in the Full Sunlight environment. This resulted in International Workshop on Analysis and Modeling of Faces
a 100% CCR (with zero errors). and Gestures (AMFG), pages 234–235. IEEE, 17 Oct 2003.
These investigations confirmed that camera settings do [9] J. Galbally, J. Fierrez, and J. Ortega-Garcia. Vulnerabilities
not contribute to the classification problem, and increasing in biometric systems: Attacks and recent advances in live-
the SNR had a positive effect. ness detection. In Spanish Workshop on Biometrics (SWB),
The Apple iPhone 6 already has a white LED that pro- volume 1, 5 Jun 2007.
vides seven times the illumination of the Microsoft Surface [10] J. Galbally and S. Marcel. Face anti-spoofing based on gen-
Pro 2 screen (see Figure 2), and a camera that can record eral image quality assessment. In 22nd International Con-
ference on Pattern Recognition (ICPR). IEEE, 24–28 Aug
at 240f ps. These facilities may create a usable system in
most environments (Full Sunlight may remain challenging,
[11] ISO 2720-1974. Photography - general purpose photo-
but most people will not use their phone in Full Sunlight as graphic exposure meters (photoelectric type) - guide to prod-
it is too difficult to read the screen). uct specification, 15 Aug 1974.
[12] H.-K. Jee, S.-U. Jung, and J.-H. Yoo. Liveness detection for
6. Conclusions and future work embedded face recognition system. International Journal of
Biological and Medical Sciences, 1(4):235–238, 2006.
The results of our experiment show that our proposed [13] M. K. Khan, J. Zhang, and K. Alghathbar. Challenge-
technique of inserting a binary watermark into captured response-based biometric image scrambling for secure per-
video using reflected Light and Dark illumination is highly sonal identification. Future Generation Computer Systems,
effective and practical in most indoor environments. Al- 27(4):411–418, 2011.
though it is less effective in outdoor environments, we iden- [14] D. Maltoni, D. Maio, A. K. Jain, and S. Prabhakar. Se-
tified SNR constraints that could be removed to improve this curing Fingerprint Systems, book section 9, pages 371–416.
performance in the future. The provided FRAUD2 dataset Springer-Verlag, London, UK, 2003.
may be used for further research in defeating replay attacks. [15] T. Padfield. Using a camera as a lux me-
Acknowledgment: The experiment data involving hu- ter, 2003. Retrieved 06-Nov-2014 from http:
mans was performed in accordance with the School of ITEE
Ethics Approval Number: EC201303SMI.A1.
[16] G. Pan, L. Sun, Z. Wu, and S. Lao. Eyeblink-based anti-
spoofing in face recognition from a generic webcamera. In
References 11th International Conference on Computer Vision (ICCV),
pages 1–8. IEEE, 2007.
[1] Z. Akhtar, C. Micheloni, C. Piciarelli, and G. L. Foresti.
[17] C. E. Shannon. Communication in the presence of noise.
Mobio livdet: Mobile biometric liveness detection. In 11th
Proceedings of the IRE, 37(1):10–21, 1949.
IEEE International Conference on Advanced Video and Sig-
nal Based Surveillance (AVSS), pages 187–192. IEEE, 26–29 [18] J. Shelton, G. Dozier, J. Adams, and A. Alford. Permutation-
Aug 2014. based biometric authentication protocols for mitigating re-
play attacks. In IEEE Congress on Evolutionary Computa-
[2] P. Ambalakat. Security of biometric authentication systems. tion (CEC), pages 1–5. IEEE, 10–15 Jun 2012.
In 21st Annual Computer Science Seminar. Rensselaer Poly-
[19] D. F. Smith, A. Wiliem, and B. C. Lovell. Face recognition
technic Institute, 2005.
on consumer devices: Reflections on replay attacks. IEEE
[3] R. M. Bolle, J. H. Connell, and N. K. Ratha. Biometric perils Transactions on Information Forensics and Security, 2015.
and patches. Pattern Recognition, 35(12):2727–2738, Dec In press.
2002. [20] H. Stockman. Communication by means of reflected power.
[4] J. Bonneau, C. Herley, P. C. van Oorschot, and F. Stajano. Proceedings of the IRE, 36(10):1196–1204, 1948.
The quest to replace passwords: A framework for compar- [21] C. Xiao. Wirelurker: A new era in ios and os x malware.
ative evaluation of web authentication schemes. In IEEE Report PAN WP U42 WL 0110514, Palo Alto Networks, 5
Symposium on Security and Privacy, pages 553–567. IEEE, Nov 2014. Retrieved 13-Nov-2014.
20–23 May 2012.
[22] Y. Zhou and X. Jiang. Dissecting android malware: Char-
[5] B. Coskun and C. Herley. Can “something you know” be acterization and evolution. In IEEE Symposium on Security
saved? In T.-C. Wu, C.-L. Lei, V. Rijmen, and D.-T. Lee, and Privacy (SP), pages 95–109. IEEE, 2012.
editors, Information Security Conference (ISC08), volume