Sie sind auf Seite 1von 8

Using the ENF criterion for determining the time of recording of

short digital audio recordings


Maarten Huijbregtse, Zeno Geradts
1
Netherlands Forensic Institute, Departement Digital Evidence and Biometrics, Laan van Ypenburg 6, 2497 GB
DEN HAAG Netherlands
Z .geradts@nfi.minjus.nl
Abstract. The Electric Network Frequency (ENF) Criterion is a recently developed forensic technique
for determining the time of recording of digital audio recordings, by matching the ENF pattern from a
questioned recording with an ENF pattern database. In this paper we discuss its inherent limitations in
the case of short i.e., less than 10 minutes in duration digital audio recordings. We also present a
matching procedure based on the correlation coefficient, as a more robust alternative to squared error
matching.
Keywords: ENF, authentication, integrity
1 Introduction
Electric networks operate at their own specific frequency: the Electric Network Frequency (ENF)
1
.
However, due to unbalances in production and consumption of electrical energy, the ENF is known to
fluctuate slightly over time rather than being stuck to its exact set point (figure 1). The fluctuation pattern is
the same throughout the entire network [2] [3].
Digital recording equipment both mains and battery powered can pick up the ENF
2
, which ends up as
an extra frequency component in the recorded audio file [2] [3] [4]. By band pass filtering the audio signal,
the ENF can be isolated and its pattern can be retrieved. Under the assumption that the ENF fluctuations are
random, this effectively puts a time-stamp on the audio recording: the ENF pattern is unique for the time at
which the recording was made.
The ENF criterion
One of the challenges in authenticating digital audio evidence is to gain insight into its time of
recording [5]. A technique known as the ENF criterion [2] uses the aforementioned ENF fluctuation to
achieve this
3
. By comparing the recorded ENF pattern to a database ENF pattern from the same electric
network, it is possible to:
1) verify (or falsify) a questioned time of recording, or
2) determine an unknown time of recording.
A visual comparison of the recorded and database ENF patterns is often adequate for the first case, while an
(automated) search routine is necessary for the latter, to locate the best match between recorded and
database pattern
4
.
1 The main part of continental Europe is served by one large electric network, controlled by the UTE !1"# $ts E%& is
set at '( )*#
2 laims are that recording e+uipment,s microphones are sensitive to the power socket signal -when mains powered.
and the electromagnetic fields emanating from nearby power lines -when battery powered.# / thorough investigation of
recording e+uipment for which these claims hold is, however, lacking#
3 0ee !2" for other applications of the E%& criterion#
4 $n this paper, we focus on using the E%& criterion for determining an unknown time of recording#
Paper outline
Up until now, the ENF criterion has mostly been used with long recordings i.e., approximately one hour
in duration [2] [3] [4]. However, the amount of digital audio evidence (often accompanied by video
footage) of short duration i.e., ten minutes or less is increasingly prominent with the advent of audio
and video recording capabilities in consumer products (e.g., cell phones and digital cameras).
In the second half of this paper (sections 4 and 5) we will discuss the limitations of using the ENF criterion
with short recordings, and show examples of erroneous determination of the time of recording when using a
minimum squared error-based matching procedure. We present a maximum correlation coefficient-based
matching procedure as a more robust alternative.
In the first half (sections 2 and 3), we will describe our means of building an ENF pattern database and
extracting the ENF pattern from a digital audio recording.
2 ENF pattern database
Since the ENF is the frequency at which voltage levels in an electric network oscillate, it is possible to
obtain the ENF pattern by analyzing the voltage level signal, e.g., from a power socket. In our setup, we fed
this signal in attenuated form to a PC sound card that was set to a sampling frequency of 8000 Hz. The
sampled signal [ ] x n , with the index n starting at 1, can be modeled as:
(1)
where ( ) V t denotes the voltage level at time
t
, k is a factor representing the attenuation,
s
T the
sampling period (= 1.251
!"
s) and
1
t the time of the first sample.
#e used the method of $ero crossings, mentioned %& 'rigoras [2], for anal&sis of [ ] x n . (he idea is to treat
the signal as sinusoidal, although this is not strictl& true since its fre)uenc& * the +,- * varies slightl& over

F i g 1 : E N F f l u c t u a t i o n o v e r t i m e .
( ) ( )
1
1 ] [ t T n V k n x
s
+ =
time. -or a sinusoidal signal, the time

%etween two consecutive $ero crossings e)uals half the oscillation


period, so that its inverse e)uals twice the oscillation fre)uenc& f .
1
2
f

= (2)
#e determined the times of $ero crossings %& linear interpolation %etween samples [ ] x k and [ 1] x k +
that differ in sign, and calculated the difference %etween two consecutive times of $ero crossings to o%tain
values for

. (he corresponding values for f were calculated using (2) and averaged for ever& second of
signal. (his finall& results in a series of fre)uenc& values * i.e., the +,- pattern * with a time resolution of
1 second (figure 2a). -or visual clarit&, +,- patterns are often depicted as continuous (figure 2%).
3. ENF pattern extraction from digital audio recording
We have adopted the method presented by Cooper [4] for extracting the ENF pattern from a digital audio
recording. We shall cover this method briefly here, since Coopers paper offers an excellent and
comprehensive description.
The basic steps are:
Signal decimation Many digital audio recordings are recorded at high sampling frequencies
e.g., 44100 Hz. To detect the ENF, which is approximately 50 Hz, much lower sampling
frequencies are allowed. The audio file is thus decimated to a sampling frequency of 300 Hz,
which significantly reduces computational time.
Band pass filtering The frequencies of interest are around 50 Hz, so the decimated audio file is
digitally band pass filtered from 49.5 Hz to 50.5 Hz to isolate the ENF.
Short Time Fourier Transform (STFT) In discrete time STFT analysis, a signal is divided into
J

partly overlapping frames (figure 3) for which, after windowing and zero-padding, the frequency
spectrum is calculated via a Discrete Fourier Transform (DFT). The jump H (in samples)
between frames determines the time resolution of the final ENF pattern, while the amount of
overlap
H M
affects its smoothness. In our specific case, we have chosen
/ = H
so that the
extracted ENF pattern time resolution equals that of the database i.e., 1 second. Each frame was
windowed with a rectangular window and zero-padded by a factor of 4.
Peak frequency estimation For each frequency spectrum
'
, the frequency with maximum
amplitude is estimated. As it is unlikely that this peak frequency coincides exactly with a DFT
5 /ctually, we used the log power spectrum, defined as
[ ]
2
1
log X f , where [ ] X f is the fre+uency spectrum#
Fig.2 a) ENF pattern as a series of ENF values b) Continuous ENF pattern, obtained by interpolating ENF values
a. b.
frequency bin, quadratic interpolation around the bin with maximum amplitude is performed. The
estimated peak frequency is stored as the ENF value for the corresponding frame, so that we end
up with an extracted ENF pattern of J ENF values.
4 Matching by minimum squared error
0alculating the s)uared difference (1error2) %etween two vectors is a common approach in determining
their e)uivalence. the smaller the s)uared error, the more %oth vectors are ali3e. (he s)uared error E for
two length
L
vectors
x
and
y
is defined as.
( )

=
=
L
i
i y i x E
1
2
] [ ] [ [3]
#hen determining the time of recording using the +,- criterion, we have in general one longer vector (the
data%ase +,- pattern d ) and one shorter vector (the recorded +,- pattern r ). (he approach is then to
calculate a vector
e
of s)uared error values, according to.
( )

=
+ =
R
i
k i d i r k e
1
2
] 1 [ ] [ ] [ (4)
in which R is the length of the recorded +,- pattern, while the inde4 k runs from 1 to 1 + R D 5 D
%eing the length of the data%ase +,- pattern. (he minimum value in
e
determines the location of the %est
match %etween recorded and data%ase pattern, and hence the time of recording.
6deall&, the recorded +,- pattern and its corresponding data%ase pattern are e4actl& e)ual and the
minimum error value would %e $ero. 6n practice, however, this is not the case. (he relia%ilit& of the +,-
criterion in determining the right time of recording is therefore limited %& the occurrence of similar patterns
within the database itself * i.e., +,- patterns with s)uared errors in the same range as 1t&pical2 s)uared
errors %etween recorded and corresponding data%ase +,- pattern.
Fig. 3 Division of a signal of length N into J partly overlapping frames
Database analysis
6n an e4periment, we too3 roughl& 1.5 &ears of +,- data
7
and calculated the (root mean) s)uared error
8
%etween two randoml& pic3ed, non!overlapping pieces of 7 +,- values ( =9 1 minutes). :& repeating
this one million times, we were a%le to picture the appro4imate distri%ution of root mean s)uared (rms)
errors %etween +,- patterns of length 7 within the data%ase (figure "). ;imilar e4periments were run for
patterns of 7, 12, 2" and "2 +,- values.
(he most interesting part of the histogram in figure " is near $ero. the smallest (o%served) rms error within
the data%ase +,- pattern. -or length 7, we found this smallest rms error to %e a%out ." <$.
#e thus conclude that the minimum rms error %etween a recorded +,- pattern of length 7 and a (large)
data%ase should %e 1well %elow2 ." <$ for a relia%le determination of the time of recording.
(a%le 1 lists the o%served smallest rms errors for all e4periments. =s can %e e4pected, the error increases
for longer +,- patterns. the longer a pattern, the less li3el& it will have a similar counterpart over its whole
length within the data%ase
>
.
6 ollected as described in section 2, at the %etherlands &orensic $nstitute -The )ague, The %etherlands. from
0eptember 2((' until &ebruary 2((1# E%& values were stored minute2by2minute in plain te3t files -i#e#, 4( values per
file.#
7 &ollowing the notation of e+uation -3., the root mean s+uared error
rms
E is defined as
( )
L
i y i x
E
L
i
rms

=
1
2
] [ ] [
# onclusions are independent of a choice for
E
or
rms
E as the dissimilarity
measure#
8 $t is therefore that the E%& criterion works well for audio recordings of long duration -as confirmed by some of our
e3periments not mentioned here.# $n this case, the rms error between recorded and corresponding database E%&
pattern is almost certainly much smaller than those found within the database itself#
Fig 9 : Normalized histogram of squared errors between database pieces, 600 ENF values in length
Table 1 : Smallest observed rms errors within 1.5 years of ENF database
ENF pattern length Smallest observed rms error [Hz
4( (#(((1
12( (#((1'
24( (#((2(
42( (#((3'
4(( (#((4(
!est recordings
-or a second e4periment, we too3 an ?=merican =udio @oc3et AecordB porta%le digital audio recorder and
set it up to %e mains powered. #e made a total of 8 recordings with durations of 7, 12, 2", "2 and
7 seconds (i.e., 1" recordings for each duration). (he e4act times of recording were 3nown and the audio
files * sampled at "",1 3<$ * were stored in lossless #=C format.
(he +,- pattern from each recording, e4tracted as descri%ed in section /, was compared to a small
data%ase consisting of two wee3s of +,- data, including the period of recording. Aesults are summari$ed
in ta%le 2.
Table 2: Test recording results for minimum root-mean-squared error matching
"ecording duration #orrect time estimate
for$
%inimum rms error ranging
from$
4( s ( out of 14 recordings (#(((5 )* to (#((25 )*
12( s ( out of 14 recordings (#((15 )* to (#((31 )*
24( s 2 out of 14 recordings (#((33 )* to (#((46 )*
42( s 1( out of 14 recordings (#((4' )* to (#(('' )*
4(( s 14 out of 14 recordings (#((4' )* to (#(('' )*
6t is seen that the +,- criterion failed in correctl& estimating the time of recording for "" out of the 8
recordings. Doreover, the found minimum rms errors are all a%ove the values mentioned in ta%le 1.
0omparison to a larger data%ase could thus have resulted in even less satisf&ing results.
Matching by maximum correlation coefficient
-igure 5 shows a main reason for the failure of the +,- criterion. our recorded +,- patterns have a slight
offset compared to the data%ase pattern * a phenomenon also noted %& EaFstura et al [/]. 6n general, this
cannot %e 3nown %eforehand and thus the matching procedure should %e ro%ust to this t&pe of %ehavior.
#e propose matching %ased on e)uivalence of shape, %& using the correlation coefficient. -ollowing the
notation of e)uation (/), the correlation coefficient

%etween two vectors is defined as.


1
( [ ] )( [ ] )
( 1)
L
n
x y
x n x y n y
L


=

=

(5)
where the hori$ontal %ars and sigmas denote the averages and the standard deviations of the vectors
respectivel&.

can run from !1 to G15 the closer the value is to G1, the more %oth vectors are ali3e in
shape. #hen comparing a recorded +,- pattern to a data%ase, we thus search for the maximum correlation
coefficient %etween recorded and data%ase pattern.
Database analysis
=s with minimum s)uared error matching, the relia%ilit& of a ma4imum correlation coefficient!%ased
matching procedure will %e limited %& high correlations within the data%ase itself. #e have repeated the
first e4periment descri%ed in the preceding section, this time calculating correlation coefficients instead of
rms errors. <ere, we are interested in the larest o%served values, which are listed in ta%le /
H
.
Table 3: Iargest o%served correlation coefficients within 1.5 &ears of +,- data%ase
!" #attern length $argest obser%e& correlation coefficient
7 . HH>
12 .HH
2" .H>8
"2 .H>2
7 .H>5
Test recor&ings
Datching the same 8 test recordings with the same data%ase %& a ma4imum correlation coefficient search,
&ielded the results mentioned in ta%le ". 0orrect time estimation is significantl& improved, with onl& /
failures out of 8. =lso, from duration of 2" seconds onwards, the ma4imum correlation coefficients all lie
a%ove the values mentioned in ta%le /. (his suggests that even comparisons to a larger data%ase would have
resulted in correct time estimates.
Table 4. (est recording results for ma4imum correlation coefficient matching
'ecor&ing &uration (orrect time estimate for) Maximum corr* coeff* ranging
from)
7 s 12 out of 1" recordings .H85> to .HH>H
12 s 1/ out of 1" recordings .H582 to .HH>
2" s 1" out of 1" recordings .H>H/ to .HH>H
"2 s 1" out of 1" recordings .HH1 to .HHH2
7 s 1" out of 1" recordings .HH"5 to .HHH/
9 The higher value for length 4(( compared to length 42( is probably due to using 7only, one million random pairs of
database pattern8 the e3periment for length 4(( 9ust happened to come across a better matching pattern than the one for
length 42(#
Fig 5 : Recorded ENF patterns lie consistently below the corresponding database patterna) Example of a
recording 240 s in duration b) Example of a recording 420 s in duration
a. b.
+* (onclusion
#e have shown that the relia%ilit& of the +,- criterion is inherentl& limited %& similarities within the +,-
pattern data%ase to which the recording is compared. (he possi%le presence of a fre)uenc& offset further
increases the danger of erroneous determination of the time of recording * especiall& for recordings shorter
than 1 minutes in duration in com%ination with a minimum s)uared error!%ased matching procedure. #e
have shown improvements %& using a ma4imum correlation coefficient!%ased matching procedure.
References
1. Union for the Co-ordination of Transmission of Electricity, Homepage, http://www.ucte.org
2# :rigoras, # 8 ;igital audio recording analysis < the electric network fre+uency criterion, $nternational
=ournal of 0peech >anguage and the >aw, vol# 12, no# 1, pp# 43214 -2(('.
3# ?a9stura, @, Trawinska /, )ebenstreit, =8 /pplication of the Electrical %etwork &re+uency -E%&.
riterion < / case of a ;igital Aecording, &orensic 0cience $nternational, vol# 1'', pp# 14'2111 -2(('.
4# ooper, /#=8 The electric network fre+uency -E%&. as an aid to authenticating forensic digital audio
recordings < an automated approach, onference paper, /E0 33rd $nternational onference, U0/ -2((5.
'# Bri3en, E#B8 Techni+ues for the authentication of digital audio recordings, Cresented at the /E0 122
nd
onvention, Dienna -2((1.

Das könnte Ihnen auch gefallen