Beruflich Dokumente
Kultur Dokumente
Abstract
Statistical discrimination for nonstationary random processes is important in many applications.
Our goal was to develop a discriminant scheme that can extract local features of the time series,
was consistent, and computationally ecient. Here, we propose a discriminant scheme based on
the SLEX (Smooth Localized Complex EXponential) library. The SLEX library forms a collection
of Fourier-type bases that are simultaneously orthogonal and localized in both time and frequency
domains. Thus, the SLEX library has the ability to extract local spectral features of the time series.
The rst step in our procedure, which is the feature extraction step based on Saito (1994), is to nd
a basis from the SLEX library that can best illuminate the dierence between two or more classes of
time series. In the next step, we construct a discriminant criterion that is related to the KullbackLeibler divergence between the SLEX spectra of the dierent classes. The discrimination criterion
is based on estimates of the SLEX spectra that are computed using the SLEX basis selected in the
feature extraction step. We show that the discrimination method is consistent and demonstrate
via nite sample simulation studies that our proposed method performs well. Finally, we apply
our method to a seismic waves dataset with the primary purpose of classifying an unknown seismic
recording as having either an earthquake or explosion origin.
Keywords: Classication and discrimination, Kullback-Leibler divergence, likelihood ratio, nonstationary random process, seismic time series, SLEX library.
Introduction
place near the Russian nuclear test facility in Novaya Zemlya. The problem of discriminating between mining explosions and earthquakes is a reasonable proxy for the problem of discriminating
between nuclear explosions and earthquakes. This latter problem is one of critical importance
for monitoring a comprehensive test-ban treaty. An extensive discussion of this problem can be
found in Shumway and Stoer (2000, Ch 5); the data are available on the web site for the text,
http://www.stat.pitt.edu/stoffer/tsa.html and its mirrors. Time series classication problems are not restricted to geophysical applications, but occur under many and varied circumstances
in other elds. Traditionally, the detecting of a signal embedded in a noise series has been analyzed
in the engineering literature by statistical pattern recognition techniques.
The historical approaches to the problem of discriminating among dierent classes of time series
can be divided into two distinct categories. The optimality approach, as found in the engineering and statistics literature, makes specic Gaussian assumptions about the probability density
functions of the separate groups and then develops solutions that satisfy well-dened minimum
error criteria. Typically, in the time series case, we might assume the dierence between classes is
expressed through dierences in the theoretical mean and covariance functions and use likelihood
methods to develop an optimal classication function. A second class of techniques, which might
be described as a feature extraction approach, proceeds more heuristically by looking at quantities
that tend to be good visual discriminators for well-separated populations and have some basis in
physical theory or intuition. Less attention is paid to nding functions that are approximations to
some well-dened optimality criterion.
When analyzing time series data, both time domain and frequency domain approaches are typically available; this is true in the case of discrimination and classication. For relatively short
stationary series, a time domain approach that follows conventional multivariate discriminant analysis is viable; an example is given in Shumway and Stoer (2000, Example 5.11). For longer
stationary time series, a frequency domain approach is computationally easier because it reduces
the dimension of the problem. For nonstationary time series, the reduction in dimension is essential
for computational eciency.
There has been extensive research on the discrimination problem of stationary time series. For
example, Shumway (1982) reviewed many dierent discriminant methods for time series including time domain and frequency domain approaches. Kakizawa, Shumway and Taniguchi (1998)
proposed the method of discrimination for multivariate time series by using the Kullback-Leibler
discrimination information and the Cherno information measure. Pulli (1996) considered using
the ratio of spectra to process the discriminant problem between earthquakes and explosions. These
methods are developed for stationary time series. A discussion of the current state of the art for
stationary series can be found in Shumway and Stoer (2000, 5.7).
In many practical problems, however, the time series are realizations of nonstationary random
processes. For example, it is clear that the seismic waves displayed in Figure 1 have variance and
spectrum that change over time. Various models of nonstationary random processes have been
proposed in the literature. Priestley (1965) was the rst to introduce the concept of a Cramer
representation with time-varying transfer function. This idea was later rened by Dahlhaus (1997)
where he established an asymptotic framework for locally stationary processes. Recently, Ombao,
Raz, von Sachs and Guo (2002) introduced the SLEX (Smooth Localized Complex EXponentials)
model of a nonstationary random process. The SLEX model uses the SLEX vectors (orthogonal
and localized Fourier vectors) as the stochastic building blocks in the Cramer representation.
Sakiyama & Taniguchi (2003) and Shumway (2003) discussed discrimination for Dahlhaus locally stationary time series. We approach the problem of discrimination and classication of nonstationary time series using the SLEX model. One very important consideration in using the SLEX
library is that it employs computationally ecient algorithms. In particular, it uses the fast Fourier
transform (FFT) algorithm in computing the SLEX transform and the best basis algorithm (BBA)
of Coifman and Wickerhauser (1992) in searching for the best basis for discrimination between
classes.
Our discriminant scheme contains of two parts, a feature extraction part and a classication
part. The feature extraction step consists of selecting a basis from the SLEX library that illuminates
the most dierences between the groups. We follow Saito (1994) who proposed a criterion that is
based on the Kullback-Leibler divergence. This method is aimed at selecting the basis from any
library of orthogonal bases that can best present the time series as clouds in space with maximal
distance (distance is between the normalized time-varying spectrum). Because the SLEX functions
are local, they are able to extract local spectral features of the time series. After selecting a basis
using training data sets, we compute the SLEX periodogram of the time series that we will need to
classify. We propose a classication criterion which we will show to be asymptotically consistent.
The essential idea is that an observed time series is assigned to a class (population, or group) c
if the Kullback-Leibler divergence between the estimated spectrum (SLEX periodogram computed
from the data) and the spectrum of c is smaller than that between the estimated spectrum and
the spectra of any other class.
In the next section we present our procedure for selecting the best local discriminant basis from
the SLEX library. In 3, we discuss our discriminant criterion. Finally, in 4, we will present some
3
The rst step in our proposed method is to choose a basis from the SLEX library, which is a
collection of bases; each basis consists of orthogonal and localized basis vectors. The SLEX vector
is a time and frequency localized orthogonal generalization of the Fourier (complex exponential)
basis vectors; it is ideal for analyzing nonstationary time series. Ombao, Raz, von Sachs and Mallow
(2001) use the SLEX in estimating the spectral density matrix of a bivariate nonstationary time
series. Moreover, Ombao, Raz, von Sachs and Guo (2002) introduce a model of a nonstationary
random process having a spectral representation in terms of the SLEX. The SLEX library is a rich
collection of localized bases, thus it is able to extract local spectral features of the time series and
are well suited to the problem of discrimination and classication of nonstationary time series.
2.1
Here, we give a brief overview of the SLEX transform. For details, we refer the reader to Ombao, et
al (2001) for specic applications to time series and to Wickerhauser (1994) for a broader discussion
on local trigonometric transforms.
Fourier basis vectors are perfectly localized in frequency and hence are ideal at representing
stationary time series. However, they cannot adequately represent nonstationary time series, i.e.,
the time series with spectra that change over time. In this paper, we will use the SLEX basis vectors
which are simultaneously orthogonal and localized in time and frequency. They are constructed by
applying a projection operator on the Fourier vectors. It turns out that the action of a projection
operator on any periodic vector is identical to applying two specially constructed smooth windows
to the Fourier vectors. A SLEX basis vector S, (t) having support on the discrete time block
S = {0 + 1, . . . , 1 } and oscillating at frequency has the form
t
|S|
+ S, (t) exp i2
t
|S|
(2.1)
where [1/2, 1/2], |S| = 1 0 , is a small overlap between two consecutive time blocks (the
size of is given in 2.2. The windows S,+ (t) and S, (t) are the particularly constructed two
smooth windows which take the form
t 0 2 1 t
S,+ (t) = r
r
t 0
0 t
t 1
1 t
r
r
r
S, (t) = r
2
(2.2)
(2.3)
where r() is called a rising cut-o function. In our implementation, we use the sine rising cut-o
function
1.0
0.0
-1.0
0
500
1000
1500
2000
Figure 2: Real and imaginary parts of a SLEX waveform with support on block indexed by time
t {512, . . . , 1023} with = 32.
The SLEX basis vectors are dened at dyadic blocks that overlap. One important property of the
SLEX vectors is that they remain orthogonal despite the overlap. Moreover, in our implementation,
we use the fundamental frequencies k = k/|S|, k = |S|/2 + 1, . . . , |S|/2 where |S| is the length
of time block S. An example of a SLEX basis vector is given in Figure 2.
2.2
The SLEX library is a collection of bases, each having orthogonal vectors with time support that is
obtained by segmenting the time series, of length T , in a dyadic manner. The library is constructed
by rst specifying the nest resolution level J (smallest time block has length T /2J ). At resolution
level j (where j = 0, . . . , J), the time series is divided into 2j overlapping blocks. The amount of
overlap at is set to the = T /2J+1 which is the same for all levels j. We denote the block b on
level j to be S(j, b) and Mj = T /2j . The SLEX vectors on block S(j, b) are allowed to oscillate at
dierent fundamental frequencies k = k/Mj (where k = Mj /2 + 1, . . . , Mj /2).
To illustrate this further, consider the bottom of Figure 3 where the SLEX library is constructed
by setting J = 2. In this example, the SLEX library consists of 5 orthogonal bases and we enumerate
the support of these basis vectors: (i) S(0, 0); (ii) S(1, 0) S(1, 1); (iii) S(2, 0) S(2, 1) S(2, 2)
S(2, 3); (iv) S(1, 0) S(2, 2) S(2, 3); (v) S(2, 0) S(2, 1) S(1, 1). Clearly, the SLEX basis
vectors are allowed to have dierent lengths of support (or dierent time and frequency resolutions).
Moreover, each basis captures local spectral features of the time series that will be useful for
discrimination and classication. In the latter part of this section, we will discuss our criterion for
selecting a basis from the SLEX library.
2.3
The SLEX transform consists of the set of coecients corresponding to all the SLEX vectors dened
in the library. The SLEX coecients on block S = S(j, b) are dened by
1
S,k =
Xt,T S,k (t),
Mj t
where the fundamental frequency is k = k/Mj and k = Mj /2 + 1, . . . , Mj /2.
5
(2.5)
S(0, 0)
S(1, 0)
S(2, 0)
S(1, 1)
S(2, 1)
S(2, 2)
S(2, 3)
Figure 3: The SLEX library constructed with level J = 2. The shaded blocks S(1, 0) S(2, 2)
S(2, 3) represent one basis (out of the 5 in this library) .
2.4
(2.6)
We use the best local discriminant criterion that is described in Saito (1994) as a criterion for
selecting the best basis. In the actual search over the SLEX library, we can employ either the algorithm in Saito or the BBA of Coifman and Wickerhauser (1992). Both of these are computationally
ecient algorithms, each of order O(T ).
Intuitively, a best basis for discrimination is the one which gives the largest disparity between
two or more dierent classes. One well known measure of disparity is relative entropy (also known as
cross entropy, Kullback-Leibler divergence, or I-divergence) which we now dene. Let p = {pi }ni=1
and q = {qi }ni=1 be two nonnegative sequences that satisfy i pi = i qi = 1. The relative entropy
between p and q is dened to be
n
I(pp, q ) =
pi log
i=1
pi
.
qi
C1
C
I(pp , pk ).
=1 k=+1
c
Now, let {x
xci }N
i=1 be a set of Nc training signals belonging to class c, for c = 1, . . . , C. Then the
time-frequency energy map of class c, denoted by c , is a table of real values specied by the triplet
(j, b, k) as
c (j, b, k)
Nc
i=1
i
(j,b),k
Nc
i=1
x
xci 2
(2.7)
i
(j,b),k
where
is the periodogram of the ith time series computed at level j = 0, . . . , J, in block
j
b = 0, . . . , 2 1 and frequency k where k = Mj /2 + 1, . . . , Mj /2. Note that the time-frequency
energy map for a class c are in fact average normalized periodograms, i.e., for any basis BT in each
class c, (j,b,k)BT c (j, b, k) = 1. The time-frequency energy map gives the location (in time
and frequency) where the energy in a particular class is mostly concentrated. Thus, our procedure
for classication will be based on the time-frequency energy concentration of the signals.
After computing the time-frequency energy map of each class, we then calculate the divergence
between the C signals at block S(j, b) as
Mj /2
j,b =
k=Mj /2+1
In this case, j,b gives a local measure of disparity between the time-frequency energy maps of
the C classes in each time block S(j, b). Our procedure will select the basis that consists of blocks
S(j, b) where, essentially, the distance between classes, j,b is maximized.
2.5
In this section, we give the specic steps in selecting the best basis. Saito (1994) developed the
local discriminant basis algorithm (LDBA) which we apply to the SLEX library. Let Aj,b represent
the best basis (maximizer of the local discriminant criterion) and Bj,b denote the SLEX vectors at
level j in block b. The algorithm for selecting the best basis is as follows:
Step 0: Specify the maximum depth of decomposition J.
Step 1: Construct time-frequency energy maps c for c = 1, . . . , C.
Step 2: Set AJ,b = BJ,b . Determine the Aj,b for b = 0, . . . , 2j 1, and j = J 1, . . . , 0, by the
following rule:
If j,b j+1,2b + j+1,2b+1 , then Aj,b = Bj,b ,
else Aj,b = Aj+1,2b Aj+1,2b+1 and set j,b = j+1,2b + j+1,2b+1 .
Step 3: Extract the blocks S(j , b ) that are dened by A(j, b) in the previous step. The SLEX
periodograms computed at the blocks dened by the best basis will be used in constructing
the classication rule.
The algorithm essentially compares a parent block against its children blocks; e.g., the
parent block S(j, b) vs its children blocks S(j + 1, 2b) S(j + 1, 2b + 1). If the value of the local
discriminant criterion at the parent block is smaller than the corresponding sum at the children
blocks, then the children blocks separate the populations better than the parent block. Thus the
children blocks are chosen in favor of their parent in the best basis BT . If the value of the local
discriminant criterion at the parent block is greater than or equal to the corresponding sum at the
children blocks, then the children blocks do not separate the populations better than the parent
block, and the parent block would be selected in favor of its children.
Finally, we point out the connection between LDBA and the BBA. The basis that maximizes
the I-divergence is equivalent to the basis that minimizes the negative I-divergence. One may use
the BBA in searching for the best local discriminant basis in the SLEX library. Thus, our procedure
is computationally ecient and can handle massive time series datasets.
2.6
We now describe the SLEX model for a nonstationary time series Xt,T , for t = 0, . . . , T 1. Let
BT be the best local discriminant basis selected from the SLEX library and let Si be the blocks
in BT . Note that Si is a particular dyadic segmentation of the time series. Dene Mi to be the
number of points on the block Si . Let JT to be the highest time resolution level in BT , i.e., the
smallest time block in BT has length T /2JT . The frequencies dened on Si are the grid frequencies
ki = ki /Mi for ki = Mi /2 + 1, . . . , Mi /2. The spectral representation of Xt,T is
Xt,T =
Si BT
Mi
Mi /2
(2.8)
k=Mi /2+1
where i,k,T is the transfer function on time block Si and frequency k; i,k is the SLEX basis vector
oscillating at frequency k and having support at block Si ; and zi,k is an orthonormal random process
with nite fourth moment.
2.6.1
The SLEX spectrum is dened analogously to the spectrum of a stationary process. It is the square
of the modulus of the time-varying transfer function. It is dened on rescaled time [0, 1]. Let u
be in an interval I [0, 1] such that [uT ] is in some time block Si on BT . The SLEX spectrum
is fT (u, k ) = |i,k,T |2 [uT ] Si . Note that for a xed frequency k , fT (u, k ) is constant
within each time sub-interval (or block). This is because for each xed T the SLEX model gives
an explicit partitioning of the time-frequency plane, as determined by the blocks i Si in the basis
BT .
Ombao et al (2002) developed asymptotic theory on the SLEX model. In this paper, we discuss
only the ideas that are essential in proving the consistency of our discrimination procedure. As
T , fT (u, k ) the limiting spectrum f (u, ) which is independent of T . The limiting logspectrum is dened below. As T increases, the number of observations in each block also increases.
However, the number of blocks in BT should also be allowed to increase if we want to model a
limiting spectrum which, as a function of time, has an innite wavelet expansion. Thus, we need
to allow JT to increase as T increases, but at a rate that is slower than KT = log2 (T ). The innite
wavelet expansion in Ombao, et al (2002) was dened on log f (u, ) instead of f (u, ) in order to
guarantee positivity of the spectrum. Let Haar wavelet basis of L2 ([0, 1]) be {0,0 } {,m }0,m0
where is the father Haar wavelet and is the mother wavelet. The innite wavelet expansion
of the log limiting SLEX spectrum is
1
J
T 1 2
=0 m=0
(2.9)
1,0 () =
log f (u, )
0,0 (u)
du,
,m () =
Ombao, et al (2002) demonstrated that, for a given T, log f (u, ) and SLEX basis BT with nest
time resolution level JT , log fT (u, ) arises as a nite wavelet expansion of log f (u, ) truncated to
a level JT .
As a nal remark, Ombao, et al (2002) have shown that the SLEX model and the Dahlhaus (1997)
model of a locally stationary process are asymptotically mean-squared equivalent. The equivalence
implies nonstationary processes that have smoothly time-varying spectrum can be modelled using the SLEX model. For example, one may model ARMA processes with parameters that vary
smoothly over time by a SLEX model with spectrum that varies with time but is piecewise constant
within small time blocks.
2.6.2
We now derive the likelihood of the SLEX periodograms. When the orthonormal random increment
processes, zi,k , are complex Gaussian then the SLEX coecients, i,k , are independent complex
i,k = |i,k |2 are distributed as fT (ui , k )Vi,k where
Gaussian. Moreover, the SLEX periodograms
Vi,k are independent over i and k 0 and are distributed as 22 /2 when k
= 0, Mi /2 and 21 when
k = 0, Mi /2. Because the periodograms at each level j and block b are symmetric about the zero
frequency (k = 0), we will consider only the periodograms at non-negative frequencies.
= {
i,k }; let p(
) be the joint density of the periodograms under the SLEX process
Denote
and fT be the spectrum. Then the log likelihood of the periodograms is
) =
(fT |
Mi /2
Si BT k=0
i,k
log fT (ui , k ) +
.
fT (ui , k )
(2.10)
In the next section, we will develop a classication rule that is based on the log likelihood ratio.
3.1
The basic idea of our approach is as follows. We rst extract the SLEX periodograms, which we
x), from the blocks of the best basis selected. We then form the likelihood under 1
denote as (x
9
DT (fT1 , fT2 ; x) =
x)]
p1 [(x
1
log
,
x)]
T
p2 [(x
(3.1)
= {
i,k } and let p1 (
) and p2 (
) be the density under processes 1 and 2 respectively
Denote
1
2
and let fT and fT be the spectra under these two processes. Then the log likelihood under these
two densities are, respectively,
(fT1
(fT2
Mi /2
) =
|
Si BT k=0
Mi /2
) =
|
Si BT k=0
i,k
+ 1
fT (ui , k )
i,k
+ 2
fT (ui , k )
(3.2)
(3.3)
Our classication rule is based on the likelihood ratio, i.e., we classify x into 1 if
(fT1 |
)
(fT2 |
). Otherwise, it is classied into 2 . Following equations (3.1), (3.2) and (3.3), the
discriminant statistic is
1
DT (fT1 , fT2 ; x) =
T
Mi /2
i Si BT k=Mi /2+1
f 2 (ui , k )
i,k [1/fT2 (ui , k ) 1/fT1 (ui , k )] .
+
log T1
fT (ui , k )
(3.4)
We classify x into 1 if
DT (fT1 , fT2 ; x)
0.
Remarks:
)
(fT2 |
) . In the computation of the log
1. First, we note that DT (fT1 , fT2 ; x) T2
(fT1 |
likelihood, we do not explicitly include the negative frequencies because the periodograms are
symmetric about k = 0.
The likelihood is obtained by segmenting the time series into N time blocks, each having
length M , and then computing the Fourier periodograms PM (b, ), at blocks b = 1, . . . , N .
Their classication criterion is
DT (f 1 , f 2 ; x) =
N 1/2
1
f 2 (uj , )
2
1
log D
1 (u , ) + PM (uj , )[1/fD (uj , ) 1/fD (uj , )]d, (3.5)
N j=1 1/2
fD
j
N
1
,m
N m=1 i,k
= 1, 2.
(3.6)
Essentially, we perform averaging across subjects which gives the similar desired eect of
reducing the variance when smoothing over frequency. However, if N1 is small, we might not
achieve the desired reduction in the variance and thus we may need to do an extra smoothing
step which is over frequency. Hence, we may use the weighted form
f
T (ui , k ) =
M
s=M
Ws f
T (ui , ks )
M
2
f
T (ui , k ) can be obtained in a similar manner if N2 is small.
3.2
s=M
Ws = 1. The estimate
The discriminant statistic, DT given in (3.4), can be derived using the Kullback-Leibler divergence.
As before, let fT1 and fT2 represent the spectra of 1 and 2 , respectively. Let the periodogram
= {
i,k }. The
of the time series, x, to be classied into either 1 and 2 , be denoted by
1
2
and fT and fT are:
Kullback-Leibler divergences between
, fT1 )
KL(
, fT2 ) =
KL(
1
T
1
T
Mi /2
i Si BT k=Mi /2+1
Mi /2
i Si BT k=Mi /2+1
i,k
i,k
This leads to the same classication rule discussed in 3.1.1 below (3.4), viz., classify x into 1 if
DT 0; otherwise, classify x into 2 .
11
3.3
Consistency Theorem
We now state the results pertaining to the consistency of our classication scheme based on the
SLEX model. Denote the probability of misclassifying a time series x to process 2 , given that
the true process is 1 , to be PT (2 | 1) = P {DT (fT1 , fT2 ; x) < 0 | 1 }. Similarly, we denote
PT (1 | 2) = P {DT (fT1 , fT2 ; x) 0 | 2 }. Then, under appropriate conditions,
lim PT (2 | 1) = 0
(3.7)
lim PT (1 | 2) = 0.
(3.8)
T
T
Moreover, (3.7) and (3.8) hold when fT1 and fT2 are replaced by the consistent estimates dened in
(3.6) if, in addition, we have N for
= 1, 2. The conditions and proofs of these results are
given in the Appendix, A.2.
4.1
Simulation Results
A number of numerical experiments were performed to test how well our proposed method was able
to classify time series. We discuss a few of simulation experiments here. In the rst experiment we
had two classes, where processes from 1 were Gaussian white noise, and processes from 2 were
stationary rst order autoregressive with parameter . The length of each time series was set to
T = 1024, and the training set for each group consisted of eight time series. Several simulation
studies were performed by varying . Using the training datasets, we selected the best local
discriminant basis and then set up the discriminant criterion. We then generated 50 time series
datasets from each of 1 and 2 . In Table 1 we report the error rate which is the number of
incorrect classication out of a total of 100 trials. Note that there is a small misclassication rate
when the value of is close to 0. For example, when = 0.1, there is some misclassication error
because these AR(1) processes are close to white noise.
Table 1: Misclassication rate for the rst simulation
study where 1 is Gaussian white noise
and 2 is AR(1) with parameter .
Yt =
(1)
(2)
(4.1)
Xt =
(1)
Yt if 1 t T /2
(2)
Yt if T /2 + 1 t T
(1)
Xt if 1 t T /2
(2)
Xt if T /2 + 1 t T
12
(4.2)
1.2
=
=
=
=
0.5
0.4
0.3
0.2
0.4
0.6
0.8
1.0
tau
tau
tau
tau
200
400
600
800
1000
Figure 4: The coecient at; versus t for = 0.5, 0.4, 0.3, 0.2
8
100
200
300
400
500
Time
600
700
800
900
1000
(1)
(2)
13
100
200
300
400
500
Time
600
700
800
900
1000
0.4
0.3
0.2
Error rate 0.08 0.00 0.00
at; = 0.8[1 cos(t/1024)], = 0.5, and t were iid standard normal. We performed three subexperiments where 2 was dened by the processes Yt = at; Yt1 0.81Yt2 + t , t = 1, . . . , 1024
where takes on the three dierent values = 0.4, 0.3, 0.2. The plots of at; for the various values
of are shown in Figure 4. A simulated data set from process 1 is shown in Figure 5 and that
from 2 with = 0.3 is shown in Figure 6.
We generated 10 data sets from each of 1 and 2 as the training data. From the training data
set, we obtained the best basis and the estimated spectra. Then, we generated 10 new data for
each category to evaluate the error rate. The simulation results are shown in Table 3. We can see
that the error rates are zero when the = 0.3, 0.2 in 2 . There was a misclassication error for
the case where the s are close for both processes, i.e., = 0.4 for 2 and = 0.5 for 1 .
4.2
Data Analysis
14
15
50
Power
40
30
8
20
6
10
0
4
Time
0.5
0.4
2
0.3
0.2
0.1
Frequency
80
Power
60
40
8
20
7
6
0
0.5
5
4
0.4
0.3
0.2
Frequency
Time
3
2
0.1
0
in the Introduction. The earthquake and explosion seismic recordings are given in Figures 7 and 8
respectively. The SLEX spectral estimate for one earthquake and explosion time series are given
in Figures 9 and 10 respectively while the SLEX spectral estimate of the NZ event is in Figure 11.
To evaluate our method, we used the holdout procedure; that is, we removed one time series (at
a time) to be used for classication and used all remaining time series in the training dataset (excluding the unknown NZ event) in constructing the classication rule. For each holdout time series,
we performed the following steps in sequence: we selected a basis for discrimination, we extracted
the SLEX periodograms computed at the blocks in this basis, we constructed the classication rule
and nally, we assigned the holdout time series. We repeated the process for each of the holdout
time series. In the classication criterion, we estimated the spectra for each group by rst averaging the SLEX periodograms across subjects and then smoothing the averaged SLEX periodograms
16
25
Power
20
15
10
8
5
7
6
0
0.5
5
4
0.4
0.3
3
0.2
Frequency
Time
0.1
0
2
DT (fT1 , f
T ; x)
0.1526
0.6271
0.8132
0.0131
0.5283
0.7189
0.6116
0.6469
Holdout series
Exp1
Exp2
Exp3
Exp4
Exp5
Exp6
Exp7
Exp8
NZ
1
2
DT (f
T , fT ; x )
0.5096
18.6391
1.4676
17.5373
12.5065
10.3111
2.1453
0.1044
3.1552
across frequency. Denote the estimated spectra for the earthquake and explosion classes to be,
2
1 2
1
respectively, f
T and fT . The classication criterion was DT (fT , fT ; x ) and the testing data was
1 2
1
2
classied as an earthquake event if DT (f
T , fT ; x ) 0 and to explosion if DT (fT , fT ; x ) < 0. The
results are reported in Table 4 where we denote Eqk to be the k-th earthquake in the data set and
Expk to be the k-th explosion in the data set. The proposed method had a zero misclassication
rate. Moreover, unknown NZ event here is classied as explosion which agrees with the result in
Kakizawa et al (1998).
Conclusion
In this paper, we proposed the SLEX method for discrimination and classication of nonstationary
time series that is based on the SLEX transform. The SLEX transform is localized in both time
and frequency domains thus our method is able to extract local features of the data. Moreover,
our method is computationally ecient and hence is able to handle large datasets. Our procedure
17
computes the Kullback-Leibler distance of the time-frequency energy maps (normalized SLEX
periodograms or the estimated normalized spectrum) of the classes and selects the basis from
the SLEX library that can best discriminate between classes of nonstationary time series. Our
classication rule, which is based on the Kullback-Leibler distance between the estimated spectra
of the groups and the time series that needs to be classied, is shown to be consistent, i.e., the
probability of misclassication goes to zero as the length of the time series goes to innity. Finite
sample simulation studies and data analysis demonstrate that the method performs well in practice.
Appendix
A.1
As mentioned in 2.6, the existence of the limiting SLEX spectrum depends on three assumptions
which we state now for completeness. Details can be found in Ombao et al (2002).
Assumption 1. For f (u, ) as a function of [1/2, 1/2], uniformly in u [0, 1] we assume a
Holder condition of order (0, 1] with constant L > 0
|f (u, ) f (u, )| L| |
Assumption 2. There exists a hierarchical collection I of dyadic subintervals of [0, 1]
I=
2 m, 2 (m + 1) :
= 0, 1, . . . ; m = 0, . . . , 2 1
and a subset of intervals, say Iv = [uv , uv+1 ) I, such that v Iv = [0, 1] and that f (u, ), as a
function of u, is Holder of order 0 < sv < 1 on Iv , for all [1/2, 1/2]. Moreover, the transition
points uv between the intervals Iv1 and Iv allow for a nite number of possible jumps of nite
height.
Assumption 3. (a) Either the maximum depth J is xed, or, if J = JT is allowed to grow as a
function of T , i.e., JT , then we must have 2JT /T 0 as T . Furthermore, JT J2T .
(b) The length Mj of each segment Sj satises MJT /Mj 1, for j = 0, 1, . . . , JT .
A.2
Proof of Consistency
In Section 3.3 we claimed that our discrimation rule is consistent in the sense of (3.7) and (3.8). We
now prove these results under Assumptions 13. First, we state a theorem that relates the SLEX
spectrum, fT to the SLEX limiting spectrm, f (recall Remark 4 in 2.6).
Theorem 1 (Ombao et al, 2002) Under Assumptions 13, given the SLEX model as defined in
2.6, with SLEX spectrum fT and SLEX limiting spectrum f , and a sequence of frequencies k,T
as T , we have the following:
sv
+s
log
f
(u,
)]
du
=
O
T
.
T
k,T
0
To aid in the proof of the consistency of our method, we rst dene the following functions. Let
T (u, ) = 1/fT2 (u, ) 1/fT1 (u, )
Dene
1
HT (T ) =
T
Mi /2
1 1/2
H() =
i,k ,
T (ui , k )
i Si BT k=Mi /2+1
1/2
i,k is the SLEX periodogram of x, i Si represents the blocks corresponding to the best basis,
where
BT , and f is the SLEX limiting spectrum corresponding, generically, to class (i.e. x ).
Similarly, fT will represent the SLEX spectrum corresponding to .
E HT (T ) = H() + O
M
T
+O
2JT
T
+ O (bsT ) ,
E HT (T ) =
i Si BT
i Si BT
i Si BT
i Si BT
i Si BT
Mi 1
T Mi
Mi 1
T Mi
Mi 1
T Mi
Mi
T
Mi
T
1 1/2
1/2
Mi /2
k=Mi /2+1
Mi /2
k=Mi /2+1
Mi /2
k=Mi /2+1
1/2
1/2
1/2
1/2
(ui , )f (ui , ) d + O
(ui , )f (ui , ) d + O
19
M
T
2JT
T
2JT
T
+O
+ O (bsT )
+ O (bsT )
2JT
T
+ O (bsT ) .
)+O
s
b
T
+O
2JT
T
Mi /2
Then
1
T
Mi /2
i,k
T (ui , k )
i Si BT k=Mi /2+1
1
1
Var
(P
)
+
Cov (Pi , Pj ),
i
T2 i
T 2 i=j
1
T2
1
T2
1
T2
1
T2
Mi /2
i Si BT k=Mi /2+1
Mi /2
i Si BT k=Mi /2+1
Si /2
i,k }
2T (ui , k )Var {
i Si BT k=Si /2+1
Mi /2
i Si BT k=Mi /2+1
2 (ui , k ) + O(bsT )
f 2 (ui , k ) + O(bsT )
= O(T 1 ) + O (bsT /T ) ,
and
1
T2
Cov (Pi , Pj ) =
i=j
1
T2
Cov
i=j
ki
i,ki ,
T (ui , ki )
kj
j,kj
T (uj , kj )
1
T2
i=j
ki
kj
j,kj
i,ki ,
T (ui , ki )T (uj , kj )Cov
1
s
(u
,
)(u
,
)
+
O(b
)
i
j
ki
kj
T
T 2 i=j k k
2
1
Cov
xt Si ,ki (t) ,
Mi
1
Cov
(u
,
(u
,
i
j
T
T
k
i,k
k
j,k
i
i
j
j
T 2 i=j k k
i
1
T2
i=j
ki
kj
2
1
xt Sj ,kj (t)
Mj
t
where zi,k are zero mean and orthonormal with nite fourth moment, say 4 (cf. 2.6). wolog,
assume that the set of frequencies in block Si is coarser than that in block Sj , (i.e., Mi Mj ), then
1
2
2
(1s)JT
(u
,
)(u
,
)|
|
|
|
+
O
2
/T
i
j
4
S
,
,T
S
,
,T
k
k
i
j
i
i
k
k
i
i
T 2 i=j k
1
Cov (Pi , Pj ) =
T 2 i=j
JT
= O 2
/T + O 2(1s)JT /T
= O 2JT /T .
Lemma 3 Let DT (fT1 , fT2 ; x) denote the discriminant statistic defined in (3.4), where fT1 and fT2
are the SLEX spectra in classes 1 and 2 , respectively. Then under the established notation and
conditions, as T ,
DT (fT1 , fT2 ; x) E{DT (fT1 , fT2 ; x) | i } p 0,
for x i , i = 1, 2.
Mi /2
1
T
log
i Si BT k=Mi /2+1
fT2 (ui , k )
.
fT1 (ui , k )
Then
DT (fT1 , fT2 ; x) = RT + HT (T ),
and
E{DT (fT1 , fT2 ; x) | 1 } = RT + E{HT (T ) | 1 }.
Hence,
DT (fT1 , fT2 ; x)
2
1 = Var{HT (T ) | 1 } 0,
DT (fT1 , fT2 ; x)
as T by Lemma 2.
We are now ready to establish the main result on the consistency of our discrimination rule.
Theorem 2 Under the established notation and conditions,
Proof. We prove the rst result, the second result follows in a similar manner. Using Lemma 1,
we have
DT (fT1 , fT2 ; x)
| 1
1
T
Mi /2
log
i Si BT k=Mi /2+1
1 1/2
1/2
+O
log
M
T
fT2 (ui , k )
+ fT1 (ui , k )/fT2 (ui , k ) 1
fT1 (ui , k )
f 2 (u, )
+ f 1 (u, )/f 2 (u, ) 1 d du
f 1 (u, )
+O
21
2JT
T
+O
MT
T
s
However,
log
so that
f 2 (u, )
+ f 1 (u, )/f 2 (u, ) 1 0,
f 1 (u, )
E DT (fT1 , fT2 ; x) | 1 C 0
(A.1)
1
2
lim lim P DT (f
T , fT ; x ) < 0 | 1 = 0;
T N
1
2
lim lim P {DT (f
T , fT ; x ) 0 | 2 } = 0.
T N
1
2
1
2
Proof. By the SLLN, DT (f
T , fT ; x ) as DT (fT , fT ; x ) as N , and the result follows immediately from Theorem 2.
References
Blandford, R. R. (1993). Discrimination of Earthquakes and Explosions. AFTAC-TR-93-044
HQ, Air Force Technical Applications Center, Patrick Air Force Base, FL.
Coifman, R. and Wickerhauser, M. (1992). Entropy based algorithms for best basis selection.
IEEE Transactions on Information Theory, 32, 712718.
Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Annals of Statistics,
25, 137.
Kakizawa, Y., Shumway, R. H., and Taniguchi, M. (1998). Discrimination and Clustering
for Multivariate Time Series. Journal of the American Statistical Association, 93, 328-340.
Ombao, H., Raz, J., von Sachs, R. and Malow, B. (2001). Automatic statistical analysis
of bivariate nonstationary time series. Journal of the American Statistical Association, 96,
543-560.
Ombao, H., Raz, J., von Sachs, R. and Guo, W. (2002). The SLEX model of nonstationary
processes. Annals of the Institute of Statistical Mathematics, 32, 201-225.
Priestley, M. (1965). Evolutionary spectra and nonstationary processes. Journal of the Royal
Statistical Society, Ser. B, 28, 228-240.
Pulli, J. J. (1996). Extracting and processing signal parameters for regional seismic event
identication. In Monitoring a Comprehensive Test Ban Treaty, pp. 791-803. E. S. Husebye
and A. M. Dainty eds. Doordrecht: Kluwer.
Saito, N. (1994). Local Feature Extraction and Its Applications. Ph.D. Dissertation, Department
of Mathematics, Yale University.
Sakiyama, K. and Taniguchi, M. (2003). Discrimination for Locally Stationary Random
Processes. Journal of Multivariate Analysis. In press.
22
Shumway, R. H. (1982). Discriminant analysis for time series. In Classification, Pattern Recognition and Reduction of Dimensionality, Handbook of Statistics Vol. 2, pp. 1-46. P. R. Krishnaiah
and L. N. Kanal eds. Amsterdam: North Holland.
Shumway, R. H. (2003). Time-frequency clustering and discriminant analysis. Stat. Prob.
Letters. In press.
Shumway, R.H. and Stoffer, D.S. (2000). Time Series Analysis and Its Applications. New
York: Springer.
Wickerhauser, M. (1994). Adapted Wavelet Analysis from Theory to Software. IEEE Press,
Wellesley, MA.
23