Beruflich Dokumente
Kultur Dokumente
I. Introduction
A factor model is a model which summarizes the co-movement of a large set of response (observed)
variables with a small set of explanatory (unobserved) variables. In a linear factor model the
observed response variables are linearly related through the unobserved factors.
Let xit be the observed value of response variable i (= 1,,N) at time t (= 1,,T). The response
variables are generated by an r1 vector of common factors, f t :
x jt = j1 f1t + j 2 f 2t + ... + jr f rt + jt
For j = 1, , N and t = 1, , T,
xit = o f t + it ,
where xit = ( x1t , x2t ,..., xNt ) , o = (1o , 2o ,..., No ) , io is the r1 vector of factor loadings for
variable i, and i t = ( 1 t , ..., N t ) is the vector of the idiosyncratic components of response
variables.
X = F o + ,
The APT leads naturally to a Factor Model since it begins by assuming that returns follow a factor
structure. More precisely,
r = E (r ) + f +
Where r is an NxT matrix of assets returns, b is an Nxr matrix of factor loadings, f is an rxT matrix
of factors scores and is an NxT matrix of idiosyncratic components.
The basics assumptions are that E[ f ] = 0 , E[ ] = 0 and E[ f '] = 0 , plus that the expected
returns and variances exist.
Then,
= E[ ff '] '+ V
Where V = E[ '] .
In a strict factor model the idiosyncratic components are uncorrelated which implies that the matrix
V is diagonal.
Therefore, in a strict factor model the covariance matrix of the response variables can be
decomposed in two matrices, the one corresponding to the common components with rank r and the
one corresponding to the idiosyncratic component with rank min( N , T ) .
Note that the variance of the idiosyncratic component of a portfolio with weight in each asset is
equal to 'V . Since the portfolio weights must sum one and the average portfolio weight is 1/N
then we have that
N N
'V = i2 i2 (max i2 ) i2
i
i =1 i =1
Then, as long as all idiosyncratic variances have a finite upper bound, when N goes to infinity the
idiosyncratic component of the portfolio converge to zero.
A strict factor model imposes too severe a restriction on the covariance matrix when N/r is large. In
the case of stock returns in which r is less than 10 and N greater than 1000, this restriction becomes
a real problem.
The approximate factor model preserves the diversifiability of the idiosyncratic component but
weakens the diagonality condition of V. In other words, it allows for cov( i , j ) 0 . It also impose
conditions that assures that the factor risks are not diversifiable.
We still assume that E[ f ] = 0 , E[ ] = 0 and E[ f '] = 0 , then we can still express the
covariance matrix of asset returns as
= E[ ff '] '+ V
Diversifiability of idiosyncratic risk
We say is diversifiable risk if lim n n ' n = 0 implies that lim n E[( n ' ) 2 ] = 0 , where
n are the weights of a portfolio formed with n assets.
NON-diversifiability of common risk (or that the factors are pervasive)
Lets define zj as an r-vector with a 1 in the jth component and 0 elsewhere. Then, the r factors f are
pervasive if for each zj, j=1,..,r there exist a n such that lim n n ' n = 0 and n ' = z j for all
n.
This condition ensures that each factor affects many assets in the economy.
Connor and Korajczyk (1986, 1988) showed that the r eigenvectors corresponding to the largest r
eigenvalues are consistent estimators of the r underlying common factors. But the estimated factors
are unique only up to rotation.
More specifically, there is an indeterminacy of rotation in the definition of the factors and betas in
the equation r = E ( r ) + f + .
For example, given f and , consider any nonsingular rxr matrix L such that * = L and
f * = L1 f .
Replacing with * and f with f* yields an observational equivalent data generating process.
A common procedure (when T<N) is to normalize the factors such that E[ ff '] = I r . In this case,
the estimated factors f are the T times the eigenvectors corresponding to the r largest
eigenvalues of the T x T matrix XX' (where X is the matrix of response variables). At the same time,
') 1 f ' X = f ' X / T .
the matrix of factor loadings becomes = ( ff
Note that if we normalize the factors as in the above case we have that,
'+ V
* =
The diversifiability of the idiosyncratic component and the pervasiveness of the factors implies that
as n goes to infinity, the eigenvalues of V are bounded while the eigenvalues of ' go to infinity
(Note: if we divide the matrix by NT then the eigenvalues of ' are bounded while the ones of
V converge to zero). This property is the one that will allow us to construct tests for the number of
factors.
III. Estimation of the number of factors
III.1.a Limits
Definition: let {bn } be a sequence of real numbers. If there exist a real number b and if for every
>0 there exists an integer N() such that for all n N ( ) , bn b < , then b is the limit of the
sequence {bn } .
When a limit exists, we say that the sequence {bn } converges to b as n converges to infinity,
written bn b as n .
Examples:
i. If bn = 1 + 1/ n then bn 1
ii. If bn = (1 + a / n) n then bn e a
iii. If bn = n 2 then bn
Definition:
i. The sequence {bn } is at most of order n , denoted O ( n ) , if and only if for some real number ,
0 < < , there exists a finite integer N such that for all n N , n bn < .
ii. The sequence {bn } is of order smaller than n , denoted o( n ) , if and only if for every real
number >0 , there exists a finite integer N() such that for all n N ( ) , n bn < .
Note: we take the number to be a positive real number as large as necessary and the number to
be a positive real number as small as necessary.
Important
{ }
{bn } is O ( n ) if n bn is eventually bounded, whereas {bn } is o( n ) if n bn 0 .
Obviously if {bn } is o( n ) then it is also O ( n ) .
Further, if {bn } is O ( n ) , then for every >0, {bn } is o( n + ) .
When {bn } is O ( n 0 ) it is simply bounded and may or may not have a limit. It is written O (1) .
When {bn } is o(1) means bn 0 .
Examples:
iii. i. if {an} is o( n ) and {bn} is o(n ) , then {an bn} is o(n + ) and {an+bn} is o( n )
iv. i. if {an} is O ( n ) and {bn} is o(n ) , then {an bn} is o(n + ) and {an+bn} is o( n )
Proof (i): Since {an} is O ( n ) and {bn} is O ( n ) , there exist a 0 < < and an N such that for all
n N , n an < and n bn < . Consider {an bn}. Now n anbn = n an n bn < 2 for
all n N . Hence {an bn} is O ( n + ) .
Lets start with the factor model X = F + . Lets define m = min( N , T ) and lets assume the
o
We also use k ( A) to denote the kth largest eigenvalue of a positive semi-definite matrix A. Define
NT ,k k 1
1
XX = k X X ,
NT NT
the sample variance matrix of xi i if the means of xit are all zeros.
Under some standard assumptions, based on Bai and Ng (2002), Ahn and Horenstein (2012) prove
that the convergence rates of the first r eigenvalues of XX / ( NT ) are determined by the
1. NT , k = O p (1) for k r
2. NT ,k = O p ( m 1 ) for k > r
Given this results we can find the number of factors using the following three methods.
The Scree Test developed by Cattell (1966) is a visual way to find out the number of factors from
the eigenvalues of XX . It consist of plotting the eigenvalues from the matrix XX in desecending
order and find a relatively sharp break appeared where the true number of factors ended and the
detritus, presumably due to error factors, appeared. From the analogy of the steep descent of a
mountain till one comes to the scree of rubble at the foot of it, I decided to call this the Scree Test.
1.2
1
eigenvalue
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
k
This method provides a consistent way to estimate the number of factors. While the Scree Test is
only a visual test, Bai and Ng provided the first consistent method to estimate the numbers of
factors in panels of large dimensions.
The method is based on a penalty function. Intuitively, the threshold separates information from
noise by converging to zero at a slower rate than the eigenvalues from the idiosyncratic component.
1. NT , k = O p (1) for k r
2. NT ,k = O p ( m 1 ) for k > r
Then, the estimated number of factors is the number of NT , k such that NT , k > g ( N , T ) .
Graphically,
Bai and Ng (2002) proposed several thresholds. In fact, there are an infinite number of penalty
functions available. As an example, the penalty called BIC3 has the following functional form:
N +T k 2
g ( N , T ) = 2 ln( NT ) , where is the squared residuals from a regression of the
NT
maximum number of factors to test for on the set of response variables.
m
Note: if kmax is the maximum number of factors to test for then 2 = j = kmax +1
NT , j .
N +T k
Exercise: prove that g ( N , T ) = 2 ln( NT ) is a consistent penalty function.
NT
NT ,1 NT ,2
; ; ; NT ,kmax where kmax > r
NT ,2 NT ,3 NT ,kmax+1
(Simplified) Theorem: If the kth ratio of the sequence is the maximum, then k is a consistent
estimator of the number of factors.
NT ,k
ER = arg max k kmax
NT , k +1
Graphically,
25
20
NT ,k
15
NT ,k +1
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
k
kER
Why it works?
Remember that,
1. NT , k = O p (1) for k r
2. NT ,k = O p ( m 1 ) for k > r
NT ,1
From 1, , , NT , r 1 are bounded, positive, real numbers.
NT ,2 NT ,r
NT ,r +1
From 2, , , NT ,kmax are also bounded, positive, real numbers.
NT ,r + 2 NT ,kmax+1
NT ,r O p (1)
However, at k=r, is = O p ( m) and is the only ratio that explodes to infinity.
NT ,r +1 O p ( m)
Therefore,
Theorem 1: Suppose that r 1 . Then, under Assumptions A F, there exists a real number
lim m Pr ( ER = r ) = 1
III.3 Estimation of the number of factors among a set of factor candidates
We have already seen that in order to apply the Fama-MacBeth method we need the true matrix to
be full rank, otherwise all the risk premium estimators in the second pass are undefined.
Let xit be the response variable for the i th cross-section unit at time t , where i = 1, 2, , N , and
t = 1, 2, , T . Explicitly, xit can be the (excess) return on asset i at time t . The response
variables xit depend on the individual effect i , the time effect t and the k factor-candidate
variables in f t = ( f1t , f 2 t ,..., f kt ) . That is,
xit = i + t + i f t + it , (1)
where i = ( i1 , i 2 , , ik ) is the beta vector for cross section unit i. The product i f t is the
common component of xit , and the it are idiosyncratic components or idiosyncratic risks.1
Our interest for model (1) is to estimate rank of the beta matrix , where
= ( 1 , 2 ,..., N ) . However, because of the presence of the time effects t , we are unable to
estimate i . Instead we can estimate the demeaned betas, i = i , where = N 1iN=1 i .
The demeaned betas can be estimated by estimating the following double demeaned model,
xi i = F i + i i
,
(T 1) (T k ) ( k 1) (T 1)
where
xi i = (
xi1 , xiT ) , i i is similarly defined, and F = ( f1 , f2 ,, fT ) . For all data, we
xi 2 , ,
have
X = F d +
,
(T N ) (T k ) ( k N ) (T N )
= (
where X x1i ,
x2 i , , = ( , , , ) . Then, the demeaned beta matrix d can
x N i ) , , and 1i 2i Ni
d = X F ( F F ) 1 .
be estimated by the OLS estimator
1
In this model, we consider only the case of time invariant betas. Our method can be easily extended to the
case of time-varying betas since the rank estimation is based on the estimated beta matrix.
In what follows, we use j ( A) to denote the j th largest eigenvalue of a matrix A . Then, we can
d
define NT , j = j ( d / N) where j indicates that NT , j is the jth largest eigenvalue of the
d
matrix d /N .
Theorem 1: Under assumption A D, (i) p lim T NT , j > 0 for 0 < j r ; and (ii)
NT , j = Op (T 1 ) , for 0 r < j k .
The following theorem defines the consistent estimator that we call Threshold estimator.
Theorem 2 (Threshold Estimator): For a given threshold function g (T ) > 0 such that g (T ) 0
and Tg (T ) as T , define rTH = #{1 j k : NT , j > g (T )} , where #{i} is the
cardinality of a set . Then, under Assumptions A D, lim T Pr( rTH = r ) = 1 .
d 2
g (d , T ) = , (3)
Td
where d = 1 R 2 for 0.3 1 R 2 0.8 , d = 0.3 for 1 R 2 < 0.3 , and d = 0.8 for 1 R 2 >
0.8.
Intuition
Assume:
1- The CAPM is the true model (1 factor model) and we have the market pf. Thus, r=1.
Then,
N 2 N N
iM iM iSMB iM iHML
i =1 i =1 i =1
=
N N N
iM iSMB 2
iSMB iSMB iHML
i =1 i =1 i =1
N N N
iM iHML iHML iHML
2
iSMB
i =1 i =1 i =1 33
But HML and SMB are useless factors! Then, as T we have that iSMB , iHML p 0 i
N N N
N 1
2
iM N 1
iM iSMB N 1 iM iHML
i =1 i =1 i =1
1
N N N
= N iM iSMB N 1 iSMB 2
N 1 iSMB iHML
N i =1 i =1 i =1
1 N N N
N iM iHML N iSMB iHML N iHML
1 1 2
i =1 i =1 i =1 33
1 N 2
N iM 0 0
i =1
p = 0 0 0
N
0 0 0
33
has 3 eigenvalues: M , SMB , HML
N