Sie sind auf Seite 1von 12

Factor Models

I. Introduction

A factor model is a model which summarizes the co-movement of a large set of response (observed)
variables with a small set of explanatory (unobserved) variables. In a linear factor model the
observed response variables are linearly related through the unobserved factors.

Let xit be the observed value of response variable i (= 1,,N) at time t (= 1,,T). The response
variables are generated by an r1 vector of common factors, f t :

x jt = j1 f1t + j 2 f 2t + ... + jr f rt + jt

For j = 1, , N and t = 1, , T,

 only x jt are observable


 fjt are the common factors
 jt are the factor loadings (sensitivities of observed variables to unobserved factors)

The factor model can be described as follows:

xit = o f t + it ,

where xit = ( x1t , x2t ,..., xNt ) , o = (1o , 2o ,..., No ) , io is the r1 vector of factor loadings for
variable i, and i t = ( 1 t , ..., N t ) is the vector of the idiosyncratic components of response
variables.

We can describe the model for the complete panel data by

X = F o + ,

where X = ( xi1 ,..., xiT ) , F = ( f1 ,..., fT ) , and = ( i1 , i 2 ,..., iT ) .

II. Strict and Approximate Factor Models

The APT leads naturally to a Factor Model since it begins by assuming that returns follow a factor
structure. More precisely,

r = E (r ) + f +

Where r is an NxT matrix of assets returns, b is an Nxr matrix of factor loadings, f is an rxT matrix
of factors scores and is an NxT matrix of idiosyncratic components.
The basics assumptions are that E[ f ] = 0 , E[ ] = 0 and E[ f '] = 0 , plus that the expected
returns and variances exist.

II.1 Strict Factor Models

The covariance matrix of asset returns can be expressed as follows:

= E[( r E ( r ))( r E ( r )) ']

Then,

= E[ ff '] '+ V

Where V = E[ '] .

In a strict factor model the idiosyncratic components are uncorrelated which implies that the matrix
V is diagonal.

Therefore, in a strict factor model the covariance matrix of the response variables can be
decomposed in two matrices, the one corresponding to the common components with rank r and the
one corresponding to the idiosyncratic component with rank min( N , T ) .

Note that the variance of the idiosyncratic component of a portfolio with weight in each asset is
equal to 'V . Since the portfolio weights must sum one and the average portfolio weight is 1/N
then we have that

N N
'V = i2 i2 (max i2 ) i2
i
i =1 i =1

Then, as long as all idiosyncratic variances have a finite upper bound, when N goes to infinity the
idiosyncratic component of the portfolio converge to zero.

II.2 Approximate Factor Models

A strict factor model imposes too severe a restriction on the covariance matrix when N/r is large. In
the case of stock returns in which r is less than 10 and N greater than 1000, this restriction becomes
a real problem.

The approximate factor model preserves the diversifiability of the idiosyncratic component but
weakens the diagonality condition of V. In other words, it allows for cov( i , j ) 0 . It also impose
conditions that assures that the factor risks are not diversifiable.

We still assume that E[ f ] = 0 , E[ ] = 0 and E[ f '] = 0 , then we can still express the
covariance matrix of asset returns as

= E[ ff '] '+ V
Diversifiability of idiosyncratic risk

We say is diversifiable risk if lim n n ' n = 0 implies that lim n E[( n ' ) 2 ] = 0 , where
n are the weights of a portfolio formed with n assets.
NON-diversifiability of common risk (or that the factors are pervasive)

Lets define zj as an r-vector with a 1 in the jth component and 0 elsewhere. Then, the r factors f are
pervasive if for each zj, j=1,..,r there exist a n such that lim n n ' n = 0 and n ' = z j for all
n.

This condition ensures that each factor affects many assets in the economy.

II.3 Eigenvectors, eigenvalues and choice of rotation

Connor and Korajczyk (1986, 1988) showed that the r eigenvectors corresponding to the largest r
eigenvalues are consistent estimators of the r underlying common factors. But the estimated factors
are unique only up to rotation.

More specifically, there is an indeterminacy of rotation in the definition of the factors and betas in
the equation r = E ( r ) + f + .

For example, given f and , consider any nonsingular rxr matrix L such that * = L and
f * = L1 f .

Replacing with * and f with f* yields an observational equivalent data generating process.

A common procedure (when T<N) is to normalize the factors such that E[ ff '] = I r . In this case,

the estimated factors f are the T times the eigenvectors corresponding to the r largest
eigenvalues of the T x T matrix XX' (where X is the matrix of response variables). At the same time,
 ') 1 f ' X = f ' X / T .
the matrix of factor loadings becomes  = ( ff

Note that if we normalize the factors as in the above case we have that,

  '+ V
* =

The diversifiability of the idiosyncratic component and the pervasiveness of the factors implies that
as n goes to infinity, the eigenvalues of V are bounded while the eigenvalues of ' go to infinity
(Note: if we divide the matrix by NT then the eigenvalues of ' are bounded while the ones of
V converge to zero). This property is the one that will allow us to construct tests for the number of
factors.
III. Estimation of the number of factors

III.1 Digression to Asymptotic Theory

III.1.a Limits

The most fundamental concept is that of limit.

Definition: let {bn } be a sequence of real numbers. If there exist a real number b and if for every

>0 there exists an integer N() such that for all n N ( ) , bn b < , then b is the limit of the
sequence {bn } .

When a limit exists, we say that the sequence {bn } converges to b as n converges to infinity,
written bn b as n .

Examples:

i. If bn = 1 + 1/ n then bn 1

ii. If bn = (1 + a / n) n then bn e a

iii. If bn = n 2 then bn

iv. If bn = ( 1) n then no limit exists

III.1.b Order of Magnitude

It is often useful to have an order of magnitude of a particular sequence without particularly


worrying about its convergence. The following definition compares the behavior of a sequence
{bn } with the behavior of a power of n, say n , where is chosen such that {bn } and {n } behave
similarly.

Definition:

i. The sequence {bn } is at most of order n , denoted O ( n ) , if and only if for some real number ,

0 < < , there exists a finite integer N such that for all n N , n bn < .

ii. The sequence {bn } is of order smaller than n , denoted o( n ) , if and only if for every real

number >0 , there exists a finite integer N() such that for all n N ( ) , n bn < .

Note: we take the number to be a positive real number as large as necessary and the number to
be a positive real number as small as necessary.
Important

{ }
 {bn } is O ( n ) if n bn is eventually bounded, whereas {bn } is o( n ) if n bn 0 .
 Obviously if {bn } is o( n ) then it is also O ( n ) .
 Further, if {bn } is O ( n ) , then for every >0, {bn } is o( n + ) .
 When {bn } is O ( n 0 ) it is simply bounded and may or may not have a limit. It is written O (1) .
 When {bn } is o(1) means bn 0 .

Examples:

i. If bn = 4 + 2 n + 6n 2 then bn is O ( n 2 ) and o( n 2 + ) for every >0.

ii. If bn = ( 1) n then bn is O (1) and o( n ) for every >0.

iii. If bn = e n then bn o( n ) for every >0 and also O( n )

Properties: let an and bn be scalars. Then,

i. if {an} is O ( n ) and {bn} is O ( n ) , then {an bn} is O ( n + )

ii. if {an} is O ( n ) and {bn} is O ( n ) , then {an+bn} is O ( n ) where = max( , )

iii. i. if {an} is o( n ) and {bn} is o(n ) , then {an bn} is o(n + ) and {an+bn} is o( n )

iv. i. if {an} is O ( n ) and {bn} is o(n ) , then {an bn} is o(n + ) and {an+bn} is o( n )

Proof (i): Since {an} is O ( n ) and {bn} is O ( n ) , there exist a 0 < < and an N such that for all
n N , n an < and n bn < . Consider {an bn}. Now n anbn = n an n bn < 2 for
all n N . Hence {an bn} is O ( n + ) .

Proof (ii): Consider {an+ bn}. Now, n ( an + bn ) = n an + n bn n an + n bn by the


triangle inequality. Since and ,
n ( an + bn ) n an + n bn n an + n bn < 2 for all n N . Hence, {an+bn} is O(n )

III.2 Latent approach. Estimation of the total number of factors

Lets start with the factor model X = F + . Lets define m = min( N , T ) and lets assume the
o

matrix F is a T x r matrix where T >r.

We also use k ( A) to denote the kth largest eigenvalue of a positive semi-definite matrix A. Define

 NT ,k k 1
1
XX = k X X ,
NT NT

where k = 1, ... , m and m = min(T , N ) . Then, T  NT ,k ( N  NT ,k ) is the kth largest eigenvalue of

the sample variance matrix of xi i if the means of xit are all zeros.

Under some standard assumptions, based on Bai and Ng (2002), Ahn and Horenstein (2012) prove
that the convergence rates of the first r eigenvalues of XX / ( NT ) are determined by the

convergence rates of the eigenvalues of F o o F / ( NT ) , while those of the other eigenvalues of


XX / ( NT ) are determined by those of the eigenvalues of / ( NT ) . More specifically,
assuming for simplicity that no sample eigenvalue of XX is equal to zero we can show that,

1.  NT , k = O p (1) for k r
2.  NT ,k = O p ( m 1 ) for k > r

Given this results we can find the number of factors using the following three methods.

III.2.a Scree Test

The Scree Test developed by Cattell (1966) is a visual way to find out the number of factors from
the eigenvalues of XX . It consist of plotting the eigenvalues from the matrix XX in desecending
order and find a relatively sharp break appeared where the true number of factors ended and the
detritus, presumably due to error factors, appeared. From the analogy of the steep descent of a
mountain till one comes to the scree of rubble at the foot of it, I decided to call this the Scree Test.

For example, if r=5, the Scree Test might look like:


1.4

1.2

1
eigenvalue

0.8

0.6

0.4

0.2

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
k

III.2.b Bai and Ng (2002)

This method provides a consistent way to estimate the number of factors. While the Scree Test is
only a visual test, Bai and Ng provided the first consistent method to estimate the numbers of
factors in panels of large dimensions.

The method is based on a penalty function. Intuitively, the threshold separates information from
noise by converging to zero at a slower rate than the eigenvalues from the idiosyncratic component.

More specifically, given that under their assumptions

1.  NT , k = O p (1) for k r
2.  NT ,k = O p ( m 1 ) for k > r

Then, constructing a penalty function g ( N , T ) such that g ( N , T ) 0 when N , T but


mg ( N , T ) when N , T would be enough to separate information from noise.

Then, the estimated number of factors is the number of  NT , k such that  NT , k > g ( N , T ) .

Graphically,
Bai and Ng (2002) proposed several thresholds. In fact, there are an infinite number of penalty
functions available. As an example, the penalty called BIC3 has the following functional form:

N +T k 2
g ( N , T ) = 2 ln( NT ) , where is the squared residuals from a regression of the
NT
maximum number of factors to test for on the set of response variables.


m
Note: if kmax is the maximum number of factors to test for then 2 = j = kmax +1
 NT , j .

N +T k
Exercise: prove that g ( N , T ) = 2 ln( NT ) is a consistent penalty function.
NT

III.2.c Ahn and Horenstein (2012)

Create a sequence of ratios of adjacent eigenvalues:

 NT ,1  NT ,2 
; ;  ; NT ,kmax where kmax > r
 NT ,2  NT ,3  NT ,kmax+1

(Simplified) Theorem: If the kth ratio of the sequence is the maximum, then k is a consistent
estimator of the number of factors.

 NT ,k
ER = arg max k kmax
 NT , k +1

Graphically,
25

20
 NT ,k
15
 NT ,k +1
10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

k
kER

Why it works?

Remember that,

1.  NT , k = O p (1) for k r
2.  NT ,k = O p ( m 1 ) for k > r

 NT ,1 
From 1, ,  , NT , r 1 are bounded, positive, real numbers.
 NT ,2  NT ,r

 NT ,r +1 
From 2, ,  , NT ,kmax are also bounded, positive, real numbers.
 NT ,r + 2  NT ,kmax+1

 NT ,r O p (1)
However, at k=r, is = O p ( m) and is the only ratio that explodes to infinity.
 NT ,r +1 O p ( m)
Therefore,

Theorem 1: Suppose that r 1 . Then, under Assumptions A F, there exists a real number

d c (0,1] , such that for a choice of integer, kmax (r ,[d c m] 2r 1] ,

lim m Pr ( ER = r ) = 1
III.3 Estimation of the number of factors among a set of factor candidates

We have already seen that in order to apply the Fama-MacBeth method we need the true matrix to
be full rank, otherwise all the risk premium estimators in the second pass are undefined.

Ahn, Horenstein and Wang (2012)

Let xit be the response variable for the i th cross-section unit at time t , where i = 1, 2, , N , and
t = 1, 2, , T . Explicitly, xit can be the (excess) return on asset i at time t . The response
variables xit depend on the individual effect i , the time effect t and the k factor-candidate
variables in f t = ( f1t , f 2 t ,..., f kt ) . That is,

xit = i + t + i f t + it , (1)

where i = ( i1 , i 2 , , ik ) is the beta vector for cross section unit i. The product i f t is the
common component of xit , and the it are idiosyncratic components or idiosyncratic risks.1

Our interest for model (1) is to estimate rank of the beta matrix , where
= ( 1 , 2 ,..., N ) . However, because of the presence of the time effects t , we are unable to
estimate i . Instead we can estimate the demeaned betas, i = i , where = N 1iN=1 i .

The demeaned betas can be estimated by estimating the following double demeaned model,

xit = i ft + it




xit = xit xt xi + x , ft = f t f , it = it t i + , xt = N 1iN=1 xit , xi = T 1Tt =1 xit ,


where 
x = ( NT ) 1 iN=1Tt =1 xit , f = (Tt=1 f t ) / T , and t , i , and are similarly defined. For each time
period t , model (2) can be written as


xi i = F i + i i
,
(T 1) (T k ) ( k 1) (T 1)

where 
xi i = ( 
xi1 ,  xiT ) , i i is similarly defined, and F = ( f1 , f2 ,, fT ) . For all data, we
xi 2 , , 
have

X = F d + 

,
(T N ) (T k ) ( k N ) (T N )

 = ( 
where X x1i , 
x2 i , ,   = ( ,  , ,  ) . Then, the demeaned beta matrix d can
x N i ) , , and 1i 2i Ni

d = X F ( F F ) 1 .
be estimated by the OLS estimator

1
In this model, we consider only the case of time invariant betas. Our method can be easily extended to the
case of time-varying betas since the rank estimation is based on the estimated beta matrix.
In what follows, we use j ( A) to denote the j th largest eigenvalue of a matrix A . Then, we can
d
define NT , j = j ( d / N) where j indicates that NT , j is the jth largest eigenvalue of the
d
matrix d /N .

Under a set of fairly common assumption the following theorem is true:

Theorem 1: Under assumption A D, (i) p lim T NT , j > 0 for 0 < j r ; and (ii)
 NT , j = Op (T 1 ) , for 0 r < j k .

Theorem 1 shows that the first r > 0 largest eigenvalues of d


d / N have the same convergence
rates, which are different from those of the other eigenvalues. This difference in convergence rate
is used to identify the rank of the matrix d , r .

The following theorem defines the consistent estimator that we call Threshold estimator.

Theorem 2 (Threshold Estimator): For a given threshold function g (T ) > 0 such that g (T ) 0
and Tg (T ) as T , define rTH = #{1 j k : NT , j > g (T )} , where #{i} is the
cardinality of a set . Then, under Assumptions A D, lim T Pr( rTH = r ) = 1 .

Let 2 = [( N 1)(T 1)]1 iN=1Tt =1


eit2 , where the eit are the OLS residuals from the regression of
the double demeaned model (2). The estimator 2 is a consistent estimator of var( it ) . Also, let
R 2 = 1 [ iN=1Tt =1
eit2 ] / [iN=1Tt =1 
xit2 ] be the R-square from the OLS regression of model (2). Then,
the threshold function we suggest to use for the Threshold estimator is given by:

d 2
g (d , T ) = , (3)
Td

where d = 1 R 2 for 0.3 1 R 2 0.8 , d = 0.3 for 1 R 2 < 0.3 , and d = 0.8 for 1 R 2 >
0.8.

Intuition

Assume:

1- The CAPM is the true model (1 factor model) and we have the market pf. Thus, r=1.

2- We use the Fama-French 3 factor model to get betas

rit rft = i + iM ( rMt rft ) + iSMB SMBt + iHML HMLt + it

From the above regression we obtain the matrix of estimated betas:


1M 1, SMB 1, HML

2, SMB 2, HML
= 2M
  

N , SMB N , HML N 3

NM

Then,

N 2 N N

iM iM iSMB iM iHML
i =1 i =1 i =1

=
N N N
iM iSMB 2
iSMB iSMB iHML
i =1 i =1 i =1

N N N

iM iHML iHML iHML
2
iSMB
i =1 i =1 i =1 33

But HML and SMB are useless factors! Then, as T we have that iSMB , iHML p 0 i

N N N

N 1
2
iM N 1
iM iSMB N 1 iM iHML
i =1 i =1 i =1

1
N N N

= N iM iSMB N 1 iSMB 2
N 1 iSMB iHML
N i =1 i =1 i =1

1 N N N

N iM iHML N iSMB iHML N iHML
1 1 2

i =1 i =1 i =1 33
1 N 2
N iM 0 0

i =1

p = 0 0 0
N
0 0 0
33



has 3 eigenvalues: M , SMB , HML
N

M converges in probability to a positive real number


SMB , HML converge in probability to zero at a rate Op (T 1 )
1
Therefore, we propose a function g(T) that converges to zero at a rate slower than O p (T ) .

Conclusion, as T , M > g (T ) > SMB , HML

Das könnte Ihnen auch gefallen