Beruflich Dokumente
Kultur Dokumente
Recall multivariate linear regression can be written in vector form, suppose the
dependent variable vector is 𝑦 ∈ 𝑅 𝑛 , the covariate vector 𝑥 ∈ 𝑅 𝑛 and its sample
matrix X, the coefficient vector 𝛽 ∈ 𝑅 𝑛 , standard error r. v. 𝜀 ∈ 𝑅 𝑛 Then the
regression equation is
𝑦 = 𝑋𝛽 + 𝜀
The Assumptions of multilinear regression lie below:
A1: Linearity
A2: No autocorrelation and Homoscedasticity 𝑉𝑎𝑟(𝜀|𝑋) = 𝜎 2 𝐼
A3: Rank condition 𝑟𝑎𝑛𝑘(𝑋) = 𝑛
A4: Heterogeneity 𝐶𝑜𝑣(𝜀, 𝑋) = 0
A5: Normality 𝜀~𝑁(0, 𝜎 2 𝐼)
Under these assumptions, we get UMVE of 𝛽
We can popularize this regression into tensor form To setup the tensor
regression model, we need the following work (a) How to express the elements in
tensor space; (b) What the operator is between X and 𝛽 in tensor space; (c) How to
define the random variable in tensor space To define a tensor linear regression
model, we will first figure out (a), (b), (c)
This means Φ keeps linearity for all component The addition of tensor defined as
(Φ + Ψ)(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) = Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) + Ψ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 )
And numerical multiple of tensor defined as ∀α ∈ 𝑅
(αΦ)(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) = 𝛼Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 )
Below we give the definition of tensor space
𝑛
Suppose {𝑔𝑖 }𝑛𝑖=1 a set of basis in 𝑅 𝑛 , then det(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) ≠ 0, so ∃! {𝑔𝑖 }𝑖=1
𝑗
𝛿𝑖 is the Kronecker Symbol Define 𝑔𝑖𝑗 = (𝑔𝑖 , 𝑔 𝑗 ) , 𝑔𝑖𝑗 = (𝑔𝑖 , 𝑔𝑗 ) Note (, ) is
Euclidean inner product 1So tensor can be expressed using the basis 2
Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) = Φ(𝑢1𝑖1 𝑔𝑖1 , 𝑢2𝑖2 𝑔𝑖2 , ⋯ , 𝑢𝑑𝑖𝑑 𝑔𝑖𝑑 ) = 𝑢1𝑖1 𝑢2𝑖2 ⋯ 𝑢𝑑𝑖𝑑 Φ(𝑔𝑖1 , 𝑔𝑖2 , ⋯ , 𝑔𝑖𝑑 )
Note 𝑢1𝑖1 𝑢2𝑖2 ⋯ 𝑢𝑑𝑖𝑑 = 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 (𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) and define
Φ𝑖1 𝑖2⋯𝑖𝑑 is a component of tensor Φ So Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 , where 𝑖𝑗 =
1,2, ⋯ 𝑛, j=1,2,……,d 3
Below we will introduce three operations in tensor space The first one is tensor
product or called Kronecker Product, which is defined as
⨂: 𝒯 𝑝 × 𝒯 𝑞 → 𝒯 𝑝+𝑞
In fact, ∀Φ ∈ 𝒯 𝑝 , Ψ ∈ 𝒯 𝑞 , Suppose
Φ = Φ𝑖1 𝑖2⋯𝑖𝑝 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝
Ψ = Ψ𝑗1 𝑗2 ⋯𝑗𝑞 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞
𝛽 = 𝛽𝑗1 𝑗2 ⋯𝑗𝑞 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞
𝑒
𝑥 ( ) 𝛽 = 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒
∙
Sort the 𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒 properly and make it equal t0
Here c is a certain number which can be calculated accurately4 so this has no impact
on 𝛽 Thus the regression model can be written as
𝑦 𝑖1 𝑖2⋯𝑖𝑑 = 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 + 𝜀 𝑖1 𝑖2 ⋯𝑖𝑑
So residual is
𝑒
𝜖 = 𝑦 − 𝑥 ( ) 𝛽̂
∙
= (𝑦 𝑖1𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝜖 2 = 𝜖⨀𝜖 = (𝑦 𝑖1 𝑖2⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )(𝑦 𝑖1𝑖2 ⋯𝑖𝑑
∂𝜖 2 ∂
= [(𝑦 𝑖1 𝑖2⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )(𝑦 𝑖1𝑖2 ⋯𝑖𝑑
∂𝛽̂ ∂𝛽𝑗1 𝑗2 ⋯𝑗𝑞
∂
[(𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 ) (𝑦 𝑖1 𝑖2 ⋯𝑖𝑑
∂𝛽𝑗1 𝑗2 ⋯𝑗𝑞
= −2𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 (𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 ) = 0
𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑦 𝑖1𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 = 0
Note in 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑦 𝑖1 𝑖2⋯𝑖𝑑 , there are q dummy indicators5,
𝑒
𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑦 𝑖1 𝑖2⋯𝑖𝑑 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 = 𝑥 ( 1 ) 𝑦, 𝑝 + 𝑑 − 2𝑒1 = 𝑞
∙
𝑒 𝑒 𝑒 𝑒
𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 = 𝑥 ( 1 ) (𝑥 ( ) 𝛽) = (𝑥 ( 1 ) 𝑥) ( ) 𝛽
∙ ∙ ∙ ∙
𝑒 𝑒 𝑒
(𝑥 ( 1 ) 𝑥) ( ) 𝛽 = 𝑥 ( 1 ) 𝑦
∙ ∙ ∙
Some Results of Tensor Linear Equations are shown in Appendix A
𝑒 −1 𝑒
𝛽̂𝑂𝐿𝑆 = (𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝑦
∙ 𝑞 ∙
Calculate
𝑒 −1 𝑒
𝐸(𝛽̂𝑂𝐿𝑆 ) = 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝑦)
∙ 𝑞 ∙
𝑒 −1 𝑒 𝑒
= 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) (𝑥 ( ) 𝛽 + ε))
∙ 𝑞 ∙ ∙
𝑒 −1 𝑒 𝑒 𝑒 −1 𝑒
= 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝑥 ( ) 𝛽) + 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝜀)
∙ 𝑞 ∙ ∙ ∙ 𝑞 ∙
𝑒 𝑒 −1 𝑒
= 𝐼𝑞 ( ) 𝛽 + (𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝐸(𝜀) = 𝛽
∙ ∙ 𝑞 ∙
So 𝛽̂𝑂𝐿𝑆 is still unbiased estimation of 𝛽
Φ = Φ𝑖1 𝑖2⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 is tensor in 𝒯 𝑑 , we will first define three kinds
Under the three elementary transformation, we can transform a tensor into its
standard form Here standard form means
Φ𝑖1 𝑖2 ⋯𝑖𝑟 𝑖𝑟+1 ⋯𝑖𝑑 = 1, 𝑖𝑓 𝑖𝑗 = 𝑘, 𝑗 = 1,2, ⋯ , 𝑟, 𝑘 = 1,2 ⋯ , 𝑛
1, ⋯ , 𝑑
If tensor Φ can be transformed into the above form, then 𝑟𝑎𝑛𝑘(Φ) = 𝑟 The rank
of a general tensor can also be defined in this way Bellow we prove every tensor has
only one standard form thus the only rank
If we generalize the definition of inverse of matrix, then we will get the following
definition of inverse of tensor For Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑 , if Ψ ∈ 𝒯 𝑑
𝑑⁄2 𝑑⁄2
and Φ ( )Ψ = Ψ( ) Φ = 𝐼𝑑 , here 𝐼𝑑 is unit d-dimensional tensor satisfying
∙ ∙
𝐼 = 𝐼 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑
𝐼 𝑖1 𝑖2⋯⋯𝑖𝑑 = 1, 𝑖𝑓 𝑖𝑗 = 𝑗, 𝑗 = 1,2, ⋯ , 𝑑
𝑒
Φ ( ) Ψ = Φ𝑖1 𝑖2⋯𝑖𝑑−𝑒𝑠1 ⋯𝑠𝑒 Ψ𝑠1 ⋯𝑠𝑒𝑗1 𝑗2 ⋯𝑗𝑑−𝑒 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑−𝑒 ⨂𝑔 𝑗𝑒+1 ⋯ ⨂𝑔 𝑗𝑑
∙
= 𝐼 𝑖1 𝑖2 ⋯𝑖𝑝 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝
This is the basic equation to solve the p-order inverse of tensor Below we will give
two ways to solve, the first is by elementary transformations and the other by adjoint
tensor
𝑒
𝐴( )𝑥
∙
𝑖 𝑖 ⋯𝑖𝑑 𝑖𝑑+1 𝑖𝑑+2 ⋯𝑖𝑑+𝑒
=𝐴12 𝑖𝑑+1 ⋯𝑖𝑑+𝑒 𝑥 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ⨂𝑔𝑖𝑑+1 ⋯ ⨂𝑔𝑖𝑑+𝑒 𝑔𝑖𝑑+1 ⨂𝑔𝑖𝑑+2 ⨂ ⋯ ⨂𝑔𝑖𝑑+𝑒
𝑒 𝑒 𝑒 𝑒
Note 𝐴 ( ) 𝑥 = 𝐴 ( ) 𝐴−1 ( 𝑏 ) 𝑏 = 𝐼𝑑 ( 𝑏 ) 𝑏 = 𝑏 In fact, dimension of A is e+d/2
∙ ∙ 𝑑 ∙ ∙
Appendix B. Tensor Calculus
Γ𝑗𝑖𝑘 is the second Christoffel Symbol while Γ𝑗𝑖,𝑘 is the first Christoffel Symbol
𝑑
Φ = Φ⨀Φ = Φ ( ) Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 Φ𝑖1 𝑖2⋯𝑖𝑑
∙
We can define |Φ| = √Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1𝑖2 ⋯𝑖𝑑 as norm in tensor space Check whether this
Secondly, |𝑘Φ| = √𝑘Φ𝑖1 𝑖2⋯𝑖𝑑 𝑘Φ𝑖1 𝑖2⋯𝑖𝑑 = |𝑘|√Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1 𝑖2 ⋯𝑖𝑑 = |𝑘||Φ|
Thirdly, |Φ| + |Ψ| = √Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1 𝑖2 ⋯𝑖𝑑 + √Ψ𝑖1 𝑖2 ⋯𝑖𝑑 Ψ𝑖1 𝑖2⋯𝑖𝑑
= |Φ + Ψ|
So |Φ| + |Ψ| ≥ |Φ + Ψ|, ∀Φ, Ψ ∈ 𝒯 𝑑 This is generalization of Frobenius Norm of
Matrix, so we call this Frobenius Norm (of tensor) Since different norms are in the
same order, we will use the norm above to analysis below
≤ √(Φ − Θ)𝑖1 𝑖2 ⋯𝑖𝑑 (Φ − Θ)𝑖1 𝑖2⋯𝑖𝑑 + √(Θ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑 (Θ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑
= ρ(Φ, Θ) + ρ(Θ, Ψ)
This metric is derived from Frobenius Form, so we call it Frobenius Metric If tensor
space is complete under Frobenius Metric, then tensor space is Banach Space There
are quite a lot literature illustrate the completeness of tensor space, so we suppose
this true
𝑑 𝑑Ψ 𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝 ⨂𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑+𝑝
𝑑Φ 𝑑Φ𝑖1 𝑖2⋯𝑖𝑑 𝑗1
Furtherly, we can calculate higher order derivation of tensor is
𝑑2 𝑑 2 Ψ𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝 ⨂𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ⨂𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑑
𝑑Φ2 𝑑(Φ𝑖1 𝑖2 ⋯𝑖𝑑 )2 𝑗1
Thus generally,
𝑑𝑛 𝑑𝑛 Ψ 𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝+𝑛𝑑 ∈ 𝒯 𝑛𝑑+𝑝
𝑑Φ𝑛 𝑑(Φ𝑖1𝑖2 ⋯𝑖𝑑 )𝑛 𝑗1
Now that we can give the optimal conditions of tensor map Suppose Φ0 is the
maximal value of 𝑓(Φ), so 𝑓(Φ) ≤ 𝑓(Φ0 ), ∀Φ ∈ 𝒯 𝑑 , we use 2-order approximation
𝑑 𝑑 1 𝑑2 2𝑑
𝑓(Φ) − 𝑓(Φ0 ) = 𝑓(Φ) ( ) (Φ − Φ0 ) + 𝑓(Φ) ( ) [(Φ − Φ0 )⨂(Φ − Φ0 )]
𝑑Φ ∙ 2! 𝑑Φ2 ∙
𝑑
Let 𝑓(Φ) = 0, then ∀Φ ∈ 𝒯 𝑑
𝑑Φ
𝑑2 2𝑑
2
𝑓(Φ) ( ) [(Φ − Φ0 )⨂(Φ − Φ0 )] ≤ 0
𝑑Φ ∙
2 𝑑2
𝑑 2𝑑
Equivalence of ∀Ψ ∈ 𝒯 𝑑 , 𝑑Φ2 𝑓(Φ) ( ) (Ψ⨂Ψ) ≤ 0 At this time 𝑓(Φ) is called
∙ 𝑑Φ2
Furtherly, consider
So distribution of ε is defined as
𝐹(𝑡) = 𝑃 (𝜀 −1 ((−∞, 𝑡])) = ∫ 𝐼(𝜀 −1 ((−∞, 𝑡])) 𝑑𝑃 ∈ [0,1]
Thus density is
𝑑 𝑑
𝑓(𝑡) = 𝐹(𝑡) = ∫ 𝐼(𝜀 −1 ((−∞, 𝑡])) 𝑑𝑃
𝑑𝑡 𝑑𝑡
So variance is
𝑉𝑎𝑟(𝜀) = 𝐸(𝜀)2 − 𝐸 2 (𝜀) ∈ 𝒯 2𝑑
We start from the simplest case Define 𝑋 = 𝑋 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈
Where 𝑢~𝑁(0,1) Below we will derive the distribution of standard tensor normal
distribution and its characteristic function, then we will turn to general tensor
normal distribution Note
𝐹(𝑡) = 𝑃 (𝑋 −1 ((−∞, 𝑡])) = ∫ 𝐼 (𝑋 −1 ((−∞, 𝑡])) 𝑑𝑃
𝑡
𝑑 ⁄2 1
= ∫ (2𝜋)−𝑛 exp(− 𝑥⨀𝑥)𝑑𝑥
−∞ 2
So the density is
𝑑𝐹 𝑑 1
𝑓(𝑥) = (𝑥) = (2𝜋)−𝑛 ⁄2 exp(− 𝑥⨀𝑥)
𝑑𝑡 2
Thus the characteristic function of tensor normal distribution is
1
𝜑𝑋 (𝑡) = 𝐸(𝑒 𝑖𝑡𝑋 ) = exp(− 𝑡⨀𝑡)
2
𝑒
Now suppose 𝑌 = 𝜇 + 𝐴 ( ) 𝑋 , 𝑌, 𝜇 ∈ 𝒯 𝑝 (𝑅 𝑛 ), 𝐴 ∈ 𝒯 𝑞 (𝑅 𝑛 ), 𝑝 = 𝑞 + 𝑑 − 2𝑒 ,
∙
then we say 𝑌~𝑁𝒯 𝑝 (𝜇, Σ), Σ ∈ 𝒯 2𝑝 (𝑅𝑛 ), here
2𝑒
Σ = 𝐴2 ( ) 𝐼2𝑑
∙
Generally, the characteristic function is
1
𝜑𝑌 (𝑡) = 𝐸(𝑒 𝑖𝑡𝑌 ) = exp (𝑖𝜇⨀𝑡 − 𝑡⨀Σ⨀𝑡),
2
The density is
𝑝 ⁄2 𝑝 ⁄2 1
𝑓(𝑦) = (2𝜋)−𝑛 (det(Σ))−𝑛 exp(− (𝑦 − 𝜇)2 ⨀Σ −1 )
2
we will use reversed mathematical induction to prove the density for 𝑒 = 1,2, ⋯ , 𝑝