Tensor Regression Theory and Application2

Tensor Regression Theory and Application
2 Classical Least Square Regression
2.1 Tensor linear regression
Recall multivariate linear regression can be written in vector form, suppose the
dependent variable vector is 𝑦 ∈ 𝑅 𝑛 , the covariate vector 𝑥 ∈ 𝑅 𝑛 and its sample
matrix X, the coefficient vector 𝛽 ∈ 𝑅 𝑛 , standard error r. v. 𝜀 ∈ 𝑅 𝑛 Then the
regression equation is
𝑦 = 𝑋𝛽 + 𝜀
The Assumptions of multilinear regression lie below:
A1: Linearity
A2: No autocorrelation and Homoscedasticity 𝑉𝑎𝑟(𝜀|𝑋) = 𝜎 2 𝐼
A3: Rank condition 𝑟𝑎𝑛𝑘(𝑋) = 𝑛
A4: Heterogeneity 𝐶𝑜𝑣(𝜀, 𝑋) = 0
A5: Normality 𝜀~𝑁(0, 𝜎 2 𝐼)
Under these assumptions, we get UMVE of 𝛽
We can popularize this regression into tensor form To setup the tensor
regression model, we need the following work (a) How to express the elements in
tensor space; (b) What the operator is between X and 𝛽 in tensor space; (c) How to
define the random variable in tensor space To define a tensor linear regression
model, we will first figure out (a), (b), (c)
2.1.1 Tensor and Tensor Space
We can think tensor intuitively as extension of matrix While a 𝑛 × 𝑛 matrix

saves 2-dimensional data can be treated as 2-order tensor with base space 𝑅 𝑛 , a d-
order tensor is a d-dimensional box with n number on each edge Formally define
tensor as multiple linear function, so we can label tensor as Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ), where
𝑢𝑖 ∈ 𝑅 𝑛 , ∀𝑖 ∈ {1,2, ⋯ , 𝑑}, then we say Φ is d-dimensional tensor with base space 𝑅 𝑛
if ∀α, β ∈ 𝑅
Φ(𝑢1 , 𝑢2 , ⋯ , 𝛼𝑢̂𝑗 + 𝛽𝑢̃𝑗 , ⋯ , 𝑢𝑑 ) = 𝛼Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢̂𝑗 , ⋯ , 𝑢𝑑 ) + 𝛽Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢̃𝑗 , ⋯ , 𝑢𝑑 )
This means Φ keeps linearity for all component The addition of tensor defined as
(Φ + Ψ)(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) = Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) + Ψ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 )
And numerical multiple of tensor defined as ∀α ∈ 𝑅
(αΦ)(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) = 𝛼Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 )
Below we give the definition of tensor space
A tensor set 𝒯 is a d-dimensional tensor space with base space Rm if the

following conditions are satisfied
1) (Φ + Ψ)(u1 , u2 , ⋯ , ud ) = Φ(u1 , u2 , ⋯ , ud ) + Ψ(u1 , u2 , ⋯ , ud ), ∀Φ, Ψ ∈ 𝒯
2) (αΦ)(u1 , u2 , ⋯ , ud ) = αΦ(u1 , u2 , ⋯ , ud ), ∀Φ ∈ 𝒯, α ∈ R
3) (Φ + Ψ) + Θ = Φ + (Ψ + Θ), ∀Φ, Ψ, Θ ∈ 𝒯
4) Φ + Ψ = Ψ + Φ, ∀Φ, Ψ ∈ 𝒯
5) Φ + 0 = 0, ∀Φ ∈ 𝒯, ∃0 ∈ 𝒯
6) Φ + Ψ = 0, ∀Φ ∈ 𝒯, ∃! Ψ ∈ 𝒯
7) 1Ψ = Ψ, ∀Ψ ∈ 𝒯
8) (αβ)Θ = α(βΘ), ∀α, β ∈ R, ∀Θ ∈ 𝒯
9) (α + β)Θ = αΘ + βΘ, ∀α, β ∈ R, ∀Θ ∈ 𝒯
10) α(Φ + Ψ) = αΦ + αΨ, ∀α ∈ R, Φ, Ψ ∈ 𝒯
Then 𝒯 is called a tensor space which can be written as 𝒯 d (Rn ) Obviously
tensor space is a linear space We will simply write it as 𝒯 d when the base space is
Rn
𝑛
Suppose {𝑔𝑖 }𝑛𝑖=1 a set of basis in 𝑅 𝑛 , then det(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) ≠ 0, so ∃! {𝑔𝑖 }𝑖=1
that (𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 )(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) = 𝐼 and det(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) ≠ 0 which means

𝑛 𝑛
{𝑔𝑖 }𝑖=1 is also basis in 𝑅 𝑛 Here we call {𝑔𝑖 }𝑛𝑖=1 as covariant basis and {𝑔𝑖 }𝑖=1
contravariant basis According to (𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 )(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) = 𝐼 we know

𝑗
(𝑔𝑖 , 𝑔 𝑗 ) = 𝛿𝑖
𝑗
𝛿𝑖 is the Kronecker Symbol Define 𝑔𝑖𝑗 = (𝑔𝑖 , 𝑔 𝑗 ) , 𝑔𝑖𝑗 = (𝑔𝑖 , 𝑔𝑗 ) Note (, ) is
Euclidean inner product 1So tensor can be expressed using the basis 2
Φ(𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) = Φ(𝑢1𝑖1 𝑔𝑖1 , 𝑢2𝑖2 𝑔𝑖2 , ⋯ , 𝑢𝑑𝑖𝑑 𝑔𝑖𝑑 ) = 𝑢1𝑖1 𝑢2𝑖2 ⋯ 𝑢𝑑𝑖𝑑 Φ(𝑔𝑖1 , 𝑔𝑖2 , ⋯ , 𝑔𝑖𝑑 )
Note 𝑢1𝑖1 𝑢2𝑖2 ⋯ 𝑢𝑑𝑖𝑑 = 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 (𝑢1 , 𝑢2 , ⋯ , 𝑢𝑑 ) and define
Φ(𝑔𝑖1 , 𝑔𝑖2 , ⋯ , 𝑔𝑖𝑑 ) = Φ𝑖1 𝑖2⋯𝑖𝑑
Φ𝑖1 𝑖2⋯𝑖𝑑 is a component of tensor Φ So Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 , where 𝑖𝑗 =
1,2, ⋯ 𝑛, j=1,2,……,d 3
Below we will introduce three operations in tensor space The first one is tensor
product or called Kronecker Product, which is defined as
⨂: 𝒯 𝑝 × 𝒯 𝑞 → 𝒯 𝑝+𝑞
In fact, ∀Φ ∈ 𝒯 𝑝 , Ψ ∈ 𝒯 𝑞 , Suppose
Φ = Φ𝑖1 𝑖2⋯𝑖𝑝 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝
Ψ = Ψ𝑗1 𝑗2 ⋯𝑗𝑞 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞
Φ⨂Ψ = Φ𝑖1 𝑖2⋯𝑖𝑝 Ψ𝑗1 𝑗2 ⋯𝑗𝑞 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞
Given this, the e-dot product can be defined (𝑒 ≤ min(𝑝, 𝑞))

𝑒
( ): 𝒯 𝑝 × 𝒯 𝑞 → 𝒯 𝑝+𝑞−2𝑒
∙
𝑒
Φ ( ) Ψ = Φ𝑖1 𝑖2⋯𝑖𝑝−𝑒𝑠1 ⋯𝑠𝑒 Ψ𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒
∙
2.1.3 Tensor Linear Regression Model
Suppose dependent variable 𝑦 ∈ 𝒯 𝑑 , covariate 𝑥 ∈ 𝒯 𝑝 , coefficient 𝛽 ∈ 𝒯 𝑞 ,

standard error ε: (Ω, ℱ, 𝑃) → (𝒯 𝑑 , 𝔅(𝒯 𝑑 )) So the regression model is
𝑒
𝑦 = 𝑥( )𝛽 +ε
∙
This is the tensor form of linear regression, here 𝑝 + 𝑞 − 2𝑒 = 𝑑 Below we will give
1 Under Einstein Sum Assumption 𝑔𝑖𝑗 = ∑ 𝑔𝑖𝑘 𝑔 𝑗𝑘 = 𝑔𝑖𝑘 𝑔 𝑗𝑘

2 Note (𝑖1 , 𝑖2 , ⋯ , 𝑖𝑑 ) = σ(1,2, ⋯ , d) here σ means permutation, detail in Appendix A2
3 j=1,2,……,d means value of j can be any of d
the component form of the equation Suppose
𝑦 = 𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
ε = 𝜀 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝑥 = 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝 𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝
𝛽 = 𝛽𝑗1 𝑗2 ⋯𝑗𝑞 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞
𝑒
𝑥 ( ) 𝛽 = 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒
∙
Sort the 𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒 properly and make it equal t0
𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 , in fact
𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒 = 𝑐𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
Here c is a certain number which can be calculated accurately4 so this has no impact
on 𝛽 Thus the regression model can be written as
𝑦 𝑖1 𝑖2⋯𝑖𝑑 = 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 + 𝜀 𝑖1 𝑖2 ⋯𝑖𝑑
Below we give the assumptions for classical least square regression

A1: Linearity
A2: No autocorrelation and Homoscedasticity 𝑉𝑎𝑟(𝜀) = 𝜎 2 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
A3: Rank condition 𝑟𝑎𝑛𝑘(𝑋) = 𝑑

A4: Heterogeneity 𝐶𝑜𝑣(𝜀, 𝑋) = 0
A5: Normality 𝜀~𝑁(0, 𝜎 2 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 )
The definition of rank of tensor is showing in Appendix A The definition and

properties of tensor normal distribution is showing in Appendix C
2.2 Estimating Tensor Linear Regression Model
2.2.1 Ordinary Least Square Estimation
Suppose the estimation of 𝛽 is 𝛽̂
4 c = (𝑔𝑖1 , 𝑔𝑘1 )(𝑔𝑖2 , 𝑔𝑘2 ) ⋯ (𝑔𝑖𝑑 , 𝑔 𝑗𝑞−𝑒 )

𝛽̂ = 𝛽̂𝑗1 𝑗2 ⋯𝑗𝑞 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞
So residual is
𝑒
𝜖 = 𝑦 − 𝑥 ( ) 𝛽̂
∙
= (𝑦 𝑖1𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝜖 𝑖1 𝑖2⋯𝑖𝑑 = 𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞
To get Least Square Estimation of 𝛽, we need to do the following optimization

𝑒 𝑒
min 𝜖 2 = 𝜖⨀𝜖 = [𝑦 − 𝑥 ( ) 𝛽̂ ]⨀[𝑦 − 𝑥 ( ) 𝛽̂ ]
̂
𝛽 ∙ ∙
Note: To compute this optimization, we need to establish tensor derivation and

integration then naturally we can extent to Taylor Mean Value Theorem We will
show these in Appendix B
∂𝜖 2
𝛽̂𝑂𝐿𝑆 = 𝑎𝑟𝑔 min 𝜖 2 ⟹ |̂ =0
̂
𝛽 ∂𝛽̂ 𝛽𝑂𝐿𝑆
∂2 𝜖 2
And ̂ 2 |𝛽
̂𝑂𝐿𝑆 semi-positive definite tensor
∂𝛽
𝜖 2 = 𝜖⨀𝜖 = (𝑦 𝑖1 𝑖2⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )(𝑦 𝑖1𝑖2 ⋯𝑖𝑑
− 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )
∂𝜖 2 ∂
= [(𝑦 𝑖1 𝑖2⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )(𝑦 𝑖1𝑖2 ⋯𝑖𝑑
∂𝛽̂ ∂𝛽𝑗1 𝑗2 ⋯𝑗𝑞
− 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )]𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 = 0
∂
[(𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 ) (𝑦 𝑖1 𝑖2 ⋯𝑖𝑑
∂𝛽𝑗1 𝑗2 ⋯𝑗𝑞
− 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )]
= −𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 (𝑦 𝑖1 𝑖2⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )
− (𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 ) 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒
= −2𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 (𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 ) = 0
𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑦 𝑖1𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 = 0
Note in 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑦 𝑖1 𝑖2⋯𝑖𝑑 , there are q dummy indicators5,
𝑒
𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑦 𝑖1 𝑖2⋯𝑖𝑑 𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 = 𝑥 ( 1 ) 𝑦, 𝑝 + 𝑑 − 2𝑒1 = 𝑞
∙
𝑒 𝑒 𝑒 𝑒
𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 = 𝑥 ( 1 ) (𝑥 ( ) 𝛽) = (𝑥 ( 1 ) 𝑥) ( ) 𝛽
∙ ∙ ∙ ∙
𝑒 𝑒 𝑒
(𝑥 ( 1 ) 𝑥) ( ) 𝛽 = 𝑥 ( 1 ) 𝑦
∙ ∙ ∙
Some Results of Tensor Linear Equations are shown in Appendix A
𝑒 −1 𝑒
𝛽̂𝑂𝐿𝑆 = (𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝑦
∙ 𝑞 ∙
2.2.2 Unbiased Estimation
Calculate
𝑒 −1 𝑒
𝐸(𝛽̂𝑂𝐿𝑆 ) = 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝑦)
∙ 𝑞 ∙
𝑒 −1 𝑒 𝑒
= 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) (𝑥 ( ) 𝛽 + ε))
∙ 𝑞 ∙ ∙
𝑒 −1 𝑒 𝑒 𝑒 −1 𝑒
= 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝑥 ( ) 𝛽) + 𝐸 ((𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝜀)
∙ 𝑞 ∙ ∙ ∙ 𝑞 ∙
𝑒 𝑒 −1 𝑒
= 𝐼𝑞 ( ) 𝛽 + (𝑥 ( 1 ) 𝑥) 𝑥 ( 1 ) 𝐸(𝜀) = 𝛽
∙ ∙ 𝑞 ∙
So 𝛽̂𝑂𝐿𝑆 is still unbiased estimation of 𝛽
5 Means they can traverse 1,2,……,n

Appendix A. Tensor Algebra
Appendix A1. Rank of Tensor
Φ = Φ𝑖1 𝑖2⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 is tensor in 𝒯 𝑑 , we will first define three kinds
of elementary transformation of tensor We call 𝑖𝑗 = 1,2 ⋯ , 𝑛, 𝑗 = 1,2, ⋯ , 𝑑 as the j-

dimension of Φ
The first kind of elementary transformation is change several dimensions in
tensor, which can be expressed by permutation 𝜎(𝑖1 𝑖2 ⋯ 𝑖𝑑 ) (sometimes rotation
(𝑖𝑘1 ⋯ 𝑖𝑘𝑙 ), 𝑙 ≤ 𝑑 ) This means Φ𝑖1 𝑖2⋯𝑖𝑑 →𝜎(𝑖1 𝑖2 ⋯𝑖𝑑 ) Φ𝜎(𝑖1 𝑖2 ⋯𝑖𝑑 ) , so the tensor after
change is Φ𝜎(𝑖1 𝑖2 ⋯𝑖𝑑) 𝑔𝜎(𝑖1 ) ⨂𝑔𝜎(𝑖2 ) ⨂ ⋯ ⨂𝑔𝜎(𝑖𝑑 )
Φ𝜎(𝑖1 𝑖2 ⋯𝑖𝑑 ) = Φ(𝑢𝜎(𝑖1 ) , 𝑢𝜎(𝑖2 ) , ⋯ , 𝑢𝜎(𝑖𝑑 ) )
The second kind of elementary transformation is to select some dimensions

(𝑖𝑘1 , 𝑖𝑘2 , ⋯ , 𝑖𝑘𝑙 ) and multiple them with a given number 𝑘 ∈ 𝑅 Then
Φ𝑖1 𝑖2 ⋯𝑖𝑑 →𝑘(𝑖𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘 ) Φ𝑘(𝑖𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘𝑙 )

𝑙
Φ𝑖1 𝑖2⋯𝑖𝑑 = Φ(𝑢𝑖1 , 𝑢𝑖2 , ⋯ , 𝑢𝑖𝑑 )
Φ𝑘(𝑖𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘𝑙 ) = Φ(𝑢𝑖1 , ⋯ , 𝑘𝑢𝑖𝑘1 , ⋯ , 𝑘𝑢𝑖𝑘 , ⋯ , 𝑢𝑖𝑑 )

𝑙
The third kind of elementary transformation is to select some dimensions

( 𝑖𝑘1 , 𝑖𝑘2 , ⋯ , 𝑖𝑘𝑙 ) and multiple them with a given number 𝑘 ∈ 𝑅 and add this to
another dimensions (𝑖𝑗1 , 𝑖𝑗2 , ⋯ , 𝑖𝑗𝑙 )
Φ𝑖1𝑖2 ⋯𝑖𝑑 →𝑘(𝑖 Φ(𝑖𝑗1 ,𝑖𝑗2 ,⋯,𝑖𝑗𝑙 )

𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘𝑙 )+(𝑖𝑗1 ,𝑖𝑗2 ,⋯,𝑖𝑗𝑙 )
Φ(𝑖𝑗1 ,𝑖𝑗2 ,⋯,𝑖𝑗𝑙 ) = Φ(𝑢𝑖1 , ⋯ , 𝑘𝑢𝑖𝑘1 + 𝑢𝑖𝑗1 , ⋯ , 𝑘𝑢𝑖𝑘 + 𝑢𝑖𝑗 , ⋯ , 𝑢𝑖𝑑 )

𝑙 𝑙
Under the three elementary transformation, we can transform a tensor into its
standard form Here standard form means
Φ𝑖1 𝑖2 ⋯𝑖𝑟 𝑖𝑟+1 ⋯𝑖𝑑 = 1, 𝑖𝑓 𝑖𝑗 = 𝑘, 𝑗 = 1,2, ⋯ , 𝑟, 𝑘 = 1,2 ⋯ , 𝑛
Φ𝑖1 𝑖2⋯𝑖𝑑 = 0, 𝑖𝑗 = 1,2, ⋯ , 𝑗 − 1, 𝑗 + 1, ⋯ , 𝑛, 𝑗 = 1,2, ⋯ , 𝑟 𝑜𝑟 𝑖𝑗 = 1,2, ⋯ , 𝑛, 𝑗 = 𝑟 +
1, ⋯ , 𝑑
If tensor Φ can be transformed into the above form, then 𝑟𝑎𝑛𝑘(Φ) = 𝑟 The rank
of a general tensor can also be defined in this way Bellow we prove every tensor has
only one standard form thus the only rank
Appendix A2. Determinate of Tensor
Before import determinate of tensor, we will introduce permutation Note

permutation is map to change the order of ordered set
𝑖1 ⋯ 𝑖𝑑
σ≔( )
𝜎(𝑖1 ) ⋯ 𝜎(𝑖𝑑 )
This σ is a d-order permutation, the set of d-order permutation is 𝒫𝑑 The symbol
of permutation σ is defined as
+1 𝑒𝑣𝑒𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡𝑜 𝑟𝑒𝑐𝑜𝑣𝑒𝑟
𝑠𝑔𝑛(σ) = {
−1 𝑜𝑑𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡𝑜 𝑟𝑒𝑐𝑜𝑣𝑒𝑟
Given this we can define the determinate of tensor is
Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑
det(Φ) = ∑ ∏ Φ𝜎1(𝑖1 )𝜎2 (𝑖2 )⋯𝜎𝑑 (𝑖𝑑 )

𝜎1 ,𝜎2 ⋯,𝜎𝑑 ∈𝒫𝑑
Bellow we will illustrate det(Φ) unchanged under elementary transformation

We will also prove when 𝑟𝑎𝑛𝑘(Φ) < 𝑑, det(Φ) = 0
Appendix A3. Inverse of Tensor
If we generalize the definition of inverse of matrix, then we will get the following
definition of inverse of tensor For Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑 , if Ψ ∈ 𝒯 𝑑
𝑑⁄2 𝑑⁄2
and Φ ( )Ψ = Ψ( ) Φ = 𝐼𝑑 , here 𝐼𝑑 is unit d-dimensional tensor satisfying
∙ ∙
𝐼 = 𝐼 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑
𝐼 𝑖1 𝑖2⋯⋯𝑖𝑑 = 1, 𝑖𝑓 𝑖𝑗 = 𝑗, 𝑗 = 1,2, ⋯ , 𝑑
But more generally, we define the inverse of tensor as Ψ ∈ 𝒯 𝑑 ,

𝑒 𝑒 𝑝
Φ ( ) Ψ = Ψ ( ) Φ = 𝐼𝑝 , 𝑒 = 𝑑 −
∙ ∙ 2
Then Ψ ∈ 𝒯 𝑑 is the p-order inverse of tensor Φ, label Ψ = Φ𝑝−1
Ψ = Ψ 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝑒
Φ ( ) Ψ = Φ𝑖1 𝑖2⋯𝑖𝑑−𝑒𝑠1 ⋯𝑠𝑒 Ψ𝑠1 ⋯𝑠𝑒𝑗1 𝑗2 ⋯𝑗𝑑−𝑒 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑−𝑒 ⨂𝑔 𝑗𝑒+1 ⋯ ⨂𝑔 𝑗𝑑
∙
= 𝐼 𝑖1 𝑖2 ⋯𝑖𝑝 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝
Φ𝑖1 𝑖2 ⋯𝑖𝑑−𝑒𝑠1 ⋯𝑠𝑒 Ψ𝑠1 ⋯𝑠𝑒𝑗1 𝑗2 ⋯𝑗𝑑−𝑒 = 𝐼 𝑖1 𝑖2⋯𝑖𝑝
This is the basic equation to solve the p-order inverse of tensor Below we will give
two ways to solve, the first is by elementary transformations and the other by adjoint
tensor
Appendix A4. Tensor Linear Equation
Generally, the base space of dimensions of tensor can be different So generally

a tensor is defined as Φ ∈ 𝒯 𝑑 (𝑅 𝑛1 , 𝑅 𝑛2 , ⋯ , 𝑅 𝑛𝑑 ), in this case
Φ = Φ𝑖1 𝑖2⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 , 𝑖𝑗 = 1,2 ⋯ , 𝑛𝑗 , 𝑗 = 1,2, ⋯ , 𝑛
We can also conduct elementary transformation on this kind of tensors, but no

determinate and inverse Below we will focus on tensor linear equations especially
the existence and uniqueness of their solution
𝑒
Consider 𝐴 ( ) 𝑥 = 𝑏, 𝑥 ∈ 𝒯 𝑒 (𝑅𝑛𝑑+1 , ⋯ , 𝑅 𝑛𝑑+𝑒 )
∙
𝐴 ∈ 𝒯 𝑑+𝑒 (𝑅 𝑛1 , 𝑅 𝑛2 , ⋯ , 𝑅 𝑛𝑑 , 𝑅 𝑛𝑑+1 , ⋯ , 𝑅 𝑛𝑑+𝑒 ), 𝑏 ∈ 𝒯 𝑑 (𝑅 𝑛1 , 𝑅 𝑛2 , ⋯ , 𝑅 𝑛𝑑 )
𝐴 = 𝐴𝑖1 𝑖2 ⋯𝑖𝑑+𝑒 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑+𝑒
𝑏 = 𝑏 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝑥 = 𝑥 𝑖𝑑+1 𝑖𝑑+2 ⋯𝑖𝑑+𝑒 𝑔𝑖𝑑+1 ⨂𝑔𝑖𝑑+2 ⨂ ⋯ ⨂𝑔𝑖𝑑+𝑒
𝑒
𝐴( )𝑥
∙
𝑖 𝑖 ⋯𝑖𝑑 𝑖𝑑+1 𝑖𝑑+2 ⋯𝑖𝑑+𝑒
=𝐴12 𝑖𝑑+1 ⋯𝑖𝑑+𝑒 𝑥 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ⨂𝑔𝑖𝑑+1 ⋯ ⨂𝑔𝑖𝑑+𝑒 𝑔𝑖𝑑+1 ⨂𝑔𝑖𝑑+2 ⨂ ⋯ ⨂𝑔𝑖𝑑+𝑒
= 𝑏 𝑖1 𝑖2⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝑖 𝑖 ⋯𝑖𝑑 𝑖𝑑+1 𝑖𝑑+2 ⋯𝑖𝑑+𝑒

𝐴12 𝑖𝑑+1 ⋯𝑖𝑑+𝑒 𝑥 = 𝑏 𝑖1 𝑖2 ⋯𝑖𝑑
The are 𝑛𝑑+1 + ⋯ + 𝑛𝑑+𝑒 variables and 𝑛1 + ⋯ + 𝑛𝑑 equations We can then use
the results of linear algebra If 𝑛𝑑+1 + ⋯ + 𝑛𝑑+𝑒 = 𝑛1 + ⋯ + 𝑛𝑑 , there are only one
solution of the tensor linear equation If 𝑛𝑑+1 + ⋯ + 𝑛𝑑+𝑒 > 𝑛1 + ⋯ + 𝑛𝑑 , there is a
solution space with dimension 𝑛𝑑+1 + ⋯ + 𝑛𝑑+𝑒 − (𝑛1 + ⋯ + 𝑛𝑑 ) If 𝑛𝑑+1 + ⋯ +
𝑛𝑑+𝑒 < 𝑛1 + ⋯ + 𝑛𝑑 , then the equation gets no analytical solution Below we will give
a tensor way to determine whether a tensor equation has solution
Note if 𝑟𝑎𝑛𝑘(𝐴) = 𝑑 , then only one solution; if 𝑟𝑎𝑛𝑘(𝐴) < 𝑑 , then solution
space with rank 𝑑 − 𝑟𝑎𝑛𝑘(𝐴); if 𝑟𝑎𝑛𝑘(𝐴) < 𝑑 , no analytical solution Let’s prove
these
𝑒
If 𝑛1 = ⋯ = 𝑛𝑑 = 𝑛𝑑+1 = ⋯ = 𝑛𝑑+𝑒 , for 𝐴 ( ) 𝑥 = 𝑏 and 𝑟𝑎𝑛𝑘(𝐴) = 𝑑 , the
∙
𝑒𝑏
only solution of this is 𝑥 = 𝐴−1
𝑑 ( ∙ ) 𝑏, 𝑒𝑏 = 𝑑 − 𝑒 ⁄2
𝑒 𝑒 𝑒 𝑒
Note 𝐴 ( ) 𝑥 = 𝐴 ( ) 𝐴−1 ( 𝑏 ) 𝑏 = 𝐼𝑑 ( 𝑏 ) 𝑏 = 𝑏 In fact, dimension of A is e+d/2
∙ ∙ 𝑑 ∙ ∙
Appendix B. Tensor Calculus
B1. Christoffel Symbol of Curvilinear Coordinate System
Note base space of tensor space 𝒯 𝑑 is 𝑅 𝑛 , suppose a curve in 𝑅 𝑛 is

𝑓(𝑥): 𝑅 𝑛 ∋ 𝑥 → 𝑓(𝑥) ∈ 𝑅 𝑛
If 𝑓(𝑥) ∈ 𝐶 1 (𝑅 𝑛 ) , ∃𝐷(𝑓(𝑥)), 𝑓(𝑥 + ℎ) = 𝑓(𝑥) + 𝐷(𝑓(𝑥)) + 𝑜(||ℎ||) , 𝐷(𝑓(𝑥)) is the
Jacobi Matrix of 𝑓(𝑥) 𝐷(𝑓(𝑥)) = (𝑔1 , 𝑔1 , ⋯ , 𝑔𝑛 )(𝑥), then 𝑔1 , 𝑔1 , ⋯ , 𝑔𝑛 make up a
local basis of curvilinear coordinate system

Note that
∂𝑔𝑗 ∂𝑔𝑗 𝑘 ∂𝑔𝑗
=( , 𝑔 ) 𝑔𝑘 = ( , 𝑔 ) 𝑔𝑘 = Γ𝑗𝑖𝑘 𝑔𝑘 = Γ𝑗𝑖,𝑘 𝑔𝑘
∂𝑥𝑖 ∂𝑥𝑖 ∂𝑥𝑖 𝑘
Γ𝑗𝑖𝑘 is the second Christoffel Symbol while Γ𝑗𝑖,𝑘 is the first Christoffel Symbol
B2. Metric and Completeness
Note full dot product with tensor itself is

Φ = Φ𝑖1 𝑖2⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
𝑑
Φ = Φ⨀Φ = Φ ( ) Φ = Φ𝑖1 𝑖2 ⋯𝑖𝑑 Φ𝑖1 𝑖2⋯𝑖𝑑
∙
We can define |Φ| = √Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1𝑖2 ⋯𝑖𝑑 as norm in tensor space Check whether this
definition satisfied the axioms of norm

Firstly, |Φ| = √Φ𝑖1𝑖2 ⋯𝑖𝑑 Φ𝑖1 𝑖2 ⋯𝑖𝑑 ≥ 0, 𝑒𝑞𝑢𝑎𝑡𝑖𝑛𝑔 𝑤ℎ𝑒𝑛 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑤ℎ𝑒𝑛 Φ𝑖1 𝑖2 ⋯𝑖𝑑 = 0
Secondly, |𝑘Φ| = √𝑘Φ𝑖1 𝑖2⋯𝑖𝑑 𝑘Φ𝑖1 𝑖2⋯𝑖𝑑 = |𝑘|√Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1 𝑖2 ⋯𝑖𝑑 = |𝑘||Φ|
Thirdly, |Φ| + |Ψ| = √Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1 𝑖2 ⋯𝑖𝑑 + √Ψ𝑖1 𝑖2 ⋯𝑖𝑑 Ψ𝑖1 𝑖2⋯𝑖𝑑
According to Minkowski Inequality,

√Φ𝑖1 𝑖2⋯𝑖𝑑 Φ𝑖1 𝑖2⋯𝑖𝑑 + √Ψ𝑖1 𝑖2 ⋯𝑖𝑑 Ψ𝑖1 𝑖2 ⋯𝑖𝑑 ≥ √(Φ + Ψ)𝑖1 𝑖2 ⋯𝑖𝑑 (Φ + Ψ)𝑖1 𝑖2 ⋯𝑖𝑑
= |Φ + Ψ|
So |Φ| + |Ψ| ≥ |Φ + Ψ|, ∀Φ, Ψ ∈ 𝒯 𝑑 This is generalization of Frobenius Norm of
Matrix, so we call this Frobenius Norm (of tensor) Since different norms are in the
same order, we will use the norm above to analysis below
Define metric by Frobenius Norm ∀Φ, Ψ ∈ 𝒯 𝑑

Ψ = Ψ 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑
ρ(Φ, Ψ) = |Φ − Ψ| = √(Φ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑 (Φ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑
Check whether this definition satisfied axioms of metric

Firstly, ρ(Φ, Ψ) ≥ 0, 𝑒𝑞𝑢𝑎𝑡𝑖𝑛𝑔 𝑤ℎ𝑒𝑛 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑤ℎ𝑒𝑛 (Φ − Ψ)𝑖1 𝑖2⋯𝑖𝑑 = 0
Secondly, ρ(Φ, Ψ) = |Φ − Ψ| = |Ψ − Φ| = ρ(Ψ, Φ)
Thirdly, ρ(Φ, Ψ) = √(Φ − Ψ)𝑖1 𝑖2⋯𝑖𝑑 (Φ − Ψ)𝑖1 𝑖2⋯𝑖𝑑
According to Minkowski Inequality,

√(Φ − Ψ)𝑖1 𝑖2⋯𝑖𝑑 (Φ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑
≤ √(Φ − Θ)𝑖1 𝑖2 ⋯𝑖𝑑 (Φ − Θ)𝑖1 𝑖2⋯𝑖𝑑 + √(Θ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑 (Θ − Ψ)𝑖1 𝑖2 ⋯𝑖𝑑
= ρ(Φ, Θ) + ρ(Θ, Ψ)
This metric is derived from Frobenius Form, so we call it Frobenius Metric If tensor
space is complete under Frobenius Metric, then tensor space is Banach Space There
are quite a lot literature illustrate the completeness of tensor space, so we suppose
this true
B3. Differential and Optimization
Suppose map between tensor space

𝑓(Φ): 𝒯 𝑑 ∋ Φ → Ψ = 𝑓(Φ) ∈ 𝒯 𝑝
If the map differentiable in 𝒯 𝑑 , we need to know what the derivation is
𝑓(Φ) = Ψ = Ψ𝑖1 𝑖2 ⋯𝑖𝑝 𝑔𝑗1 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝
𝑑 𝑑Ψ 𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝 ⨂𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈ 𝒯 𝑑+𝑝
𝑑Φ 𝑑Φ𝑖1 𝑖2⋯𝑖𝑑 𝑗1
Furtherly, we can calculate higher order derivation of tensor is
𝑑2 𝑑 2 Ψ𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝 ⨂𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ⨂𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑑
𝑑Φ2 𝑑(Φ𝑖1 𝑖2 ⋯𝑖𝑑 )2 𝑗1
This can be renumbered as

𝑑2 𝑑2 Ψ𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝+2𝑑 ∈ 𝒯 2𝑑+𝑝
𝑑Φ2 𝑑(Φ𝑖1𝑖2 ⋯𝑖𝑑 )2 𝑗1
Thus generally,
𝑑𝑛 𝑑𝑛 Ψ 𝑖1 𝑖2 ⋯𝑖𝑝
𝑓(Φ) = 𝑔 ⨂𝑔𝑗2 ⨂ ⋯ ⨂𝑔𝑗𝑝+𝑛𝑑 ∈ 𝒯 𝑛𝑑+𝑝
𝑑Φ𝑛 𝑑(Φ𝑖1𝑖2 ⋯𝑖𝑑 )𝑛 𝑗1
∀Φ, Φ0 ∈ 𝒯 𝑑 , 𝜀 > 0, 𝜌(Φ, Φ0 ) < 𝜀

𝑑 𝑑 1 𝑑2 2𝑑
𝑓(Φ) − 𝑓(Φ0 ) = 𝑓(Φ) ( ) (Φ − Φ0 ) + 2
𝑓(Φ) ( ) [(Φ − Φ0 )⨂(Φ − Φ0 )] +
𝑑Φ ∙ 2! 𝑑Φ ∙
1 𝑑𝑛 𝑛𝑑
⋯+ 𝑓(Φ) ( ) [⨂𝑛 (Φ − Φ0 )] + ⋯
𝑛! 𝑑Φ𝑛 ∙
⨂𝑛 (Φ − Φ0 ) is tensor product of n number of (Φ − Φ0 )
Now that we can give the optimal conditions of tensor map Suppose Φ0 is the
maximal value of 𝑓(Φ), so 𝑓(Φ) ≤ 𝑓(Φ0 ), ∀Φ ∈ 𝒯 𝑑 , we use 2-order approximation
𝑑 𝑑 1 𝑑2 2𝑑
𝑓(Φ) − 𝑓(Φ0 ) = 𝑓(Φ) ( ) (Φ − Φ0 ) + 𝑓(Φ) ( ) [(Φ − Φ0 )⨂(Φ − Φ0 )]
𝑑Φ ∙ 2! 𝑑Φ2 ∙
𝑑
Let 𝑓(Φ) = 0, then ∀Φ ∈ 𝒯 𝑑
𝑑Φ
𝑑2 2𝑑
2
𝑓(Φ) ( ) [(Φ − Φ0 )⨂(Φ − Φ0 )] ≤ 0
𝑑Φ ∙
2 𝑑2
𝑑 2𝑑
Equivalence of ∀Ψ ∈ 𝒯 𝑑 , 𝑑Φ2 𝑓(Φ) ( ) (Ψ⨂Ψ) ≤ 0 At this time 𝑓(Φ) is called
∙ 𝑑Φ2
semi-negative definite tensor
B4. Integration of Tensor Function
Consider 𝑓(𝑥): 𝒯 𝑑 ⊇ 𝒟𝑥 → 𝑅, note the base space of 𝒟𝑥 is Euclidean Space, we

can establish Riemann Integration in tensor space 𝒯 𝑑 Suppose
𝑥 = 𝑥 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 . Give a segmentation of tensor space 𝒟𝑥 as
(𝑗) (𝑗) (𝑗) (𝑗) (𝑗) (𝑗)

𝑙 (𝑗) = 𝑡0 < 𝑡1 < 𝑡2 < ⋯ < 𝑡𝑛−1 < 𝑡𝑛 = 𝑢(𝑗) , 𝑗 = 1,2 ⋯ , 𝑑, 𝑡𝑘 ∈ 𝑅
Where 𝑙 (𝑗) , 𝑢(𝑗) ∈ 𝑅̅ This segmentation is label as 𝑃, also define

𝑑
(𝑗) (𝑗)
𝑃 = inf{𝜆 (∏ (𝑡𝑘𝑗 − 𝑡𝑘𝑗 −1 )) : 𝑘𝑗 = 1,2 ⋯ , 𝑛}
𝑗=1
Here P is also called the mode of segmentation and 𝜆 is Lebesgue Measure in

Euclidean Space So integration of 𝑓(𝑥) is defined as
∫ 𝑓(𝑥) 𝑑𝑥 = lim 𝜎(𝑓, 𝜉, 𝑃)
𝑃→0
Here 𝜎(𝑓, 𝜉, 𝑃) is partial sum of 𝑓, while 𝜉 denotes random selection of number

𝑑 𝑑
(𝑗) (𝑗) (𝑗) (𝑗)
𝜎(𝑓, 𝜉, 𝑃) = ∑ 𝑓(𝜉𝑘1 ⋯𝑘𝑑 ) 𝜆 (∏ (𝑡𝑖 − 𝑡𝑖−1 )) , 𝜉𝑘1 ⋯𝑘𝑑 ∈ ∏ (𝑡𝑖 − 𝑡𝑖−1 )
𝑗=1 𝑗=1
There are three equivalent ways to determine existence of Integration, we will

introduce them and prove the equivalence below
Furtherly, consider
Appendix C. Tensor Random Variables
C1. Definition, Distribution and Density
Traditional definition of n-dimensional random variable in 𝑅 𝑛 is a map from

probability space (Ω, ℱ, 𝑃) to measurable space (𝑅 𝑛 , 𝔅(𝑅 𝑛 )) Similarly, we can
define random variable ε: (Ω, ℱ, 𝑃) → (𝒯 𝑑 , 𝔅(𝒯 𝑑 ))
∀𝐸 ∈ 𝔅(𝒯 𝑑 ), the probability of the event ε ∈ 𝐸 is
𝑃(ε ∈ 𝐸) = 𝑃(𝜀 −1 (𝐸)) = ∫ 𝐼(𝜀 −1 (𝐸)) 𝑑𝑃
Before define the distribution of ε, we need to clarify Borel-algebra of tensor space

𝔅(𝒯 𝑑 ) = σ({(−∞, 𝑡]: 𝑡 ∈ 𝒯 𝑑 })
Means Borel-algebra of 𝒯 𝑑 is σ-algebra generated by {(−∞, 𝑡]: 𝑡 ∈ 𝒯 𝑑 } , note for
tensor Φ ∈ 𝒯 𝑝 , Ψ ∈ 𝒯 𝑝 , Φ > Ψ means Φ𝑖1 𝑖2⋯𝑖𝑝 > Ψ 𝑖1 𝑖2 ⋯𝑖𝑝 , ∀𝑖1 , 𝑖2 , ⋯ , 𝑖𝑝
So distribution of ε is defined as
𝐹(𝑡) = 𝑃 (𝜀 −1 ((−∞, 𝑡])) = ∫ 𝐼(𝜀 −1 ((−∞, 𝑡])) 𝑑𝑃 ∈ [0,1]
Thus density is
𝑑 𝑑
𝑓(𝑡) = 𝐹(𝑡) = ∫ 𝐼(𝜀 −1 ((−∞, 𝑡])) 𝑑𝑃
𝑑𝑡 𝑑𝑡
C2. Expectations, Variance and Characteristic Function
Given density we can calculate the expectation as

𝑑
𝐸(𝜀) = ∫ 𝑡 𝐼(𝜀 −1 ((−∞, 𝑡])) 𝑑𝑃 = ∫ 𝑡𝑓(𝑡) 𝑑𝑡 ∈ 𝒯 𝑑
𝑑𝑡
And the second order moment is
𝐸(𝜀)2 = ∫ 𝑡 2 𝑓(𝑡) 𝑑𝑡 ∈ 𝒯 2𝑑
So variance is
𝑉𝑎𝑟(𝜀) = 𝐸(𝜀)2 − 𝐸 2 (𝜀) ∈ 𝒯 2𝑑
C3. Tensor Normal Distribution
We start from the simplest case Define 𝑋 = 𝑋 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 ∈
𝒯 𝑑 (𝑅 𝑛 ) follows standard tensor normal distribution if

𝑋 𝑖1 𝑖2 ⋯𝑖𝑑 = 𝑢, 𝑖𝑓 𝑖𝑗 = 1,2, ⋯ , 𝑛, 𝑗 = 1,2, ⋯ , 𝑑
Where 𝑢~𝑁(0,1) Below we will derive the distribution of standard tensor normal
distribution and its characteristic function, then we will turn to general tensor
normal distribution Note
𝐹(𝑡) = 𝑃 (𝑋 −1 ((−∞, 𝑡])) = ∫ 𝐼 (𝑋 −1 ((−∞, 𝑡])) 𝑑𝑃
𝑡
𝑑 ⁄2 1
= ∫ (2𝜋)−𝑛 exp(− 𝑥⨀𝑥)𝑑𝑥
−∞ 2
So the density is
𝑑𝐹 𝑑 1
𝑓(𝑥) = (𝑥) = (2𝜋)−𝑛 ⁄2 exp(− 𝑥⨀𝑥)
𝑑𝑡 2
Thus the characteristic function of tensor normal distribution is
1
𝜑𝑋 (𝑡) = 𝐸(𝑒 𝑖𝑡𝑋 ) = exp(− 𝑡⨀𝑡)
2
𝑒
Now suppose 𝑌 = 𝜇 + 𝐴 ( ) 𝑋 , 𝑌, 𝜇 ∈ 𝒯 𝑝 (𝑅 𝑛 ), 𝐴 ∈ 𝒯 𝑞 (𝑅 𝑛 ), 𝑝 = 𝑞 + 𝑑 − 2𝑒 ,
∙
then we say 𝑌~𝑁𝒯 𝑝 (𝜇, Σ), Σ ∈ 𝒯 2𝑝 (𝑅𝑛 ), here
2𝑒
Σ = 𝐴2 ( ) 𝐼2𝑑
∙
Generally, the characteristic function is
1
𝜑𝑌 (𝑡) = 𝐸(𝑒 𝑖𝑡𝑌 ) = exp (𝑖𝜇⨀𝑡 − 𝑡⨀Σ⨀𝑡),
2
The density is
𝑝 ⁄2 𝑝 ⁄2 1
𝑓(𝑦) = (2𝜋)−𝑛 (det(Σ))−𝑛 exp(− (𝑦 − 𝜇)2 ⨀Σ −1 )
2
C4. Tensor Wishart Distribution
Consider standard normal tensor distribution 𝑋~𝑁𝒯 𝑝 (0,1), Suppose

𝜒 2 = 𝑋⨀𝑋
Then define 𝜒 2 ∼ 𝜒 2 (𝑛𝑝 ) , and this suggest we can derive 𝜒 2 distribution from
tensor normal distribution Note the density of 𝜒 2 is
𝑛𝑝 ⁄2
𝑛𝑝 −1 𝑛𝑝⁄ −1 −𝑦
𝑓(𝑦) = [2 Γ( )] 𝑦 2 𝑒 2 , y > 0
2
Below we popularize this distribution into tensor form
Recall for vector normal distribution 𝑥𝑖 ~𝑁𝒯 1 (0, Σ), 𝑖 = 1,2, ⋯ , 𝑛 , and let 𝑋 =
[𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ] , then define 𝑊 = 𝑋 𝑇 𝑋 as its Wishart Matrix, and 𝑊~𝑊𝑛 (𝑛, Σ)
means W follows Wishart Distribution
Here we suppose 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ~𝑁𝒯 𝑝 (0, Σ) 𝑖. 𝑖. 𝑑 , let 𝑋6 = [𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ] ∈ 𝒯 𝑝+1 ,
𝑒
then we can define 𝑊𝑒 = 𝑋 ( ) 𝑋 ∈ 𝒯 2𝑝+2−2𝑒 as Wishart Matrix of simple random
∙
sample [𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ] Notice there are some difference between Wishart Matrix of
vector sample and tensor sample Since the product between two tensor have
different forms, this means 𝑒 = 1,2, ⋯ , 𝑝 + 1 , and the outcome is in different
dimensions, so 𝑊𝑒 represents Wishart Matrix derived from e-dot product of sample
tensor X Thus, Wishart Matrix of vector normal sample is derived from 1-dot
1
product, and can be written as 𝑊1 = 𝑋 ( ) 𝑋 Below we calculate the distribution of
∙
tensor Wishart Distribution
𝑒
suppose 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ~𝑁𝒯 𝑝 (0, Σ) 𝑖. 𝑖. 𝑑, and let 𝑋 = [𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ], 𝑊𝑒 = 𝑋 ( ) 𝑋
∙
1
exp[− 2 𝑡𝑟(𝑊𝑒 )]
𝑓(𝑊𝑒 ) = 𝑝+1 ⁄2
2𝑛 [det(𝑊𝑒 )]1⁄2 Γ𝑛𝑒 (𝑛⁄2)
𝑛−1
𝑝
Γ𝑛 (𝑛⁄2) = [Γ𝑛 (𝑛⁄2)]𝑝 , Γ𝑛 (𝑛⁄2) = 𝜋 2 Γ(𝑛⁄2)Γ((𝑛 − 1)⁄2) ⋯ Γ(1⁄2)
𝑒
Proof: Note 𝑒 = 1,2, ⋯ , 𝑝 + 1, when 𝑒 = 𝑝 + 1, 𝑊𝑝+1 = 𝑋 ( ) 𝑋 = 𝑋⨂𝑋 ∈ 𝑅, at this
∙
this 𝑊𝑝+1 follows 𝜒 2 distribution
−1
𝑛𝑝+1 ⁄2
𝑛𝑝+1 𝑊𝑝+1
𝑓(𝑊𝑝+1 ) = [2 Γ( )] 𝑒− 2
2
we will use reversed mathematical induction to prove the density for 𝑒 = 1,2, ⋯ , 𝑝
6 X~𝑁𝒯 𝑝 (0, Σ⨂𝐼𝑛 )

Tensor Regression Theory and Application2

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Tensor Regression Theory and Application2

Hochgeladen von

Copyright:

Verfügbare Formate

Tensor Regression Theory and Application

2 Classical Least Square Regression

2.1 Tensor linear regression

2.1.1 Tensor and Tensor Space

We can think tensor intuitively as extension of matrix While a 𝑛 × 𝑛 matrix

A tensor set 𝒯 is a d-dimensional tensor space with base space Rm if the

that (𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 )(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) = 𝐼 and det(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) ≠ 0 which means

contravariant basis According to (𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 )(𝑔1 , 𝑔2 , ⋯ , 𝑔𝑛 ) = 𝐼 we know

Φ(𝑔𝑖1 , 𝑔𝑖2 , ⋯ , 𝑔𝑖𝑑 ) = Φ𝑖1 𝑖2⋯𝑖𝑑

Φ⨂Ψ = Φ𝑖1 𝑖2⋯𝑖𝑝 Ψ𝑗1 𝑗2 ⋯𝑗𝑞 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑝 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞

Given this, the e-dot product can be defined (𝑒 ≤ min(𝑝, 𝑞))

2.1.3 Tensor Linear Regression Model

Suppose dependent variable 𝑦 ∈ 𝒯 𝑑 , covariate 𝑥 ∈ 𝒯 𝑝 , coefficient 𝛽 ∈ 𝒯 𝑞 ,

1 Under Einstein Sum Assumption 𝑔𝑖𝑗 = ∑ 𝑔𝑖𝑘 𝑔 𝑗𝑘 = 𝑔𝑖𝑘 𝑔 𝑗𝑘

ε = 𝜀 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑

𝑥 = 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝 𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝

𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑 , in fact

𝑔𝑘1 ⨂𝑔𝑘2 ⨂ ⋯ ⨂𝑔𝑘𝑝−𝑒 ⨂𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 −𝑒 = 𝑐𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑

Below we give the assumptions for classical least square regression

A3: Rank condition 𝑟𝑎𝑛𝑘(𝑋) = 𝑑

The definition of rank of tensor is showing in Appendix A The definition and

2.2 Estimating Tensor Linear Regression Model

2.2.1 Ordinary Least Square Estimation

Suppose the estimation of 𝛽 is 𝛽̂

4 c = (𝑔𝑖1 , 𝑔𝑘1 )(𝑔𝑖2 , 𝑔𝑘2 ) ⋯ (𝑔𝑖𝑑 , 𝑔 𝑗𝑞−𝑒 )

𝜖 𝑖1 𝑖2⋯𝑖𝑑 = 𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞

To get Least Square Estimation of 𝛽, we need to do the following optimization

Note: To compute this optimization, we need to establish tensor derivation and

− 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )

− 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )]𝑔 𝑗1 ⨂𝑔 𝑗2 ⨂ ⋯ ⨂𝑔 𝑗𝑞 = 0

− 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )]

= −𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 (𝑦 𝑖1 𝑖2⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 )

− (𝑦 𝑖1 𝑖2 ⋯𝑖𝑑 − 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒 𝛽𝑠1 ⋯𝑠𝑒𝑗𝑒+1 𝑗𝑒+2 ⋯𝑗𝑞 ) 𝑥 𝑘1 𝑘2 ⋯𝑘𝑝−𝑒𝑠1 ⋯𝑠𝑒

2.2.2 Unbiased Estimation

5 Means they can traverse 1,2,……,n

Appendix A1. Rank of Tensor

of elementary transformation of tensor We call 𝑖𝑗 = 1,2 ⋯ , 𝑛, 𝑗 = 1,2, ⋯ , 𝑑 as the j-

change is Φ𝜎(𝑖1 𝑖2 ⋯𝑖𝑑) 𝑔𝜎(𝑖1 ) ⨂𝑔𝜎(𝑖2 ) ⨂ ⋯ ⨂𝑔𝜎(𝑖𝑑 )

Φ𝜎(𝑖1 𝑖2 ⋯𝑖𝑑 ) = Φ(𝑢𝜎(𝑖1 ) , 𝑢𝜎(𝑖2 ) , ⋯ , 𝑢𝜎(𝑖𝑑 ) )

The second kind of elementary transformation is to select some dimensions

Φ𝑖1 𝑖2 ⋯𝑖𝑑 →𝑘(𝑖𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘 ) Φ𝑘(𝑖𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘𝑙 )

Φ𝑖1 𝑖2⋯𝑖𝑑 = Φ(𝑢𝑖1 , 𝑢𝑖2 , ⋯ , 𝑢𝑖𝑑 )

Φ𝑘(𝑖𝑘1 ,𝑖𝑘2 ,⋯,𝑖𝑘𝑙 ) = Φ(𝑢𝑖1 , ⋯ , 𝑘𝑢𝑖𝑘1 , ⋯ , 𝑘𝑢𝑖𝑘 , ⋯ , 𝑢𝑖𝑑 )

The third kind of elementary transformation is to select some dimensions

another dimensions (𝑖𝑗1 , 𝑖𝑗2 , ⋯ , 𝑖𝑗𝑙 )

Φ𝑖1𝑖2 ⋯𝑖𝑑 →𝑘(𝑖 Φ(𝑖𝑗1 ,𝑖𝑗2 ,⋯,𝑖𝑗𝑙 )

Φ(𝑖𝑗1 ,𝑖𝑗2 ,⋯,𝑖𝑗𝑙 ) = Φ(𝑢𝑖1 , ⋯ , 𝑘𝑢𝑖𝑘1 + 𝑢𝑖𝑗1 , ⋯ , 𝑘𝑢𝑖𝑘 + 𝑢𝑖𝑗 , ⋯ , 𝑢𝑖𝑑 )

Φ𝑖1 𝑖2⋯𝑖𝑑 = 0, 𝑖𝑗 = 1,2, ⋯ , 𝑗 − 1, 𝑗 + 1, ⋯ , 𝑛, 𝑗 = 1,2, ⋯ , 𝑟 𝑜𝑟 𝑖𝑗 = 1,2, ⋯ , 𝑛, 𝑗 = 𝑟 +

Appendix A2. Determinate of Tensor

Before import determinate of tensor, we will introduce permutation Note

det(Φ) = ∑ ∏ Φ𝜎1(𝑖1 )𝜎2 (𝑖2 )⋯𝜎𝑑 (𝑖𝑑 )

Bellow we will illustrate det(Φ) unchanged under elementary transformation

Appendix A3. Inverse of Tensor

But more generally, we define the inverse of tensor as Ψ ∈ 𝒯 𝑑 ,

Ψ = Ψ 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑

Φ𝑖1 𝑖2 ⋯𝑖𝑑−𝑒𝑠1 ⋯𝑠𝑒 Ψ𝑠1 ⋯𝑠𝑒𝑗1 𝑗2 ⋯𝑗𝑑−𝑒 = 𝐼 𝑖1 𝑖2⋯𝑖𝑝

Appendix A4. Tensor Linear Equation

Generally, the base space of dimensions of tensor can be different So generally

We can also conduct elementary transformation on this kind of tensors, but no

𝑏 = 𝑏 𝑖1 𝑖2 ⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑

𝑥 = 𝑥 𝑖𝑑+1 𝑖𝑑+2 ⋯𝑖𝑑+𝑒 𝑔𝑖𝑑+1 ⨂𝑔𝑖𝑑+2 ⨂ ⋯ ⨂𝑔𝑖𝑑+𝑒

= 𝑏 𝑖1 𝑖2⋯𝑖𝑑 𝑔𝑖1 ⨂𝑔𝑖2 ⨂ ⋯ ⨂𝑔𝑖𝑑

𝑖 𝑖 ⋯𝑖𝑑 𝑖𝑑+1 𝑖𝑑+2 ⋯𝑖𝑑+𝑒

B1. Christoffel Symbol of Curvilinear Coordinate System