Sie sind auf Seite 1von 28

S1/2 Regularization Methods and Fixed Point Algorithms for

Ane Rank Minimization Problems



Dingtao Peng Naihua Xiu and Jian Yu

Abstract
The ane rank minimization problem is to minimize the rank of a matrix under linear
constraints. It has many applications in various areas such as statistics, control, system
identication and machine learning. Unlike the literatures which use the nuclear norm or
the general Schatten q (0 < q < 1) quasi-norm to approximate the rank of a matrix, in
this paper we use the Schatten 1/2 quasi-norm approximation which is a better approx-
imation than the nuclear norm but leads to a nonconvex, nonsmooth and non-Lipschitz
optimization problem. It is important that we give a globally necessary optimality condi-
tion for the S1/2 regularization problem by virtue of the special objective function. This is
very dierent from the local optimality conditions usually used for the general Sq regular-
ization problems. Explicitly, the globally optimality condition for the S1/2 regularization
problem is a xed point equation associated with the singular value half thresholding
operator. Naturally, we propose a xed point iterative scheme for the problem. We also
provide the convergence analysis of this iteration. By discussing the location and setting
of the optimal regularization parameter as well as using an approximate singular value
decomposition procedure, we get a very ecient algorithm, half norm xed point algo-
rithm with an approximate SVD (HFPA algorithm), for the S1/2 regularization problem.
Numerical experiments on randomly generated and real matrix completion problems are
presented to demonstrate the eectiveness of the proposed algorithm.

Key words. ane rank minimization problem; matrix completion problem; S1/2 regu-
larization problem; xed point algorithm; singular value half thresholding operator

AMS Subject Classification. 90C06, 90C26, 90C59, 65F22

1 Introduction
The ane rank minimization problem, which is to minimize the rank of a matrix under linear
constraints, can be described as follows

minXRmn rank(X)
(1.1)
s.t. A(X) = b,

College of Science, Guizhou University, Guiyang 550025, Guizhou, China; and School of Science, Beijing
Jiaotong University, Beijing 100044, China; (dingtaopeng@126.com). This author was supported by the NSFC
grant 11171018 and the Guizhou Provincial Science and Technology Foundation grant 20102133.

School of Science, Beijing Jiaotong University, Beijing 100044, China (nhxiu@bjtu.edu.cn). This author
was supported by the National Basic Research Program of China grant 2010CB732501 and the NSFC grant
71271021.

College of Science, Guizhou University, Guiyang 550025, Guizhou, China (sci.jyu@gzu.edu.cn).

1
where b Rp is a given vector and A : Rmn Rp is a given linear transformation determined
by p matrices A1 , , Ap Rmn via
A(X) := [A1 , X, , Ap , X]T for all X Rmn ,
with Ai , X := trace(ATi X), i = 1, , p. An important special case of (1.1) is the matrix
completion problem [6]
minXRmn rank(X)
(1.2)
s.t. Xi,j = Mi,j , (i, j) ,
where X and M are both m n matrices, is a subset of index pairs (i, j), and a small
subset {Mi,j |(i, j) } of the entries is known.
Many applications arising in various areas can be captured by solving the model (1.1),
for instance, the low-degree statistical models for a random process [17, 36], the low-order
realization of linear control systems [19, 37], low-dimensional embedding of datum in Eu-
clidean spaces [20], system identication in engineering [28], machine learning [32], and other
applications [18]. The matrix completion problem (1.2) often arises, for which the examples
include the Netix problem, global positioning, remote sensing and so on [5, 6]. Moreover,
problem (1.1) is an extension of the well-known sparse signal recovery (or compressed sens-
ing) which is formulated as nding a sparsest solution of an underdetermined system of linear
equations [7, 15].
Problem (1.1) was considered by Fazel [18] in which its computational complexity is
analyzed and it is proved to be an NP-hard problem. To overcome such a diculty, Fazel [18]
and other researchers (e.g., [6, 8, 34]) have suggested to relax the rank of X by the nuclear
norm, that is, to consider the following nuclear norm minimization problem
minXRmn X
(1.3)
s.t. A(X) = b,
or the nuclear norm regularization problem
minXRmn A(X) b22 + X (1.4)
if the data contain noises, where X is the nuclear norm of X, i.e., the sum of its singular
values. It is well-known that problems (1.3) and (1.4) are both convex and therefore, can be
easier (at least in theory) solved than (1.1).
Many existing algorithms rely on nuclear norm. For examples, problem (1.3) can be
reformulated as a semidenite programming [34] and be solved by SDPT3 [41]; Lin et al [26],
and Tao and Yuan [39] adopt augmented lagrangian multipliers (ALM) methods to solve
robust PCA problems and its extension which contain the matrix completion problem as a
special case; The SVT [4] solves (1.3) by applying a singular value thresholding operator; Toh
and Yun [40] solve a general model that contains (1.3) as a special case by accelerated proximal
gradient (APG); Liu, Sun and Toh [27] present a framework of proximal point algorithms in
the primal, dual and primal-dual forms for solving the nuclear norm minimization with linear
equality and second order cone constraints; Ma, Goldfarb and Chen [29] proposed xed point
and Bregman iterative algorithms for solving problem (1.3).
Considering the nonconvexity of the original problem (1.1), some researchers [23, 25, 30,
31, 33] suggest to use the Schatten q (0 < q < 1) quasi-norm (for short, q norm) relaxation,
that is, to solve the q-norm minimization problem
minXRmn Xqq
(1.5)
s.t. A(X) = b,

2
or the Sq regularization problem

minXRmn A(X) b22 + Xqq (1.6)

if the data contain noises, where the Schatten q quasi-norm of X is dened by Xqq :=
min{m,n} q
i=1 i and i (i = 1, , min{m, n}) are the singular values of X. Problems (1.5) is
intermediate between (1.1) and (1.3) in the sense that


min{m,n}

min{m,n}

min{m,n}
rank(X) = i0 , Xqq = iq , and X = i1 .
i=1 i=1 i=1
i = 0

Obviously, the q quasi-norm is a better approximation of the rank function than the nuclear
norm, but it leads to a nonconvex, nonsmooth, non-Lipschitz optimization problem for which
the global minimizers are dicult to nd.
In fact, the nonconvex relaxation method was rstly proposed in the region of sparse signal
recovery [9, 10]. Recently, nonconvex regularization methods associated with the q (0 < q <
1) norm have attracted much attention and many theoretical results and algorithms have been
developed to solve the nonconvex, nonsmooth, even non-Lipschitz optimization problems, see,
e.g., [2, 14, 23, 25, 31]. Extensive computational results have shown that using the q norm
can nd very sparse solutions under very little measurements, see, e.g., [914, 25, 31, 36, 45].
However, since the q norm minimization is a nonconvex, non-smooth and non-Lipschitz
problem, it is in general dicult to give a theoretical guarantee for nding a global solution.
Moreover, which q should be selected is another interesting problem. The results in [4345]
revealed that the 1/2 relaxation can be somehow regarded as a representation among all the
q relaxation with q in (0, 1) in the sense that the 1/2 relaxation has more powerful recovering
ability than the q relaxation as 1/2 < q < 1, meanwhile the recovering ability has no much
dierence between the 1/2 relaxation and the q relaxation as 0 < q < 1/2. Moreover, Xu et
al [44] in fact provided a globally necessary optimality condition for the 1/2 regularization
problem, which is expressed as a xed point equation involving the half thresholding function.
This condition may not hold at the local minimizers. Then they developed a fast iterative half
thresholding algorithm for the 1/2 regularization problem which matches the iterative hard
thresholding algorithm for the 0 regularization problem and the iterative soft thresholding
algorithm for the 1 regularization problem.
In this paper, inspired by the works of nonconvex regularization method, especially 1/2
regularization mentioned above, we focus our attention on the following S1/2 regularization
problem
{ }
1/2
min AX b22 + X1/2 , (1.7)
XRmn

1/2 min{m,n} 1/2


where X1/2 = i=1 i and i (i = 1, , min{m, n}) are the singular values of X.
This paper is organized as follows. In Section 2, we briey discuss the relation bween the
global minimizers of problem (1.5) and problem (1.6). In Section 3, we deduce an analytical
thresholding expression associated with the solutions to problem (1.7), and establish an exact
lower bound of the nonzero singular values of the solutions. Moveover, we prove that the
solutions to problem (1.7) are xed points of a matrix-valued thresholding operator. In
Section 4, based on the xed point condition, we give a naturally iterative formula, and
provide the convergence analysis of our proposed iteration. Section 5 discusses the location
of the optimal regularization parameter and the setting of the parameter which coincides with

3
the xed point continuation technique used in the convex optimization. Since the singular
value decomposition is computationally expensive, in Section 6 we employ an approximate
singular value decomposition procedure to cut the cost of time. Thus we get a very fast, robust
and powerful algorithm which we call HFPA algorithm (half norm xed point algorithm
with an approximate SVD). Numerical experiments on randomly generated and real matrix
completion problems are presented in Section 7 to demonstrate the eectiveness of the HFPA
algorithm. At last, we conclude our results in section 8.
Before continuing, we summarize the notations that will be used in this paper. Through-
out this paper, without loss of generality, we always suppose m n. Let x2 denote
the Euclidean norm of any vector x Rp . For any x, y Rp , x, y = xT y denotes the
inner product of two vectors. For any matrix X Rmn , (X) = (1 (X), , m (X))T
denotes the vector of singular values of X arranged in nonincreasing order, and it will be
simply denoted by = (1 , , m )T if no confusion is caused in the context; Diag((X))
denotes a diagonal matrix whose diagonal vector is (X); and XF denotes the Frobe-
( )1/2 (m 2 )1/2
nius norm of X, i.e., XF = X 2
i,j ij = i=1 i . For any X, Y Rmn ,
X, Y = tr(Y T X) denotes the inner product of two matrices. Let the linear transfor-
mation A : Rmn Rp be determined by p given matrices A1 , , Ap Rmn , that
is, A(X) = (A1 , X, , Ap , X)T . Dene A = (vec(A1 ), , (vec(Ap ))T Rpmn and
x = vec(X) Rmn where vec() is the stretch operator, then we have A(X) = Ax and
A(X)2 AXF , where A := max{A(X)2 : XF = 1} = A2 and A2 is the

pof the matrix A. Let A denote
spectral norm the adjoint of A. Then for any y Rp , we
have A y = i=1 yi Ai and A(X), y = X, A y = vec(X), vec(A y) = x, AT y.

2 Relation between global minimizers of problem (1.5) and


problem (1.6)
We now show that in some sense, problem (1.5) can be solved by solving problem (1.6).
The theorem here is general and covers problem (1.7) as a special case. We note that the
regularization term Xqq is nonconvex, nonsmooth and non-Lipschitz, hence the result is
nontrivial.

Theorem 2.1 For each > 0, the set of global minimizers of (1.6) is nonempty and bounded.
Let {k } be a decreasing sequence of positive numbers with k 0, and Xk be a global
minimizer of problem (1.6) with = k . Suppose that problem (1.5) is feasible, then {Xk }
is bounded and any of its accumulation points is a global minimizer of problem (1.5).

Proof. Since C (X) := A(X) b22 + Xqq Xqq , the objective function C (X)
is bounded from below and is coercive, i.e., C (X) if XF , and hence the set of
global minimizers of (1.6) is nonempty and bounded.
Suppose that problem (1.5) is feasible and X is any feasible point, then AX = b. Since
Xk is a global minimizer of problem (1.6) with = k , we have
{ }
max k Xk qq , A(X)k b22 k Xk qq + A(X)k b22
k Xqq + AX b22
= k Xqq .

From k Xk qq k Xqq , we get Xk qq Xqq , that is, the sequence {Xk } is bounded.
Thus, {Xk } has at least one accumulation point. Let X be any accumulation point of

4
{Xk }. From A(X)k b22 k Xqq and k 0, we derive A(X) = b, that is, X is a
feasible point of problem (1.5). It follows from Xk qq Xqq that X qq Xqq . Then
by the arbitrariness of X, we obtain that X is a global minimizer of problem (1.5).

3 Globally necessary optimality condition


In this section, we give a globally necessary optimality condition for problem (1.7), which
perhaps does not hold at the local minimizers. This condition is expressed as a matrix-
valued xed point equation associated with a special thresholding operator which we called
half thresholding operator. Before we start to research the S1/2 regularization problem, we
begin with introducing the half thresholding operator.

3.1 Half thresholding operator


First, we introduce the half thresholding function, which is to minimize a real-valued function.
The following key lemma follows but is dierent from Xu et al [44].

Lemma 3.1 Let t R, > 0 be two given real numbers. Suppose that x R is a global
minimizer of the problem

min f (x) := (x t)2 + x1/2 . (3.1)


x0

3
Then x is uniquely determined by (3.1) when t = 54 2/3
4 , and can be analytically expressed
by

3

h,1/2 (t), if t > 54 2/3
4

3
x = h (t) := {h,1/2 (t), 0}, if t = 54 2/3
4
(3.2)


3
54 2/3
0, if t < 4

where
( ( ))
2 2 2
h,1/2 (t) = t 1 + cos (t) (3.3)
3 3 3

with
( ( ) )
t 3/2
(t) = arccos . (3.4)
8 3

Proof. Firstly, we consider the positive stationary points of (3.1). The rst order optimal
condition of (3.1) gives that

x t + = 0. (3.5)
4 x

This equation implies that if and only if t > 0 it has positive roots, and that if t 0, f (x)
is increasing on [0, +) and x = 0 is the unique minimizer of (3.1). Hence, we need only
to consider t > 0 from now on. By solving equation (3.5) and comparing the values of f at
each root of equation (3.5), Xu et al [44] have showed that x = h,1/2 (t) dened by (3.3) is
the unique desired positive stationary point of (3.1) such that f (x) is the smallest among the

5
values of f at its all positive stationary points (see (14),(15) and (16) in [44], we note that,
in (16), xi > 34 2/3 is not necessary, in fact, xi > 0 is enough).
The rest thing is to compare the values between f (x) and f (0). Fortunately, Xu et al
(see Lemma 1 and Lemma 2 in [44]) have showed that

3
54 2/3
f (x) < f (0) t >
4
and

3
54 2/3
f (x) = f (0) t = .
4
The other case is naturally

3
54 2/3
f (x) > f (0) t < .
4
The above three relationships imply

3

x, if t > 54 2/3
4

3
x = {x, 0}, if t = 54 2/3
4
,


3
54 2/3
0, if t < 4

which completes the proof.


Figure 1 shows the minimizers of the function f (x) with two dierent pairs of (t, ).
Specically, in (a) t = 2, = 8 and in (b) t = 4, = 8. In (b), x = 0 is a local minimizer of

3
f (x); Meanwhile, since t = 4 > 4 = 3 54, the global minimizer is x = h,1/2 (4) > 0.
54 2/3

25 18
t =2, = 8 t =4, = 8
16

20 14

12

15 10
f(x)

f(x)

10 6

5 2

0 2
1 0 1 2 3 4 5 1 0 1 2 3 4 5
x x
(a) (b)

Figure 1: The minimizers of the function f (x) with two dierent pairs of (t, ).


3
54 2/3
Lemma 3.2 (Appendix A in [44]) If t > 4 , then the function h (t) is strictly
increasing.

Similar to [33, 44], using h () dened in Lemma 3.1, we can dene the following half
thresholding function and half thresholding operators.

6
Definition 3.3 (Half thresholding function) Assume t R. For any > 0, the function
h () defined by (3.2)-(3.4) is called a half thresholding function.

Definition 3.4 (Vector half thresholding operator) For any > 0, the vector half threshold-
ing operator H () is defined as

H (x) := (h (x1 ), h (x2 ), , h (xn ))T , x Rn .

Definition 3.5 (Matrix half thresholding operator) Suppose Y Rmn of rank r admits a
singular value decomposition (SVD) as

Y = U Diag()V T ,

where U and V are respectively m r and n r matrices with orthonormal columns, and the
vector = (1 , 2 , , r )T consists of positive singular values of Y arranged in nonincreas-
ing order (Unless specified otherwise, we will always suppose the SVD of a matrix is given
in this reduced form). For any > 0, the matrix half thresholding operator H () : Rmn
Rmn is defined by

H (Y ) := U Diag(H ())V T .

In what follows, we will see that the matrix half thresholding operator dened above is in
1/2
fact a proximal operator associated with X1/2 , a nonconvex and non-Lispschitz function.
This in some sense can be regarded as an extension of the well-known proximal operator
associated with convex functions [27, 35].

Lemma 3.6 The global minimizer X s of the following problem


1/2
min X Y 2F + X1/2 (3.6)
XRmn

can be analytically given by

X s = H (Y ).

Proof. See the Appendix.

3.2 Fixed point equation for global minimizers


Now, we can begin to consider our S1/2 regularization problem (1.7):
{ }
1/2
min AX b22 + X1/2 . (3.7)
XRmn

For any , > 0 and Z Rmn , let


1/2
C (X) := A(X) b22 + X1/2 , (3.8)
( )
C, (X, Z) := C (X) A(X) A(Z)22 + X Z2F , (3.9)

B (Z) := Z + A (b A(Z)). (3.10)

Lemma 3.7 If X s Rmn is a global minimizer of C, (X, Z) for any fixed , and Z,
then X s can be analytically expressed by

X s = H (B (Z)). (3.11)

7
Proof. Note that C, (X, Z) can be reexpressed as
( )
1/2
C, (X, Z) = A(X) b22 + X1/2 A(X) A(Z)22 + X Z2F
1/2
= X2F + 2A(X), A(Z) 2A(X), b 2X, Z + X1/2
+Z2F + b22 A(Z)22
= X2F 2X, Z + A (b A(Z)) + X1/2
1/2

+Z2F + b22 A(Z)22


1/2
= X2F 2X, B (Z) + X1/2 + Z2F + b22 A(Z)22 .
1/2
= X B (Z)2F + X1/2
+Z2F + b22 A(Z)22 B (Z)2F .

This implies that minimizing C, (X, Z) for any xed , and Z is equivalent to solving
{ }
1/2
min X B (Z)2F + X1/2 .
XRmn

By applying Lemma 3.6 with Y = B (Z), we get expression (3.11).

Lemma 3.8 Let and be two fixed numbers satisfying > 0 and 0 < A2 . If X
is a global minimizer of C (X), then X is also a global minimizer of C, (X, X ), that is,

C, (X , X ) C, (X, X ) for all X Rmn . (3.12)

Proof. Since 0 < A2 , we have

X X 2F AX AX 22 0.

Hence for any X Rmn ,


( )
C, (X, X ) = C (X) AX AX 22 + X X 2F
( ) ( )
= AX b22 + X1/2 + X X 2F AX AX 22
1/2

( )
1/2
AX b22 + X1/2 = C (X)
C (X ) = C, (X , X ),

where the last inequality is due to that X is a global minimizer of C (X). The proof is thus
complete.
By applying Lemmas 3.7 and 3.8, we can now derive the main result of this section.

Theorem 3.9 Given > 0, 0 < A2 . Let X be a global minimizer of problem (1.7)
and B (X ) = X + A (b A(X) ) admit the following SVD

B (X ) = U Diag((B (X )))V T . (3.13)

Then X satisfies the following fixed point equation

X = H (B (X )). (3.14)

8
Particularly, one can express
[(X )]i = h ([(B (X ))]i )

3
if [(B (X ))]i > 54 2/3
h,1/2 ([(B (X ))]i ), 4 ()
3
= {h,1/2 ([(B (X ))]i ), 0}, if [(B (X ))]i = 54
4 ()
2/3 (3.15)


3
0, if [(B (X ))]i < 54
4 ()
2/3 .

Moreover, we have

3
54
either [(X )]i ()2/3 or [(X )]i = 0. (3.16)
6
Proof. Since X is a global minimizer of C (X), by Lemma 3.8, X is also a global
minimizer of C, (X, X ). Consequently by Lemma 3.7, X satises equation (3.14). (3.15)
is a reexpression of equation (3.14) in the form of component. According to (3.2)-(3.4), by
direct computation, we have

3
54

lim (t) = and
lim h (t) = ()2/3 .
t
3 54
()2/3
4 t
3 54
()2/3
6
4 4

3
54 2/3
This limitation together with the strict monotonicity of h on t > 4 () (Lemma 3.2)

3
3
implies that > 654 ()2/3 as [(B (X ))]i > 454 ()2/3 . The last one of (3.15)
[(X )]i

3
shows [(X )]i = 0 as [(B (X ))]i < 454 ()2/3 . Thus, (3.16) is derived.

3
Theorem 3.10 provides not only the lower bound estimation, say 654 ()2/3 , for the
nonzero singular values of the global minimizers of the S1/2 regularization problem, but also
a global necessary optimality condition in the form of a xed point equation associated with
the matrix half thresholding operator H (). In one hand, it is analogous to the xed point
condition of the nuclear norm regularization solution associated with the so-called singular
value shrinkage operator (see, e.g., [4,29]). On the other hand, the half thresholding operator
associated here is more complicated than the singular value shrinkage operator due to our
nonconvex, nonsmooth and non-Lipschitz minimization problem.

Definition 3.10 We call X a global stationary point of problem (1.7) if there exists 0 <
A2 such that X satisfies the fixed point equation (3.14).

4 Fixed point iteration and its convergence


According to the xed point equation (3.14), a xed point iterative formula for the S1/2
regularization problem (1.7) can be naturally proposed as follows: given X0 ,
Xk+1 = H (Xk + A (b A(X)k )). (4.1)
To simplify the process of iterations and for the aim to nd low rank solutions, we make a
slightly adjustment of h in (4.1) as follows
{
3
h,1/2 (t), if t > 454 ()2/3
h (t) := (4.2)
0, otherwise.

3
The adjustment here is to choose h (t) = 0 when t = 454 ()2/3 .
Next, let us analyze the convergence of the above xed point iteration.

9
Theorem 4.1 Given > 0, choose 0 < < A2 . Let {Xk } be the sequence generated by
iteration (4.1). Then
(i) {C (Xk )} is strictly monotonically decreasing and converges to C (X ) where X is
any accumulation point of {Xk }.
(ii) {Xk } is asymptotically regular, that is, lim Xk+1 Xk F = 0.
k
(iii) Any accumulation point of {Xk } is a global stationary point of problem (1.7).

Proof. (i) Let C (X), C, (X, Z) and B (Z) be dened by (3.8)-(3.10), and B (Z)
admit the SVD as B (Z) = U Diag()V T where U Rmr , V Rnr and Rr++ .
From Lemma 3.7, we have

C, (H (B (Z)), Z) = min C, (X, Z),


X

and therefore,

C, (Xk+1 , Xk ) = min C, (X, Xk ), (4.3)


X

where Xk+1 = H (B (Xk )) = Uk Diag(H (k ))VkT and Uk Diag(k )VkT is the SVD of
B (Xk ). Since 0 < < A2 , we have

1
A(X)k+1 A(X)k 22 Xk+1 Xk 2F < 0.

Hence,
1( )
C (Xk+1 ) = C, (Xk+1 , Xk ) Xk+1 Xk 2F + A(X)k+1 A(X)k 22

1( )
C, (Xk , Xk ) Xk+1 Xk 2F + A(X)k+1 A(X)k 22

1 ( 1 )
= C, (Xk , Xk ) + A(X)k+1 A(X)k 22 Xk+1 Xk 2F

1
< C, (Xk , Xk ) = C (Xk ),

which shows that {C (Xk )} is strictly monotonically decreasing. Since {C (Xk )} is bounded
from below, {C (Xk )} converges to a constant C . From {Xk } {X : C (X) C (X0 )}
which is bounded, it follows that {Xk } is bounded, and therefore {Xk } has at least one
accumulation point. Let X be an accumulation point of {Xk }. By the continuity of C (X)
and the convergence of {C (Xk )}, we get C (Xk ) C = C (X ) as k +.
(ii) Since 0 < < A2 , we have 0 < := 1 A2 < 1 and
1( )
Xk+1 Xk 2F Xk+1 Xk 2F A(X)k+1 A(X)k 22 .

From (3.8), (3.9) and (4.3), we derive

[C (Xk ) C (Xk+1 )] = C, (Xk , Xk ) C (Xk+1 )


C, (Xk+1 , Xk ) C (Xk+1 )
= Xk+1 Xk 2F A(X)k+1 A(X)k 22 .

10
The above two inequalities yield that, for any positive integer K,


K
1 (
K
)
Xk+1 Xk 2F Xk+1 Xk 2F A(X)k+1 A(X)k 22

k=0 k=0

K
(C (Xk ) C (Xk+1 ))

k=0

= (C (X0 ) C (XK+1 ))


C (X0 ).



Hence, Xk+1 Xk 2F < +, and so Xk+1 Xk F 0 as k +. Thus, {Xk } is
k=0
asymptotically regular.
(iii) Let {Xkj } be a convergent subsequence of {Xk } and X be its limit point, i.e.,

Xkj X , as kj +. (4.4)

From the above limitation, we derive

B (Xkj ) = Xkj + A (b A(Xkj )) X + A (b A(X )) = B (X ), as kj +,

i.e.,

Ukj Diag(kj )VkTj U Diag( )V T , as kj +, (4.5)

where B (Xkj ) = Ukj Diag(kj )VkTj and B (X ) = U Diag( )V T are the SVDs of B (Xkj )
and B (X ) respectively. According to (4.5) and [22, Corollary 7.3.8], we have

[kj ]i [ ]i for each i = 1, r, as kj +, (4.6)

where r is the rank of B (X ). By the selection principle (see, e.g., [22, Lemma 2.1.8]), we
can suppose that

Ukj U , Diag(kj ) Diag( ), Vkj V , as kj +, (4.7)

for some U Rmr and V Rnr both with orthonormal columns. From (4.7), we get
Ukj Diag(kj )VkTj U Diag( )V T . This together with (4.5) implies

U Diag( )V T = U Diag( )V T = B (X ). (4.8)

The limitation (4.4) and the asymptotical regularity of {Xk } imply

Xkj +1 X F Xkj +1 Xkj F + Xkj X F 0, as kj +,

which veries that {Xkj +1 } also converges to X . Note that Xkj +1 = Ukj Diag(H (kj ))VkTj ,
which together with Xkj +1 X yields

Ukj Diag(H (kj ))VkTj X , as kj +. (4.9)

If there holds

h ([kj ]i ) h ([ ]i ) for each i = 1, 2, r, as kj +, (4.10)

11
then from (4.7), (4.10) and (4.8), we get
Ukj Diag(H (kj ))VkTj U Diag(H ( ))V T = H (B (X ))
as kj +,
where the last equality is due to the well-denedness 1 of H (). The above limitation as
well as (4.9) gives X = H (B (X )), that is, X is a global stationary point of problem
(1.7).
The rest thing is to prove (4.10)
to be true.
3

For i = 1, , r, if [ ]i < 4 ()2/3 , then by (4.6),
54


3
54
[kj ]i < ()2/3 when kj is suciently large.
4
This inequality as well as the denition of h in (4.2) gives
h ([kj ]i ) = 0 h ([ ]i ) = 0, as kj +.

3
If [ ]i > 54
4 ()
2/3 , then by (4.6),
3
54
[kj ]i > ()2/3 when kj is suciently large.
4
Note
( that although
) h () dened by (4.2) is not continuous on [0, +), it is continuous in
3
54
4 ()
2/3 , + . So it follows from [kj ]i [ ]i that

h ([kj ]i ) h ([ ]i ), as kj +.

3
If [ ]i = 454 ()2/3 , since [kj ]i [ ]i , there are two possible cases:
Case 1: There is a subsequence of {[kj ]i }, say {[kjm ]i }, converging to [ ]i such that
[kjm ]i [ ]i for each kjm . In this case, we have
h ([kjm ]i ) = 0 h ([ ]i ) = 0, as kjm +.
Case 2: There is a subsequence of {[kj ]i }, say {[kjn ]i }, converging to [ ]i such that

3
[kjn ]i > [ ]i = 454 ()2/3 for each kjn . However, we will verify this case can never happen
as long as is chosen appropriately.
If Case 2 happens,there is a large integer N1 such that
(
3
54

3
54 )
[kjn ]i 2/3
() , ()2/3
4 3
holds for any kjn N1 . By (ii), Xkjn +1 Xkjn F 0 as kjn +. Then there is a large
integer N2 N1 such that
(
3
54

3
54 )
[kjn +1 ]i 2/3
() , ()2/3 (4.11)
4 3
1
The matrix half thresholding operator H : Rmn Rmn here is in fact a non-symmetric Lowners
operator [38] associated with the half thresholding function h : R R. The non-symmetric Lowners
operator H : Rmn Rmn is called well-dened if it is independent of the choice of the matrices U and V
in the SVD. In other words, if Y Rmn has two dierent SVDs such as Y = U1 Diag()V1T = U2 Diag()V2T ,
we have H (Y ) = U1 Diag(h (1 ), , h (m ))V1T = U2 Diag(h (1 ), , h (m ))V2T . Theorem 1 of
Lecture III in [38] proves that a non-symmetric Lowners operator H : Rmn Rmn associated with a
scalar valued function h : R+ R+ is well-dened if and only if h(0) = 0. By this theorem, our matrix half
thresholding operator H is well-dened since h (0) = 0.

12
holds for any kjn N2 .
On the other hand, since B (Xkjn ) = Xkjn + A (b A(Xkjn )) is continuous in and
B (Xkjn ) Xkjn if 0, we know that if is chosen suciently small, [(B (Xkjn ))]i will
be closed to [kjn ]i . Let be chosen such that
(
3
54

3
54 )
[(B (Xkjn ))]i () 2/3
, ()2/3
4 3
holds for any kjn N2 . According to (3.2)-(3.4), by direct computation, we know

3
54

lim (t) = and
lim h (t) = ()2/3 .
t
3 54
()2/3 4 t
3 54
()2/3 6
4 4


3
Note that [kjn +1 ]i = h ([(B (Xkjn ))]i ) and h () is increasing in ( 454 ()2/3 , +)
(Lemma 3.2), then there is a large integer N3 N2 such that
(3
54

3
54 )
[kjn +1 ]i = h ([(B (Xkjn ))]i ) 2/3
() , ()2/3 (4.12)
6 4
holds for any kjn N3 . One can nd that (4.12) is in contradiction with (4.11). This
contradiction shows that Case 2 will never happen as long as is chosen appropriately.
Therefore, we have shown (4.10) is true. The proof is thus complete.

5 Setting of parameters and fixed point contiuation


In this section, we discuss the problem of parameter selection in our algorithm. As we all
know, the quality of solutions to optimization problems depends seriously on the setting of
regularization parameter . But the selection of proper parameters is a very hard prob-
lem. There is no optimal rule in general. Nevertheless, when some prior information (e.g.,
low rank) is known for a problem, it is realistic to set the regularization parameter more
reasonably.

5.1 Location of the optimal regularization parameter


We begin with nding the location of the optimal , which then serves as the basis of the
parameter setting strategy used in the algorithm to be proposed. Specically, suppose that
a problem can be formulated as an S1/2 regularization form (1.7), whose solutions are the
matrices of rank r.
Thus, we are required to solve the S1/2 regularization problem restricted
to the subregion r = {X Rmn | rank(X) = r}. For any , denote by B (X) = X +
A (b A(X)). Assume X is a solution to the S1/2 regularization problem and (B (X ))
is arranged in nonincreasing order. By Theorem 3.9 (particularly (3.16)) and (4.2), we have
3
3
54 2/3 54 2/3
[(B (X ))]i > ( ) [(X )]i > ( ) i {1, 2, , r}
4 6
and

3
54 2/3
[(B (X ))]i ( ) [(X )]i = 0 i {r + 1, r + 2, , n},
4

13
which implies

96 96
3/2
([(B (X ))]r+1 ) < ([(B (X ))]r )3/2 .
9 9
The above estimation provides an exact location of where the optimal parameter should be.
We can then take
( )
96
= (1 ) ([(B (X ))]r+1 )3/2 + ([(B (X ))]r )3/2
9
with any [0, 1). Especially, a most reliable choice of is

96
= ([(B (X ))]r+1 )3/2 . (5.1)
9
Of course, it may not be the best choice since we should note that the larger , the larger
3
threshold value 454 ( )2/3 , and the lower rank of the solution resulted by the thresholding
algorithm.
We also note that formula (5.1) is valid for any xed . We will use it with a xed 0
satisfying 0 < 0 < A2 below. In applications, we may use Xk instead of the real solution
X and the rank of Xk instead of r + 1, that is, we can take

96
k+1 = ([(Xk )]rk )3/2 , (5.2)
90
where rk is the rank of Xk . More often, we can also take
{ { }}
96
k+1 = max , min k , ([(Xk )]rk )3/2 , (5.3)
90

where is a suciently small but positive real number, and (0, 1) is a constant and rk
is the rank of Xk . In this case, {k } can keep monotonically decreasing. In next subsection,
one will see that (5.3) may result an acceleration of the iteration.

5.2 Interpretation as a method of fixed point continuation


In this subsection, we recast (5.3) as a continuation technique (i.e., homotopy approach)
which accelerates the convergence of the xed point iteration. In [21], Hale et al. de-
scribe a continuation technique to accelerate the convergence of the xed point iteration
for the 1 regularization problem. Inspired by this work, Ma et al. [29] provide a sim-
ilar continuation technique to accelerate the convergence of the xed point iteration for
the nuclear norm regularization problem. As shown in [21, 29], this continuation technique
improves considerably the convergence speed of xed point iterations. The main idea in
their continuation technique, explained in our context, is to choose a decreasing sequence
{k } : 1 > 2 > > L = > 0, then in the kth iteration, use = k . Therefore,
formula (5.3) coincides with this continuation technique. Generally speaking, our algorithm
can be regarded as a xed point continuation algorithm, but is implemented to a nonsmooth,
nonconvex and non-Lipschitz optimization problem.
Thus, a xed point iterative algorithm based on the half norm of matrices for problem
(1.7) can be specied as follows.

14
Algorithm 5.2. Half Norm Fixed Point algorithm (HFP algorithm)
Given the linear operator A : Rmn Rp and the vector b Rp ;
Set the parameters 0 > 0, > 0 and (0, 1).
- Initialize: Choose the initial values {X0 , 1 } with 1 , set X = X0 and = 1 .
- for k = 1 : maxiter, do = k ,
-while NOT converged, do
Compute B = X + 0 A (b A(X)), and its SVD, say B = U Diag()V T
Compute X = U Diag(H0 ())V T
- end while, and{output: { Xk , k , rk =rank(Xk ); }}

- set k+1 = max , min k , 96
90 ([(Xk )]rk )3/2 ;
- if k+1 = , return;
- end for

In Algorithm 5.2, the positive integer maxiter is large enough that the convergence in
outer loop can be ensured.

5.3 Stopping criteria for inner loops


Note that in the half norm xed point algorithm, in the kth inner loop we solve problem
(1.7) for a xed = k . We must determine when to stop this inner iteration and go to
the next inner iteration. Since when Xk gets close to an optimal solution X , the distance
between Xk and Xk+1 should become very small. Hence, we can use the following criterion

Xk+1 Xk F
< xtol, (5.4)
max{1, Xk F }

where xtol is a small positive number.


Besides the above stopping criterion, we use Im to control the maximum number of the
inner loops. i.e., if the stopping rule (5.4) is not satised after Im iterations, we terminate
the subproblem and update to start the next subproblem.

6 HFPA algorithm: HFP algorithm with an approximate SVD


In Algorithm 5.2, computing singular value decompositions is the main computational cost.
Inspired by the works of Cai et al. [4] and Ma et al. [29], instead of computing the full SVD
of the matrix B in each iteration, we implement a variant of HFP algorithm in which we
compute only a rank-r approximation to B, where r is an estimator of the rank of the opti-
mal solution. We call this half norm xed point algorithm with an approximate SVD HFPA
algorithm. This approach greatly reduces the computational eort required by the algo-
rithm. Specically, we compute an approximate SVD by a fast Monte Carlo algorithm: the
n A R
Linear Time SVD algorithm developed by Drineas et al. [16]. For a given matrix mn ,

and parameters cs , ks Z with 1 ks cs n, and {pi }i=1 , pi 0, i=1 pi = 1, this


+ n

algorithm returns an approximation to the largest ks singular values and corresponding left
singular vectors of the matrix A in linear O(m + n) time. The Linear Time Approximate
SVD Algorithm is outlined below.

15
Linear Time Approximate SVD Algorithm [16, 29]
- Input: A Rmn ,cs , ks Z+ s.t. 1 ks cs n, {pi }ni=1 s.t. pi 0, ni=1 pi = 1.
- Output: Hks Rmks and t (C), t = 1, 2, , ks .
- For t = 1 to cs ,
Pick it {1, 2, , n} with Prob{it = } = p , = 1, 2, , n.

Set C (t) = A(it ) / ct pit .
s 2 T
- Compute C T C and its SVD, say C T C = ct=1 t (C)y t y t .
- Compute h = Cy /t (C) for t = 1, 2, , ks .
t t
(t)
- Return Hks , where Hks = ht , and t (C), t = 1, 2, , ks .

The outputs t (C) (t = 1, 2, , ks ) are approximations to the largest ks singular values


(t)
and Hks (t = 1, 2, , k) are approximations to the corresponding left singular vectors of
the matrix A. Thus, the SVD of A is approximated by

A Aks := Hks Diag((C))(AT Hks Diag(1/(C)T ).

Drineas et al. [16] prove that with high probability, Aks is an approximation to the best
rank-ks approximation to A.
[( In our numerical experiments,
) ] same as in [29], we set cs = 2rm 2, where rm =

m + n (m + n) 4p /2 is, for a given number of entries sampled, the largest rank
2

of m n matrices for which the matrix completion problem has a unique solution. We refer
ro [29] for how to set ks . We also set all pi equal to 1/n. For more details for the choices
of the parameters in the Linear Time Approximate SVD Algorithm, please see [16, 29]. The
Linear Time Approximate SVD Code we will use is written by Shiqian Ma and is available
at http://www.columbia.edu/sm2756/FPCA.htm.

7 Numerical experiments
In this section, we report some numerical results on a series of matrix completion problems
of the form (1.2) to demonstrate the performance of the HFPA algorithm. The purpose of
numerical experiments is to assess the eectiveness, accuracy, robustness and convergence of
the algorithm. The eectiveness is measured by how few measurements required to exactly
recover a low-rank matrix. The fewer the measurements used by an algorithm, the better
the algorithm. Under the same measurements, the shorter time used by an algorithm and
the higher accuracy achieved by it, the better the algorithm. We will also test the robustness
of the algorithm with respect to the varying dimensions, the varying ranks and the varying
sampling ratios. To compare performance of nding low-rank matrix solutions, some other
competitive algorithms such as the singular value thresholding algorithm (SVT2 ) [4], the xed
point contiuation algorithm based on an approximate SVD using the iterative Lanczos algo-
rithm (FPC3 ) [29], the xed point contiuation algorithm based on a linear time approximate
SVD (FPCA4 ) [29] have been also demonstrated together with our HFPA algorithm. Note
that the former three algorithms are all based on the nuclear norm minimization, while these
2
The SVT code is available at http://svt.caltech.edu, which is written by Emmanuel Candes, October
2008, and last modied by Farshad Harirchi and Stephen Becker, April 2011.
3
The FPC code is available at http://svt.caltech.edu, which is coded by Stephen Becker, March 2009. He
referred to [29].
4
The FPCA code is available at http://www.columbia.edu/sm2756/FPCA.htm, which is coded and mod-
ied by Shiqian Ma, July 2008 and April 2009, respectively.

16
four algorithms all depend on the approximate SVD. We also note that some manifold-based
algorithms without SVD, such as GenRTR [1], RTRMC [3], OptSpace [24] and LMaFit [42],
have good performances. Because of space constraints, we will not compare to them.
All computational experiments were performed in MATLAB R2009a on a Dell desktop
computer with an Intel(R) Core (TM) i3-2120 3.29GHZ CPU and 3.23GB of RAM.
In our simulations, we will use the same way as used in relevant researches (for instance,
[4, 6, 29]) to generate m n matrices of rank r. The procedure is that: we rst generate
random matrices ML Rmr and MR Rnr with i.i.d. Gaussian entries, and then set
M = ML MRT . We then sampled a subset of p entries uniformly at random. Thus, the
entries of M on are observed data and M is the real unknown matrix. For each problem
with m n matrix M , measurement number p and rank r, we solved a xed number of
randomly created matrix completion problems. We use SR:=p/(mn), i.e., the number of
measurements divided by the number of entries of the matrix, to denote the sampling ratio.
Recall that an m n matrix of rank r depends upon df:=r(m + n r) degrees of freedom.
Then OS:=p/df is the oversampling ratio, i.e., the ratio between the number of sampled
entries and the true dimensionality of an m n matrix of rank r. Note that if OS< 1,
then there is always an innite number of matrices with rank r with the given entries, so
we cannot hope to recover the matrix in this situation. We also note that when OS 1, the
closer to 1 the OS is, the more dicult to recover the matrix. For this reason, following [29],
we call a matrix completion problem a easy problem if OS and SR in this problem are such
that OSSR>0.5 and OS > 2.6, equivalently a hard problem if OSSR0.5 or OS 2.6.
In the tables, FR := 1/OS = df /p is an often used quantity in literatures. We use rank to
denote the average rank of matrices that are recovered by an algorithm. We use time and
iter to denote the average time (seconds) and the average number of iterations respectively
that an algorithm takes to reach convergence.
We use three relative errors: the relative error on , the relative recovery error in the
Frobinus norm and the relative recovery error in the spectral norm
M () Xopt ()F M Xopt F M Xopt 2
rel.err() := , rel.err.F := , rel.err.s :=
M ()F M F M 2
to evaluate the closeness of Xopt to M , where Xopt is the optimal solution to (1.2) obtained
by an algorithm.
The parameters and initial values in HFPA algorithm for matrix completion problems
are listed in Table 1.

Table 1: Parameters and initial values in HFPA algorithm

= 1e 4, = 1/4, 1 = min{3, mn/p}A (b)2 , X0 = A (b), maxiter = 10, 000,


if {hard problem & max(m, n) < 1000} 0 = 1.7; Im = 200;
else
if {SR < 0.5 & min(m, n) 1000} 0 = 1.98; else 0 = 1.5; end; Im = 10;
end

7.1 Results for randomly created noiseless matrix completion problems


Our rst experiment is to compare the recovering ability of HFPA to SVT, FPC and FPCA for
small and easy matrix completion problems. Here a small matrix means that the dimension

17
of the matrix is less than 200, Specically in the rst experiment, we take m = n = 100,
OS= 3, FR=0.33 and let the real rank increased from 6 to 16 per 1 increase. The tolerance in
the four algorithms is set to be 104 . For each scale of these problems, we solve 10 randomly
created matrix completion problems. The computational results for this experiment are
displayed in Table 2.
From Table 2, the rst observation is that only HFPA recovers all the real ranks. When
r < 11, the recovered ranks by SVT are larger than the real ranks; the recovered ranks by
FPC are also larger than the real ones when r < 10; the same thing happens to FPCA as
the real rank equal to 16. The second observation is that HFPA runs fastest among the four
algorithms for most of the problems. As the real ranks change from 6 to 16, the time cost
by HFPA is almost no change. Although when r 6, FPCA is slightly faster than HFPA in
several percent seconds, HFPA is much faster than FPCA when r 12. Obviously, HFPA
runs faster than SVT and FPC. At last, let us make a comparison among the accuracies
achieved by the four algorithms. We can observe that HFPA achieves the most accurate
solutions for most of the problems; even when r 12, at least one of the three relative errors
by HFPA achieves 106 ; meanwhile the accuracies of SVT and FPC are not very good when
r 7, and FPCA begins to yield very inaccurate solutions when r 13. We can draw
a conclusion that for the small and easy matrix completion problems, HFPA is very fast,
eective and robust.
Our second experiment is to compare the recovering abilities of HFPA to SVT, FPC
and FPCA for small but hard matrix completion problems. These problems are hard and
challenging to recover because the oversampling ratio OS=2 is very close to 1, which implies
that the observed data are very limited with respect to the freedom degree of the unknown
matrices. In this experiment, we take m = n = 100, OS=2, FR=0.50 and let r increased
from 2 to 24 per 2 increases. For this set of problems, SR ranges from 7.9% to 84.5%. The
tolerance in this experiment is set to be 106 . For each scale of these problems, we also
solve 10 randomly created matrix completion problems. The results are displayed in Table 3.
From Table 3, we nd that SVT and FPC cannot work well in the sense that the recovered
ranks by them are far more than the real ranks and the accuracies of their solutions are poor
until the real rank increases to 20. It is clear that FPCA and HFPA both work very well.
We can observe that as r increases from 2 to 24, the time cost by HFPA and FPCA are both
increasing but in slow speed. As we can see, HFPA shares the accuracy as good as or slightly
better than FPCA, however the former is obviously faster than the later.
Now we begin to test our algorithm for large randomly created matrix completion prob-
lems. We only run 5 times for each large scale problems. The numerical results of HFPA
for easy and large matrix completion problems are presented in Table 4. For easy problems,
since SVT performs in general better than FPC, we omit the results of FPC in Table 4 for
the sake of limited spaces. For example, when m = n = 1000, r = 10, OS=6, SR=0.119 and
FR=0.17, FPC costs more than 350 seconds to recover the matrix and SVT only costs about
8 seconds while they achieve the similar accuracy.
From Table 4, we can see that for a 3000 3000 unknown matrix of rank 200 with 38.7%
sampling ratio, HFPA can well recover it in only 12 minutes, while SVT needs half an hour
and FPCA fails to work. We also nd that for a xed scale unknown matrix, the decrease of
sampling ratio has little inuence on the computational time of HFPA, however the increase
of sampling ratio can remarkably improve its accuracy. We can conclude that for these easy
problems some of which have a very low rank and some of which have a low but not very low
rank, HFPA is always powerful enough to recover them.
For hard problems, without any exception SVT and FPC either diverge or cannot termi-

18
Table 2: Comparison of SVT, FPC, FPCA and HFPA for randomly created small and easy
matrix completion problems (m = n = 100, r = 6 : 1 : 16, OS=3, FR=0.33, xtol=104 )

r SR Solver rank iter time rel.err() rel.err.F rel.err.s


SVT 11 1000 29.19 4.43e-2 3.51e-2 5.51e-2
6 0.349 FPC 10 742 14.39 3.67e-4 1.13e-2 2.11e-2
HFPA 6 96 0.14 1.58e-4 3.44e-4 5.43e-4
FPCA 6 84 0.12 9.56e-5 1.91e-4 2.49e-4
SVT 11 1000 29.21 3.02e-2 3.49e-2 7.32e-2
7 0.405 FPC 11 904 15.09 3.07e-4 1.66e-2 3.63e-2
HFPA 7 70 0.13 1.54e-4 4.33e-4 8.95e-4
FPCA 7 80 0.13 1.01e-4 2.46e-4 4.81e-4
SVT 14 1000 25.61 3.78e-3 3.51e-3 7.67e-3
8 0.461 FPC 10 661 11.21 2.65e-4 4.14e-4 4.62e-4
HFPA 8 58 0.14 4.89e-5 9.68e-5 1.33e-4
FPCA 8 75 0.16 5.20e-5 7.94e-5 7.94e-5
SVT 13 1000 23.70 9.02e-3 1.27e-2 2.79e-2
9 0.516 FPC 10 390 6.37 2.23e-4 6.27e-3 1.38e-2
HFPA 9 57 0.13 4.07e-5 7.51e-5 9.27e-5
FPCA 9 76 0.16 5.20e-5 7.94e-5 7.94e-5
SVT 11 181 3.93 9.84e-5 2.45e-4 4.03e-4
10 0.570 FPC 10 229 4.20 1.97e-4 2.62e-4 2.47e-4
HFPA 10 55 0.13 2.44e-5 4.42e-5 5.62e-5
FPCA 10 89 0.17 5.03e-5 4.06e-5 4.32e-5
SVT 11 150 3.77 9.65e-5 1.77e-4 2.01e-4
11 0.624 FPC 11 186 3.11 1.75e-4 2.23e-4 1.92e-4
HFPA 11 52 0.14 1.13e-5 1.65e-5 1.82e-5
FPCA 11 153 0.31 5.86e-5 4.71e-5 6.21e-5
SVT 12 146 3.11 9.81e-5 1.81e-4 2.29e-4
12 0.677 FPC 12 161 2.64 1.59e-4 2.00e-4 1.83e-4
HFPA 12 52 0.13 7.92e-6 1.12e-5 1.40e-5
FPCA 12 511 0.91 5.98e-5 4.96e-5 7.86e-5
SVT 13 127 2.15 9.71e-5 1.67e-4 2.28e-4
13 0.729 FPC 13 126 2.10 1.45e-4 1.74e-4 1.51e-4
HFPA 13 51 0.12 5.57e-6 7.77e-6 1.14e-5
FPCA 13 600 1.07 1.67e-2 1.43e-2 1.64e-2
SVT 14 118 1.77 9.87e-5 1.61e-4 2.13e-4
14 0.781 FPC 14 114 2.06 1.35e-4 1.57e-4 1.35e-4
HFPA 14 51 0.14 4.91e-6 6.36e-6 9.23e-6
FPCA 14 600 1.17 7.31e-2 6.46e-2 7.03e-2
SVT 15 113 1.60 9.82e-5 1.55e-4 2.10e-4
15 0.833 FPC 15 98 1.85 1.22e-4 1.39e-4 1.13e-4
HFPA 15 51 0.13 2.34e-6 2.40e-6 2.76e-6
FPCA 15 600 1.21 1.91e-1 1.74e-1 2.15e-1
SVT 16 91 1.49 9.97e-5 1.46e-4 1.61e-4
16 0.883 FPC 16 85 1.72 1.13e-4 1.21e-4 8.76e-5
HFPA 16 51 0.13 1.76e-6 1.72e-6 1.65e-6
FPCA 17 600 1.10 6.70e-1 6.30e-1 6.89e-1

19
Table 3: Comparison of SVT, FPC, FPCA and HFPA for randomly created small but hard
matrix completion problems (m = n = 100, r = 2 : 2 : 24, OS=2, FR=0.50, xtol=106 ).

r SR Solver rank iter time rel.err() rel.err.F rel.err.s


SVT Divergent - - -
2 0.079 FPC 18 2755 90.85 1.67e-3 4.07e-1 3.92e-1
HFPA 2 2300 1.04 2.72e-3 4.35e-2 5.16e-2
FPCA 2 8000 3.02 8.13e-4 2.02e-2 2.57e-2
SVT 18 1000 60.76 7.91e-1 5.47e-1 3.60e-1
4 0.157 FPC 25 2585 76.99 8.74e-4 2.45e-1 2.43e-1
HFPA 4 1702 1.03 1.92e-6 5.58e-6 5.86e-6
FPCA 4 4603 2.58 3.17e-6 1.10e-5 1.13e-5
SVT 24 1000 85.26 5.78e-1 4.35e-1 3.47e-1
6 0.233 FPC 33 2384 70.79 5.99e-4 1.81e-1 1.83e-1
HFPA 6 1651 1.26 1.33e-6 3.68e-6 3.94e-6
FPCA 6 4595 3.38 2.27e-6 6.87e-6 7.71e-6
SVT 34 1000 123.11 4.03e-1 3.41e-1 2.78e-1
8 0.307 FPC 38 2296 74.53 4.87e-4 1.38e-1 1.59e-1
HFPA 8 1621 1.65 1.30e-6 3.06e-6 2.98e-6
FPCA 8 4581 4.54 1.82e-6 4.99e-6 5.15e-6
SVT 29 1000 65.02 2.60e-1 2.38e-1 2.23e-1
10 0.380 FPC 43 2704 98.62 3.99e-4 4.82e-2 6.11e-2
HFPA 10 1613 2.09 8.75e-7 1.74e-6 1.57e-6
FPCA 10 4556 5.72 1.44e-6 3.59e-6 3.21e-6
SVT 30 1000 70.19 1.15e-1 1.21e-1 1.52e-1
12 0.451 FPC 42 2493 86.29 3.26e-4 1.22e-2 2.50e-2
HFPA 12 1609 2.53 7.79e-7 1.52e-6 1.68e-6
FPCA 12 4572 7.61 1.06e-6 2.47e-6 2.52e-6
SVT 31 1000 78.73 2.19e-2 3.72e-2 6.11e-2
14 0.521 FPC 26 2223 66.01 2.45e-4 5.55e-3 1.47e-2
HFPA 14 1609 2.74 5.48e-7 9.73e-7 1.10e-6
FPCA 14 4908 7.97 7.46e-7 1.86e-6 2.59e-6
SVT 38 1000 36.12 2.10e-4 2.17e-3 2.32e-3
16 0.589 FPC 22 1001 28.71 2.02e-4 3.54e-4 3.00e-4
HFPA 16 1610 2.80 5.61e-7 8.62e-7 9.38e-7
FPCA 16 5011 8.99 6.54e-7 1.42e-6 1.68e-6
SVT 28 1000 36.86 3.84e-5 2.91e-4 7.52e-4
18 0.655 FPC 20 739 17.93 1.71e-4 2.69e-4 2.49e-4
HFPA 18 1613 2.86 4.26e-7 5.55e-7 5.45e-7
FPCA 18 5051 9.03 5.70e-7 1.08e-6 1.17e-6
SVT 27 804 28.73 9.93e-7 3.85e-6 8.27e-6
20 0.720 FPC 21 445 11.10 1.56e-4 2.24e-4 2.00e-4
HFPA 20 1610 3.03 4.96e-7 5.45e-7 5.69e-7
FPCA 20 5011 9.23 3.37e-7 6.03e-7 6.47e-7
SVT 24 451 15.78 9.81e-7 2.52e-6 5.07e-6
22 0.079 FPC 22 317 7.47 1.40e-4 1.89e-4 1.63e-4
HFPA 22 1611 3.13 3.86e-7 3.89e-7 4.28e-7
FPCA 22 5011 9.34 2.79e-7 4.23e-7 4.58e-7
SVT 24 328 10.86 9.91e-7 1.94e-6 2.28e-6
24 0.845 FPC 24 234 5.67 1.22e-4 1.52e-4 1.28e-4
HFPA 24 1611 3.17 4.37e-7 4.31e-7 5.66e-7
FPCA 24 5011 9.50 1.93e-7 2.96e-7 4.14e-7

20
Table 4: Comparison of SVT, FPCA and HFPA for randomly created large and easy matrix
completion problems (xtol=104 ).

Problems SVT HFPA FPCA


m=n r OS SR FR time rel.err.F time rel.err.F time rel.err.F
200 20 3 0.570 0.33 9.29 1.80e-4 0.43 2.98e-5 0.64 3.19e-5
4 0.760 0.25 2.33 1.37e-4 0.43 1.24e-6 4.73 3.36e-3
500 50 3 0.570 0.33 29.20 1.73e-4 3.67 1.76e-5 6.50 2.56e-5
4 0.760 0.25 11.25 1.38e-4 3.45 7.73e-7 44.00 6.00e-4
800 80 3 0.570 0.33 78.78 1.70e-4 15.48 1.21e-5 28.99 2.84e-5
4 0.760 0.25 37.70 1.34e-4 15.03 2.25e-7 162.05 2.57e-3
1000 10 6 0.119 0.17 8.29 1.65e-4 4.96 4.05e-4 6.66 4.36e-4
50 4 0.390 0.25 51.33 1.61e-5 14.56 1.34e-5 24.86 3.14e-5
100 3 0.570 0.33 129.58 1.67e-4 21.99 1.04e-5 42.38 2.29e-5
2000 10 6 0.060 0.17 17.70 1.69e-4 26.87 9.01e-4 35.51 9.14e-4
100 4 0.390 0.25 231.85 1.60e-4 195.95 1.06e-5 356.37 2.36e-5
200 3 0.570 0.33 909.87 1.69e-4 257.87 1.11e-5 1413.06 4.34e-5
3000 50 5 0.165 0.20 167.38 1.54e-4 94.99 2.45e-4 126.15 2.54e-4
100 4 0.262 0.25 368.42 1.66e-4 283.15 8.67e-5 420.30 9.86e-5
200 3 0.387 0.33 1837.82 1.85e-4 717.59 3.88e-5 Out of memory!

Table 5: Comparison of FPCA and HFPA for randomly created large and hard matrix
completion problems (xtol=104 ).

Problems HFPA FPCA


m=n r OS SR FR time rel.err() rel.err.F time rel.err() rel.err.F
200 10 2 0.195 0.50 2.03 1.43e-4 3.60e-4 7.11 2.44e-4 6.91e-4
20 1.3 0.247 0.77 3.39 1.24e-4 5.75e-4 10.62 2.04e-4 1.07e-3
500 25 2 0.195 0.50 15.48 1.40e-4 3.07e-4 60.16 2.44e-4 5.97e-4
50 1.2 0.228 0.83 38.24 1.30e-4 6.66e-4 98.68 2.21e-4 1.53e-3
800 40 2 0.195 0.50 43.85 1.10e-4 2.32e-4 185.64 2.25e-4 5.36e-4
80 1.2 0.228 0.83 123.46 1.29e-4 6.77e-4 336.79 2.21e-4 1.53e-3
1000 20 2 0.079 0.50 21.00 2.95e-4 8.19e-4 22.79 2.80e-4 7.67e-4
50 2 0.195 0.50 18.01 1.64e-4 3.99e-4 20.05 1.64e-4 3.92e-4
100 1.5 0.285 0.67 49.41 1.12e-4 3.09e-4 54.00 1.11e-4 2.98e-4
2000 20 3 0.060 0.33 52.15 4.21e-4 8.94e-4 60.78 4.18e-4 8.74e-4
50 2 0.099 0.50 89.02 2.58e-4 6.59e-4 100.45 2.52e-4 6.47e-4
100 2 0.195 0.50 109.22 1.69e-4 3.99e-4 127.21 1.67e-4 3.94e-4
200 2 0.380 0.50 348.40 7.58e-5 1.50e-4 414.71 7.38e-5 1.41e-4
3000 50 2 0.066 0.50 277.01 3.16e-4 8.28e-4 290.93 3.27e-4 8.52e-4
100 2 0.131 0.50 282.62 2.14e-4 5.27e-4 304.70 2.10e-4 5.17e-4
200 2 0.258 0.50 659.74 1.39e-4 3.12e-4 747.07 1.33e-4 2.96e-4
300 2 0.320 0.50 1247.87 8.15e-5 1.60e-4 1420.86 6.90e-5 1.32e-4

21
nate in one hour, or yield very inaccurate solutions. For example, when m = n = 200, r = 10
and SR=0.195 which is the simplest case, SVT costs more than 300 seconds to recover a
matrix of rank 43 with the relative error in Frobinus norm of 101 , while FPC recovers a
matrix of rank 69 with relative error in Frobinus norm of 101 . Another simple example is
that when m = n = 200, r = 20 and SR=0.380, SVT costs more than 700 seconds to recover
a matrix of rank 87 with relative error in Frobinus norm of 101 , while FPC recovers a matrix
of rank 96 with relative error in Frobinus norm of 102 . Therefore, in this case, only FPCA
is comparable to HFPA. The results are displayed in Table 5. From Table 5, we can see that
HFPA still has a powerful recovering ability for hard and large matrix completion problems.

7.2 Results for matrix completion from noisy sampled entries


In this subsection, we simply demonstrate the results of HFPA for matrix completion prob-
lems from noisy sampled entries. Suppose we observe data from the following model

Bij = Mij + Zij , (i, j) , (7.1)

where Z is a zero-mean Gaussian white noise with standard deviation . The results of
HFPA together with SVT and FPCA are displayed in Table 6. The quantities are averages
of 5 runs. The tolerance is set to be 104 .
From Table 6, we see that for the noisy sampled data HFPA performs as well as or slightly
better than FPCA, while it is obviously more powerful than SVT.

Table 6: Numerical results for SVT, HFPA and FPCA on random matrix completion prob-
lems with noisy

noise Problems SVT HFPA FPCA


level m=n r OS SR time rel.err.F time rel.err.F time rel.err.F
102 1000 10 6 0.119 74.38 3.15e-3 33.44 1.71e-3 34.88 1.73e-3
50 4 0.390 448.84 1.65e-3 206.44 1.02e-3 208.34 1.02e-3
100 3 0.570 930.83 1.26e-3 294.20 8.29e-4 292.99 1.09e-3
150 2 0.555 - - 291.02 9.04e-4 298.56 1.30e-3
101 1000 10 6 0.119 - - 33.22 1.71e-2 34.49 1.73e-2
50 4 0.390 1770.99 *1.66e-2 203.97 1.00e-2 204.89 1.01e-2
100 3 0.570 2392.36 **1.23e-2 289.36 8.38e-3 288.25 1.10e-2
150 2 0.555 - - 308.96 5.09e-3 309.24 7.36e-3
* The recovered rank by SVT is 125.
** The recovered rank by SVT is 167.
- The SVT algorithm can not terminate in one hour.

7.3 Results for real problems


In this subsection, we apply HFPA to image inpainting problems in order to test its ef-
fectiveness in real data matrices. It is well known that grayscale images and color images
can be expressed by matrices and tensors, respectively. In grayscale image inpainting, the
grayscale value of some of the pixels of the image are missing, and we want to ll in these
missing values. If the image is of low-rank, or of numerical low-rank, we can solve the image
inpainting problem as a matrix completion problem (1.2) (see, e.g., [29]). Here, Figure 2(a)
is a 600 903 grayscale image of rank 600. We applied SVD to Figure 2(a) and truncated

22
100 200 300 400 500 600 700 800 900 100 200 300 400 500 600 700 800 900
(a) (b)

100 200 300 400 500 600 700 800 900 100 200 300 400 500 600 700 800 900
(c) (d)

100 200 300 400 500 600 700 800 900 100 200 300 400 500 600 700 800 900
(e) (f)

100 200 300 400 500 600 700 800 900 100 200 300 400 500 600 700 800 900
(g) (h)

Figure 2: (a):Original 600 903 image with full rank; (b): Image of rank 80 truncated from (a);
(c): 50% randomly masked from (a); (d): Recovered image from (c) (rel.err.F = 8.30e-2); (e): 50%
randomly masked from (b); (f): Recovered image from (e) (rel.err.F = 6.56e-2); (g): Deterministically
masked from (b); (h): Recovered image from (g) (rel.err.F= 6.97e-2).
23
this decomposition to get the rank-80 image which is shown in Figure 2(b). Figure 2(c) is a
masked version of the image in Figure 2(a), where half of the pixels in Figure 2(a) have been
lost at random. Figure 2(d) is an image obtained from Figure 2(c) by applying HFPA. Figure
2(d) is of rank 54 and it is a low-rank approximation to Figure 2(a) with a relative error of
8.30e-2. Figure 2(e) is a masked version of the image in Figure 2(b), where half of the pixels
in Figure 2(b) have been masked uniformly at random. Figure 2(f) is the image obtained
from Figure 2(e) by applying HFPA. Figure 2(f) is of rank 46 and it is an approximation to
Figure 2(b) with a relative error of 6.56e-2. Figure 2(g) is another masked image obtained
from Figure 2(b), where 10 percent of the pixels have been masked in a non-random fashion.
Figure 2(h) is the image obtained from Figure 2(g) by applying HFPA. Figure 2(h) is of rank
52 and it is an approximation to Figure 2(b) with a relative error of 6.97e-2.

8 Conclusion
In this paper, we proposed using the S1/2 regularization method, a nonconvex, nonsmooth,
and non-Lipschitz optimization problem, to solve the ane rank minimization problem. We
rst gave a globally necessary optimality condition for the S1/2 regularization problem, which
was characterized as a matrix-valued xed point equation associated with the singular value
half thresholding operator. Then the xed point iterative method for the S1/2 regularization
problem was naturally proposed, and the convergence analysis was established. By using a
useful parameter setting strategy together with an approximate singular value decomposition
procedure, we get a very ecient algorithm (HFPA) for ane rank minimization problems.
Numerical results on matrix completion problems showed that the proposed HFPA algorithm
is very fast, ecient and robust.

Acknowledgements
The authors are grateful to Prof. Shiqian Ma at The Chinese University of Hong Kong for
sharing the FPCA code. The authors also thank Prof. Wotao Yin at Rice University for his
valuable suggestion in the convergence analysis.

References
[1] P.-A. Absil, C. Baker, and K. Gallivan, Trust-region methods on Riemannian manifolds,
Found. Comput. Math., 7(2007), pp. 303-330.
[2] W. Bian, X. Chen, Worst-case complexity of smoothing quadratic regularization meth-
ods for non-Lipschitzian optimization, SIAM J. Optim., 23 (2013), pp. 1718-1741.
[3] N. Boumal, and P. Absil, RTRMC: A riemannian trust-region method for low-rank
matrix completion, In NIPS, 2011.
[4] J. Cai, E. Candes, and Z. Shen, A singular value thresholding algorithm for matrix
completion, SIAM J. Optim., 20 (2010), pp. 1956-1982.
[5] E. Candes, and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98
(2010), pp. 925-936.
[6] E. Candes, and B. Recht, Exact matrix completion via convex optimization, Found.
Comput. Math., 9 (2009), pp. 717-772.

24
[7] E. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal recon-
struction from highly incomplete frequency information, IEEE Trans. Inform. Theory,
52 (2006), 489-509.
[8] E. Candes, and T. Tao, The power of convex relaxation: Near-optimal matrix comple-
tion, IEEE Trans. Inform. Theory, 56 (2010), pp. 2053-2080.
[9] R. Chartrand, Exact reconstructions of sparse signals via nonconvex minimization, IEEE
Signal Process. Lett., 14 (2007), pp. 707-710.
[10] R. Chartrand, Nonconvex regularization for shape preservation, IEEE Inter. Confer.
Image Process., I (2007), pp. 293-296.
[11] R. Chartrand, Fast algorithms for nonconvex compressive sensing: MRI reconstruction
from very few data, in Proc. IEEE Int. Symp. Biomed. Imag., 2009, pp. 262-265.
[12] R. Chartrand, and V. Staneva, Restricted isometry properties and nonconvex compres-
sive sensing, Inverse Problems, 24 (2008), pp. 20-35.
[13] X. Chen, D. Ge, Z. Wang, Y. Ye, Complexity of unconstrained 2 -p minimization, Math.
Program., to appear.
[14] X. Chen, F. Xu, and Y. Ye, Lower bound theory of nonzero entries in solutions of 2 p
minimization, SIAM J. Sci. Comput., 32 (2010), pp. 2832-2852.
[15] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289-1306.
[16] P. Drineas, R. Kannan, and M. W. Mahoney, Fast Monte Carlo algorithms for matrices
ii: computing low-rank approximations to a matrix, SIAM J. Comput., 36 (2006), pp.
158-183.
[17] B. Efron, T. Hastie, and I. M. Johnstone et al, Least angle regression, Annals of Statis-
tics, 32 (2004), pp. 407-499.
[18] M. Fazel, Matrix rank minimization with applications, PhD thesis, Stanford University,
2002.
[19] M. Fazel, H. Hindi, and S. Boyd, A rank minimization heuristic with application to
minimum order system approximation, in Proc. Amer. Control Confer., 2001.
[20] M. Fazel, H. Hindi, and S. Boyd, Log-det heuristic for matrix rank minimization with
applications to Hankel and Euclidean distance matrices, in Proc. Amer. Control Confer.,
2003.
[21] E. T. Hale, W. Yin, and Y. Zhang, A xed-point continuation method for 1 -regularized
minimization: methodology and convergence, SIAM J. Optim., 19 (2008), pp. 1107-1130.
[22] R. Horn, and C. Johnson, Matrix Analysis, Cambridge University Press, New York,
1990.
[23] S. Ji, K.-F. Sze, and Z. Zhou et al, Beyond convex relaxation: a polynomial-
time non-convex optimization approach to network localization, avaliable at
http://www.stanford.edu/ yyye/lp snl local.
[24] R. Keshavan, A. Montanari, and S. Oh, Matrix completion from a few entries, IEEE
Trans. Inform. Theory, 56 (2010), pp. 2980-2998.
[25] M.-J. Lai, Y. Xu, and W. Yin, Improved iteratively rewighted least squares for uncon-
strained smoothed p minimization, Rice CAAM technical report 11-12, 2012.
[26] Z. Lin, M. Chen, and Y. Ma, The augmented Lagrange multiplier method for exact
recovery of corrupted low-rank matrices, arxiv.org/abs/1009.5055v2, 2011.
[27] Y. Liu, D. Sun, and K.-C. Toh, An implementable proximal point algorithmic framework
for nuclear norm minimization, Math. Program. Ser A, 133 (2012), pp. 399-436.

25
[28] Z. Liu, and L. Vandenberghe, Interior-point method for nuclear norm approximation
with application to system identication, SIAM J. Matrix Anal. Appl., 31 (2009), pp.
1235-1256.
[29] S. Ma, D. Goldfarb, and L. Chen, Fixed point and Bregman iterative methods for matrix
rank minimization, Math. Program. Ser A, 128 (2011), pp. 321-353.
[30] K. Mohan, and M. Fazel, Iterative reweighted algorithms for matrix rank minimization,
J Machine Learn. Res., 13 (2012), pp. 3253-3285.
[31] F. Nie, H. Huang, and C. Ding, Low-rank matrix recovery via ecient Schatten p-
norm minimization, Proceedings of the Twenty-Sixth AAAI Conference on Articial
Intelligence, 2012, pp. 655-661.
[32] A. Rakotomamonjy, R. Flamary, and G. Gasso et al, p -q penalty for sparse linear and
sparse multiple kernel multitask learning, IEEE Trans. Neural Netw., 22 (2011), pp.
1307-1320.
[33] G. Rao, Y. Peng, and Z. Xu, Robust sparse and low-rank matrix fraction based on the
S1/2 modeling, Science China-Infor. Sci., to appear.
[34] B. Recht, M. Fazel, and P. Parrilo, Guaranteed minimum rank solutions of matrix
equations via nuclear norm minimization, SIAM Review, 52 (2010), pp. 471-501.
[35] R. T. Rockafellar, Monotone Operators and the proximal point algorithm, SIAM J.
Control Optim., 14 (1976), pp. 877-898.
[36] A. Rohde, and A. Tsybakov, Estimation of high-dimensional low-rank matrices, Annals
of Statistics, 39 (2011), pp. 887-930.
[37] R. Skelton, T. Iwasaki, and K. Grigoriadis, A unied algebraic approach to linear control
design, Taylor and Francis, 1998.
[38] D. Sun, Matrix Conic Programming, Dalian University of Science and Technology,
Dalian, 2011.
[39] M. Tao, and X. Yuan, Recovering low-rank and sparse components of matrices from
incomlete and noisy observations, SIAM J. Optim., 21 (2011), pp. 57-81.
[40] K.-C. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regu-
larized least squares problems, Pacic J. Optim., 6 (2010), 615-640.
[41] R. Tutuncu, K. Toh, and M. Todd, Solving semidenite-quadratic-linear programs using
SDPT3, Math. Program. Ser. B, 95 (2003), pp. 189-217.
[42] Z. Wen, W. Yin, and Y. Zhang, Solving a low-rank factorization model for matrix
completion by a nonlinear successive over-relaxation algorithm, Math. Program. Comp.,
4 (2012), pp. 333-361.
[43] Z. Xu, Data modeling: Visual psychology approach and 1/2 regularization theory, in
Proc. Inter. Congr. Math., 4 (2010), pp. 3151-3184.
[44] Z. Xu, X. Chang, and F. Xu et al, 1/2 regularization: a thresholding representation
theory and a fast solver, IEEE Trans. Neural Netw. Learn. Sys., 23 (2012), pp. 1013-
1027.
[45] Z. Xu, H. Guo, and Y. Wang et al, Representation of 1/2 regularizer among q (0 < q
1) regularizer: an experimental study based on phase diagram, Acta Autom. Sinica, 38
(2012), pp. 1225-1228.

26
Appendix: Proof of Lemma 3.6
The proof here follows [33] but is more complete than there.
Let the SVD of Y Rmn be Y = U Diag()V T . Denote by

= (1 , , r )T , U = [u1 , , ur ], V = [v1 , , vr ], (A.1)

where ui Rm , vi Rn (i = 1, , r) are the orthonormal columns of U and V respectively,


and 1 r > 0.
Let X admit the SVD as X = U Diag( )V T = ni=1 i ui viT , where = (1 , , m ),

U = [u1 , , um ], V = [v1 , , vm ] with ui R , vi R and 1 m 0. Note
m n

that the SVD of X here may not be in the most reduced form. Denote by ti = uT
i Y vi for
each i = 1, , m. Note that

X Y 2F + X1/2 = U Diag( )V T Y 2F + U Diag( )V T 1/2


1/2 1/2

= Diag( ) U Y V 2F + Diag( )1/2


T 1/2

= Diag( ) U Y V 2F + Diag( )1/2


T 1/2


m
1/2
= [(i uT 2
i Y vi ) + i ]
i=1

m
1/2
= [(i ti )2 + i ]. (A.2)
i=1

Let
{ n }
1/2
Q(U , V ) = min [(i ti )2 + i ] | 1 0, , m

0 ,
i=1

then problem (3.6) is equivalent to

min Q(U , V )
U ,V (A.3)
s.t. U T U = Im , V T V = Im .

1/2
Let f (i ) = (i ti )2 + i , then
{m }

Q(U , V ) = min f (i ) 1 0, , m

0 . (A.4)
i=1

Fixing U , V , note that m
i=1 f (i ) is separable as to (1 , , m ). Hence, solving problem
(A.4) is equivalent to solving the following m problems, for each i = 1, , m,
1/2
min f (i ) = (i ti )2 + i . (A.5)
i 0

By Lemma 3.1, for each i = 1, , m, the global minimizer of (A.5) is i = h (ti ). Thus

f (i ) = [h (ti ) ti ]2 + h (ti ).
1/2

27

3
From (3.2) and (A.5), we know that f (i ) = 0 as ti 54 2/3
4 , and that f (i ) < 0 and

3
3
i > 0 as ti > 54 2/3
4 . When ti > 54 2/3
4 , the rst order optimal condition of (A.5) yields
that
1 1/2
2h (ti ) 2ti + h (ti ) = 0. (A.6)
2
By direct computation and using (A.6), we get

df (i ) 1 1/2
= 2h (ti )h (ti ) 2h (ti ) 2ti h (ti ) + h (ti )h (ti )
dti 2
( )
1 1/2
= h (ti ) 2h (ti ) 2ti + h (ti ) 2h (ti )
2

= 2h (ti ) = 2i < 0.

This implies that f (i ) = h2 (ti ) 2ti h (ti ) + h (t


1/2
i ) is a monotonically decreasing function
in the variable ti = uT
i Y v . Note that Q(U , V ) =
i
m
i=1 f (i ). Therefore, solving problem
(A.3) is equivalent to solving the following m problems, for each i = 1, , m,

r
max

ti = uT
i Y vi = i uT T
i ui vi vi
ui ,vi i=1
s.t. ui 2 = 1, vi 2 = 1, (A.7)
ui {u1 , u2 , , ui1 },
vi {v1 , v2 , , vi1
}.

Note that 1 r > 0 and m r. Solving the m problems one by one from i = 1 to
i = m, we can nd that for each i = 1, , r, the maximizer of (A.7) is ui = ui , vi = vi and
the optimal value of the objective function is ti = uTi Y vi = i , where ui , vi , i all belong to Y
(see (A.1)); and that for each i = r + 1, , m, since ui {u1 , , ur } and vi {v1 , , vr },
the optimal value of the objective function is ti = 0, and then h (ti ) = 0. Thus, we have


m
m
X s
= i ui vi T = h (ti )ui vi T
i=1 i=1

r
m
= h (i )ui viT + 0 (ui vi T )
i=1 i=r+1
= U Diag(H ())V T
= H (Y ).

where H () is dened as Denition 3.5. The proof is thus complete.

28

Das könnte Ihnen auch gefallen