Sie sind auf Seite 1von 52

The Canonical Tensor Decomposition and

Its Applications to Social Network Analysis

Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy


Sandia National Labs

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United
States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
What is Canonical Tensor
Decomposition?
CANDECOMP/PARAFAC (CP) model [Hitchcock’27, Harshman’70, Carroll & Chang’70]

= +…+
I

K
J
R components
CP Application: Neuroscience
Epileptic Seizure Localization:
c1 c2
b1
≈ b2
samples

+
Time

a1 a2
Channels
Scales
CP Application: Neuroscience
Epileptic Seizure Localization:
c1 c2
b1
≈ b2
samples

+
Time

a1 a2
Channels
Scales

Acar et al, 2007, De Vos et al, 2007


CP has Numerous Applications!
• Chemometrics
– Fluorescence Spectroscopy
– Chromatographic Data
Analysis
• Neuroscience Andersen and Bro, Journal
Mørup, Hansen and Arnfred,
Journal of Neuroscience
– Epileptic Seizure Localization of Chemometrics, 2003. Methods, 2007.
– Analysis of EEG and ERP
• Signal Processing
• Computer Vision
– Image compression, Sidiropoulos,
classification Giannakis and Bro, Hazan, Polak and
IEEE Trans. Signal Shashua, ICCV 2005.
– Texture analysis Processing, 2000.
• Social Network Analysis
– Web link analysis Bader, Berry, Browne,
– Conversation detection in Survey of Text Mining:
Clustering, Classification,
emails and Retrieval, 2nd Ed.,
– Text analysis 2007.
• Approximation of PDEs
Doostan and Iaccarino, Journal of
Computational Physics, 2009.
Algorithms: How Can We Compute CP?
Mathematical Details for CP
Columns: mode-1 fibers

Unfolding
(Matricization)
Mathematical Details for CP

Unfolding
(Matricization)

Row: mode-2 fibers


Mathematical Details for CP

Unfolding
(Matricization)

Tube: mode-3 fibers

= +…+

Matrix Khatri-Rao Product


CP is a Nonlinear
Optimization Problem
Given tensor and R (# of components), find matrices A, B, C that solve the
following problem:

Optimization Problem Objective Function

where the vector x comprises the


entries of A, B, and C stacked
column-wise:

= +…+
I
K
J

variables
Traditional Approach: CPALS
CPALS dating back to Harshman’70 and Carroll & Chang’70 solves for one factor
matrix at a time.
Optimization Problem

Each step can be converted to a


Alternating Algorithm matrix least squares problem:
for k = 1,…

I x JK JK x R

end IxR
I x JK JK x R
R x R matrix
Traditional Approach: CPALS

Optimization Problem

Repeat the following steps until “convergence”:

Very fast, but not always accurate.


Not guaranteed to converge to a stationary point.
Other issues, e.g., cannot exploit symmetry.
Our Approach: CPOPT
Unlike CPALS, CPOPT solves for all factor matrices simultaneously using a
gradient based optimization.
Optimization Problem

Define the objective function:


Rewriting the Objective Function

Inner Product

Norm
Derivative of 2nd Summand

Tensor-Vector Multiplication

Analogous formulas
exist for partials w.r.t.
columns of B and C.
Derivative of 3rd Summand

Analogous formulas
exist for partials w.r.t.
columns of B and C.
Objective and Gradient

Objective Function
= +…+

Gradient (for r = 1,…,R)


Gradient in Matrix Form

Objective Function
= +…+

Gradient

Note that this formulation can be used to


derive the ALS approach!
Indeterminacies of CP

• CP is often unique.
= +…+
• However, CP has two fundamental
indeterminacies
– Permutation – The components
can be reordered
Not a big deal.
• Swap a1, b1, c1 Leads to multiple,
with a3, b3, c3 but separated,
minima.
– Scaling – The vectors
comprising a single rank-one
factor can be scaled This leads to a
continuous space of
• Replace a1 and b1 equivalent solutions.
with 2 a1 and ½ b1
Adding Regularization

Objective Function

Gradient
Our methods:
CPOPT & CPOPTR
CPOPT: Apply derivative-based optimization method to the following objective
function:

CPOPTR: Apply derivative-based optimization method to the following regularized


objective function:
Another competing method:
CPNLS
CPNLS: Apply nonlinear least squares solver to the following equations:

Jacobian is of size .

Proposed by Paatero’97 and also


Tomasi and Bro’05.
Experimental Set-Up[Tomasi&Bro’06]
20 triplets Step 2: Construct tensor from factor
matrices and add noise. All
combinations of:
• Homoscedastic: 1%, 5%, 10%
• Heteroscedastic: 0%, 1%, 5%
Step 1: Generate random
factor matrices A, B, C
with Rtrue = 3 or 5
columns each and
collinearity set to 0.5,
i.e.,
Step 3: Use algorithm to extract factors, using
Rtrue and Rtrue+1 factors. Compare against
factors in Step 1. 180
tensors
= + + +

360 tests
R=3
Implementation Details

• All experiments were performed in MATLAB on a Linux


workstation (Quad-Core Intel Xeon 2.50GHz, 9 GB RAM).

• Methods
– CPALS – Alternating least squares. Used parafac_als in the Tensor Toolbox
(Bader & Kolda)
– CPNLS – Nonlinear least squares. Used PARAFAC3W, which implements
Levenberg-Marquadt (necessary due to scaling ambiguity), by Tomasi and
Bro.
– CPOPT – Optimization. Used routines in the Tensor Toolbox in calculation
of function values and gradients. Optimization via Nonlinear Conjugate
Gradient (NCG) method with Hestenes-Stiefel update, using Poblano (in-
house code to be released soon).
– CPOPTR – Optimization with regularization. Same as above.
(Regularization parameter = 0.02.)
CPOPT is Fast and Accurate
Generated 360 dense test problems (with ranks 3 and 5) and factorized with R as
the correct number of components and one more than that. Total of 720 tests for
each entry below.

KxKxK O(RK3) O(R3K3) O(RK3) O(RK3)


R = # components
Overfactoring has a significant impact
CPOPT is robust to overfactoring
Amino (http://www.models.life.ku.dk/)

Emission mode Emission mode Emission mode


0.16 0.2 0.3

0.14 0.18
0.25
0.16
0.12
0.14 0.2

0.1
0.12
0.15
0.08

1
0.1

3
0.1
0.06 0.08

0.06 0.05
0.04
0.04
0.02 0
0.02

0 0 -0.05
250 300 350 400 450 250 300 350 400 450 250 300 350 400 450
Emission wavelength Emission wavelength Emission wavelength
Application: Link Prediction
Link Prediction on Bibliometric Data
2007
2005…

…2004
1992
1991
authors

# of papers
by ith author conferences
at jth conf. in year k.

Question1: Can we use tensor decompositions to model the data and extract
meaningful factors?
Question2: Can we predict who is going to publish at which conferences in
future?
Components make sense!
year c1 c2 cR
DBLP + …

authors
b 1 b2 bR

a1 a2 aR

conferences
ar br cr
Hans Peter
Meinzer Thomas Martin
Author Mode Conference Mode Time mode
0.3 Lehmann 1.2 1

0.25 Heinrich BILDMED


1
0.9

Niemann 0.8

0.2 0.8 0.7

0.6
0.15 0.6
Coeffs.

Coeffs.

Coeffs.
0.5
0.1 0.4 CARS 0.4

0.05 0.2 DAGM 0.3

0.2
0 0
0.1

-0.05 -0.2 0
0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Components make sense!
year c1 c2 cR
+ …

authors
b 1 b2 bR
X
a1 a2 aR

conferences
ar br cr
Hans Peter
Meinzer Thomas Martin
Author Mode Conference Mode Time mode
0.3 Lehmann 1.2 1

0.25 Heinrich BILDMED


1
0.9

Niemann 0.8

0.2 0.8 0.7

0.6
0.15 0.6
Coeffs.

Coeffs.

Coeffs.
0.5
0.1 0.4 CARS 0.4

0.05 0.2 DAGM 0.3

0.2
0 0
0.1

-0.05 -0.2 0
0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Components make sense!
year c1 c2 cR
+ …

authors
b1 b2 bR

a1 a2 aR

conferences
ar br cr

Craig Boutilier
Author mode Conference mode Time mode
0.16 1.2 0.6

0.14 Daphne Koller 1 0.5

0.12
IJCAI
0.8 0.4
0.1

0.08 0.6 0.3


Coeffs.

Coeffs.

Coeffs.
0.06 0.4 0.2

0.04
0.2 0.1
0.02

0 0
0

-0.02 -0.2 -0.1


0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Components make sense!
year c1 c2 cR
+ …

authors
b1 b2 bR

a1 a2 aR

conferences
ar br cr

Craig Boutilier
Author mode Conference mode Time mode
0.16 1.2 0.6

0.14 Daphne Koller 1 0.5

0.12
IJCAI
0.8 0.4
0.1

0.08 0.6 0.3


Coeffs.

Coeffs.

Coeffs.
0.06 0.4 0.2

0.04
0.2 0.1
0.02

0 0
0

-0.02 -0.2 -0.1


0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Link Prediction Problem

TRAIN: …2004
1992 c1 c2 cR
1991
≈ + …
authors

b1 b2 bR

a1 a2 aR

conferences

TEST: 2007
2006 ~ 60K links out of 19 million
2005 possible <author, conf> pairs
authors

~ 0.3% dense
authors

~ 32K previously unseen links in


the training set
conferences conferences <authori, confj> = 0
<authori, confj> = 1
if i author publishes at jth conf.
th
Score for <authori, confj>
• Sign ambiguity:
c1 c2
≈ b1
+ b2

a1 a2

• Fix signs using the signs of the maximum magnitude entries and then compute a
score for each author-conference pair using the information from the time domain:
b1
a1
+ …
b1 b2 bR

0.7
c1 a1 a2 aR
0.6

0.5

0.4

t
0.3

0.2

0.1

time
-0.1
0 2 4 6 8 10 12 14
Score for <authori, confj>
• Sign ambiguity:
c1 c2
≈ b1
+ b2

a1 a2

• Fix signs using the signs of the maximum magnitude entries and then compute a
score for each author-conference pair using the information from the time domain:
b1
a1
+ …
b1 b2 bR

a1 0.45

0.4
c2 a2 aR
0.35

0.3

0.25

t
0.2

0.15

0.1

0.05

-0.05
0 2 4 6 8 10 12 14
Performance Measure: AUC

s: contains the scores for all possible pairs, e.g., ~19 million

authors
scores sorted scores labels

⎡s11⎤ ⎡ s95 ⎤ ⎡1 ⎤
⎢s ⎥ ⎢s ⎥ ⎢0 ⎥
conferences <authori, confj> = 0
⎢ 12⎥ sort ⎢ 23 ⎥ ⎢ ⎥ <authori, confj> = 1
⎢....⎥ ⎢.... ⎥ if i author publishes at jth conf.
th
⎢....⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢sij ⎥ ⎢.... ⎥ ⎢1 ⎥
⎢....⎥ ⎢.... ⎥ ⎢....⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣sIJ ⎥⎦ ⎢⎣ s67 ⎥⎦ ⎣0 ⎦
N: number of 1’s
M: number of 0’s
Performance Measure: AUC

s: contains the scores for all possible pairs, e.g., ~19 million

TP rate FP rate Receiver Operating


scores sorted scores labels Characteristic (ROC)
⎡s11⎤ ⎡ s95 ⎤ ⎡1 ⎤ 1/N 0
Curve

⎢s ⎥
1

⎢s ⎥ ⎢0 ⎥ 0.9

⎢ 12⎥ sort ⎢ 23 ⎥ ⎢ ⎥
1/N 1/M 0.8

⎢....⎥ ⎢.... ⎥
0.7

TP rate
⎢....⎥ 0.6

⎢ ⎥ ⎢ ⎥ ⎢ ⎥
0.5

Area Under the curve


⎢sij ⎥ ⎢.... ⎥
0.4

⎢1 ⎥ 0.3
(AUC)
⎢....⎥ ⎢.... ⎥ ⎢....⎥
0.2

⎢ ⎥ ⎢ ⎥
0.1

⎢ ⎥ 0

⎢⎣sIJ ⎥⎦ ⎢⎣ s67 ⎥⎦
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

⎣0 ⎦ 1 1
FP rate
N: number of 1’s
M: number of 0’s
Performance Evaluation

CP
AUC=0.92
Predicting Links
for 2005 - 2007 (~ 60K):
RANDOM

AUC=0.87
Predicting Previously Unseen Links
for 2005 - 2007(~ 32K):
CP-WOPT: Handling Missing Data
Missing Data Examples

Missing data in different disciplines due to loss CHEMISTRY


Tomasi&Bro’05
of information, machine failures, different sampling
frequencies or experimental-set ups.
• Chemometrics
• Biomedical signal processing (e.g., EEG)
• Network traffic analysis (e.g., packet drops)
emission
• Computer vision (e.g., occlusions)
excitation
• …
EEG

subject 1 subject N subjects


channels

channels

+…+

time-frequency time-frequency
Modify the objective for CP

FOR HANDLING MISSING DATA


NO MISSING DATA Optimization Problem
Optimization Problem

Objective Function
Our approach: CP-WOPT
Objective Function
Objective and Gradient
Objective Function

Gradient (for r = 1,…,R; i=1,…I; j=1,..J; k=1,..K )


Gradient in Matrix Form
Objective Function

Gradient
Experimental Set-Up[Tomasi&Bro’05]
20 triplets
Step 2: Construct tensor from factor
matrices and add noise ( 2%
homoscedastic noise)
Step 1: Generate random
factor matrices A, B, C
with R = 5 or 10
columns each and
collinearity set to 0.5.

Step 4: Use algorithm to extract R Step 3: Set some entries to missing


factors. Compare against factors in • Percentage of Missing Data: 10%, 40%,
Step 1. 70%

= + …+ Missing: entries, fibers

R
CP-WOPT is Accurate!

Generated 40 test problems (with ranks 5 and 10) and factorized with an R-
component CP model. Each entry corresponds to the percentage of correctly
recovered solutions.

# known data entries


# variables
CP-WOPT is Accurate!

Generated 40 test problems (with ranks 5 and 10) and factorized with an R-
component CP model. Each entry corresponds to the percentage of correctly
recovered solutions.

CPNLS : Nonlinear least squares. Used INDAFAC, which implements Levenberg-


Marquadt [Tomasi and Bro’05].
Other alternative: ALS-based imputation (For comparisons, see Tomasi and Bro’05).
CP-WOPT is Fast!

Generated 60 test problems (with M =10%, 40% and 70%) and factorized with
an R-component CP model. Each entry corresponds to the average/std of the CP
models, which successfully recover the underlying factors.
CP-WOPT is useful for real data!
Thanks to Morten Mørup!

GOAL: To differentiate between left and right hand stimulation


subjects


channels

+ +

missing
time-frequency
COMPLETE DATA INCOMPLETE DATA
Summary & Future Work
• New CPOPT method
– Accurate & scalable
• Extend CPOPT to CP-WOPT to
handle missing data
– Accurate & scalable
• More open questions…
– Starting point?
– Tuning the optimization
– Regularization
– Exploiting sparsity
– Nonnegativity
• Application to link prediction
– On-going work comparing to other
methods
Thank you!
• More on tensors and tensor models:
– Survey : E. Acar and B. Yener, Unsupervised Multiway Data Analysis: A Literature Survey,
IEEE Transactions on Knowledge and Data Engineering, 21(1): 6-20, 2009.
– CPOPT : E. Acar, T. G. Kolda and D. M. Dunlavy, An Optimization Approach for Fitting
Canonical Tensor Decompositions, Submitted for publication.
– CP-WOPT : E. Acar, T.G. Kolda, D. M. Dunlavy and M. Mørup, Tensor Factorizations with
Missing Data, Submitted for publication.
– Link Prediction: E. Acar, T.G. Kolda and D. M. Dunlavy, Link Prediction on Evolving Data, in
preparation.
• Contact:
– Evrim Acar, eacarat@sandia.gov
– Tamara G. Kolda, tgkolda@sandia.gov
– Daniel M. Dunlavy, dmdunla@sandia.gov

Minisymposia on
Tensors and Tensor-based Computations