Sie sind auf Seite 1von 16

Convolutional neural networks

on graphs
Michal Punčochář (michal.puncochar@gmail.com)
CNS course @ UniPi, 2018
Outline
1. Motivation
2. Dynamical system approach
3. Early models
• NN4G
• Learnable molecular fingerprints
4. Analogy with CNN on images and sequences
5. Spectral convolution approach
6. Unified view
Motivation
• Molecules – many small graphs
• Classification, regression
• Biological networks: protein-protein interactions, gene regulatory, gene
co-expression, metabolic, signaling, ...
• Citation networks, social networks, ...
• We can generate graphs from non-graph data
− 𝑥 −𝑥 2 /𝜎
• K-nearest neighbors, 𝑒 1 2 < 𝛿, supervised, ...
 Node embeddings, semi-supervised node labeling,
inductive learning
?
GNN[1,2]

• Dynamical system, diffusion between nodes


𝒙 𝑣 = 𝑓 𝒖 𝑣 ,𝒙 𝒩 𝑣
• Cyclical dependencies  iterate
• e.g. 𝒙𝑡 𝑣 = 𝜎 𝑊𝑖𝑛 𝒖 𝑣 + 𝑤∈𝒩 𝑣 𝑊𝒙𝑡−1 𝑤
• Ensure contractivity of 𝑓 and iterate to convergence
• Learning using the gradient at equilibrium[1,2] (Almeida-Pineda algorithm)
• No learning – GraphESN[3]
• Or unroll to fixed time-step, then backprop through time[4]
• No need for contractivity  greater expressivity?

[1] Gori, M.; Monfardini, G.; Scarselli, F. In A new model for learning in graph domains, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005; pp 729-734 vol. 2.
[2] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G., The Graph Neural Network Model. IEEE Transactions on Neural Networks 2009, 20 (1), 61-80.
[3] Gallicchio, C.; Micheli, A. In Graph Echo State Networks, The 2010 International Joint Conference on Neural Networks (IJCNN), 18-23 July 2010; pp 1-8.
[4] Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks ArXiv e-prints, 2015.
NN4G[5]

𝒙𝑙 (𝑣) = 𝜎 𝑊𝑙𝑖𝑛 𝒖 𝑣 + 𝑊𝑙,𝑙′ 𝒙𝑙′ 𝑤


𝑙 ′ <𝑙 𝑤∈𝒩 𝑣

• Connections to input and all previous layers


• ~ unrolled GNN up to 𝐿 layers with “skip-connections”
• Stationarity assumption: the same procedure is
applied to all vertices
• Hierarchical growth of context ~ diffusion
• Suggested training by cascade correlation
• Possible to use backprop through all layers

[5] Micheli, A., Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks 2009, 20 (3), 498-511.
Learnable molecular fingerprints[6]

• Simplified model updated with modern DL practices

𝒙𝑙 𝑣 = 𝜎 𝑊𝑙, 𝒩 𝑣 ⋅ 𝒙𝑙−1 𝑤
𝑤∈𝒩0 𝑣
• 𝒙0 = 𝒖 ... atom features
• Weight matrix for each layer and degree (# of bonds ≤ 5)
• Connection only to previous layer – less parameters
• 𝜎 = ReLU, better results than tanh (deep architecture)
• Fingerprint 𝒇 = 𝑙 softmax 𝑊𝑜 𝒙𝑙
• Differentiable version of fixed ECFP → Task specific fingerprint
• Similar concept to Neural Turing Machine
• Trained by backprop gradient descent
[6] Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional Networks on Graphs for Learning
Molecular Fingerprints ArXiv e-prints, 2015.
Convolution on graphs?
• Sequences, images → 𝑑-dimensional grids in Euclidean space
• Regularity, notion of direction, distance, translation Convolution

• Natural convolution using sliding windows Pooling

• Natural subsampling (pooling)


• Hierarchical stacking of layers
⇒ growth of context
• General graphs 𝑖−3 𝑖−2 𝑖−1 𝑖 𝑖+1 𝑖+2 𝑖+3
• Irregular
• All edges from a node are equivalent
• No notion of left, right, up, down
Signal processing on graphs
• Graph 𝐺 = {𝑉, 𝐸}, 𝑛 = 𝑉
• Signal on graph: function 𝑓: 𝑉 → ℝ
• Or a vector 𝒇 ∈ ℝ𝑛
• Laplace operator Δ
𝑑 2 𝜕2 𝜕2
• In Euclidean ℝ , Δ = 𝛻 = + ⋯+ 2
𝜕𝑥12 𝜕𝑥𝑑
𝑓 𝑥−ℎ − 2𝑓 𝑥 + 𝑓 𝑥+ℎ
• Finite difference approximation, e.g. Δ𝑓 𝑥 ≈
ℎ2
• For graphs Δ𝑓 𝑣 = 𝑤∈𝒩(𝑣) 𝑓 𝑣 − 𝑓 𝑤
• Laplace matrix 𝐿 = 𝐷 − 𝐴 deg 𝑣𝑖 , 𝑖=𝑗
• 𝐷 = diag deg 𝑉 , 𝐴 is adjacency matrix 𝐿𝑖,𝑗 = −1, 𝑖, 𝑗 ∈ 𝐸
• ∆𝑓 = 𝐿𝒇 0, otherwise
[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to
Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
Graph spectrum and Fourier transform
• 𝐿 is positive semi-definite ⇒ 𝐿 = 𝑈 𝑇 Λ 𝑈
• Real eigenvalues 𝜆0 ≤ ⋯ ≤ 𝜆𝑛−1 and eigenvectors 𝒖1 , … , 𝒖𝑛
• Classical Fourier transform 𝑓 𝜉 = 𝑓 𝑡 𝑒 −2𝜋𝑖𝜉𝑡 𝑑𝑡 = 𝑓, 𝑒 2𝜋𝑖𝜉𝑡

• 𝑒 2𝜋𝑖𝜉𝑡 are eigenfunctions of Δ (Fourier basis)
• FT on graphs: 𝑓 𝜆𝑗 = 𝑛𝑖=1 𝑓 𝑖 𝑢𝑗 ⇒ 𝒙 = 𝑈𝒙
• Inverse FT: 𝑓 𝑖 = 𝑛−1
𝑗=0 𝑓(𝜆𝑗 ) 𝑢𝑗 𝑖 ⇒ 𝒙 = 𝑈 𝑇𝒙

[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to
Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
Graph spectral convolution
• Classical convolution: 𝑓 ∗ 𝑔 𝑡 = ℝ
𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏
• Works on grids, but no translation in graphs
• Convolution theorem: ℱ 𝑓 ∗ 𝑔 = ℱ 𝑓 ∘ ℱ 𝑔
⇒ 𝑓 ∗ 𝑔 = ℱ −1 ℱ 𝑓 ∘ ℱ 𝑔
• Graph convolution: 𝒙 ∗ 𝒈 = 𝑈 𝑇 𝑈𝒙 ∘ 𝑈𝒈 = 𝑈 𝑇 diag 𝑈𝒈 𝑈𝒙
• First idea[8,9]: learn the vector 𝒘 = 𝑈𝒈
• Convolutional layer: conv 𝒙 = 𝜎 𝑈 𝑇 diag 𝒘 𝑈𝒙
• Issues:
• Filters are not localized
• Number of parameters depends on input size: 𝒘 ∈ ℝ𝑛
• Multiplications with 𝑈 and 𝑈 𝑇 are costly: 𝒪 𝑛2
[8] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs ArXiv e-prints, 2013.
[9] Henaff, M.; Bruna, J.; LeCun, Y. Deep Convolutional Networks on Graph-Structured Data ArXiv e-prints, 2015.
Fast localized spectral filters[10]

• 𝒙 ∗ 𝒈 = 𝑈 𝑇 diag 𝒘 𝑈𝒙
• 𝑈 𝑇 diag 𝒘 𝑈 is interpreted as Laplacian with modified frequencies
• 𝒙 ∗ 𝒈𝜃 = 𝑔𝜃 𝐿 𝒙 = 𝑈 𝑇 𝑔𝜃 Λ 𝑈 𝒙
• 𝑔𝜃 Λ = diag 𝑔𝜃 𝜆0 , … , 𝑔𝜃 𝜆𝑛−1
• Theorem: When 𝑔𝜃 (𝜆) is 𝐾-degree polynomial, convolution at a
vertex is localized to its 𝐾-neighborhood
• Naive parametrization: 𝑔𝜃 Λ = 𝐾−1 𝜃
𝑘=0 𝑘 Λ𝑘
... 𝒪 𝑛 2

• Using Chebyshev polynomials: 𝑔𝜃 𝐿 = 𝐾−1 𝑘=0 𝜃𝑘 𝑇𝑘 𝐿


• 𝑇0 𝑥 = 1, 𝑇1 𝑥 = 𝑥, 𝑇𝑘 𝑥 = 2𝑥 𝑇𝑘−1 𝑥 − 𝑇𝑘−2 𝑥 ... 𝒪 𝐾 𝐸

[10] Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering ArXiv e-prints, 2016.
The circle closes
• Setting 𝐾 = 1 [11]: 𝑔𝜃 𝐿 = 𝜃0 𝐼 + 𝜃1 𝐿 = 𝜃0 𝐼 + 𝜃1 𝐷 − 𝜃1 𝐴
• 𝒙 ∗ 𝒈𝜃 𝑣 = 𝜃0 𝐼 + 𝜃1 𝐷 − 𝜃1 𝐴 𝒙 𝑣

𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ 𝒙𝑙−1 𝑤 − 𝒙𝑙−1 𝑣


𝑤∈𝒩 𝑣

• Using normalized 𝐿𝑁 = 𝐷−1 2 𝐿𝐷−1 2 = 𝐼 − 𝐷 −1 2 𝐴𝐷−1 2 ⟹ 𝜆 ∈ 0,2


1
𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ 𝒙𝑙−1 𝑤
𝑤∈𝒩 𝑣
deg 𝑣 deg 𝑤

Alternatively max or LSTM


≈ mean 𝒙𝑙−1 𝑤 , 𝑤 ∈ 𝒩 𝑣
aggregator[12]
[11] Kipf, T. N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks ArXiv e-prints, 2016.
[12] Hamilton, W. L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs ArXiv e-prints, 2017.
Putting it together
1. Time-unrolled dynamical system (diffusion)
2. Spatial convolution (hierarchical growth of context)
3. Spectral convolution
A general convolutional layer:
𝒙𝑙 𝑣 = 𝜎 𝑊0,𝑙 ⋅ 𝒙𝑙−1 𝑣 + 𝑊1,𝑙 ⋅ Aggregate 𝒙𝑙−1 𝑤 , ∀𝑤 ∈ 𝒩 𝑣

• Aggregate by sum, mean, max, LSTM, ...


• Mean and max should prevent overfitting on local neighborhood structures
• LSTM on unordered set?!
• Additional skip-connections and other ad-hoc modifications possible
References
[1] Gori, M.; Monfardini, G.; Scarselli, F. In A new model for learning in graph domains, Proceedings. 2005 IEEE International Joint
Conference on Neural Networks, 2005; pp 729-734 vol. 2.
[2] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G., The Graph Neural Network Model. IEEE Transactions on Neural
Networks 2009, 20 (1), 61-80.
[3] Gallicchio, C.; Micheli, A. In Graph Echo State Networks, The 2010 International Joint Conference on Neural Networks (IJCNN), 18-23
July 2010; pp 1-8.
[4] Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks ArXiv e-prints, 2015.
[5] Micheli, A., Neural Network for Graphs: A Contextual Constructive Approach. IEEE Transactions on Neural Networks 2009, 20 (3), 498-
511.
[6] Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional
Networks on Graphs for Learning Molecular Fingerprints ArXiv e-prints, 2015.
[7] Shuman, D.; Narang, S.; Frossard, P.; Ortega, A.; Vandergheynst, P., The Emerging Field of Signal Processing on Graphs: Extending
High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Processing Magazine 2013, 30, 83-98.
[8] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs ArXiv e-prints, 2013.
[9] Henaff, M.; Bruna, J.; LeCun, Y. Deep Convolutional Networks on Graph-Structured Data ArXiv e-prints, 2015.
[10] Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering ArXiv e-
prints, 2016.
[11] Kipf, T. N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks ArXiv e-prints, 2016.
[12] Hamilton, W. L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs ArXiv e-prints, 2017.
Thank you for your attention
Questions?

Das könnte Ihnen auch gefallen