Beruflich Dokumente
Kultur Dokumente
Abstract—In imitation learning, multivariate Gaussians are distributions over movement trajectories [3], and the encoding
widely used to encode robot behaviors. Such approaches do not of movement from multiple perspectives [4].
provide the ability to properly represent end-effector orientation,
Probabilistic encoding using Gaussians is restricted to vari-
as the distance metric in the space of orientations is not
Euclidean. In this work we present an extension of common ables that are defined in the Euclidean space. This typically
imitation learning techniques to Riemannian manifolds. This excludes the use of end-effector orientation. Although one can
generalization enables the encoding of joint distributions that approximate orientation locally in the Euclidean space [5], this
include the robot pose. We show that Gaussian conditioning, approach becomes inaccurate when there is large variance in
Gaussian product and nonlinear regression can be achieved
the orientation data.
with this representation. The proposed approach is illustrated
with examples on a 2-dimensional sphere, with an example of The most compact and complete representation for orienta-
regression between two robot end-effector poses, as well as an tion is the unit quaternion1 . Quaternions can be represented
extension of Task-Parameterized Gaussian Mixture Model (TP- as elements of the 3-sphere, a 3 dimensional Riemannian
GMM) and Gaussian Mixture Regression (GMR) to Riemannian manifold. Riemannian manifolds allow various notions such as
manifolds.
length, angles, areas, curvature or divergence to be computed,
Index Terms—Learning and Adaptive Systems, Probability and which is convenient in many applications including statistics,
Statistical Methods optimization and metric learning [6]–[9].
In this work we present a generalization of common prob-
abilistic imitation learning techniques to Riemannian mani-
I. I NTRODUCTION
folds. Our contributions are twofold: (i) We show how to
g h h= g 1
f (µ) = − ǫ(µ)⊤ W ǫ(µ), (8)
Tg M 2
T hM
where c is omitted because it has no role in the optimization,
(a) (b) and
Fig. 2: (a) Tangent space alignment with respect to e. (b) Even ⊤ ⊤ ⊤ ⊤
though the tangent bases are aligned with respect to e, base mis- ǫ(µ) = Logµ(x0 ) , Logµ(x1 ) , ... , Logµ(xN ) , (9)
alignment exists between Tg M and Th M when one of them does
not lie on a geodesic through e. In such cases, parallel transport of is a stack of tangent space vectors in Tµ M which give the
p from Tg M to Th M requires a rotation Akh (p) to compensate for
g distance and direction of xi with respect to µ. W is a weight
the misalignment of the tangent spaces.
matrix with a block diagonal structure (see Table II).
M: d S2 S3 The gradient of (8) is
R
gx1 gx
!
. q0 −1
g∈M .. gy
q
∆ = − (J⊤ W J ) J⊤ W ǫ(x), (10)
gxd gz
!
g
s(||g||) ||g||
c(||g||)
! with J the Jacobian of ǫ(x) with respect to the tangent
, ||g||6=0
basis of Tµ M. It is a concatenation of individual Jacobians
gx1 c(||g||)
s(||g||) g ,||g||6=0
.
Expe (g) e+ .. 0 ||g|| J i corresponding to Logµ(xi ) which have the simple form
1
0 , ||g||=0 , ||g||=0 J i = −I d .
gxd
0
1
q ∆ provides an estimate of the optimal value mapped into
ac(g ) [gx ,gy ]⊤ , g 6=1 ac∗ (q0 ) ||q| , q 6=1
| 0 the tangent space Tµ M. The optimal value is obtained by
gx1 z ||[g ,g ]⊤ || z
x y
. 0 mapping ∆ onto the manifold
Loge (g) ..
0 0 ,
q0=1
gxd , gz =1
0
0
µ ← Expµ(∆) . (11)
Ah
g (p) pg−g+h Rh e
e R g pg g ∗ h ∗ g −1 ∗ pg
h g
Akh(p) Idp B⊤ e
2 Rh Rkg Re B 2 p B⊤ ⊤ h
3 Qh Rkg Qg B 3 p
g
The computation of (10) and (11) is repeated until ∆ reaches
TABLE I: Overview of the exponential and logarithmic maps, a predefined convergence threshold. We observed fast conver-
and the action and parallel transport functions for all Riemannian gence in our experiments for MLE, Conditioning, Product,
manifolds considered in this work. s(·), c(·), ac∗ (·) are short nota- and GMR (typically 2-5 iterations). After convergence, the
tions for the sine, cosine, and a modified version of the arccosine2 , covariance Σ is computed in Tµ M.
respectively. The elements of S 3 are quaternions, ∗ defines their
product, and −1 a quaternion inverse. Qg and Qh represent the Note that the presented Gauss-Newton algorithm performs
quaternion matrices of g and h. optimization over a domain that is a Riemannian manifold,
while standard Gauss-Newton methods consider a Euclidean
domain.
where µ ∈ M is the Riemannian center of mass, and Σ the
covariance defined in the tangent space Tµ M (see e.g. [10],
[11], [13]). III. P ROBABILISTIC I MITATION L EARNING ON
The Riemannian center of mass for a set of datapoints R IEMANNIAN M ANIFOLDS
{x0 , ... , xN } can be estimated through a Maximum Like-
lihood Estimate (MLE). The corresponding log-likelihood Many probabilistic imitation learning methods rely on some
function is form of Gaussian Mixture Model (GMM), and inference
through GMR. These include both non-autonomous (time-
N
1X driven) and autonomous systems [11], [14]. The generalization
hi Logµ(xi ) Σ−1 Logµ(xi ) ,
⊤
L(x) = c − (7) capability of these methods can be improved by encoding
2 i=1
the transferred skill from multiple perspectives [4], [15]. The
where hi are weights that can be assigned to individual elementary operations required in this framework are: MLE,
datapoints. For a single Gaussian one can set hi = 1, but Gaussian conditioning, and Gaussian product. Table II pro-
different weights are required when estimating the likelihood vides an overview of the parameters required to perform these
of a mixture of Gaussians. operations on a Riemannian manifold using the likelihood
maximization presented in Section II-B.
2 The space of unit quaternions, S 3 , provides a double covering over
We present below Gaussian conditioning and Gaussian
rotations. To ensure that
the distance between two antipodal rotations is zero
arccos(ρ) − π , −1 ≤ ρ < 0
product in more details, which are then used to extend GMR
we define arccos∗ (ρ) . and TP-GMM to Riemannian manifolds.
arccos(ρ) ,0 ≤ ρ ≤ 1
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2017
ǫ(µ) W J ∆ Σ
−1
Logµ(xn )
Σ −I
N N
MLE
.. diag ...
. 1 P
Logµ(xn ) 1 P
Logµ(xn ) Logµ(xn )⊤
.. hi hi
. PN
i h i
PN
i h i
−1 n=1 n=1
Logµ(xn ) Σ −I
−1
− Logµ(µ1 ) Σk1
−I !−1
Product
P P P −1
.. .
diag
. P
Σ−1 P
Σ−1 Logµ µp
P
Σ−1
.. ..
.
kp kp p=1 kp
−1 p=1 p=1
− Logµ(µP ) ΣkP −I
Condition
LogxI(µI ) 0
Λk LogxO(µO ) + Λ−1 Λ⊤ LogxI(µI ) Λ−1
LogxO(µO ) −I kOO kOI kOO
TABLE II: Overview of the parameters used in the likelihood maximization procedure presented in Sections II-B, III-A and
III-B.
Ignoring the constant c, we can rewrite (12) into the form (8)
g1 g1
by defining
g1g2 g1g2
⊤ ⊤ ⊤ ⊤
ǫ(µ̃) = Logµ1(µ̃) , Logµ2(µ̃) , ... , LogµP(µ̃) , (13)
g2
g2
and W = diag Σ−1 −1 −1
1 , Σ2 , ... , ΣP , where µ̃ is the mean
of the Gaussian we are approximating. The vectors Logµp(µ̃)
(a) (b) of ǫ(µ̃) in (13) are not defined in the Tµ̃ M, but in P
Fig. 3: Log-likelihood for the product of two Gaussians on the different tangent spaces. In order to perform the likelihood
manifold S 2 . The Gaussians are visualized on their tangent spaces maximization we need to switch the base and argument of
by the black ellipsoids; their center and contour represent the mean Log() while ensuring that the original likelihood function
and covariance. The color of the sphere corresponds to the value (12) remains unchanged. This implies that the Mahalanobis
of the log-likelihood (high=red, low=blue). The true log-likelihood
(equation (12) with P = 2) is displayed on the left of each subfigure, distance should remain unchanged, i.e.
while the log-likelihood approximated by the product is displayed on
Logµp(µ̃) Σ−1
⊤
⊤ −1
the right. The configuration of the Gaussians in (a) results in a log- p Logµp(µ̃) = Logµ̃ µp Σkp Logµ̃ µp ,
likelihood with a single mode, while (b) shows the special case of (14)
multiple modes. Note the existence of a saddle point to which the where Σkp is a modified weight matrix that ensures an equal
gradient descent could converge. distance measure. It is computed through parallel transporta-
tion of Σp from µp to µ̃ with
Σkp = Akµ̃ (Lp ) Akµ̃
⊤
A. Gaussian Product µ µ
(Lp ) , (15)
p p
In Euclidean space, the product of P Gaussians is again a where Lp is obtained through a symmetric decomposition
Gaussian. This is not generally the case for Riemannian Gaus- of the covariance matrix, i.e. Σp = L⊤p Lp . This operation
sians. This can be understood by studying its log-likelihood transports the eigencomponents of the covariance matrix [16].
function It has to be performed at each iteration of the gradient descent
P
because it depends on the changing µ̃. For spherical manifolds,
1X T parallel transport is the linear operation (4), and (15) simplifies
L(x) = c − Logµp(x) Σ−1
p Logµp(x) , (12)
2 p=1 to Σkp = R⊤ Σp R with R = Akµ̃ (I ).
µp d
Using the transported covariances we can compute the
where P represents the number of Gaussians to multiply, and gradient required to estimate µ̃ and Σ̃. The result is presented
µp and Σp their parameters. The appearance of Logµp(x) in Table II.
makes the log-likelihood potentially nonlinear. As a result The Gaussian product arises in different fields of probabilis-
the product of Gaussians on Riemannian manifolds is not tic robotics. Generalizations of the Extended Kalman Filter
guaranteed to be Gaussian. [17] and the Unscented Kalman Filter [18] to Riemannian
Figure 3 shows comparisons on S 2 between the true log- manifolds required a similar procedure.
likelihood of a product of two Gaussians, and the log-
likelihood obtained by approximating it using a single Gaus-
B. Gaussian Conditioning
sian. The neighborhood in which the approximation of the
product by a single Gaussian is reasonable will vary depend- In Gaussian conditioning, we compute the probability
ing on the values of µp and Σp . In our experiments we P (xO |xI ) ∼ N (µO|I , ΣO|I ) of a Gaussian that encodes the
demonstrate that the approximation is suitable for movement joint probability density of xO and xI . The log-likelihood of
regeneration in the task-parameterized framework. the conditioned Gaussian is given by
Parameter estimation for the approximated Gaussian 1 ⊤
L(x) = c − Logµ(x) Λ Logµ(x) , (16)
N µ̃, Σ̃ can be achieved through likelihood maximization. 2
ZEESTRATEN et al.: AN APPROACH FOR IMITATION LEARNING ON RIEMANNIAN MANIFOLDS 5
with
xI µ ΛII ΛIO e
x= , µ= I , Λ= , (17)
xO µO ΛOI ΛOO
A
where the subscripts O and I indicate respectively the input and
the output, and the precision matrix Λ = Σ−1 is introduced
to simplify the derivation.
Equation (16) is in the form of (8). Here we want to estimate
xO given xI (i.e. µO|I ). Similarly to the Gaussian product, we
cannot directly optimize (16) because the dependent variable, (a) Transformation A (b) Translation b
xO , is in the argument of the logarithmic map. This is again
resolved by parallel transport, namely Fig. 4: Application of task-parameters A ∈ Rd×d and b ∈ S 2 to a
Gaussian defined on S 2 .
Λk = Akx(V ) Akx
⊤
µ µ
(V ) , (18)
where V is obtained through a symmetric decomposition of (19) would require a weighted sum of manifold elements—
the precision matrix: Λ = V ⊤ V . Using the transformed an operation that is not available on the manifold. Secondly,
precision matrix, the values for ǫ(xt ), and J (both found in because directly applying (20) would incorrectly assume the
Table II), we can apply (11) to obtain the update rule. The alignment of the tangent spaces in which Σi are defined.
covariance obtained through Gaussian conditioning is given by To remedy the computation of the mean, we employ the
Λ−1
k . Note that we maximize the likelihood only with respect
approach for MLE given in Section II-B, and compute the
to xO , and therefore 0 appears in the Jacobian. weighted mean iteratively in the tangent space Tµ̂O M, namely
K
X
C. Gaussian Mixture Regression ∆= hi Logµ̂O(E[N (xO |xI ; µi , Σi )]) , (22)
i=1
Similarly to a Gaussian Mixture Model (GMM) in Eu-
µ̂O ← Expµ̂O(∆) . (23)
clidean space, a GMM on a Riemannian manifold is defined
by a weighted sum of Gaussians Here, E[N (xO |xI ; µi , Σi )] refers to the conditional mean
K point obtained using the procedure described in Section III-B.
Note that the computation of the responsibilities hi is straight-
X
P (x) = πi N (x; µi , Σi ),
i=1
forward using the Riemannian Gaussian (6).
PK After convergence of the mean, the covariance is computed
where πi are the priors, with i πi = 1. In imitation in the tangent space defined at µ̂O . First, Σi are parallel
learning, they are used to represent nonlinear behaviors in a transported from Tµi M to Tµ̂O M, and then summed similarly
probabilistic manner. Parameters of the GMM can be estimated to (20) with
by Expectation Maximization (EM), an iterative process in K
which the data are given weights for each cluster (Expectation
X
Σ̂O = hi Σki , where (24)
step), and in which the clusters are subsequently updated using i
a weighted MLE (Maximization step). We refer to [10] for a
Σki = Akµ̂ (Li ) Akµ̂
⊤
µ
O
µ
O
(Li ) , (25)
detailed description of EM for Riemannian GMMs. i i
Fig. 6: Snapshot sequences of a typical demonstration (top) and reproduction (bottom) of the pouring task. The colored perpendicular lines
indicate the end-effector orientations.
with and without parallel transport, i.e. using Λ and Λk , [5] J. Silvério, L. Rozo, S. Calinon, and D. G. Caldwell, “Learning bimanual
respectively. Without parallel transport the regression output end-effector poses from demonstrations using task-parameterized dy-
namical systems,” in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots
(solid blue line) does not lie on a geodesic, since it does not and Systems (IROS), 2015, pp. 464–470.
coincide with the (unique) geodesic between the outer points [6] R. Hosseini and S. Sra, “Matrix manifold optimization for Gaussian mix-
(dotted blue line). Using the proposed method that relies on tures,” in Advances in Neural Information Processing Systems (NIPS),
2015, pp. 910–918.
the parallel transported precision matrix Λk , the conditioned [7] X. Pennec, “Intrinsic statistics on Riemannian manifolds: Basic tools for
means µkx|t (displayed in yellow) follow a geodesic path geometric measurements,” Journal of Mathematical Imaging and Vision,
on the manifold, thus generalizing the ‘linear’ behavior of vol. 25, no. 1, pp. 127–154, 2006.
[8] N. Ratliff, M. Toussaint, and S. Schaal, “Understanding the geometry of
Gaussian conditioning to Riemannian manifolds. workspace obstacles in motion optimization,” in Proc. IEEE Intl Conf.
Furthermore, the generalization of GMR proposed in [11] on Robotics and Automation (ICRA), 2015, pp. 4202–4209.
relies on the weighted combination of quaternions presented [9] S. Hauberg, O. Freifeld, and M. J. Black, “A geometric take on
metric learning,” in Advances in Neural Information Processing Systems
in [22]. Although this method seems to be widely accepted, (NIPS), 2012, pp. 2024–2032.
it skews the weighting of the quaternions. By applying this [10] E. Simo-Serra, C. Torras, and F. Moreno-Noguer, “3D Human Pose
method, one computes the chordal mean instead of the re- Tracking Priors using Geodesic Mixture Models,” International Journal
of Computer Vision (IJCV), pp. 1–21, August 2016.
quired weighted mean [23]. [11] S. Kim, R. Haschke, and H. Ritter, “Gaussian mixture model for 3-dof
In general, the objective in probabilistic imitation learning is orientations,” Robotics and Autonomous Systems, vol. 87, pp. 28 – 37,
to model behavior τ as the distribution P (τ ). In this work we October 2017.
[12] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on
take a direct approach and model a (mixture of) Gaussian(s) Matrix Manifolds. Princeton, NJ: Princeton University Press, 2008.
over the state variables. Inverse Optimal Control (IOC) uses [13] G. Dubbelman, “Intrinsic statistical techniques for robust pose estima-
a more indirect approach: it uses the demonstration data to tion,” Ph.D. dissertation, University of Amsterdam, September 2011.
[14] S. M. Khansari-Zadeh and A. Billard, “Learning stable non-linear
uncover an underlying cost function cθ (τ ) parameterized by dynamical systems with Gaussian mixture models,” IEEE Trans. on
θ. In [24], [25], pose information is learned by demonstration Robotics, vol. 27, no. 5, pp. 943–957, 2011.
by defining cost features for specific orientation and position [15] S. Calinon, D. Bruno, and D. G. Caldwell, “A task-parameterized
probabilistic model with minimal intervention control,” in Proc. IEEE
relationships between objects of interest and the robot end- Intl Conf. on Robotics and Automation (ICRA), 2014, pp. 3339–3344.
effector. An interesting parallel exists between our ‘direct’ ap- [16] O. Freifeld, S. Hauberg, and M. J. Black, “Model transport: Towards
proach and recent advances in IOC where behavior is modeled scalable transfer learning on manifolds,” in Proceedings IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1378 –
as the maximum entropy distribution p(τ ) = Z1θ exp(−cθ (τ )) 1385.
[26]–[28]. The Gaussian used in our work can be seen as an [17] B. B. Ready, “Filtering techniques for pose estimation with applica-
approximation of this distribution, where the cost function is tions to unmanned air vehicles,” Ph.D. dissertation, Brigham Young
University-Provo, November 2012.
the Mahalanobis distance defined on a Riemannian manifold. [18] S. Hauberg, F. Lauze, and K. S. Pedersen, “Unscented Kalman filtering
The benefits of such an explicit cost structure are the ease of on Riemannian manifolds,” Journal of mathematical imaging and vision,
finding the optimal reward (we learn its parameters through vol. 46, no. 1, pp. 103–120, 2013.
[19] K. V. Mardia and P. E. Jupp, Directional statistics. John Wiley & Sons,
EM), and the derivation of the policy (which we achieve 2009, vol. 494.
through conditioning). [20] W. Feiten, M. Lang, and S. Hirche, “Rigid Motion Estimation using
Mixtures of Projected Gaussians,” in 16th International Conference on
Information Fusion (FUSION), 2013, pp. 1465–1472.
VI. C ONCLUSION [21] M. Lang, M. Kleinsteuber, O. Dunkley, and S. Hirche, “Gaussian
process dynamical models over dual quaternions,” in European Control
In this work we showed how a set of probabilistic imitation Conference (ECC), 2015, pp. 2847–2852.
[22] F. L. Markley, Y. Cheng, J. L. Crassidis, and Y. Oshman, “Averaging qua-
learning techniques can be extended to Riemannian manifolds. terions,” Journal of Guidance, Control, and Dynamics (AIAA), vol. 30,
We described Gaussian conditioning, Gaussian Product and pp. 1193–1197, July 2007.
parallel transport, which are the elementary tools required to [23] R. I. Hartley, J. Trumpf, Y. Dai, and H. Li, “Rotation averaging,”
International Journal of Computer Vision, vol. 103, no. 3, pp. 267–305,
extend TP-GMM and GMR to Riemannian manifolds. The July 2013.
selection of the Riemannian manifold approach is motivated [24] P. Englert and M. Toussaint, “Inverse KKT–learning cost functions
by the potential of extending the developed approach to other of manipulation tasks from demonstrations,” in Proc. Intl Symp. on
Robotics Research (ISRR), 2015.
Riemannian manifolds, and by the potential of drawing bridges [25] A. Doerr, N. Ratliff, J. Bohg, M. Toussaint, and S. Schaal, “Direct
between other robotics challenges that involve a rich variety loss minimization inverse optimal control,” in Proceedings of Robotics:
of manifolds (including perception, planning and control prob- Science and Systems (RSS), 2015, pp. 1–9.
[26] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum en-
lems). tropy inverse reinforcement learning.” in AAAI Conference on Artificial
Intelligence, 2008, pp. 1433–1438.
[27] S. Levine and V. Koltun, “Continuous inverse optimal control with
R EFERENCES locally optimal examples,” in Proceedings of the 29th International
Conference on Machine Learning (ICML), July 2012, pp. 41–48.
[1] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The
[28] C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep in-
MIT Press, 2012.
verse optimal control via policy optimization,” in Proceedings of 33rd
[2] A. G. Billard, S. Calinon, and R. Dillmann, “Learning from humans,”
International Conference on Machine Learning (ICML), June 2016, pp.
in Handbook of Robotics, B. Siciliano and O. Khatib, Eds. Secaucus,
49–58.
NJ, USA: Springer, 2016, ch. 74, pp. 1995–2014, 2nd Edition.
[3] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilistic
movement primitives,” in Advances in Neural Information Processing
Systems (NIPS), 2013, pp. 2616–2624.
[4] S. Calinon, “A tutorial on task-parameterized movement learning and
retrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, January
2016.