Learning For The Control of Dynamical Motion Systems

Learning for the Control of Dynamical Motion Systems
Pierre-François Marteau, Sylvie Gibet

VALORIA, University of Bretagne Sud, France
{pierre-francois.marteau, sylvie.gibet}@univ-ubs.fr
Abstract biological or artificial. To cope with the need for

plasticity (adaptation to changes), generic control
This paper addresses the dynamic control of multi- performance (similar control principles for various
joint systems based on learning of sensory-motor kinds of mappings, various kinds of articulated chains
transformations. To avoid the dependency of the or neuro-anatomical variability among individuals) and
controllers to the analytical knowledge of the multi- anticipation (predictive capability and optimization of
joint system, a non parametric learning approach is movements), it is more or less accepted in the
developed which identifies non linear mappings neuroscience community that these mappings are
between sensory signals and motor commands learned by biological organisms rather than pre-
involved in control motor systems. The learning phase programmed. For the design of artificial control
is handled through a General Regression Neural system, the biological plausibility of the control
Network (GRNN) that implements a non parametric mechanisms involved is not really considered as an
Nadarayan-Watson regression scheme and a set of issue. Nevertheless, adaptive, predictive and generic
local PIDs. The resulting dynamic sensory-motor capabilities of controlling components are indeed key
controller (DSMC) is intensively tested within the characteristics that have been carefully addressed for a
scope of hand-arm reaching and tracking movements long time, in particular within the optimal control field.
in a dynamical simulation environment. (DSMC) In the scope of complex artificial system design,
proves to be very effective and robust. Moreover, it analytical equations that drive the dynamics and the
reproduces kinematics behaviors close to captured kinematics of the motion system can be difficult to
hand-arm movements. extract, and the corresponding solution to the set of
differential equations fastidious to estimate. In
1. Introduction particular, computational implementations of sensory-
motor controllers require the complete or partial
For humans, moving skillfully and dynamically knowledge of transformations that are directly
requires a long learning phase, acquired during dependent on the multi-joint structure to control.
infancy. After exploring the physical capabilities of the Setting up control strategies for complex system
hand-arm system, both dynamically and kinematically control is consequently not a simple task. In this
for pointing, grasping or tracking tasks, in interaction context, learning or identifying part of the control
with the environment, the performance exploits strategy from the observation of the system behavior is
previous experiences and becomes more accurate, an appealing and efficient approach. The aim of this
smooth and rapid for new tasks. Our approach is paper is to present a generic learning approach for the
inspired by such a sensory-motor learning process. We dynamical control of mechanical articulated systems,
propose a plausible biological model, which uses and more precisely hand-arm systems.
examples of previous performances, and is able to
continuously select control variables, based on 2. Related work
information coming from the sensorial receptors, given
a specific task. Previous work in learning motion control can be
In our approach, the muscular-skeleton motion divided in two main classes, depending on the
system, controlled through sensory feedback (visual or motivations: the first class concerns works which
proprioceptive) involves nonlinear transformations that provide new insights into motor control. This kind of
mix motor commands with sensory information. These work may improve the understanding via simulation
transformations play a central role in the control of of hypothetical strategies that the Central Nervous
multi-articulated chain systems, whether they are System uses to control limb movements. The second
class concerns the design of artificial systems that 3. Inverse sensory-motor control
mimic biological behaviors.
Within the first class of work, numerous The controller system refers to the process of
approaches integrating learning mechanisms have defining a sensory-motor control policy for a
been developed to control sensory-motor systems. muscular-skeleton system and a particular task goal. In
Among them, several significant contributions our model, we assume that the system to move is
highlight two main approaches: those which are composed of a skeleton dynamics part, labeled (D),
looking for an a priori analogy with biological and a skeleton kinematics part, labeled (K) (Fig. 1).
systems (identification of functions of the cerebellum)
[1-3], and the others which are looking for an a Task goal
posteriori analogy with biological systems [4-5].
In the second class of work, the problem of learning (t )
sT (t ), yT (t )
motion control is encompassed by the highly
Sensory
developed field of neural network control [6]. A Dynamic
information:
typical intelligent motion controller tends to output Sensory-Motor Error
visual,
the control signals directly from a neural network, or a Controller
proprioceptive
similar device. Two distinct and competing approaches (DSMC)
are available when facing the problem of learning non y (t ) s (t ) Output
linear transformations (NLT) and in particular non
linear mappings involved in multi-joint control Skeleton Skeleton
(t )
systems: parametric learning (PL) and non parametric Dynamics (D) State Kinematics (K)
learning (NPL) (see [7] and [8] for a review of PL and
NPL models with biological relevance arguments Muscular skeleton system
regarding internal sensory-motor maps). The
fundamental difference between PL and NPL is that Figure 1. Inverse sensory-motor control
PL addresses the learning essentially globally while
NPL addresses it much more locally. In other words, At each time, the muscular skeleton system can be
PL methods try to learn non linear transforms over characterized by its state vector y (t ) , represented by
their whole domain of validity. This means that if a
the joint coordinates and their derivatives ( q, q ) ( q
change in the environment occurs locally, it will
potentially affect the learning process everywhere in being Euler angles or quaternion coordinates).
the definition domain of the transform. Conversely, Moreover, the sensory data are used in a feedback loop
NPL learns the properties of the transform in the to iteratively compute the motor command: given the
neighborhood of each point of interest within the task goal sT (t ) or yT (t ) , the dynamic sensory-motor
definition domain of the transform. Thus, a local controller (DSMC) generates from the error between
update in the learning process does not affect the rest the current values of y (t ) and s (t ) and the task goal a
of the learned definition domain. Multi layer torque input (t ) applied to the direct (DK) system.
Perceptron [9-10] are instances of the PL class with
synaptic weights as parameters, while Probabilistic
Networks or General regression Neural networks [11- s (t ) y s (t )
-1
12] are instances of the NPL class. sT (t ) (K)
Biological relevance can be found for the two kinds (D)-1 (t )
of approaches. Nevertheless, local characteristics of
NPL is undoubtedly a great advantage when y (t )
addressing incremental learning in variable yT (t ) y(t )
environments, since the local modification resulting
from any change does not affect the overall structure
Figure 2. Dynamic sensory-motor controller
of the non linear transform already learned.
Our paper proposes a new learning scheme for the
The controller itself is composed of two inverse
dynamical control of muscular-skeleton systems, based
transformations, as shown in figure 2, where yT (t ) is
on the learning of inverse sensory-motor
transformations. Our learning algorithm is applied to part of the command specified in the state space
the control of an anthropomorphic hand-arm system. and sT (t ) part of the command specified in the output
space. The error signals measured between sensory
outputs and task inputs are generally used as corrective In previous works we developed a gradient-based
information to update the torque command of the controller in a sensory-motor closed-loop
articulated system. The excess of degrees of freedom transformation which integrates neurophysiologic
which characterizes the system (DK) makes the elements. This model has proved to control articulated
inversion of (D) and (K) a redundant problem since the chains and produce motion that globally respects
same sensory outputs may be observed for numerous human motion laws [15]. To implement such a model,
different torque commands and states of the system. all coefficients of the Jacobian matrix J(t) have to be
We briefly present hereinafter our solutions that known for all values of the state vector. These
implicitly inverse the (D) and (K) models and show coefficients directly depend on the structure of the
how to assemble these solutions to build up a dynamic articulated chain to control. Furthermore, for any
sensory-motor controller (DSMC). articulated chain, a specific Jacobian matrix has to be
calculated. One solution to overcome such limitation is
3.1 Inversion of the skeleton dynamics D to introduce a learning scheme, a functionality that
most of biological systems have ingenuously
When controlling mechanical articulated systems, implemented.
we have to design control laws which compute torque A first learning scheme of the inverse kinematics
commands for each joints. This control problem is transformation was proposed in [16]. We extend here
inherently non linear, which means that much of linear the learning to the inverse dynamics, by adding a set of
control theory is not directly applicable. Nonetheless, distributed inverse dynamics modules associated to
one solution, classically used in robotics and computer inverse kinematics modules in which the mappings are
animation, called Proportional Integrative Derivative learned.
law (PID), is issued from linear control theory. For
each internal joint, each PID controller takes as inputs 4. Learning kinematics using General
angular position of the joint and its derivative as well
Regression Neural Networks(GRNN)
as the desired angular position, and computes the
torque output required to produce the desired
GRNN or General Regression Neural Networks
displacement of the joint as expressed by equation 1.
have been proposed by Donald Specht [11-12]. They
are relevant to Nadaraya-Watson kernel regression
(t ) K p ( qT q ) K d ( qT q ) Ki q (t )dt (1) method, or Parzen window methods [17-18].
where q is the angle of the joint, qT is the desired Definitions and assumptions behind the derivation of
the Nadaraya-Watson estimate are detailed briefly
angle, Kp, Kd and Ki are the respective proportional,
hereinafter.
derivative and integral gains. The effect of the PID
Let X be an m-dimensional random variable in (Rm,
controller is to eliminate large step changes in the
Bm) and Y an n-dimensional dependent random
errors, thus smoothing the simulated motion.
variable in (Rn, Bn) such that Y = f(X), where Bm and Bn
are the borel -algebra over Rm and Rn respectively.
3.2. Inversion of the skeleton kinematics K
Let (X,Y) be the joint continuous density function.
Assuming that x and y are measured values for X
Numerous solutions exist to control sensory-motor
systems (see references [1-5] for some approaches that and Y respectively, the expected value of Y given x
exploit learning mechanisms). (the regression of Y upon x ) is given as:
Numerical approaches have been used in computer
graphics as numerical iterative approaches to solve
Y . (Y , x ) . dY
Inverse Kinematics (IK). (IK) can be regarded as a E (Y / x ) (3)
nonlinear optimization problem based on the (Y , x ) . dY
minimization of a scalar potential function defined by:
E( y) M ( y ) sT
T
M ( y ) sT ( 2)
Let Km be a probability density function on Rm. We
assume that Km satisfies the following conditions:
where M ( y ) denotes the forward kinematics mapping
Km is continuous, symmetric, bounded.
from the state space to the observation space Assuming that the underlying density is continuous
and sT denotes the desired sensory output expressed in and smooth enough (first derivatives of evaluated at
the output task space. any x are small), and based on a set of p observation
samples ( xi , yi ) i 1,.., p
, the joint probability density network for which a hidden unit is centered at every
training sample. The RBF units of a GRNN
estimator ( x , y ) using the non parametric Parzen s architecture are usually characterized by Gaussian
window method can be formulated as follows: kernels. The hidden-to-output weights are identified to
the target values, so that the output is a weighted
1 p
(x xi ) T Wx ( x xi ) (y yi )T ( y yi ) average of the target values of the training samples
( x, y) Km Km ( 4)
i 1 2( i ) 2 2( i ) 2 close to the given input case. The only parameters of
the networks are the widths of the kernels associated to
where is a normalizing factor, Wx a positive the RBF units. These widths (often a single width is
diagonal matrix used as weighting coordinates of used) are called smoothing parameters or
vector x , and i the local bandwidth of the kernel bandwidths [17] [18]. They are usually chosen by
Km centered on sample ( xi , yi ) . In general a Gaussian cross-validation or by ad-hoc methods not well-
described. GRNN is a universal approximator for
kernel is chosen such that Km is identified to the smooth functions, so it should be able to solve any
exponential function. smooth function-approximation problem provided
Substituting equation (4) into equation (3) we get: enough data is given. The main drawback of GRNN is
that, like kernel methods in general, it suffers badly
p
(x xi )T Wx ( x xi ) from the lack of learning data.
yi .K m
i 1 2( i ) 2 (5)
E (Y / x ) p
(x xi )T Wx ( x xi ) 4.1. Learning SMC maps using GRNN
Km
i 1 2( i ) 2
To estimate the normalized gradient of the error, the
or: following map f is defined:
p p
yi .K m xi , x yi .wi
(6) y s f ( y, s ) (8)
i 1 i 1
E (Y / x ) p p Where s is the 3D directional vector towards the
K m xi , x wi task sT specified in the sensory space, y the vector of
i 1 i 1
(x xi )T Wx ( x xi ) the state variable. y s is the estimated normalized

with wi Km (7)
2( i ) 2 modification vector within the state space that
minimizes the error between the current output s and
Assumptions:
the task specification sT . Following GRNN memory
(A1) The ( xi , yi ) are i.i.d. (This assumption will
based approach, the calculation of the map f is
change later)
approximated through a variable Gaussian kernel
(A2) Km is a symmetric function around zero with
density estimator as explained below.
the following properties.
Given a set p of learning samples,{(yi, yi, si)}i=1 p,
Km d 1 the state update y s that minimizes the error signal
2
Km d 2 0 calculated from a current state y and a 3D normalized
Km
2
d directional vector s is estimated as the conditional
expectation of y given :
2
(A3) ( x, y ) exists for all k, is continuous, and p
xk 2 yi .K ( i , )
is bounded in a neighborhood around x (and thus we ys E( y / ) i 1 (9)
C
can evaluate the bias and variance of ) where [ y , s ]T , i [ yi , si ]T , C is a
(A4) for all i, i 0 as p normalizing factor, and K a variable Gaussian kernel:
(A5) for all i, p. i as p
T
Under assumptions (A1) to (A5), the consistency of i W i
the kernel estimator is established, as the estimator K ( i , ) exp (10)
E (Y / x ) tends in probability towards E (Y / x ) .
W is a weighting diagonal matrix used to balance the
GRNN calculates the Nadaraya-Watson estimate
GRNN is a normalized Radial Basis Function (RBF)
weighting of sensory information s with state
information y , is a parameter that scales the local Each arm is composed of three joints with six degrees
density, both in the state space and in the sensory of freedom, and each finger is composed of three joints
space: if the density is low, is increased and with four degrees of freedom.
conversely, if the density is high, is lowered. The arms are controlled by (DSMC) controllers,
where the mapping between y and s is learned on the
4.2. Naive GRNN learning algorithm basis of a gradient descent strategy. The learning
processes are carried out for increasing values of the
is selected empirically, since an optimum value number of learning samples p. For each process, a
cannot be determined from a set of observations. thousand of 3D spatial target positions and initial
conditions have been selected randomly to test the
- Initialisation: select a small value , an integer value correctness of the learning process. For these 1000
conditions, the error rate (number of cases where the
p and set i to 0 ( can be a function of p).
arm is not able to reach the target) is calculated.
- Select randomly a state vector y , position the multi-
The experimental settings for this test are the
joint system according to y , and observe the following. A target is considered to be reached when
corresponding sensory outputs s . the residual distance between the arm end-point and
- Select a small normalized change y , position the the target is below 1% of the total length of the
multi-joint system according to y y , and observe extended hand-arm chain. The size of is selected
such that at least 40 neighbors can be provided to
the change in sensory outputs s .
evaluate y .
- Calculate y using [ y , s ]T according to The results of this test are reported in Figure 3. For
equations (9) and (10). about 60000 learning samples, the map f is apparently
If y y , save the association [ y , s ], y well modeled, since the residual error rate is low
(about 0.5%) and very few improvements are gained
as a new learning sample i , yi , create a when increasing p.
corresponding neuron and increment i. 40
- If i<p, loop in 1), stop otherwise Error rate (%)
35
30
4.3. Implementation issues
25
20
When estimating the expectation of the state update
15
y given , the computations of distances in the d-
10
dimensional space are required (d = dimension of
5
the state space + dimension of the sensory space). 0
When summing Gaussian kernels (eq. (9) and (10)), 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81
only the i vectors belonging to the neighborhood of
Figure 3. Error rate as a function of the
are retained. To speed up the computation process, a number of learning samples p.
kd-tree [14] for identifying neighborhoods in
logarithmic time with p can be advantageously used. The fact that p can be chosen very low while
(The kd-tree representation of the stored data leads to maintaining good performances is a major result.
reconsider the architecture of GRNN to implement Generally, for estimating a multivariate function with 9
similar neighborhood search). variables (e.g. 6 degrees of freedom and 3D
coordinates), a kernel density estimator requires above
5. Results 500,000 samples adequately selected. In our case,
p=80,000 seems to be sufficient for the considered
The learning approach is applied on a simulated task. One reasonable explanation is that the sensory-
mechanical system composed of two arms and two motor loop performs a time average over successive
hands, submitted to successive reaching tasks. The gradient estimate values which compensates small
mechanical systems are modeled with ODE (Open errors due to the coarse estimation. A rough gradient
Dynamic Engine)[19], with a 3D custom rendering mapping estimation is consequently quite accurate for
(see Fig. 4.b for the visualization of the 3D character). the reaching task considered in our experiments.
After the learning phase, a simulation process was Model Based on Minimum Torque Criterion. Biological
carried out, for a tracking task which consisted in Cybernetics, vol. 62, 1990, pp. 275-288.
following discrete targets extracted from a motion [2]Wolpert D.M., Miall R.C., Kawato M. Internal models in
capture hand trajectory, according to an adaptive sub- the cerebellum. Trends in Cognitive Science, vol. 2, n°9,
1998, pp. 338-347.
sampling algorithm [20] (see figure 4 a). Furthermore, [3] Spoelstra J., Schweighofer N., Arbib M.A. Cerebellar
the simulation of (DSMC) is linked to this tracking learning of accurate predictive control for fast reaching
task, and applied to a virtual character (see figure 4 b). movements. Biological Cybernetics, 82, 2000, pp. 321-333.
[4] Bullock D., Grossberg S., Guenther F.H. A Self-
Organizing Neural Model of Motor Equivalent Reaching and
Tool Use by a Multijoint Arm. Journal of Cognitive
Neuroscience, vol. 54, 1993, pp. 408-435.
[5] Jordan M.I. Computational motor control. In M. S.
Gazzaniga (Eds.), The cognitive neurosciences. Cambridge,
MA: MIT Press, 1995 pp. 587-609.
[6] Werbos P.J. An overview of neural networks for control.
IEEE Control Systems Magazine, January 1991.
[7] Duda, R. O., & Hart, P. E. Pattern classification and
a) b) scene analysis. New York, NY: Wiley, 1973.
[8] Schaal, S. (in press). Nonparametric regression for
Figure 4. a) Trajectory of the human wrist in learning nonlinear transformations. In: Ritter et al. eds.
the Cartesian space with the localization of Prerational Intelligence in Strategies, High-Level Processes
and Collective Behavior. Kluwer Academic.
the targets; b) Simulation by a virtual
[9] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. 1986.
character for a tracking task Learning Representation by Back-Propagating Errors. Nature
323:533-536.
For this hand-tracking task, the resulting hand [10] Bishop, C.M. Neural Networks for Pattern Recognition,
trajectories obtained through dynamical simulation can Oxford University Press, Oxford, UK, 1995.
be superimposed with the captured trajectories, as [11] D.F. Specht, A General Regression Neural Network
illustrated in figure 4 a). IEEE Trans. Neural Networks, Vol.2, No.6, p568-576, 1991.
[12] D.F. Specht, Probabilistic Neural Networks, Neural
Networks, 3, 1990, 109-118.
6. Conclusion [13] Churchland, P. S. and SeSejnowski, T. J, The
computational brain, MA:MIT Press, 1992.
In this paper, we proposed a Dynamical Sensory- [14] Friedman, J.H., Bentley J.L. and Finkel R.A., An
motor controller (DSMC) for controlling a dynamical algorithm for finding best matches in logarithmic expected
hand-arm system. The controller combines both the time , ACM Trans. Math. Software, 3, 209-226, 1977.
inversion of the kinematics model, from the learning of [15] Gibet S. and Marteau P.F., A Self-Organized Model for
sensory-motor mappings, and the inversion of the the Control, Planning and Learning of Nonlinear Multi-
dynamical system using classical PID controllers. The Dimensional Systems Using a Sensory Feedback, Journal of
Applied Intelligence, Vol.4, 1994, pp. 337-349.
learning of sensory-motor mappings was performed
[16] Gibet S., Marteau P.F. Expressive Gesture Animation
with non parametric learning approaches (GRNN), Based on Non Parametric Learning of Sensory-Motor
based on a variable kernel density estimator and the Models, CASA 2003, Computer Animation and Social
use of a kd-tree architecture to simulate neuron Agents, 7-9 mai 2003.
activation according to a near neighbor search. Despite [17] Watson, G. S. (1964). Smooth regression analysis.
the apparent high memory requirement needed by this Sankhya, Series A, 26, 359-372
kind of estimator, the proposed learning scheme [18] Nadaraya, E. A. (1964). On estimating regression.
behaves properly when used to control articulated Theory Probab. Applic., 10, 186-190.
systems with six degrees of freedom simulated in a [19] Open Dynamics Engine, 2000-2003 Russell Smith.
http://opende.sourceforge.net/
dynamical environment. This result is obtained even if
[20] P.-F. Marteau, S. Gibet. Adaptive Sampling of Motion
the number of learning samples is reasonably low. Trajectories for Discrete Task-Based Analysis and Synthesis
of Gesture. In Gesture in Human-Computer Interaction and
7. References Simulation, 6th GW, Revised Selected Papers, LNCS,
Volume 3881, pp.168-171, 2006.
[1] Kawato M., Maeda Y., Uno Y., Suzuki R.. Trajectory
Formation of Arm Movement by Cascade Neural Network
This document was created with Win2PDF available at http://www.win2pdf.com.
The unregistered version of Win2PDF is for evaluation or non-commercial use only.

Learning For The Control of Dynamical Motion Systems

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Learning For The Control of Dynamical Motion Systems

Hochgeladen von

Copyright:

Verfügbare Formate

Learning for the Control of Dynamical Motion Systems

Pierre-François Marteau, Sylvie Gibet

Abstract biological or artificial. To cope with the need for

(x xi )T Wx ( x xi ) the state variable. y s is the estimated normalized

Das könnte Ihnen auch gefallen