Beruflich Dokumente
Kultur Dokumente
This article is about the chain rule in calculus. For the In integration, the counterpart to the chain rule is the
chain rule in probability theory, see Chain rule (probabil- substitution rule.
ity). For other uses, see Chain rule (disambiguation).
In calculus, the chain rule is a formula for computing the
1 History
The chain rule seems to have rst been used
by Leibniz.
He used it to calculate the derivative of a + bz + cz 2
as the composite of the square root function and the function a+bz +cz 2 . He rst mentioned it in a 1676 memoir
(with a sign error in the calculation). The common notation of chain rule is due to Leibniz.[1] L'Hpital uses the
chain rule implicitly in his Analyse des inniment petits.
The chain rule does not appear in any of Leonhard Euler's analysis books, even though they were written over a
hundred years after Leibnizs discovery.
2 One dimension
2.1 First example
Suppose that a skydiver jumps from an aircraft. Assume
that t seconds after his jump, his height above sea level
in meters is given by g(t) = 4000 4.9t 2 . One model for
derivative of the composition of two or more functions. the atmospheric pressure at a height h is f(h) = 101325
That is, if f and g are functions, then the chain rule ex- e0.0001h . These two equations can be dierentiated and
presses the derivative of their composition f g (the func- combined in various ways to produce the following data:
tion which maps x to f(g(x)) in terms of the derivatives
of f and g and the product of functions as follows:
g(t) = 9.8t is the velocity of the skydiver at time t.
Demonstrates the chain rule with z a function of y which is a
function of x .
(f g) = (f g) g .
This can be written more explicitly in terms of the variable. Let F = f g, or equivalently, F(x) = f(g(x)) for all
x. Then one can also write
(f g)(t) is the rate of change in atmospheric pressure with respect to time at t seconds after the skydivers jump and is proportional to the buoyant force
on the skydiver at t seconds after his jump.
dz dy
dz
=
.
dx
dy dx
1
ONE DIMENSION
dicult. The utility of the chain rule is that it turns a The points where the derivatives are evaluated may also
complicated derivative into several easy derivatives.
be stated explicitly:
The chain rule states that, under appropriate conditions,
(f g) (t) = f (g(t)) g (t).
dy
dy
du
=
.
dx x=c
du u=g(c) dx x=c
2.2
Statement
.
For concreteness, consider the function
dx
du dx
The rule is sometimes abbreviated as
2.3
Further examples
3
where fa..a = fa and fa..b (x) = x when b < a . Then
the chain rule takes the form
y = esin x .
[
tions:
Df1..n = (Df1 f2..n )(Df2 f3..n ) . . . (Dfn1 fn..n )Dfn =
Dfk
k=1
y = f (u) = eu ,
u = g(v) = sin v,
v = h(x) = x2 .
f1..n
(x)
f1
(f2..n (x))
f2
(f3..n (x)) . . .
fn1
(fn..n (x))
fn (x)
k=1
dy
= f (u) = eu ,
du
du
= g (v) = cos v,
dv
dv
= h (x) = 2x.
dx
= f (x)
+ f (x)
.
g(x)
dx g(x)
dy
dy
du
dv
To compute the derivative of 1/g(x), notice that it is the
=
,
dx
du u=g(h(a)) dv v=h(a) dx x=a
composite of g with the reciprocal function, that is, the
function
that sends x to 1/x. The derivative of the recipor for short,
rocal function is 1/x2 . By applying the chain rule, the
last expression becomes:
dy du dv
dy
=
.
(
)
dx
du dv dx
1
1
f (x)g(x) f (x)g (x)
f
(x)
+f
(x)
g
(x)
=
,
The derivative function is therefore:
2
g(x)
g(x)
g(x)2
which is the usual formula for the quotient rule.
2
dy
= esin x cos x2 2x.
dx
f (g(x))g (x) = 1.
(f g) (a) = lim
xa
ONE DIMENSION
f (g(x)) f (g(a))
.
xa
lim
xa
.
g(x) g(a)
xa
{
Q(y) =
f (y)f (g(a))
,
yg(a)
f (g(a)),
y = g(a),
y = g(a).
We will show that the dierence quotient for f g is always equal to:
Q(g(x))
g(x) g(a)
.
xa
2.4
Higher derivatives
2.6
2.5.2
Second proof
and a function (h) that tends to zero as h tends to zero, Q(y) = f (g(a)) + (y g(a)).
and furthermore
The need to dene Q at g(a) is analogous to the need to
dene at zero.
g(a + h) g(a) = g (a)h + (h)h.
Here the left-hand side represents the true dierence between the value of g at a and at a + h, whereas the right- If y = f (x) and x = g(t) then choosing innitesimal
hand side represents the approximation determined by the t = 0 we compute the corresponding x = g(t +
derivative plus an error term.
t) g(t) and then the corresponding y = f (x +
In the situation of the chain rule, such a function exists x) f (x) , so that
because g is assumed to be dierentiable at a. Again by
assumption, a similar function also exists for f at g(a).
y
y x
Calling this function , we have
=
t
x t
and applying the standard part we obtain
f (g(a) + k) f (g(a)) = f (g(a))k + (k)k.
The above denition imposes no constraints on (0), even
though it is assumed that (k) tends to zero as k tends to
zero. If we set (0) = 0, then is continuous at 0.
Proving the theorem requires studying the dierence
f(g(a + h)) f(g(a)) as h tends to zero. The rst step
is to substitute for g(a + h) using the denition of dierentiability of g at a:
dy
dy dx
=
dt
dx dt
which is the chain rule.
3 Higher dimensions
The simplest generalization of the chain rule to higher dimensions uses the total derivative. The total derivative
f (g(a+h))f (g(a)) = f (g(a)+g (a)h+(h)h)f (g(a)).
is a linear transformation that captures how the function
in all directions. Fix dierentiable functions f
The next step is to use the denition of dierentiability changes
m
:
R
3 HIGHER DIMENSIONS
as matrices. The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two
m
(y1 , . . . , yk ) (y1 , . . . , yk ) u
derivatives corresponds to the product of their Jacobian
=
.
xi
u
xi
matrices. From this perspective the chain rule therefore
=1
says:
More conceptually, this rule expresses the fact that a
change in the xi direction may change all of g1 through
gk, and any of these changes may aect f.
Jf g (a) = Jf (g(a))Jg (a),
In the special case where k = 1, so that f is a real-valued
function, then this formula simplies even further:
or for short,
y u
y
=
.
xi
u xi
m
Jf g = (Jf g)Jg .
=1
3.1 Example
The chain rule for total derivatives implies a chain rule for
partial derivatives. Recall that when the total derivative Main article: Fa di Brunos formula Multivariate
exists, the partial derivative in the ith coordinate direc- version
tion is found by multiplying the Jacobian matrix by the
ith basis vector. By doing this to the formula above, we
Fa di Brunos formula for higher-order derivatives of
nd:
single-variable functions generalizes to the multivariable
case. If y = f(u) is a function of u = g(x) as above, then
the second derivative of f g is:
(y1 , . . . , yk ) (u1 , . . . , um )
(y1 , . . . , yk )
=
.
xi
(u1 , . . . , um )
xi
( y 2 uk ) ( 2 y uk u )
2y
Since the entries of the Jacobian matrix are partial deriva=
+
.
xi xj
uk xi xj
uk u xi xj
tives, we may simplify the above formula to get:
k
k,
Further generalizations
Quotient rule
Triple product rule
Product rule
Automatic dierentiation, a computational method
that makes heavy use of the chain rule to compute
exact numerical derivatives.
6 References
[1] Omar Hernndez Rodrguez and Jorge M. Lpez Fernndez (2010). A Semiotic Reection on the Didactics of
the Chain Rule (PDF). The Montana Mathematics Enthusiast 7 (23): 321332. ISSN 1551-3440.
In abstract algebra, the derivative is interpreted as a morphism of modules of Khler dierentials. A ring homomorphism of commutative rings f : R S determines a
morphism of Khler dierentials Df : R S which
sends an element dr to d(f(r)), the exterior dierential of
f(r). The formula D(f g) = Df Dg holds in this context
as well.
See also
Integration by substitution
Leibniz integral rule
7 External links
Hazewinkel, Michiel, ed. (2001), Leibniz rule,
Encyclopedia of Mathematics, Springer, ISBN 9781-55608-010-4
Weisstein, Eric W., Chain Rule, MathWorld.
Khan Academy Lesson 1 Lesson 3
http://calculusapplets.com/chainrule.html
The Chain Rule explained
8.1
Text
8.2
Images
8.3
Content license