Sie sind auf Seite 1von 3

Variance-stabilizing transformation for DESeq

For parametrized dispersion fit

This file describes the variance stabilizing transformation (VST) used by DESeq when parametric dispersion estimation is
used.
This is a Mathematica notebook. The file vst.pdf is produced from vst.nb.

When using estimateDispersions with fitType="parametric", we parametrize the relation between mean Μ and dispersion Α
with two constants a0 and a1 as follows:

Α = a0 + a1  Μ

a1
a0 +
Μ

In the package, a0 is called the asymptotic dispersion and a1 the extra-Poisson factor.

The variance is hence

v = Μ + Α Μ2  Expand

Μ + Μ2 a0 + Μ a1

A variance stabilizing transformation (VST) is a transformation u, such that, if X is a random variable with variance-mean
relation v, i.e.,VarHXL = vHEHXLL, then uHXL has stabilized variance, i.e., is homoskedastic.

x dΜ
A VST u can be derived from a variance-mean relation v by uHxL = Ù .
vHΜL
Hence, we can get a general VST with
1
u0 = IntegrateB , 8Μ, 0, x<, Assumptions ® 8a0 > 0, a1 > 0, x > 0<F
v

1+2 x a0 +a1 +2 x a0 H1+x a0 +a1 L


LogA E
1+a1

a0

If u0 is a VST, then so is uHxL = Η u0 HxL + Ξ. Hence, this here is a VST, too:

u = Η u0 + Ξ

1+2 x a0 +a1 +2 x a0 H1+x a0 +a1 L


Η LogB F
1+a1
Ξ+
a0

We will now choose the parameters Η and Ξ such that our VST behaves like log2 for large values. Let us first look at the
asymptotic ratio of the two transformations:
Limit@u  Log@2, xD, x ® ¥, Assumptions ® 8a0 > 0, a1 > 0, x > 0<D

Η Log@2D

a0

Hence, if we set Η as follows, both tranformations have asymptotically the ratio 1.


2 vst.nb

a0
Η=
Log@2D

a0

Log@2D

We also want the difference to vanish for large values:

Limit@u - Log@2, xD, x ® ¥, Assumptions ® 8a0 > 0, a1 > 0, x > 0<D


4 a0
LogB F
1+a1
Ξ+
Log@2D

So, we set

4 a0
LogB F
1+a1
Ξ=-
Log@2D
4 a0
LogB F
1+a1
-
Log@2D

Check that both limits are now correct:

Limit@u  Log@2, xD, x ® ¥, Assumptions ® 8a0 > 0, a1 > 0, x > 0<D

Limit@u - Log@2, xD, x ® ¥, Assumptions ® 8a0 > 0, a1 > 0, x > 0<D

Hence, we arrive at this VST:

FullSimplify@u, Assumptions -> 8a0 > 0, a1 > 0, x > 0<D

1+2 x a0 +a1 +2 x a0 H1+x a0 +a1 L


LogB F
4 a0

Log@2D

This VST (red) now behaves asymptotically as log2 (blue), shown here for typical values for a0 and a1 .

Plot@ 8u . 8a0 ® .01, a1 -> 3<, Log@2, xD<, 8x, 0, 10 000<, PlotStyle ® 8Red, Blue<D

13

12

11

10

2000 4000 6000 8000 10 000

For small values, however, the VST (red) compresses the dynamics much more dramatically than the logarithm (blue) and
the identity (green). This reflects that the strong Poisson noise makes differences uninformative for small values.
vst.nb 3

Plot@ 8u . 8a0 ® .01, a1 -> 3<, Log@2, xD, x<,


8x, 0, 100<, PlotStyle ® 8Red, Blue, Green<, PlotRange ® 80, 20<D
20

15

10

0 20 40 60 80 100

A template for the R code in the function:

CForm@FullSimplify@u, Assumptions -> 8a0 > 0, a1 > 0, x > 0<DD .


8a0 ® asymptDisp, a1 ® extraPois, x ® q<
Log((1 + extraPois + 2*asymptDisp*q +
2*Sqrt(asymptDisp*q*(1 + extraPois + asymptDisp*q)))/
(4.*asymptDisp))/Log(2)

For local dispersion fit


x dΜ
In case of a local dispersion fit, the variance-stabilizing transformation uHxL = Ù is obtained by numerical integration
vHΜL
of the fitted mean-dispersion relation vHΜL (by adding up along a asinh-spaced grid and a fitting a spline). Then, the scaling
parameters Η and Ξ (see above) are chosen such that the VST is equal to log2 for two large normalized count values (for
which the 95- and the 99.9-percentile of the sample-averaged normalized count values are used.)

Das könnte Ihnen auch gefallen