Sloppy Models

Sloppy Models
Jim Wu and Connor Richards
December 11, 2017
1 Introduction
Hi everyone, today we will be introducing sloppy models, what it is, how it relates to other
model dimensionality reduction techniques, and how it is connected to the ideas of simple
models we discussed before in class.
Upon observing the phenomena of life, we naively believe that biological mechanisms are
complex processes that must be modeled by some extremely complicated equation with nu-
merous interacting terms. It seems as if the models would be too complicated and messy to
extract any useful information or any enduring ideas that would describe the phenomenon
well. This notion is, however, unfounded. Although there are numerous mechanisms oc-
curring simultaneously that would affect each other, the reality is that even simple models
would allow us to obtain the most salient features of the system. Not every detail may be
correct, but the simple models provide the gateway and the first stepping stone to attack
the phenomenon of interest further.
Furthermore, from our discussion of renormalization group, simple models is perhaps all
we really need to model the system. To write down the equations of motion and dynamics
for a system, we often write a Lagrangian or Hamiltonian including kinetic terms for the par-
ticles, potential energies that each particle interacts with, the interactions between multiple
agents. It would be impossible to account for all the higher order interactions and we would
1
need to truncate the Lagrangian. Even without infinite terms, we would require an inordi-
nate number of parameters to fully characterize our system. The forest of parameters makes
the model opaque, difficult to understand, and challenging to extract any useful informa-
tion/knowledge. Is it possible to trim the hedges, removing some of these parameters so that
the salient features are still captured without being pedagogically opaque? Renormalization
group flow seems to be the answer, as we have discussed. Through the process of coarse
graining, these higher order interactions do not matter and are irrelevant; the fine details of
these interactions are washed out in the process of looking at macroscopic behaviors. In the
end, only a few parameters are necessary to describe complex systems.
However, it is difficult to apply renormalization group theory to biological data as it
exhibits heterogeneity. Furthermore, it is not obvious if biological data exhibits any form
of scale invariance. This is where techniques like Principal Component Analysis and Sloppy
Models come in. It provides a way in which we can take complex systems and reduce the
model dimensionality to something tractable and easy to comprehend.
2 Principal Component Analysis
In Principal Component Analysis (PCA), we start off with an n × p data matrix X, where
n is the number of repeated experiments done and p denotes the number of parameters or
features we measured. The data matrix is shifted so that the mean of each column is at zero.
We do a singular value decomposition to find the eigenvectors and eigenmodes of the data:
X = UΣWT
Here Σ is a rectangular diagonal matrix that contains the singular values of X, analogous to
eigenvalues. U is an n × n matrix with orthonormal column vectors that are the left singular
vectors of X, and W is a p × p matrix with orthonormal column vectors that are the right
2
singular vectors. Notice that n1 XT X is the covariance matrix and thus
1 T
X X = WΣ2 W (1)
n
Hence, the column vectors of W are the eigenvectors of the covariance matrix with corre-
sponding eigenvalues equal to the square of the singular values of Σ.
This effectively rotates our data or covariance and fits the data with a hyperellipsoid. The
eigenvalues are ranked in a way so that the major axis (or the largest eigenvalue) captures
most of the variance in the data, the second largest mode captures most the remaining
variance, etc. In the end, we can truncate the system after a couple of eigenmodes (ranked in
order from largest to smallest) and conclude that their corresponding eigenvectors effectively
captures most of the variation in the system. In addition, these principal vectors have the
nice property that they are orthogonal to each other, and therefore independent.
The hope of PCA is that the variance along a handful of principal components provides
a reasonable characterization of the complete data set. This can greatly reduce the model
dimensionality by reducing the large number of parameters to a handful of linear combina-
tions of these parameters that best captures the variation in the data. Of course, including
more eigenvectors would increase accuracy, but at the cost of model complexity.
The problem with PCA is the arbitrariness of the cutoff and its sensitivity to scaling. The
first problem will always be present in data analysis as one needs to use their own judgment
in deciding how much of the variance to capture. The latter problem is that it relies on the
eigenvalues of the matrix rather than quantities that are invariant to reparameterization.
Let’s say that one of the variables is a time constant expressed in terms of seconds. If we
rescale to units of years, then the eigenvalues will change and the eigenvalue corresponding
to this new rescaled time unit may become so small that it would look negligible to other
eigenvalues expressed in seconds. Hence, we would believe to ignore this eigenvalue and its
corresponding principal component, which is very problematic. So, to perform PCA, it is
3
important that all variables have the same units. Lastly, PCA assumes linearity when in
fact a nonlinear combination of the variables may be a better descriptor of the data.
So, although PCA is a very powerful technique that reduces the dimensionality of the
model, it is not without flaws. We will need a new approach to analyze data sets and extract
lower complexity models that still fit the data very well and gives a good description of the
system. This is where sloppy models come in.
3 MBAM and Evaporated Parameters
MBAM follows a four step algorithm that relies on the model manifold’s hyperribbon struc-
ture.
1. Find the (locally) least important parameter combination from the eigenvalues of the
FIM
2. Follow a geodesic path oriented along this direction until the manifold boundary is
discovered.
3. Identify the limiting approximation corresponding to this boundary and explicitly eval-
uate it in the model.
4. Calibrate the new model by fitting its behavior to the behavior fo the original model.
Effectively, you remove parameter combinations one at a time unitl the model is suffi-
ciently simple. What constitutes sufficiently simple depends on context, but the resulting
model is a hyper corner of the original model manifold.
By repeatedly evaluating limiting approximations in the mdoel, the irrelevant parameters
are removed and the remaining parameters are grouped naturally into the physically impor-
tant combinations (usually nonlinear combinations of the bare parameters). This naturally
connects the microscopic physics with the emergent, macroscopic physics in a semi-automatic
way
4
3.1 Example: Application to EGFR Chemical Network
An example of an application of sloppy models technique is on the EGFR Chemical network,

which is important for signaling in a developing rat cell. The original model had 48 bar
parameters and a system of 29 differential equations. Since there were so many components
interacting in nonlinear ways, it is very difficult to discern which parameter combinations
are important and using PCA for model reduction would not yield a simple interpretable
model.
However, using MBAM and the ideas of sloppy models, this group was able to reduce
the complex model into a simpler effective model with only 12 parameters and 6 differential
equations (Draw network onto the board for the original model and new model).
In the original network, most of the lines were superfluous and have been removed in the
new model. What remains is the dramatic negative feedback loop (from Erk to P90/RSK to
Ras) that is the mechanism responsible for the system behavior. Qualitatively, this negative
feedback loop is how biologists understood the behavior of the system. MBAM naturally
recovered this part of the network and produced a quantitative model to describe the process.

Sloppy Models

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sloppy Models

Hochgeladen von

Copyright:

Verfügbare Formate

Sloppy Models

Jim Wu and Connor Richards

December 11, 2017

2 Principal Component Analysis

3 MBAM and Evaporated Parameters

An example of an application of sloppy models technique is on the EGFR Chemical network,

Das könnte Ihnen auch gefallen