Beruflich Dokumente
Kultur Dokumente
CS 838
www.cs.wisc.edu/~craven/cs838.html
Mark Craven
craven@biostat.wisc.edu
April 2001
Announcements (4/9)
• talk today
Lise Getoor, Stanford University
Learning Statistical Models from Relational Data
4pm, 1221 Computer Sciences
• I will be out of town Wednesday afternoon through
Friday
– no office hours on Wednesday
– but I will be free most of Tuesday
• first project milestone (due 4/18)
– the algorithm(s) you will be investigating
– the data set(s) you will be using
– 1-3 hypotheses
1
Announcements (4/11)
• guest lecture on Monday
Prof. John Yin, Department of Chemical Engineering
Simulating the Growth of Viruses
read the paper from PSB that’s on the web page
• talk on Monday
Lila Kari, Univ. of Western Ontario
Biological Information Processing: From DNA to
Computation and Back
1:30pm
Center for Mathematical Sciences, B-level Seminar Room
• no office hours today
• first project milestone (due 4/18)
Regulatory Networks
• all cells in an organism have the same genomic
data, but the proteins synthesized in each vary
according to cell type, time, environmental factors
• there are networks of interactions among various
biochemical entities in a cell (DNA, RNA, protein,
small molecules).
• can we infer the networks of interactions among
genes?
2
Eukaryotic Expression Regulation
inactive mRNA
mRNA
primary degradation
RNA control
DNA transcript mRNA mRNA
transcriptional RNA RNA translation
control processing transport control
control control
inactive
protein protein
protein
activity
nucleus cytosol
control
Regulatory Networks
• there are lots of regulatory interactions that occur
after transcription, but we’ll focus on
transcriptional regulation:
– it plays a major role in the regulation of protein
synthesis
– we have good technology for measuring mRNA
levels
3
Transcriptional Regulation
Example: the lac Operon
Transcriptional Regulation
Example: the lac Operon
4
Transcriptional Regulation
Example: the lac Operon
5
Regulatory Network Models
• Boolean networks
[Kaufmann, 1993; Liang, Fuhrman & Somogyi, 1998]
• differential equations
[Chen, He & Church, 1999]
• weight matrices
[Weaver, Workman & Stormo, 1999]
ä Bayesian networks
[Friedman et al., 2000]
6
Bayesian Networks
• now consider the following Bayesian network for
the lac operon
L I
Z Y A
Bayesian Networks
• each node has a table representing conditional
distribution given parent variables
L Pr(L) L I
0 0.8
1 0.2
Z Y A
7
Bayesian Networks
• a Bayesian network provides a factored
representation of the joint probability distribution
L I
Z Y A
Pr( L, I , Z , Y , A) = Pr( L) × Pr( I ) × Pr( Z | L, I ) ×
Pr(Y | L, I ) × Pr( A | L, I )
• representing the joint distribution this way
requires 59 ( 2 + 3 + 18 × 3 ) parameters
8
Causaulity & Bayesian Networks
• more than one graph can represent the same set of
independences
• from observations alone, we cannot distinguish
causal relationships in general
• with interventions (e.g. gene knockouts) we can
9
Learning Bayesian Networks
• scoring function to evaluate a given network
score(G : D) = log Pr(G | D)
∝ log Pr( D | G ) + log Pr(G )
• search procedure
– operations: add, remove, reverse single arcs
– search methods: hill climbing etc.
10
Estimating Confidence in
Features: The Bootstrap Method
• for i = 1 to m
– sample (with replacement) N expression
experiments
– learn a Bayesian network from this sample
11
Confidence Levels of Features
• how can we tell if the confidence values for
features are meaningful?
• compare against confidence values for randomized
data – genes should then be independent and we
shouldn’t find “real” features
g 1,1 g 1, n
g 2 ,1 g 2,n
randomize each
g m ,1 g m ,n row independently
12
Biological Analysis
• using confidence in order relations, identified
dominant genes
– several of these are known to be involved in
cell-cycle control
– several have inviable null mutants
– many encode proteins involved in replication,
sporulation, budding
• assessing confident Markov relations
– most pairs are functionally related
13
Discussion
• extracts a richer structure from data than
clustering methods
– interactions among genes other than positive
correlation
– causal relationships (in some cases)
• compared to other approaches for extracting
genetic networks
– models have probabilistic (not deterministic)
semantics
– focus is on extracting “features” of networks,
not complete networks themselves
14