Sie sind auf Seite 1von 81

Basic Principles in Bioinformatics:

Understanding Microarrays

Pierre Farmer/Pascale Anderle


Swiss Institute for Bioinformatics/ISREC
Aim of This Course

Rapid overview of microarray


technologies

Introduction to different bioinformatic


solutions related to microarrays
Overview of the Course

Part I
Introduction into the microarray technology
Illustration of typical biological questions related to microarray studies
Short presentation of methods being used for the analysis of microarray data

Part II (TP)
Discussion of biological questions and how they could be answered applying
microarray data mining

Part III
Functional classification
Biological Problem

What is the difference between a tumor and


healthy tissue?

Are the different types of tumors?


Biological Fundamentals
Biological Fundamentals

Transcriptome: Genes Microarrays

Proteome: Proteins
Genomics Fundamentals

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Intron 3


Genomic DNA: ATGC
Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 Intron 3

Transcription

Exon 1 Exon 2 Exon 3 Messenger RNA: AUGC

Reverse Transcription: RT-PCR

Exon 1 Exon 2 Exon 3


cDNA: ATGC
Exon 1 Exon 2 Exon 3

PCR

Exon 2 Exon 3
cDNA/PCR product: ATGC

Exon 2 Exon 3
Introduction into Microarray Technology : Sample

Tumor Tissue
Normal Tissue

Protein

CCAGGCAAUAAAAAA

CCAGGCAAUAAAAAA mRNA A U G AGUAAUAAAAAA

A U G AGUAAUAAAAAA

CCAGGCAATAAAAAA

CCAGGCAATAAAAAA
A T G AGTAATAAAAAA
A T G AGTAATAAAAAA

Signal N < Signal T


Signal N << Signal T
Introduction into Microarray Technology

Normal Tumor

Gene A Gene B Gene C


Introduction into Microarray Technology
Spotting:
Probes

Physical support:
Photolithography Glass slide,
nylon membrane
Printing

Oligomers PCR products

Sample preparation and hybridization:


cRNA or cDNA
Single-labeling or dual-labeling
Affymetrix: Fluorescence or radioactivity cDNA chip:
Short oligo chip Oligos or PCR products
Single labeling Dual-labeling

or
Microarray: Definition

Microarray analysis is a technology that allows to


simultaneously detect the expression of thousands of genes in
a small sample.

Microarrays are simply ordered sets of DNA molecules of


known sequence fixed on a physical support.
Different Microarray Platforms
Definition of biological questions

Experimental design

Custom array Commercial array


PCR products Short oligos: Affymetrix
Oligomers Long oligos: Agilent

Chip preparation
Probe design
Probe preparation
Printing

Sample preparation
cRNA/cDNA Labeling

Hybridization

Scanning

Data Acquisition and Data Analysis


Making the Chip: Probe Design

Choosing genes of interest for the experiment

Probe selection strategy should ensure:


everything! Or.
Biologically meaningful results (The truth...)
Coverage, sensitivity (... The whole truth...)
Specificity (... And nothing but the truth)
Annotation
Making the Chip: Probe Design
Coding region (ORF)
Annotation relatively safe
No problems with alternative polyA sites
No repetitive elements or other funny sequences
Danger of close isoforms
Danger of alternative splicing

3 Untranslated region
Annotation less safe
Danger of alternative polyA sites
Danger of repetitive elements
Less likely to cross-hybridize with isoforms
Little danger of alternative splicing
5 Untranslated region
Close linkage to promoter
Frequently not available

5utr EXON A EXON B 3utr

INTRON
Probe Design for Custom Array

Run hmmsearch Filter genes


Keywords, Search Pfam HMM Putative
against GenPept (human only, set cut off,
seed sequences HMM db Models new genes
db eliminate red. genes)

Transporters: 670
Channels: 263
Transporters: 316
Channels: 151
Contigs: 156
Positive Controls: 9 Run Pick70
Run Multiple alignment and
Negative Controls: 3 Tm = 70, Palindrome
Pick70 selection of repr. genes
Controls (diff. Oligos): 9 Uniqueness = 15 bp
RGS: 75
FGF/RGF-like: 7 236 Contigs and singlets
ADAM family: 18

Assemble contigs

Remove vector and


characterized ESTs

Protein seed Converged Core Protein Blast human EST nucleotide


sequence PSI-Blast Family EST db sequence

Brown et al. AAPS PharmSci. 2003 Anderle et al. Pharm Res. 2003
THE EXPERIMENT : Printing I
The microspotting is done by a robot called arrayer
THE EXPERIMENT : Printing II

Microspotting
THE EXPERIMENT : Printing III
Oligo-spotting (Photolithography)
Summary

?
Microarray Analysis: Data Analysis

Definition of biological questions

Experimental design

}
Scanning and Processing images

Calculation of expression values per probe set


Low level analysis
Normalization across chips

}
Statistical analysis of expression values

Clustering of expression values

Annotation of probe sets High level analysis


Functional classification

Biological interpretation of data


Data Analysis: Processing of Images II

Addressing or gridding
Assigning coordinates to each of the spots
Segmentation
Classification of pixels either as foreground or as background
Intensity extraction (for each spot)
Foreground fluorescence intensity pairs (R, G)
Background intensities FG
Quality measures FG
M
Fluorescence Signal to Expression Level

GTTAAGCGTTCCGATGCTACTTACC PM
GTTAAGCGTTCCCATGCTACTTACC MM
Probes

Probes

mRNA reference sequence


= representative sequence

Consensus sequence
Fluorescence Signal to Expression Level I

Example: Affymetrix

~ 30 % MM signal > PM signal


Probes of given set mapped to different UniGene clusters
Same probe mapped to different UniGene clusters
Ca. 340 MM mapped to UniGene clusters
Computing Expression Values
Microarray Analysis Suite (MAS 5.0):

signal = TukeyBiweight{log( PM j MM *j )}
with MM*, a version of MM that is never bigger than PM, Tukey biweight is a type of
robust estimator...

Li and Wong model:

PMij MM ij = i j + ij , ij N(0, 2 )
i is gene expression in chip i, j is rate of increase of PM response over MM (probe-
specific effect)

Robust multi-chip analysis (RMA)

log(PM ij BG) = ai + b j + ij
Use only PM, ignore MM, assumes additive model (on log scale), estimates chip
effects ai and probe effects bj using a robust method (median polish)
MAS 5 vs. RMA: A Values
MAS 5 vs. RMA: M vs. A Plot

RMA MAS 5
Data Analysis: Transformation (Coding)
Log2 transformation
No transformation

Effect of different scheme of data transformation on the total distribution of expression


values. Data: Alon et al. PNAS 1999

Ratios:
un-transformed Log2 transformed:
2 distance 1 distance
2 1 2X = y; log2(y) = x
0 0 22 = 4; log2(4) = 2
0.5 1

0.5 -1
Data Analysis: Normalization I

Tentative separation of systematic sources of variation ("artefacts") that bias


the results from random sources of variation ("noise") that hide the truth.

Typical Statistical Approach:

Measured value = real value + systematic errors + noise

Corrected value = real value + noise

Analysis of corrected value => (unbiased) CONCLUSIONS

Examples of systematic errors:


Fluorochrome incorporation
Spatial bias
Data Analysis: Normalization II

Self-self hybridization: Non-normalized data No Self-self hybridization: Non-normalized data

Scatter (MVA-)plots
Normalization: global

Normalization based on a global adjustment

log2 R/G log2 R/G - c = log2 R/(kG)

Common choices for k or c = log2k are c = median or mean of log


ratios for a particular gene set (e.g. all genes, or control or
housekeeping genes)

Another possibility is total intensity normalization, where k = Ri/ Gi


Median centering Normalization
Ratio

0 2 Log2 Ratio
Data Analysis: Normalization III
Methods:
Median center: MEDIAN log2( CY3/CY5) = 0

CY5 CY5

Linear Transformation

CY3 CY3
CY5

CY3
Why is not satisfactory? More noise with lowexpressed genes
Data Analysis: Use of M vs A Plot

1. Logs stretch out region we are most interested in.


2. Can more clearly see features of the data such as intensity dependent variation,
and dye-bias.
3. Differentially expressed genes more easily identified.
4. Intuitive interpretation
M = log R/G = logR - logG A = ( logR + logG ) /2
Data Analysis: Normalization IV
M M

0 0

A A
Magnification

M M M
0 0
0

A A A

Loess correction
Data Analysis: Normalization IV

0
Sub-array

A
Array
M

0
Regional Variation

Spatial Bias

A
Data Analysis: Normalization V

Use of spikes
Before normalization After normalization
Data Analysis: Low Level Analysis

Summary:
Chip has been built!
Signals have been measured!
Systematic errors have been removed!
Data Analysis: Limitations
Problems in data analysis
Limitations of traditional biological interpretations:

Complexity (10 000 genes)

How to distinguish a true positive result from a false positive?

Methods:

1. Supervised learning: k-Nearest neighbor, LDA

2. Non-supervised learning: Clustering


Data Analysis: Clustering
Objectives

Gene discovery/Class identification Sample/Gene classification


Looking for characterization of the Finding genes, combinations of genes
components of the data set, or samples that match a particular a
without a priori input on cases or genes priori pattern
Labels are not used Labels are used

Unsupervised learning Supervised learning

Hierarchical clustering/Dendrograms LDA


K-means clustering k-NN classification
Self organizing maps (SOM)
Supervised vs. Unsupervised Learning: Examples

1. Example

Identification of genes that are responsible for the fact that some patients respond differently to a
certain type of chemotherapy

2. Example

Identification of genes or group of genes that explain the difference between tumor tissue and
non-tumor tissue based on the expression profile of ~100 samples (60 tumor tissue/ 40
healthy tissue)

3. Example

Identify a group of genes that are co-regulated upon a given treatment


Unsupervised Learning Problems
Unsupervised Methods

Circularity of spots This is clustering!

Length of neck

Similar objects are grouped together


How do we measure similarity
Agglomerative Hierarchical Clustering I

Before doing such clustering, one has to define two things:

1- The similarity measure between two genes (or experiments)

Correlation: Distance = 1 - R
Euclidean: Distance = sqrt((x1-x2)2+ (y1+y2)2)

Sample 2 Sample 3

Sample 1 Sample 1

2- The distance measure between the new cluster and the others

Single Linkage: Distance between closest pair.


Complete Linkage: Distance between farthest pair.
Average Linkage: Distance between cluster centers
Agglomerative Hierarchical Clustering II

Distance between joined clusters

4 2

Gene 1 3

1 3 2 4 5 Gene 2
Dendrogram The
Thedendrogram
dendrograminduces
inducesaalinear
linearordering
orderingof
of
the data points
the data points
Clustering: Defining Clusters
Unsupervised Clustering: Example

Sorlie et al. Proc Natl Acad Sci U S A 2001 Sep 11;98(19):10869-74


Supervised Methods: Learning Problems

Which criteria should we use?


Supervised Methods: Examples
k-Nearest Neighbor (knn)
Data Matrix Gene 2
Gene ?
Sample

Gene 1

PCA
LDA
Gene 2
Gene 2

Gene 1 Gene 1
Supervised Methods: Learning Problems

Which criteria should we use?


Supervised Learning: Problems
Supervised Methods: Cross-Validation
Genes

Labels
Tissues
Microarray Data
Data Matrix

Training Set Test Set Labels


Training Set Labels Test Set

Evaluation

Subset Subset
Training Test
Predicted
Labels
LDA Predictor
Supervised Methods: Experimental Design

Subset 1 Subset 2 Subset 3 Subset 4


Characteristics:

Test set: 15 Tissues

Training set: 45 Tissues

Always same proportions of


Trained Model Trained Model Trained Model Trained Model Normal / Cancer Tissues

Cross Cross Cross Cross


Validation Validation Validation Validation
All data once (and only once)
Learning set Test set in test set

The 4 subsets are used for cross-validation


(Data set from Alon et al. 1999).
Supervised Methods: Students Test LDA I

Group A Group B

t - Statistics
For all Genes-> Compute the t statistics

LDA done with the most differently expressed, then most and the second
mostetc (Cumulative)
Supervised Methods: Students Test LDA II
Effect of the Number of Genes Selected with a Student's t-Test
on the LDA Performance.

120
Percent of correct predictions

100

80

60
Test Set
40 (12,89)
Training Set

20

0
0 10 20 30 40
Number of genes (cummulative)
Summary: Part I

Microarray analysis allows simultaneously detection of the expression of thousands


of genes in a small sample.

Microarray experiments includes:


- Experimental design
- Making of the chip
- Preparation of samples, hybridization, detection of fluorescence signals
- Low level analysis:
- Transformation of fluorescence signal measurement into an
expression level values
- Normalization
- High level analysis
- Clustering, statistical analysis, functional classification
Part II: Practical Course
1. Exercise
In which steps of a typical microarray experiment may optimized computational methods contribute
to an improvement ?

2. Exercise
What features would you include in a probe design program?

3. Exercise
Which methods do you think the authors applied to answer their questions described in the
abstracts?

4. Exercise
What are the principal objectives of a supervised or unsupervised learning method, respectively?

5. Exercise
What do you think are the major limitations of microarrays?

6. Exercise
When would you rather use RMA or MAS5, respectively?

7. Exercise
Why is normalization crucial for the analysis of microarray data?

8. Exercise
How can you relate microarray data and phenotypes?
Part II: Abstract A

Novel genes and functional relationships in the adult mouse gastrointestinal tract identified by microarray analysis.

Bates MD, Erwin CR, Sanford LP, Wiginton D, Bezerra JA, Schatzman LC, Jegga AG, Ley-Ebert C, Williams SS,
Steinbrecher KA, Warner BW, Cohen MB, Aronow BJ.

Division of Gastroenterology, Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati,
Ohio 45229, USA. michael.bates@chmcc.org

BACKGROUND & AIMS: A genome-level understanding of the molecular basis of segmental gene expression along the
anterior-posterior (A-P) axis of the mammalian gastrointestinal (GI) tract is lacking. We hypothesized that functional patterning
along the A-P axis of the GI tract could be defined at the molecular level by analyzing expression profiles of large numbers of
genes. METHODS: Incyte GEM1 microarrays containing 8638 complementary DNAs (cDNAs) were used to define expression
profiles in adult mouse stomach, duodenum, jejunum, ileum, cecum, proximal colon, and distal colon. Highly expressed
cDNAs were classified based on segmental expression patterns and protein function. RESULTS: 571 cDNAs were expressed
2-fold higher than reference in at least 1 GI tissue. Most of these genes displayed sharp segmental expression boundaries, the
majority of which were at anatomically defined locations. Boundaries were particularly striking for genes encoding proteins that
function in intermediary metabolism, transport, and cell-cell communication. Genes with distinctive expression profiles were
compared with mouse and human genomic sequence for promoter analysis and gene discovery. CONCLUSIONS: The
anatomically defined organs of the GI tract (stomach, small intestine, colon) can be distinguished based on a genome-level
analysis of gene expression profiles. However, distinctions between various regions of the small intestine and colon are much
less striking. We have identified novel genes not previously known to be expressed in the adult GI tract. Identification of genes
coordinately regulated along the A-P axis provides a basis for new insights and gene discovery relevant to GI development,
differentiation, function, and disease.

Gastroenterology 2002 May;122(5):1467-82


Part II: Abstract B

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA,
Bloomfield CD, Lander ES.

Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, MA 02139, USA.
golub@genome.wi.mit.edu

Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer
classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer
classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a
test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute
lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to
determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene
expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer,
independent of previous biological knowledge.

Science 1999 Oct 15;286(5439):531-7


Functional Classification

5
2

4
3 4

Microarray Data Analysis Workflow. Existing data (repository) (1)-> generate data (2) -> collect
& manage data (3) (Microarray data management systems) -> analyze interesting sequences
(4) -> depositing into repositories (5)
Functional Classification

Typical questions to be answered with functional classification:

Whether a gene has a known function, and if so, in what class?


Whether genes found to cluster together have been described as being
functionally similar or related (promoter motifs, transcription factors)
Whether homologs or orthologs have been found to be functionally related in
any known physiological or pathological state
Whether the resultant genes are known to be associated with the experimental
conditions tested.
Functional Classification

Grouping and Clustering


Transcriptional Signatures
Identification of common
promoter elements
GO: Gene Ontology and regulatory networks.
Gene product description Name
Cytochrome p450 subfamily 4
HMG CoA synthase
Gene ID
Cyp 4A
HMG-CoA Syn
T1 T2
x
x
Apolipoprotein CIII Apo CIII x

Biological process
Stearoyl-coenzyme desaturase SCOD-1 x
Carnitine palmitoyl transferase-1 CPT-1 x
Fatty acid binding protein FABP x
Phosphoenoyl carboxykinase PEPCK x x

Cellular component Cluster determinant 36 CD36 x

Molecular function

Chromosomal Location Metabolic Pathway Assignment


Chromosomal Location: Annotation

Affymetrix
Representative Sequence

Representative sequence Consensus sequence Probes

BLAT against assembly Tagger


Comparison with UG DB Exact mapping to temp cDNA DB
sequence from UCSC Exact mapping to UG and RefSeq DB

NetAffx SIB annotation


Ensembl DB EnsMart DB
Unigene 4 quality levels

Representative Sequence: Chosen during chip design as a sequence which is best associated with the
transcribed region being interrogated

BLAT threshold: Only records whose match / Qsize >= 75% and; only records whose score >= 0.70, where
score = (match - mismatch - gap# x 5 - gap_size x 2) / Qsize; If record has several mapping locations with score >
0.70, choose the highest one; if a record has several mapping locations with the same highest score, all mapping
locations kept.

EnsMart Approach: cDNA sequence plus an additional length of downstream sequence immediately following
the most 3' exon. The individual probe sequences are mapped, by exact matching. If more than 50 % of probes
mapped, then listed as hits.
Comparison of Various Annotations

NetAffx A: 3209
EnsMart
A: 21545 A: 2686 A: 796 A: 15421
B: 904
B: 22014 B: 5507
B: 8473 B: 499
A: 11269
B: 4027
A: 4381
A: 147
B: 8610
B: 77
Mouse MOE A and B
A: 5085
B: 2533

NetAffx A: 1193
EnsMart
A: 22446 A: 418 A: 14220
Tagger A: 2384 B: 169
A: 20882 B: 22112 B: 2462
B: 7300 B: 355
B: 15247 A: 12460
B: 1853
A: 6409
A: 149
Human U133 A and B B: 12790
B: 85

A: 2657
B: 1728

Tagger
A: 21675
B: 16456
Quality of Probe Sets

Chip High Medium Low Undefined


HG-133A 13792 1663 1103 5657 Mapped on:
HG-133B 3795 790 519 17473 RefSeqs
Mu74v2A 5340 1283 1697 4102
Mu74v2B 2587 969 1190 7665
Mu74v2C 756 302 982 9828
MOE-A 12683 2395 1194 6354
MOE-B 2453 620 592 18846

Chip High Medium Low Undefined Mapped on:


HG-133A 15703 1196 3983 1333 RefSeqs
HG-133B 10096 2026 3125 7330 mRNAs
Mu74v2A 8015 615 2127 1665 ESTs
HTCs
Mu74v2B 7010 1421 2306 1674
Mu74v2C 2600 780 2555 5933
MOE-A 18070 1222 2383 951
MOE-B 11602 2376 2478 6055
Distribution: UGs per Probe Set
100000

10000
Number of Probe Sets

EnsMart A
1000 EnsMart B
Tagger A
Tagger B
100 NetAffx A
NetAffx B

10

1
1 10 100
Number of UniGenes
Functional Classification

Grouping and Clustering


Transcriptional Signatures
Identification of common
promoter elements
GO: Gene Ontology and regulatory networks.
Gene product description Name
Cytochrome p450 subfamily 4
HMG CoA synthase
Gene ID
Cyp 4A
HMG-CoA Syn
T1 T2
x
x
Apolipoprotein CIII Apo CIII x

Biological process
Stearoyl-coenzyme desaturase SCOD-1 x
Carnitine palmitoyl transferase-1 CPT-1 x
Fatty acid binding protein FABP x
Phosphoenoyl carboxykinase PEPCK x x

Cellular component Cluster determinant 36 CD36 x

Molecular function

Chromosomal Location Metabolic Pathway Assignment


Gene Ontology Project
GO Output

Cellular Component Molecular Function Biological processes

L3 L3 L2 L3 GO:Y L3 GO:Z L3

L4 GO:X L4 GO:Y

ABCB1

Two pragmatic purposes of ontology: Ontologies are structured vocabularies in the form
1. Facilitate communication between people of directed acyclic graphs (DAGs) that represent a
and organizations network in which each term may be a child of one or
2. Improve interoperability between systems more than one parent.
Distribution: Probe Sets per UG
100000

U133A
10000 U133B
U133AB
U74Av2
U74Bv2
Number of UniGenes

U74Cv2
1000 U74ABCv2
U74ABCv3_NA
MOE430A
MOE430B
100 MOE430AB

10

1
1 10 100
Number of Probe Sets
Functional Classification II

Grouping and Clustering


Transcriptional Signatures
Identification of common
promoter elements
GO: Gene Ontology and regulatory networks.
Gene product description Name
Cytochrome p450 subfamily 4
HMG CoA synthase
Gene ID
Cyp 4A
HMG-CoA Syn
T1 T2
x
x
Apolipoprotein CIII Apo CIII x

Biological process
Stearoyl-coenzyme desaturase SCOD-1 x
Carnitine palmitoyl transferase-1 CPT-1 x
Fatty acid binding protein FABP x
Phosphoenoyl carboxykinase PEPCK x x

Cellular component Cluster determinant 36 CD36 x

Molecular function

Chromosomal Location Metabolic Pathway Assignment


MAPPFinder GenMAPP

Doniger et al. Genome Biology 2003


http://www.genmapp.org/

GenMAPP:
Gene Microarray Pathway Profiler
KEGG: Kyoto Encyclopedia of Genes and Genomes

The 3 main goals of the KEGG project:

1. Computerizing the current knowledge of genetics, biochemistry, and molecular and cellular biology in
terms of the pathway of interacting molecules or genes
2. Collection of genes catalogs for all organisms with completely sequenced genomes and selected
organisms with partial genomics (consistent annotation)
3. Catalog of chemical elements, compounds and other substances in living cells

Summary of KEGG release 8.0

Kanehisa et al. 2002, Nucleic Acids Research, Ogata et al. 1999, Nucleic Acids Research; http://www.genome.ad.jp/kegg/
Functional Classification II

Grouping and Clustering


Transcriptional Signatures
Identification of common
promoter elements
GO: Gene Ontology and regulatory networks.
Gene product description Name
Cytochrome p450 subfamily 4
HMG CoA synthase
Gene ID
Cyp 4A
HMG-CoA Syn
T1 T2
x
x
Apolipoprotein CIII Apo CIII x

Biological process
Stearoyl-coenzyme desaturase SCOD-1 x
Carnitine palmitoyl transferase-1 CPT-1 x
Fatty acid binding protein FABP x
Phosphoenoyl carboxykinase PEPCK x x

Cellular component Cluster determinant 36 CD36 x

Molecular function

Chromosomal Location Metabolic Pathway Assignment


Signaling Pathways

Similar to other nuclear hormone receptors, PPAR acts as a ligand activated transcription factor. Upon binding fatty acids or hypolipidemic drugs, PPARa
interacts with RXR and regulates the expression of target genes. These genes are involved in the catabolism of fatty acids. Conversely, PPARg is activated by
prostaglandins, leukotrienes and anti-diabetic thiazolidinediones and affects the expression of genes involved in the storage of the fatty acids. PPARb is only
weakly activated by fatty acids, prostaglandins and leukotrienes and has no known physiologically relevant ligand. However, data from PPARb null mice suggest
PPARb does serve a role in fatty acid metabolism and perhaps in skin proliferation and cancer.
Genetic Network Models: Goals

Must incorporate rule-based dependencies between genes


Rule-based dependencies may constitute important biological
information.
Must allow to systematically study global network dynamics
In particular, individual gene effects on long-run network behavior.

Must be able to cope with uncertainty


Small sample size, noisy measurements, robustness

Must permit quantification of the relative influence and sensitivity of


genes in their interactions with other genes
This allows us to focus on individual (groups of) genes.
Microarray and Data Repositories
Name Archival Treatment Visualization Data normalization protocols and data analyses
modules
Acuity dual-color cDNA/oligo dual-color cDNA/oligo dual-color cDNA/oligo. Dendrograms, 2-D interactive global normalization, normalization on control
plots, animated interactive 3-D plots, line graphs, scatter spots, spike controls, or subset of spots.
plots. Hierarchical clustering, K-means, PCA, SOM.
ArrayDB dual-color cDNA/oligo dual-color cDNA/oligo dual-color cDNA/oligo global mean or median ratio based normalization

ArrayInformatics dual-color cDNA/oligo dual-color cDNA/oligo dual-color cDNA/oligo, Affymetrix, Scatter, line and Normalization to LOWESS, total intensity, median
series plots and a cluster image map,. is not supporting ratio or to a user generated gene list, graphing data
XML as of yet. trends after normalization enabling examination of
data variability.
BASE dual-color cDNA/oligo, dual-color cDNA/oligo, dual-color cDNA/oligo, Affymetrix, SAGE global mean or median ratio based normalization,
Affymetrix, SAGE Affymetrix, SAGE Lowess, MDS module
Expressionist Affymetrix Affymetrix Affymetrix, dual-color cDNA/oligo standard data processing and clustering

GeneDirector dual-color cDNA/oligo dual-color cDNA/oligo dual-color cDNA/oligo, Affymetrix ImaGene and GeneSight packagse

GeNet dual-color cDNA/oligo, dual-color cDNA/oligo, dual-color cDNA/oligo, Affymetrix GeneSpring package
Affymetrix Affymetrix
GeneTraffic(Multi) filters, dual-color filters, dual-color filters, dual-color cDNA/oligo, Affymetrix, Global normalization, z-score, Lowess
cDNA/oligo, Affymetrix, cDNA/oligo, Affymetrix, normalization, full and sub-grid, for Affymetrix,
alternative probe based protocol
GeneX dual-color cDNA/oligo, dual-color cDNA/oligo dual-color cDNA/oligo, Affymetrix R routines are available to manipulate the data
Affymetrix (normalization, clustering, etc.)
maxdSQL dual-color cDNA/oligo, dual-color cDNA/oligo, dual-color cDNA/oligo, Affymetrix, maxdView, Filtering based on numerical values. 2-D
Affymetrix Affymetrix expression data class which represents results from one correlation plot with overlay of cluster data,
or more hybridizations and any associated clusters of multidimensional plots.
genes. Profiles viewers.
NOMAD dual-color cDNA/oligo, dual-color cDNA/oligo, dual-color cDNA/oligo, Axon scanner outcome ScanAlyse package: global normalization
Axon scanner outcome Axon scanner outcome
PartisanarrayLIMS filters, dual-color filters, dual-color filters, dual-color cDNA/oligo, Affymetrix, global mean or median ratio based normalization
cDNA/oligo, Affymetrix, cDNA/oligo, Affymetrix,
Resolver Affymetrix, Nylon filters, Affymetrix, Nylon filters, Affymetrix, Nylon filters. Table Viewer: K-means, K- Error models with any experimental replicates
dual-color cDNA/oligo dual-color cDNA/oligo medians clustering, and SOM algorithms. performed, P-values computed and error bars for
every gene expression measurement, ANOVA.
SMD dual-color cDNA/oligo dual-color cDNA/oligo dual-color cDNA/oligo ScanAlyse package: global normalization
Microarray and Data Repositories
Name Data Type Tissue Type Description Web address
GEO Microarray/ Normal and Gene expression and hybridization array data http://www.ncbi.nlm.nih.gov/geo/
SAGE tumor repository
RAD Microarray/ Normal and The ultimate goal is to allow comparative analysis of http://www.cbil.upenn.edu/RAD2/
SAGE tumor experiments performed by different laboratories using
different platforms and investigating different
biological systems.
ExpressDB Microarray/ Yeast Collection of yeast RNA expression datasets http://arep.med.harvard.edu/cgi-
SAGE bin/ExpressDByeast/EXDStart
CleanEx Microarray/ Normal and Gene expression and hybirdization array data http://www.epd.isb-sib.ch/cleanex/
EST tumor repository. SAGE will be added.
libraries
Gene Microarray Tumor Data from 60 cancer cell lines based on Affymetrix http://discover.nci.nih.gov/arraytools
Expression and cDNA technology
Database
SMD Microarray Normal and Extensive collection of cDNA microarray data http://genome-
tumor www.stanford.edu/microarray
SAGEmap SAGE Normal and Data from one hundred SAGE (Serial Analysis of http://www.ncbi.nlm.nih.gov/SAGE/
tumor Gene Expression) CGAP (Cancer Genome Anatomy
Project) libraries
SAGE SAGE Normal and SAGE data from over 600,000 transcripts including http://www.sagenet.org/SAGEData/
tumor SAGE data from human, mouse and yeast transcripts. sagedata.htm
UniGene EST Normal and Collection of EST libraries from different species http://www.ncbi.nlm.nih.gov/UniGene/
libraries tumor
CGAP/Tissue EST Normal and Information on CGAP and other cDNA libraries. http://cgap.nci.nih.gov/Tissues/xProfiler
libraries tumor
BodyMap EST Normal and Database of expression information of human and http://bodymap.ims.u-tokyo.ac.jp
libraries tumor mouse genes in various tissues and cell types.
TissueInfo EST Normal Information on tissue expression profile of a sequence http://icb.mssm.edu/services/tissueinfo/qu
libraries by comparing the given sequence against the EST ery
database. Each EST comes from a library derived
from a specific tissue type
Web Resources : General Information

Leungs Links page & software info


Davisons DNA Microarray Methodology - Flash Animation
gene-chips Overview of the technique, papers
Chips & microassays General information
SMD guide Stanford's links page, very complete
Introduction Online introduction to microarrays (EBI)
Brown Lab Guide Microarrays protocols and arrayer construction.
Web Resources : Data Analysis Tools

Expression Profiler Online clustering and analysis tools (EBI)


GenEx Database, repository and analysis tools (NCGR)
MAExplorer MicroArray Explorer for data mining Gene
Expression, free download
ArrayDB Downloadable tools, short online demo

MAXD Downloadable data warehouse and visualisation


for expression data
Jexpress Java tools for gene expression data analysis, free
download
Eisen Lab Michael Eisen's suite for image quantitation and
data analysis (Scanalyze, Cluster, TreeView).
Downloadable.
Web Resources : Public Databases I

SMD The Stanford Microarray Database


Chip DB Searchable database on gene expression (MIT)
ExpressDB Public queries of E. coli and yeast data
GEO Gene expression data repository and online resource (NCBI)
RAD RNA Abundance Database
Expression Saccharomyces Genome Database expression data retrieval
Connection
EpoDB Expression information retrieval for one gene at a time
yMGV Public queries of yeast data
Web Resources : Public Databases II

AMAD Downloadable web driven database system


ArrayExpress Public data deposition and public queries (EBI)
maxdSQL Downloadable data warehouse and visualization environment
GXD Mouse expression data storage and integration
GeNet Distribution and visualization of gene expression data from any
organism
Web Resources : Public Databases III

Drosophila microarray project Drosophila Metamorphosis Time Course Database

Samson Lab Yeast Transcriptional Profiling Experiments


SageMap NCBI SAGE data and analysis tools
NCI60 cancer project Supplement to Ross et al. (Nat Genet., 2000).
Serum-response Supplement to Lyer et al.(1999) Science 283:83-87
Breast cancer Supplement to Perou et al. Nature 406:747-752(2000)
Cancer Molecular Integration of large databases on gene expression and
Pharmacology molecular pharmacology.
References

Interesting Books

Kohane et al., Microarrays for an integrative genomics, 2003 MIT

Baldi and Hatfield, DNA Microarrays and gene expression, 2002 Cambridge University Press

Jagota, Microarray data analysis and visualization, 2001 Bay Press

Das könnte Ihnen auch gefallen