Beruflich Dokumente
Kultur Dokumente
Graph theory
Complex
systems
Informatics
Modeling
Physiology
Biochemistry Systems
Chemical analysis
Deterministic Reductionist
http://www.thefullwiki.org/Hypercube
Univariate: Properties
vector of length m
mean
variance
Univariate: Representations
Univariate: Assumptions
Normality
Univariate: Utility
Hypothesis testing
- type I error ( False Positive)
- type II error ( False negative)
power - (1)
effect size - standardized difference in mean
Univariate: Limitations
Biological definition of the mean ?
Relationship between sample size and test power
Multiple hypothesis testing
False discovery rate
Old Faithful Data
272 observations
time between eruptions
70 14 min
duration of eruption
3.5 1 min
Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics 39, 357365
Bivariate: Properties
Matrix of 2 vectors of length m
Bivariate: Representations
(X,Y)
Bivariate: Utility
bivariate distribution
correlation
(X,Y)
Variable 2 = m*Variable 1 + b
Bivariate: Limitations
http://en.wikipedia.org/wiki/Correlation
Bivariate: Limitations
Sensitive to outliers
http://en.wikipedia.org/wiki/Correlation
Old Faithful
Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics 39, 357365
Old Unfaithful?
Old Unfaithful?
Additional
variables
Nearby
hydrofracking
Improve
inference
based on
more
information
Old Unfaithful?
Additional
variables
Nearby
hydrofracking
Improve
inference
based on
more
information
Multivariate: Properties
Challenges
A matrix of n vectors of length m
data often wide structured
integration
noise
Rewards
robust inference
signal amplification
Correlation matrix
holistic/systems approach
Multivariate: Dimensional Reduction
Principal Components Analysis (PCA)
PC 1
PC 2
Multivariate: Dimensional Reduction
Calculating PCs: singular value decomposition
(SVD)
Eigenvalue
Wall, Michael E., Andreas Rechtsteiner, Luis M. Rocha."Singular value decomposition and principal component analysis". in A
Practical Approach to Microarray Data Analysis. D.P. Berrar, W. Dubitzky, M. Granzow, eds. pp. 91-109, Kluwer: Norwell, MA
(2003). LANL LA-UR-02-4001.
Multivariate: Representations
Old Faithful 2.0
272 measurements A matrix of n vectors of length m
8 variables
2 real, 6 random noise
Multivariate: Representation
Number of PCs
can be used true
data complexity
Identify Identify
outliers interesting
using all measurements
groups Evaluate uni-toand
Use known bivariate
impute missing
observations
PCA: Considerations
data pre-treatment
outliers no pre-
treatment
noise
unsupervised projection
centered and
scaled to unit
variance
PCA: Considerations
Use ICA to calculate statistically
data pre-treatment independent components
outliers
linear reconstruction
noise
Independent components analysis
(ICA)
unsupervised projection
PCA: Considerations
data pre-treatment
outliers
linear reconstruction
noise
supervised projection
Non-negative matrix factorization
(NMF)
NMF uses additive parts based encoding
Weaknesses
Need to derive an empirical reference for model performance
Poor established model optimization methods
PLS-DA: Example
Data: Old Faithful 2.0
Select the appropriate number Latent
272 observations on 8 Variables (LVs) to maximize Q2
variables
Utility
Project statistical results into a biological context
Explore informative data aspects in the context of all that was
observed.
Identify emergent patterns
Networks
Interpret statistical results
within a biological context
Networks
Highlight changes in patterns of relationships.
http://sourceforge.net/apps/mediawiki/imdev
Acknowledgements
Newman Lab
Designated
Emphasis in
Biotechnology (DEB)
NIH
This project is funded in part by the NIH grant NIGMS-NIH T32-GM008799, USDA-ARS
5306-51530-019-00D, and NIH-NIDDK R01DK078328 -01.