Beruflich Dokumente
Kultur Dokumente
Part 2
CS 838
www.cs.wisc.edu/~craven/cs838.html
Mark Craven
craven@biostat.wisc.edu
April 2001
Announcements
• reading for next week
– Friedman et al., Journal of Computational
Biology 2000
– Brazma et al., Genome Research 1998
– Craven et al., ISMB 2000
1
The Course Project
• implement an algorithm or two
• experiment with it on some real data
• milestones
– description of the basic area (due 4/16)
• the algorithm(s) you will be investigating
• the data set(s) you will be using
• 1-3 hypotheses
– description of your experiments (due 4/23)
• how you will test your hypotheses
• data to be used
• what will be varied
• methodology
– final write-up (due 5/16)
• 8-10 pages similar in format to a CS conference paper
• prototypical organization: introduction, description of
methods, description of experiments, discussion of results
Non-Hierarchical Clustering
2
K-Means Clustering
• assume our objects are represented by vectors of
real values
• put k cluster centers in same space as objects
• now iteratively move cluster centers
+ +
+
object cluster center
K-Means Clustering
• each iteration involves two steps
– assignment of objects to clusters
– re-computation of the means
+ + + +
+ +
+ +
3
K-Means Clustering
given : a set X = { x1 ... x n } of objects
r r
r r
select k initial cluster centers f1 ... f k
while stopping criterion not true do
for all clusters c j do
{ ( ) (
c j = xi | ∀f l sim xi , f j ≥ sim xi , f l
r r r r r
)}
r
for all means f j do
f j = µ (c j )
r
K-Means Clustering
4
The CLICK Algorithm
• Sharan & Shamir, ISMB 2000
• objects to be clustered (e.g. genes) represented as
vertices in a graph
• weighted, undirected edges represent similarity of
objects
1
5
4
1 6
N ( µ F , σ F2 ) for non-mates
5
CLICK: How Do We Get Graph?
• let f ( S ij | i, j are mates) be the probability
density function for similarity values when i and j
are mates
• then set the weight of an edge by:
return V (G )
else /* partition graph, call recursively */
( H , H ) ← MinWeightC ut (G )
BasicCLICK ( H )
BasicCLICK ( H )
6
Minimum Weight Cuts
• a cut of a graph is a subset of edges whose
removal disconnects the graph
• a minimum weight cut is the cut with the smallest
sum of edge weights
• can be found efficiently
1
5
4
1 6
7
Deciding When a Subgraph
Represents a Kernel
• if we assume a complete graph, the minimum
weight cut algorithm finds a cut that minimizes
this ratio, i.e.
Pr( H1C | C )
weight (C ) = log
Pr( H 0C | C )
8
The Full CLICK Algorithm
9
CLICK Experiment:
Fibroblast Serum Response Data
• show table 2 from paper
10
Measuring Homogeneity
• average similarity of objects to their clusters
1
H ave = ∑ sim( F (u ), F (cluster(u )))
| N | u∈N
Measuring Separation
• average separation of pairs of clusters
1
S ave = ∑ | X i || X j | sim( F ( X i ), F ( X j ))
∑ | X i || X j | i ≠ j
i≠ j
11
CLICK Experiment:
Fibroblast Serum Response Data
12