Beruflich Dokumente
Kultur Dokumente
Abstraction, Invariance
36-350: Data Mining
2 September 2009
1
• Medical: x-rays, brain imaging, histology (“do
these look like cancerous cells?”)
• Satellite imagery
• Fingerprints
• Finding illustrations for lectures...
2
Searching for Images by
Searching for Text
• Assume there’s text accompanying the images
(“annotation”)
• tags
• Search those text records with the query phrase
• Take images which appear close to the query
phrase on highly-ranked records
4
...and
sometimes
it doesn’t;
depends on
the text!
5
Searching for images by
representing images
• For text, we only cared about features, and
only worked with feature vectors
• Abstraction
6
Abstraction
7
Abstract level: feature vectors
Similarity Dimensionality
Classification Clustering etc.
matching Reduction
v1 v2 v3 v4 v5 v6
BoW
BoW
BoW
BoW
BoW
BoW
Text 1 Text 2 Text 3 Text 4 Text 5 Text 6
v1 v2 v3 v4 v5 v6
Topics
Topics
Topics
Topics
Topics
Topics
Text 1 Text 2 Text 3 Text 4 Text 5 Text 6
v1 v2 v3 v4 v5 v6
Bitmap
Bitmap
Bitmap
Bitmap
Bitmap
Bitmap
Pic. 1 Pic. 2 Pic. 3 Pic. 4 Pic. 5 Pic.6
v1 v2 v3 v4 v5 v6
Bag of colors
Bag of colors
Bag of colors
Bag of colors
Bag of colors
Bag of colors
Pic. 1 Pic. 2 Pic. 3 Pic. 4 Pic. 5 Pic.6
v1 v2 v3 v4 v5 v6
Motifs
Motifs
Motifs
Motifs
Motifs
Motifs
Network 1 Network 2 Network 3 Network 4 Network 5 Network 6
13
Abstract level: feature vectors
Similarity Dimensionality
Classification Clustering etc.
matching Reduction
v1 v2 v3 v4 v5 v6
Bag of colors
Bitmap
Topics
Motifs
BoW
BoW
Social
Text 1 Text 2 Text 3 Network
Pic. 1 Pic. 2
Concrete level: meaningful objects
14
flower1 flower2 flower3
16
Bag of Colors
• “If it works, try it some more”
• For each possible color, count how many
pixels there are of that color
17
flower1
flower2
flower3
flower4
flower5
flower6
flower7 Multidimensional scaling
flower ocean tiger
flower8
ocean5 ocean6
1.0
flower9
tiger1 ocean1
tiger2 flower4 flower9 ocean4 ocean3
tiger3
0.5
tiger4 ocean7
tiger5
flower7
tiger6 flower3 tiger6 ocean2
V2
0.0
tiger7 flower2
tiger8 flower6 tiger4
flower8
tiger9 tiger2
ocean1 tiger1
−0.5
flower1
ocean2 tiger7
tiger8
ocean3 tiger9
ocean4 flower5
tiger5
−1.0
ocean5 tiger3
ocean6
ocean7 −1.0 −0.5 0.0 0.5 1.0
flower1
flower2
flower3
flower4
flower5
flower6
flower7
flower8
flower9
tiger1
tiger2
tiger3
tiger4
tiger5
tiger6
tiger7
tiger8
tiger9
ocean1
ocean2
ocean3
ocean4
ocean5
ocean6
ocean7
V1
19
Invariants of bags of words
• Punctuation and word order
• Universal words (exact count of “the”, “of”,
“to”, ...), if using inverse document frequency
20
Invariants of bags of
colors
21
Same color counts, different textures
22
Non-invariants
• Lighting, shadows
• Occlusion, 3D effects
• Blurring
• There are good ways to deal with blur
(from astronomy)
23
• Breaking an invariance is easy
• e.g., add features for textures
• or sub-divide the image and do color-
counts on each part
24
Similarity
search
with real
images
from the
web
(“retrievr”,
see notes)
25
26
• Typically works better with more restricted
domains (actually pretty good for medical
images)
27