Sie sind auf Seite 1von 22

VIISUALIZATION

A Tour through the Visualization Zoo


A survey of powerful visualization techniques, from the obvious to the obscure

Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, Stanford University

Thanks to advances in sensing, networking, and data management, our society is producing digital
information at an astonishing rate. According to one estimate, in 2010 alone we will generate 1,200
exabytes—60 million times the content of the Library of Congress. Within this deluge of data lies a
wealth of valuable information on how we conduct our businesses, governments, and personal lives.
To put the information to good use, we must find ways to explore, relate, and communicate the data
meaningfully.
The goal of visualization is to aid our understanding of data by leveraging the human visual
system’s highly tuned ability to see patterns, spot trends, and identify outliers. Well-designed visual
representations can replace cognitive calculations with simple perceptual inferences and improve
comprehension, memory, and decision making. By making data more accessible and appealing,
visual representations may also help engage more diverse audiences in exploration and analysis. The
challenge is to create effective and engaging visualizations that are appropriate to the data.
Creating a visualization requires a number of nuanced judgments. One must determine which
questions to ask, identify the appropriate data, and select effective visual encodings to map data values
to graphical features such as position, size, shape, and color. The challenge is that for any given
data set the number of visual encodings—and thus the space of possible visualization designs—is
extremely large. To guide this process, computer scientists, psychologists, and statisticians have
studied how well different encodings facilitate the comprehension of data types such as numbers,
categories, and networks. For example, graphical perception experiments find that spatial position (as
in a scatter plot or bar chart) leads to the most accurate decoding of numerical data and is generally
preferable to visual variables such as angle, one-dimensional length, two-dimensional area, three-
dimensional volume, and color saturation. Thus, it should be no surprise that the most common
data graphics, including bar charts, line charts, and scatter plots, use position encodings. Our
understanding of graphical perception remains incomplete, however, and must appropriately be
balanced with interaction design and aesthetics.
This article provides a brief tour through the “visualization zoo,” showcasing techniques for
visualizing and interacting with diverse data sets. In many situations, simple data graphics will
not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated
and unusual techniques that deal with complex data sets. After all, you don’t go to the zoo to see
Chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the
terrifying Sumatran tiger. Analogously, we cover some of the more exotic (but practically useful!)
forms of visual data representation, starting with one of the most common, time-series data;
continuing on to statistical data and maps; and then completing the tour with hierarchies and
networks. Along the way, bear in mind that all visualizations share a common “DNA”—a set of
mappings between data properties and visual attributes such as position, size, shape, and color—and

1
VIISUALIZATION

that customized species of visualization might always be constructed by varying these encodings.
Most of the visualizations shown here are accompanied by interactive examples. The live
examples were created using Protovis (http://vis.stanford.edu/protovis/), an open source language
for Web-based data visualization. To learn more about how a visualization was made (or to copy and
paste it for your own use), simply “View Source” on the page. All example source code is released
into the public domain and has no restrictions on reuse or modification. Note, however, that these
examples will work only on a modern, standards-compliant browser supporting SVG (scalable
vector graphics ). Supported browsers include recent versions of Firefox, Safari, Chrome, and Opera.
Unfortunately, Internet Explorer 8 and earlier versions do not support SVG and so cannot be used to
view the interactive examples.

TIME-SERIES DATA
Time-series data—sets of values changing over time—is one of the most common forms of recorded
data. Time-varying phenomena are central to many domains such as finance (stock prices, exchange
rates), science (temperatures, pollution levels, electric potentials), and public policy (crime rates).
One often needs to compare a large number of time series simultaneously and can choose from a
number of visualizations to do so.

Index Chart of Selected Technology Stocks, 2000-2010

5.0x
AAPL

4.0x

3.0x
Gain / Loss Factor

2.0x AMZN
GOOG

1.0x

IBM
MSFT
0.0x S&P 500

-1.0x
Jan 2005

Source: Yahoo! Finance; http://hci.stanford.edu/jheer/files/zoo/ex/time/index-chart.html

2
VIISUALIZATION

INDEX CHARTS
With some forms of time-series data, raw values are less important than relative changes. Consider
investors who are more interested in a stock’s growth rate than its specific price. Multiple stocks may
have dramatically different baseline prices but may be meaningfully compared when normalized.
An index chart is an interactive line chart that shows percentage changes for a collection of time-
series data based on a selected index point. For example, the image in figure 1A shows the percentage
change of selected stock prices if purchased in January 2005: one can see the rocky rise enjoyed by
those who invested in Amazon, Apple, or Google at that time.

STACKED GRAPHS
Other forms of time-series data may be better seen in aggregate. By stacking area charts on top of
each other, we arrive at a visual summation of time-series values—a stacked graph. This type of graph
(sometimes called a stream graph) depicts aggregate patterns and often supports drill-down into a
subset of individual series. The chart in figure 1B shows the number of unemployed workers in the
United States over the past decade, subdivided by industry. While such charts have proven popular
in recent years, they do have some notable limitations. A stacked graph does not support negative
numbers and is meaningless for data that should not be summed (temperatures, for example).

Stacked Graph of Unemployed U.S. Workers by Industry, 2000-2010

Agriculture

Business services

Construction

Education and Health

Finance

Government
Information

Leisure and hospitality

Manufacturing

Mining and Extraction


Other
Self-employed
Transportation and Utilities

Wholesale and Retail Trade

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Source: U.S. Bureau of Labor Statistics


http://hci.stanford.edu/jheer/files/zoo/ex/time/stack.html

3
VIISUALIZATION

Moreover, stacking may make it difficult to accurately interpret trends that lie atop other curves.
Interactive search and filtering is often used to compensate for this problem.

SMALL MULTIPLES
In lieu of stacking, multiple time series can be plotted within the same axes, as in the index chart.
Placing multiple series in the same space may produce overlapping curves that reduce legibility,
however. An alternative approach is to use small multiples: showing each series in its own chart. In
figure 1C we again see the number of unemployed workers, but normalized within each industry
category. We can now more accurately see both overall trends and seasonal patterns in each sector.
While we are considering time-series data, note that small multiples can be constructed for just
about any type of visualization: bar charts, pie charts, maps, etc. This often produces a more
effective visualization than trying to coerce all the data into a single plot.

HORIZON GRAPHS
What happens when you want to compare even more time series at once? The horizon graph is a
technique for increasing the data density of a time-series view while preserving resolution. Consider
the four graphs shown in figure 1D. The first one is a standard area chart, with positive values
colored blue and negative values colored red. The second graph “mirrors” negative values into the
same region as positive values, doubling the data density of the area chart. The third chart—a

Small Multiples of Unemployed U.S. Workers Normalized by Industry, 2000-2010

Self-employed Agriculture

Other Leisure and hospitality

Education and Health Business services

Finance Information

Transportation and Utilities Wholesale and Retail Trade

Manufacturing Construction

Mining and Extraction Government

Source: U.S. Bureau of Labor Statistics


http://hci.stanford.edu/jheer/files/zoo/ex/time/multiples.html

4
VIISUALIZATION

horizon graph—doubles the data density yet again by dividing the graph into bands and layering
them to create a nested form. The result is a chart that preserves data resolution but uses only a
quarter of the space. Although the horizon graph takes some time to learn, it has been found to be
more effective than the standard plot when the chart sizes get quite small.

STATISTICAL DISTRIBUTIONS
Other visualizations have been designed to reveal how a set of numbers is distributed and thus help
an analyst better understand the statistical properties of the data. Analysts often want to fit their
data to statistical models, either to test hypotheses or predict future values, but an improper choice
of model can lead to faulty predictions. Thus, one important use of visualizations is exploratory data
analysis: gaining insight into how data is distributed to inform data transformation and modeling
decisions. Common techniques include the histogram, which shows the prevalence of values grouped
into bins, and the box-and-whisker plot, which can convey statistical features such as the mean,
median, quartile boundaries, or extreme outliers. In addition, a number of other techniques exist for
assessing a distribution and examining interactions between multiple dimensions.

Horizon Graphs of U.S. Unemployment Rate, 2000-2010

Source: U.S. Bureau of Labor Statistics


http://hci.stanford.edu/jheer/files/zoo/ex/time/horizon.html

5
VIISUALIZATION

STEM-AND-LEAF PLOTS
For assessing a collection of numbers, one alternative to the histogram is the stem-and-leaf plot. It
typically bins numbers according to the first significant digit, and then stacks the values within each
bin by the second significant digit. This minimalistic representation uses the data itself to paint a

Stem-and-Leaf Plot of Mechanical Turk Participation Rates

0 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 6 7 8 8 8 8 8 8 9
1 0 0 0 0 1 1 1 1 2 2 3 3 3 3 4 4 4 4 5 5 6 7 7 8 9 9 9 9 9
2 0 0 1 1 1 5 7 8 9
3 0 0 1 2 3 3 3 4 6 6 8 8
4 0 0 1 1 1 1 3 3 4 5 5 5 6 7 8 9
5 0 2 3 5 6 7 7 7 9
6 1 2 6 7 8 9 9 9
7 0 0 0 1 6 7 9
8 0 0 1 2 3 4 4 4 4 4 4 4 5 6 7 7 7 9
9 1 3 3 5 7 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Source: Stanford Visualization Group


http://hci.stanford.edu/jheer/files/zoo/ex/stats/stem-and-leaf.html

Q-Q Plots of Mechanical Turk Participation Rates


Turker Task Group Completion %

100%

50%

0%

0% 50% 100% 0% 50% 100% 0% 50% 100%

Uniform Distribution Gaussian Distribution Fitted Mixture of 3 Gaussians

Source: Stanford Visualization Group


http://hci.stanford.edu/jheer/files/zoo/ex/stats/qqplot.html

6
VIISUALIZATION

frequency distribution, replacing the “information-empty” bars of a traditional histogram bar chart
and allowing one to assess both the overall distribution and the contents of each bin. In figure 2A,
the stem-and-leaf plot shows the distribution of completion rates of workers completing crowd-
sourced tasks on Amazon’s Mechanical Turk. Note the multiple clusters: one group clusters around
high levels of completion (99-100 percent); at the other extreme is a cluster of Turkers who complete
only a few tasks (~10 percent) in a group.

Scatter Plot Matrix of Automobile Data


20

30

40

50

10

20

30

40
10

15

20
00

00

00

00

0
200

150
horsepower
100

50

5000 5000

4000 4000
weight
3000 3000

2000 2000

20 20

acceleration
15 15

10 10

400

300
displacement
200

100
50

10

15

20

20

30

40

50

10

15

20
0

00

00

00

00

United States European Union Japan

Source: GGobi
http://hci.stanford.edu/jheer/files/zoo/ex/stats/splom.html

7
VIISUALIZATION

Q-Q PLOTS
Though the histogram and the stem-and-leaf plot are common tools for assessing a frequency
distribution, the Q-Q (quantile-quantile) plot is a more powerful tool. The Q-Q plot compares two
probability distributions by graphing their quantiles (http://en.wikipedia.org/wiki/Quantile) against
each other. If the two are similar, the plotted values will lie roughly along the central diagonal. If the
two are linearly related, values will again lie along a line, though with varying slope and intercept.
Figure 2B shows the same Mechanical Turk participation data compared with three statistical
distributions. Note how the data forms three distinct components when compared with uniform
and normal (Gaussian) distributions: this suggests that a statistical model with three components
might be more appropriate, and indeed we see in the final plot that a fitted mixture of three normal
distributions provides a better fit. Though powerful, the Q-Q plot has one obvious limitation in that
its effective use requires that viewers possess some statistical knowledge.

SPLOM (SCATTER PLOT MATRIX)


Other visualization techniques attempt to represent the relationships among multiple variables.
Multivariate data occurs frequently and is notoriously hard to represent, in part because of the
difficulty of mentally picturing data in more than three dimensions. One technique to overcome
this problem is to use small multiples of scatter plots showing a set of pairwise relations among
variables, thus creating the SPLOM (scatter plot matrix). A SPLOM enables visual inspection of
correlations between any pair of variables.

Parallel Coordinates of Automobile Data

cylinders displacement weight horsepower acceleration mpg year


8 455 cubic inch 5140 lbs 230 hp 25 (0 to 60mph) 47 miles/gallon 82

3 68 cubic inch 1613 lbs 46 hp 8 (0 to 60mph) 9 miles/gallon 70

Source: GGobi
http://hci.stanford.edu/jheer/files/zoo/ex/stats/parallel.html

8
VIISUALIZATION

In figure 2C a scatter plot matrix is used to visualize the attributes of a database of automobiles,
showing the relationships among horsepower, weight, acceleration, and displacement. Additionally,
interaction techniques such as brushing-and-linking—in which a selection of points on one graph
highlights the same points on all the other graphs—can be used to explore patterns within the data.

PARALLEL COORDINATES
Parallel coordinates (||-coord), shown in figure 2D, take a different approach to visualizing
multivariate data. Instead of graphing every pair of variables in two dimensions, we repeatedly plot
the data on parallel axes and then connect the corresponding points with lines. Each poly-line
represents a single row in the database, and line crossings between dimensions often indicate inverse
correlation. Reordering dimensions can aid pattern finding, as can interactive querying to filter
along one or more dimensions. Another advantage of parallel coordinates is that they are relatively
compact, so many variables can be shown simultaneously.

MAPS
Although a map may seem a natural way to visualize geographical data, it has a long and rich history
of design. Many maps are based upon a cartographic projection: a mathematical function that maps
the three-dimensional geometry of the Earth to a two-dimensional image. Other maps knowingly
distort or abstract geographic features to tell a richer story or highlight specific data.

Flow Map of Napoleon’s March on Moscow

Map data ©2010 Geocentre Consulting, PPWK, Tele Atlas -


24 Oct 18 Oct
-10°
24 Nov 09 Nov
-20°
28 Nov 14 Nov
07 Dec 01 Dec -30°
06 Dec

Based on the Work of Charles Minard


http://hci.stanford.edu/jheer/files/zoo/ex/maps/napoleon.html

9
VIISUALIZATION

FLOW MAPS
By placing stroked lines on top of a geographic map, a flow map can depict the movement of a
quantity in space and (implicitly) in time. Flow lines typically encode a large amount of multivariate
information: path points, direction, line thickness, and color can all be used to present dimensions
of information to the viewer. Figure 3A is a modern interpretation of Charles Minard’s depiction of
Napoleon’s ill-fated march on Moscow. Many of the greatest flow maps also involve subtle uses of
distortion, as geography is modified to accommodate or highlight flows.

CHOROPLETH MAPS
Data is often collected and aggregated by geographical areas such as states. A standard approach to
communicating this data is to use a color encoding of the geographic area, resulting in a choropleth
map. Figure 3B uses a color encoding to communicate the prevalence of obesity in each state in the
U.S. Though this is a widely used visualization technique, it requires some care. One common error
is to encode raw data values (such as population) rather than using normalized values to produce a
density map. Another issue is that one’s perception of the shaded value can also be affected by the
underlying area of the geographic region.

GRADUATED SYMBOL MAPS


An alternative to the choropleth map is the graduated symbol map, which places symbols over an
underlying map. This approach avoids confounding geographic area with data values and allows for

Choropleth Map of Obesity in the U.S., 2008

WA ND
MT
MN
ME
ID SD WI
OR VT
MI NH
WY NY
IA MA
NE CT RI
PA
IL OH NJ
IN
NV UT
CO MD DE
KS MO WV

CA KY VA

OK TN NC
AR
AZ NM
SC
MS AL GA
TX
32 - 35% LA
29 - 32%
26 - 29% FL
23 - 26%
20 - 23%
17 - 20%
14 - 17%

Source: National Center for Chronic Disease Prevention and Health Promotion
http://hci.stanford.edu/jheer/files/zoo/ex/maps/choropleth.html

10
VIISUALIZATION

more dimensions to be visualized (e.g., symbol size, shape, and color). In addition to simple shapes
such as circles, graduated symbol maps may use more complicated glyphs such as pie charts. In
figure 3C, total circle size represents a state’s population, and each slice indicates the proportion of
people with a specific BMI rating.

CARTOGRAMS
A cartogram distorts the shape of geographic regions so that the area directly encodes a data variable.
A common example is to redraw every country in the world sizing it proportionally to population or
gross domestic product. Many types of cartograms have been created; in figure 3D we use the Dorling
cartogram, which represents each geographic region with a sized circle, placed so as to resemble the
true geographic configuration. In this example, circular area encodes the total number of obese
people per state, and color encodes the percentage of the total population that is obese.

HIERARCHIES
While some data is simply a flat collection of numbers, most can be organized into natural
hierarchies. Consider: spatial entities, such as counties, states, and countries; command structures
for businesses and governments; software packages and phylogenetic trees. Even for data with no
apparent hierarchy, statistical methods (e.g., k-means clustering) may be applied to organize data
empirically. Special visualization techniques exist to leverage hierarchical structure, allowing rapid
multiscale inferences: micro-observations of individual elements and macro-observations of large
groups.

Graduated Symbol Map of Obesity in the U.S., 2008

Normal
Overweight
Obese

Source: National Center for Chronic Disease Prevention and Health Promotion
http://hci.stanford.edu/jheer/files/zoo/ex/maps/symbol.html

11
12
VIISUALIZATION

Source: National Center for Chronic Disease Prevention and Health Promotion
http://hci.stanford.edu/jheer/files/zoo/ex/maps/cartogram.html

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/tree.html
TreeMapLayout
StackedAreaLayout
RandomLayout
RadialTreeLayout
PieLayout
NodeLinkTreeLayout
MA

Layout
layout IndentedTreeLayout
IcicleTreeLayout
ForceDirectedLayout
ME

DendrogramLayout
CirclePackingLayout
CircleLayout
NH

BundledEdgeRouter
AxisLayout
StackedAreaLabeler
CT

label RadialLabeler
VT

Labeler
RI

VisibilityFilter
filter GraphDistanceFilter
operator FisheyeTreeFilter
PA

SizeEncoder
NY

ShapeEncoder
encoder PropertyEncoder
Encoder
NJ

ColorEncoder
FisheyeDistortion
distortion Distortion
SortOperator BifocalDistortion
DE

OperatorSwitch
OperatorSequence
NC

OperatorList
Operator
IOperator
LegendRange
legend LegendItem
MD

Legend
SC

VisualizationEvent
TooltipEvent
events SelectionEvent
OH

DataEvent ShapeRenderer

Radial Node-link Diagram of the Flare Package Hierarchy


render IRenderer
vis EdgeRenderer
TreeBuilder
Tree ArrowType
ScaleBinding
data NodeSprite
EdgeSprite
WV

GA

DataSprite
FL

DataList
VA

Data
TooltipControl
SelectionControl
PanZoomControl
IControl
HoverControl
controls ExpandControl
IN

DragControl
MI

ControlList
Dorling Cartogram of Obesity in the U.S., 2008

Control
ClickControl
AnchorControl
CartesianAxes
KY

TN

AL

AxisLabel
axis AxisGridLine
Visualization Axis
Axes
SizePalette
ShapePalette
palette Palette
ColorPalette
SparseMatrix
math IMatrix
MS

DenseMatrix
IL
WI

heap HeapNode
Strings FibonacciHeap

LA
Stats
Sort
MO

Shapes
util Property
Orientation
AR
Maths
IValueProxy
IPredicate
IEvaluable
MN

Geometry
Filter
Displays xor
Dates where
IA

Colors variance
Arrays update
sum
OK TimeScale sub
KS

ScaleType stddev
Scale select
RootScale range
QuantitativeScale orderby
scale QuantileScale or

TX
OrdinalScale not
NE

LogScale neq
ND

LinearScale mul
SD

IScaleMap mod
methods min
Xor max
Variance lte
flare Variable lt
Sum isa
StringUtil iff
Range gte
Query gt
Or fn

CO
Not eq
Minimum div

NM
Maximum distinct
Match count
Literal average
query IsA and
add
WY
If
Fn _
ExpressionIterator
Expression
MT

Distinct
DateUtil
Count
CompositeExpression
Comparison
BinaryExpression
Average
Arithmetic

UT
And

AZ
AggregateExpression
SpringForce

100K
10M
Spring

5M

1M
Simulation
physics Particle
NBodyForce
IForce

ID
GravityForce
DragForce
flex FlareVis

NV
TextSprite
RectSprite
display LineSprite
DirtySprite JSONConverter
IDataConverter
converters GraphMLConverter
DataUtil DelimitedTextConverter
DataTable Converters
data DataSource RectangleInterpolator
DataSet PointInterpolator
DataSchema
ObjectInterpolator
DataField NumberInterpolator

OR
interpolate MatrixInterpolator

WA
Tween Interpolator
Transitioner DateInterpolator

CA
TransitionEvent ColorInterpolator
Transition ArrayInterpolator

32 - 35%
29 - 32%
26 - 29%
23 - 26%
20 - 23%
17 - 20%
14 - 17%
animate Sequence
Scheduler
Pause
Parallel
ISchedulable
FunctionSequence
Easing
optimization AspectRatioBanker
SpanningTree
ShortestPaths
graph MaxFlowMinCut
analytics LinkDistance
BetweennessCentrality
MergeEdge
cluster HierarchicalCluster
CommunityStructure
AgglomerativeCluster
VIISUALIZATION

NODE-LINK DIAGRAMS
The word tree is used interchangeably with hierarchy, as the fractal branches of an oak might mirror
the nesting of data. If we take a two-dimensional blueprint of a tree, we have a popular choice for
visualizing hierarchies: a node-link diagram. Many different tree-layout algorithms have been
designed; the Reingold-Tilford algorithm, used in figure 4A on a package hierarchy of software
classes, produces a tidy result with minimal wasted space.
An alternative visualization scheme is the dendrogram (or cluster) algorithm, which places leaf

Cartesian Node-link Diagram of the Flare Package Hierarchy

lity
AgglomerativeCluster
Hierarchica ructure
NodeLin

StackedAreaLayou

LinkDeennessCentra
MergeEdgelCluster
Indente
Force drogra gLayo ut

RadialTreeLa
Circ

TreeMapLayout
Den

r
RandomLayyo
Bun

anke
IcicleectedLayyout

Span estPath ut
CommunitySt
lePa CircleLRoute

ShoxrtFlowMinC
Dir

Scause el ble nce


Sta

s
kTreeLLay

Ma istance

e
dled Axis

dTree yout

gTre
atioB
PieLayou
TreeL out
cke Rad La ilter r

P arall dula eque


ckin ayo r
Edg Lay
Gr Fish

dA ialL be

in
mLa ut

Layou

ectR
ayoutt

P che ionS
ap e

Betw

n
a
rea ab ler

out

er nt
out
hD ye

Funsing
Visstan reeF der r

ut

A een ition Eve


T ran uen ler
t
Lab ele

Asp

IS ct
t

T ra sit ce
i T
ibi ceF ilte

Twranns sitioion
T eq du

be Int ato ola lato or


Ea
out

n
Pr
Sh er ErEn

rIn erp r tor r


lity ilt r

um rix ol rp o at
ele r
S

S he
op

N at erp Inte terprpol


ize En nco od der
ap yE nc co on

F e

ol r
or
En co de er
e

rp ato
r

In atelorI Inte
Co

to r
Fi

at
la r ato
co de r

Bi

te ol
sh

lo

n
po to ol

D o ay
fo
ey

ca D er polaterp

C rr
eD

lD is t
cluster

Op O S ist tor tIn er In


layout

graph
is

er pe ort ec nt le
to

on

M t
ato ra Op orti tion bj tI ng
rti

t o O oincta
izati

o a
r
Op Se rS era n eld m
lab

P e
er qu wi tor R taFi che e
el
filt

optim

Opatorenctch S t
Da ataaSe ourcle
er

IO er Lis e D at aS ab
analytic

Le pe at t r
en

g r o D at aT til rte
LeendR ator r nve
co

di D at U
e

st rs extCo rter
ate

D ata
at
de

ge an or t e
n
ol

Vis ge D r T e
LegdItem
r

tio v
nve ted on r
im

rp

uali n CoelimihMLCverte er
te

end
an

zat
in

T ion D rap Con vert


op

o
Sele oltip Eve G ata Con
er

ctio Eve nt ID ON
at

n
Data Eve t n leg JS ite
or

Sha Eve nt en ta Spr


peR
ende
nt d da t e rs DirtyeSpritete
v e r Lin ctSpri
IR e
Edge ndere r re e v n R xtSprite
e
Rend ent
s co
Arrow erer
r Te
Type is p lay FlareV
is
rende d
TreeBui r ce
lder
Tr flex DragFtyorForce
ScaleBind ee G ra vi
NodeSprite ing IForce orce
NB od yF
EdgeSprite data Particle
DataSprite vis physics Simulation
DataList flare Spring
Data SpringForce
TooltipControl AggregateExpression
ntrol And
SelectionCoon trol Arithme
PanZoomC IContro l contro
ls Averagetic
trol Binar
er C on
Hov Control Com yExpress
n d
ExparagContris o l Co parison ion
D ntrolL tl que CoumnpositeEx
ry Da t press
Co Contro l Dis teUtil ion
o ntrool Ex tinct
C
ClicokrContr axis Exppressio
h Fn ressio n
Anc n A xesel If nIte
b rato
tesia La e te Is
L A r
Car AxisridLinxis let Maiteral
G A es pa
Axis Ax h
M tc
Mi aximh
tio
n at No nim um
liza tte m O t um
Q r
ua
l
uti

alelettete
ap

Vis P R ue
S a ry
he

e a t
SizpePPalelette Su trinnge
V g
a
Sh lorP
a rix V a m U
at trix x Xo ari riab til
me

Co rseMIMa atri r an le
tho

a ce
p e M
ds

_ d
e e

scale

S ns
iH od

ad nd eragt
rin ap

a v n ct

De
cc pN

a ou tin
St gs

c is
Sh S ats
na ea

d
ien pe es t
Or ro ap or
bo H

divq
St

e
M ati rty
IPrlueP athon

fn t

e
g te
IEv edi rox s

g
Ge alua atey
Fi

iff a
is
om ble
t

lt e
Dis Filtetry
P

lt x
ysr
c

main
m od
Co es
e

m ul
Arralors
ys

m q
Dala

ne
not
Scale
t

or erby
Type

ord e
p
IVa

le

rangct
Quantit RootScale

sele ev
QuantileeScale

stdd
Scale

sub
ale

sum
LogScale

update
LinearScale

variance
xor where
IScaleMap
S c a

OrdinalSc
Scale
Time

ativ

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/cluster-radial.html

13
VIISUALIZATION

Indented Tree Layout of the Flare Package Hierarchy

flare 933KB
analytics 47KB
cluster 14KB
graph 25KB
optimization 6KB
animate 97KB
data 29KB
display 23KB
flex 4KB
physics 29KB
DragForce 1KB
GravityForce 1KB
IForce 0KB
NBodyForce 10KB
Particle 2KB
Simulation 9KB
Spring 2KB
SpringForce 1KB
query 87KB
scale 30KB
util 161KB
vis 422KB
Visualization 16KB
axis 33KB
controls 43KB
data 107KB
events 6KB
legend 35KB
operator 179KB
IOperator 1KB
Operator 2KB
OperatorList 5KB
OperatorSequence 4KB
OperatorSwitch 2KB
SortOperator 1KB
distortion 13KB
encoder 14KB
filter 11KB
label 16KB
layout 105KB
AxisLayout 6KB
BundledEdgeRouter 3KB
CircleLayout 9KB
CirclePackingLayout 11KB
DendrogramLayout 4KB
ForceDirectedLayout 8KB
IcicleTreeLayout 4KB
IndentedTreeLayout 3KB
Layout 7KB
NodeLinkTreeLayout 12KB
PieLayout 2KB
RadialTreeLayout 12KB
RandomLayout 0KB
StackedAreaLayout 8KB
TreeMapLayout 8KB

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/indent.html

14
VIISUALIZATION

nodes of the tree at the same level. Thus, in the diagram in figure 4B, the classes (orange leaf nodes)
are on the diameter of the circle, with the packages (blue internal nodes) inside. Using polar rather
than Cartesian coordinates has a pleasing aesthetic, while using space more efficiently.
We would be amiss to overlook the indented tree, used ubiquitously by operating systems to
represent file directories, among other applications (see figure 4C). Although the indented tree
requires excessive vertical space and does not facilitate multiscale inferences, it does allow efficient
interactive exploration of the tree to find a specific node. In addition, it allows rapid scanning of node
labels, and multivariate data such as file size can be displayed adjacent to the hierarchy.

ADJACENCY DIAGRAMS
The adjacency diagram is a space-filling variant of the node-link diagram; rather than drawing a link
between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and
their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in
figure 4D is similar to the first node-link diagram in that the root node appears at the top, with child
nodes underneath. Because the nodes are now space-filling, however, we can use a length encoding
for the size of software classes and packages. This reveals an additional dimension that would be
difficult to show in a node-link diagram.
The sunburst layout, shown in figure 4E, is equivalent to the icicle layout, but in polar
coordinates. Both are implemented using a partition layout, which can also generate a node-link
diagram. Similarly, the previous cluster layout can be used to generate a space-filling adjacency
diagram in either Cartesian or polar coordinates.

Icicle Tree Layout of the Flare Package Hierarchy


flare
analytics

animate

physics
display

query

scale
data

util

vis
Visualization
NBodyForce
Transitioner

interpolate

converters
DirtySprite

Simulation
TextSprite
Transition

Geometry
methods

Displays

operator
controls
Shapes
Easing

Strings

palette

legend
Arrays
Colors
cluster

Query

Maths
Dates
graph

heap
math

data
axis
GraphMLConverter

FibonacciHeap

LegendRange
TooltipControl

ScaleBinding

TreeBuilder
Interpolator

NodeSprite
DataSprite

distortion
DataList

encoder
Legend
render

layout
Data

label
Axis

filter

ForceDirectedLayout

NodeLinkTreeLayout
CirclePackingLayout

StackedAreaLayout
RadialTreeLayout

TreeMapLayout
CircleLayout
Labeler

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/icicle.html

15
VIISUALIZATION

ENCLOSURE DIAGRAMS
The enclosure diagram is also space filling, using containment rather than adjacency to represent
the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into
rectangles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. The
example shown in figure 4F uses padding (in blue) to emphasize enclosure; an alternative saturation
encoding is sometimes used. Squarified treemaps use approximately square rectangles, which offer
better readability and size estimation than a naive “slice-and-dice” subdivision. Fancier algorithms

Sunburst (Radial Space-filling) Layout of the Flare Package Hierarchy

StackedArea

TreeMapLayout
RadialT
Node
LinkT
For

reeLay
Lay
ceD

Layout
reeL
Cir

out
irec
cle

out
ayou
Pa

ted
Ci

ck

Lay

t
rcl

ing
eL

out

ut
La
ay

wMinC
yo
ou

ut
t

La
be
layo

MaxFlo
le
r
ut
la
be

cluster

filt r
to
l

graph

er
ola
ing

en
co erp
nt
Eas

de I
ion
op

r
dis
er

r
sit

ne

tor
ato

an

tio
io

er
n ert
sit
Tr
r

te onv
an

ola LC
Tr

p M
er
Graph
int
s

Leg
analytic

end
Ran
ge te rs
ver
e

con
at

ite
Spr
im

Dirty
an

Legen
d legen x tS prite
d ta T e
da
lay orce
disp NBodyF
render vis Simulation
physics
TreeBuilder flare

query
ding
ScaleBin
data sc
ale
Sprite Que
Node ry
me
ite tho
Spr
util

ds
Data
t
ta Lis
Da
ls

Ar lor
ro

ra s
nt

Co ates s
co

ys
D lay

ta
Di

Da
Ge

l
sp
tion

tro l
is

on tro
om
ax

Ma
a

ipC Con
e

Shape
aliz

etr
Strings
palett

t
math

l
ths
heap

o n
To ctio
y
Visu

le
s

Se
is
Ax

eap
FibonacciH

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/sunburst.html

16
VIISUALIZATION

such as Voronoi and jigsaw treemaps also exist but are less common.
By packing circles instead of subdividing rectangles, we can produce a different sort of enclosure
diagram that has an almost organic appearance. Although it does not use space as efficiently as a
treemap, the “wasted space” of the circle-packing layout, shown in figure 4G, effectively reveals the
hierarchy. At the same time, node sizes can be rapidly compared using area judgments.

NETWORKS
In addition to organization, one aspect of data that we may wish to explore through visualization is
relationship. For example, given a social network, who is friends with whom? Who are the central
players? What cliques exist? Who, if anyone, serves as a bridge between disparate groups? Abstractly,
a hierarchy is a specialized form of network: each node has exactly one link to its parent, while the
root node has no links. Thus node-link diagrams are also used to visualize networks, but the loss of
hierarchy means a different algorithm is required to position nodes.
Mathematicians use the formal term graph to describe a network. A central challenge in graph
visualization is computing an effective layout. Layout techniques typically seek to position closely

Treemap Layout of the Flare Package Hierarchy

Interpolator
Labeler PropertyEncoder
Encoder
NodeLinkTreeLayout RadialTreeLayoutCirclePackingLayout Strings Shapes Maths
label encoder
MatrixInterpolator
interpolate
PointInterpolator Transitioner
ArrayInterpolator
SizeEncoder
RadialLabeler ColorEncoder
StackedAreaLabeler
ColorInterpolator
ShapeEncoder ObjectInterpolator
DateInterpolator
RectangleInterpolator
NumberInterpolator
StackedAreaLayout ForceDirectedLayout
layout operator SparseMatrix
CircleLayout BifocalDistortion OperatorList
Geometry math IMatrix Arrays
Distortion
distortion util animate
Displays DenseMatrix
IcicleTreeLayout
DendrogramLayout FisheyeDistortion
OperatorSequence Easing Transition
Layout

Stats
TreeMapLayout IndentedTreeLayout OperatorSwitch
VisibilityFilter Operator FibonacciHeap
heap HeapNode Dates
FisheyeTreeFilter
BundledEdgeRouter RandomLayout filter
AxisLayout
PieLayout GraphDistanceFilter ColorPalette Tween Scheduler
SortOperator
IOperator Property Parallel
palette
ShapePalette Colors Sort IValueProxy
SizePalette FunctionSequence Sequence
FilterOrientation ISchedulable
TooltipControl PanZoomControlControlList ClickControl Palette TransitionEvent
IPredicate
IEvaluable
Pause
vis flare
controls
Data DataList AnchorControl
ExpandControl range iff gte lte
HierarchicalCluster TimeScale QuantitativeScale
SelectionControl HoverControl mul lt div eq add
DragControl Control
IControl MaxFlowMinCut
ShortestPaths
gt modstddevxor
methods variance Query cluster
and orupdate
sub isa
orderby
distinct
average
min
CommunityStructure
AgglomerativeCluster LogScale
graph analytics Scale scale QuantileScale
fnwheremaxcount
neq notselect
sum _ BetweennessCentrality MergeEdge

LinkDistance IScaleMapRootScale
Legend SpanningTree AspectRatioBanker
optimization OrdinalScale
Expression Comparison DateUtil
NodeSprite data ScaleBinding Axis ScaleType
LinearScale

legend axis query


StringUtil Arithmetic Match
DelimitedTextConverter NBodyForce TextSprite

ShapeRenderer GraphMLConverter
converters
EdgeRenderer
render LegendRange LegendItem Axes CompositeExpression
BinaryExpression If IsA
DataSprite JSONConverter
CartesianAxes display
ArrowType
IRenderer
AxisGridLine
AxisLabel data Converters
IDataConverter physics
Simulation RectSprite
Variance Not LiteralVariable
DirtySprite
ExpressionIterator
DataEvent
Or DataSource DataSchema
Tree EdgeSprite TooltipEvent Xor
AggregateExpression Minimum
Maximum
TreeBuilder SpringGravityForce LineSprite
Visualization events Distinct DataTable Particle
SelectionEvent
VisualizationEvent Fn DataUtil DataField
Range And AverageSumCount DragForce
SpringForce IForce
DataSet FlareVis
flex

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/treemap.html

17
VIISUALIZATION

related nodes (in terms of graph distance, such as the number of links between nodes, or other
metrics) close in the drawing; critically, unrelated nodes must also be placed far enough apart
to differentiate relationships. Some techniques may seek to optimize other visual features—for
example, by minimizing the number of edge crossings.

FORCE-DIRECTED LAYOUTS
A common and intuitive approach to network layout is to model the graph as a physical system:

Nested Circles Layout of the Flare Package Hierarchy

flare

animate

interpolate
util
heap query

math
methods

physics

display
palette
scale
flex analytics
graph

vis
data
converters
operator cluster
encoder
distortion optimization

label
filter

layout
axis

legend

events

controls data

render

Source: Flare Visualization Toolkit (http://flare.prefuse.org)


http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/pack.html

18
VIISUALIZATION

nodes are charged particles that repel each other, and links are dampened springs that pull
related nodes together. A physical simulation of these forces then determines the node positions;
approximation techniques that avoid computing all pairwise forces enable the layout of large
numbers of nodes. In addition, interactivity allows the user to direct the layout and jiggle nodes
to disambiguate links. Such a force-directed layout is a good starting point for understanding the
structure of a general undirected graph. In figure 5A we use a force-directed layout to view the
network of character co-occurrence in the chapters of Victor Hugo’s classic novel, Les Misérables.
Node colors depict cluster memberships computed by a community-detection algorithm.

ARC DIAGRAMS
An arc diagram, shown in figure 5B, uses a one-dimensional layout of nodes, with circular arcs to
represent links. Though an arc diagram may not convey the overall structure of the graph as effectively
as a two-dimensional layout, with a good ordering of nodes it is easy to identify cliques and bridges.
Further, as with the indented-tree layout, multivariate data can easily be displayed alongside nodes. The
problem of sorting the nodes in a manner that reveals underlying cluster structure is formally called
seriation and has diverse applications in visualization, statistics, and even archaeology.

Source: Knuth, D. E. 1993. The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley.
http://hci.stanford.edu/jheer/files/zoo/ex/networks/force.html

19
VIISUALIZATION

MATRIX VIEWS
Mathematicians and computer scientists often think of a graph in terms of its adjacency matrix: each
value in row i and column j in the matrix corresponds to the link from node i to node j. Given this
representation, an obvious visualization then is: just show the matrix! Using color or saturation
instead of text allows values associated with the links to be perceived more rapidly.
The seriation problem applies just as much to the matrix view, shown in figure 5C, as to the arc
diagram, so the order of rows and columns is important: here we use the groupings generated by
a community-detection algorithm to order the display. While path following is harder in a matrix
view than in a node-link diagram, matrices have a number of compensating advantages. As networks
get large and highly connected, node-link diagrams often devolve into giant hairballs of line
crossings. In matrix views, however, line crossings are impossible, and with an effective sorting one
quickly can spot clusters and bridges. Allowing interactive grouping and reordering of the matrix
facilitates even deeper exploration of network structure.

CONCLUSION
We have arrived at the end of our tour and hope that the reader has found examples both intriguing
and practical. Though we have visited a number of visual encoding and interaction techniques,
many more species of visualization exist in the wild, and others await discovery. Emerging domains
such as bioinformatics and text visualization are driving researchers and designers to continually
formulate new and creative representations or find more powerful ways to apply the classics. In
either case, the DNA underlying all visualizations remains the same: the principled mapping of data
Mlle. Baptistine

Tholomyes

Brevet

Anzelma

Feuilly
Courfeyrac

Brujon
Napoleon

Cravatte

Marguerite

Blacheville

Bamatabois

Simplice

Champmathieu

Pontmercy
Boulatruelle

Mme. Pontmercy

Combeferre
Prouvaire

Joly

Claquesous

Mme. Hucheloup
Eponine

Jondrette

Grantaire
Myriel

Count

Favourite

Mother Plutarch
Isabeau

Fameuil

Perpetue

Woman 1

Woman 2

Child 1
Child 2
Geborand

Old Man
Labarre

Gervais

Listolier

Cosette

Gillenormand

Mlle. Gillenormand

Lt. Gillenormand

Bahorel

Gueulemer
Mme. Magloire

Valjean

Dahlia

Fantine
Mme. Thenardier
Thenardier

Mother Innocent

Gavroche

Mabeuf

Montparnasse
Toussaint
Chenildieu
Cochepaille

Mme. Burgon
Mme. de R

Javert

Scaufflaire

Judge
Countess de Lo

Zephine

Magnon

Mlle. Vaubois

Bossuet
Baroness T

Enjolras

Babet
Champtercier

Fauchelevent

Gribier

Marius

Source: Knuth, D. E. 1993. The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley.
http://hci.stanford.edu/jheer/files/zoo/ex/networks/arc.html

20
VIISUALIZATION

Mlle. Gillenormand

Mme. Thenardier
Mme. Hucheloup

Mme. Pontmercy
Lt. Gillenormand

Mother Innocent
Countess de Lo
Mother Plutarch

Mlle. Baptistine
Champmathieu

Mme. Magloire
Montparnasse
Mlle. Vaubois
Mme. Burgon

Fauchelevent
Champtercier
Gillenormand
Combeferre

Boulatruelle

Claquesous

Bamatabois

Cochepaille
Baroness T

Mme. de R
Gueulemer
Courfeyrac

Tholomyes
Thenardier
Pontmercy

Marguerite

Blacheville

Scaufflaire

Chenildieu

Geborand
Woman 2

Woman 1
Gavroche

Toussaint
Prouvaire

Napoleon
Grantaire
Jondrette

Favourite

Perpetue
Anzelma

Old Man
Simplice

Cravatte
Eponine
Enjolras

Bossuet

Magnon

Zephine

Isabeau
Fameuil
Bahorel

Cosette

Listolier

Labarre

Gervais
Mabeuf

Fantine

Valjean
Child 1
Child 2

Marius

Gribier
Feuilly

Brujon

Brevet
Dahlia
Javert

Judge

Count
Myriel
Babet
Joly
Child 1
Child 2
Mother Plutarch
Gavroche
Marius
Mabeuf
Enjolras
Combeferre
Prouvaire
Feuilly
Courfeyrac
Bahorel
Bossuet
Joly
Grantaire
Mme. Hucheloup
Jondrette
Mme. Burgon
Boulatruelle
Cosette
Woman 2
Gillenormand
Magnon
Mlle. Gillenormand
Mme. Pontmercy
Mlle. Vaubois
Lt. Gillenormand
Baroness T
Toussaint
Mme. Thenardier
Thenardier
Javert
Pontmercy
Eponine
Anzelma
Gueulemer
Babet
Claquesous
Montparnasse
Brujon
Marguerite
Tholomyes
Listolier
Fameuil
Blacheville
Favourite
Dahlia
Zephine
Fantine
Perpetue
Labarre
Valjean
Mme. de R
Isabeau
Gervais
Bamatabois
Simplice
Scaufflaire
Woman 1
Judge
Champmathieu
Brevet
Chenildieu
Cochepaille
Myriel
Napoleon
Mlle. Baptistine
Mme. Magloire
Countess de Lo
Geborand
Champtercier
Cravatte
Count
Old Man
Fauchelevent
Mother Innocent
Gribier

Source: Knuth, D. E. 1993. The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley.
http://hci.stanford.edu/jheer/files/zoo/ex/networks/matrix.html

variables to visual features such as position, size, shape, and color. As you leave the zoo and head
back into the wild, try deconstructing the various visualizations crossing your path. Perhaps you can
design a more effective display? Q

ADDITIONAL RESOURCES
Few, S. 2009. Now I See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.
Tufte, E. 1983. The Visual Display of Quantitative Information. Graphics Press.

21
VIISUALIZATION

Tufte, E. 1990. Envisioning Information. Graphics Press.


Ware, C. 2008. Visual Thinking for Design. Morgan Kaufmann.
Wilkinson, L. 1999. The Grammar of Graphics. Springer.

VISUALIZATION DEVELOPMENT TOOLS


Prefuse (http://prefuse.org/): Java API for information visualization.
Prefuse Flare (http://flare.prefuse.org/): ActionScript 3 library for data visualization in the Adobe
Flash Player.
Processing (http://processing.org/): Popular language and IDE for graphics and interaction.
Protovis (http://vis.stanford.edu/protovis/): JavaScript tool for Web-based visualization.
The Visualization Toolkit (http://vtk.org/): Library for 3D and scientific visualization.

LOVE IT, HATE IT? LET US KNOW


feedback@queue.acm.org

JEFFREY HEER is an assistant professor of computer science at Stanford University, where he works
on human-computer interaction, visualization, and social computing. His research investigates the
perceptual, cognitive, and social factors involved in making sense of large data collections, resulting
in new interactive systems for visual analysis and communication. He has also led the design of the
Prefuse, Flare, and Protovis visualization toolkits, in use by researchers, corporations, and thousands of
data enthusiasts. Heer is the recipient of the 2009 ACM CHI Best Paper Award and Faculty Awards from
IBM and Intel. In 2009 he was named to MIT Technology Review’s TR35. He holds B.S., M.S., and Ph.D.
degrees in Computer Science from the University of California, Berkeley.

MICHAEL BOSTOCK received the BSE degree in computer science in 2000 from Princeton University. He
is currently a Ph.D. student in the Department of Computer Science at Stanford University. His research
interests include information visualization and software design. Before joining Stanford, he was a staff
engineer at Google, where he developed search quality evaluation methodologies, experimental search
user interfaces, and reusable software components such as the Google Collections Library. He is currently
working on the Protovis visualization toolkit.

VADIM OGIEVETSKY is a Masters student at Stanford University specializing in Human-Computer


Interaction. He is a core contributor to Protovis, an open-source web-based visualization toolkit.
Ogievetsky received a First Class BA degree in Mathematics and Computer Science from the University of
Oxford where he specialized in linear algebra and programming languages. In addition to visualization,
his interests include massively parallel computing and computer controlled manufacturing processes.
© 2010 ACM 1542-7730/10/0500 $10.00

22

Das könnte Ihnen auch gefallen