Sie sind auf Seite 1von 24

STHDA

S t at i s t i c al t o o ls fo r h i gh t h ro u gh p u t d at a an aly s i s

HOME BOOKS R/STATISTICS STAT SOFTWARES CONTACT

Search

Connect

Home / Easy Guides / R software / Cluster Analysis in R Unsupervised machine learning / Beautiful dendrogram Actions menu for module Wiki
visualizations in R: 5+ must known methods Unsupervised Machine Learning

Beautiful dendrogram visualizations in R: 5+ must known methods Unsupervised Machine Learning


Adsby Google Methods Rectangle Function

Tools

1 plot.hclust(): R base function


2 plot.dendrogram() function
3 Phylogenetic trees
4 ggdendro package : ggplot2 and dendrogram
4.1 Installation and loading
4.2 Visualize dendrogram using ggdendrogram() function
4.3 Extract dendrogram plot data
5 dendextend package: Extending Rs dendrogram functionality
5.1 Chaining
5.2 Installation and loading
5.3 How to change a dendrogram
5.4 Create a simple dendrogram
5.5 Change labels
5.6 Change the points of a dendrogram nodes/leaves
5.7 Change the color of branches
5.8 Adding colored rectangles
5.9 Adding colored bars
5.10 ggplot2 integration
5.11 pvclust and dendextend
6 Infos

A variety of functions exists in R for visualizing and customizing dendrogram. The aim of this article is to describe 5+ methods for drawing a
beautiful dendrogram using R software.
We start by computing hierarchical clustering using the data set USArrests:

#Loaddata
data(USArrests)
#Computedistancesandhierarchicalclustering
dd<dist(scale(USArrests),method="euclidean")
hc<hclust(dd,method="ward.D2")

1 plot.hclust(): R base function

As you already know, the standard R function plot.hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses
(computed using hclust() function).
A simplified format is:
plot(x,labels=NULL,hang=0.1,
main="Clusterdendrogram",sub=NULL,
xlab=NULL,ylab="Height",...)

x: an object of the type produced by hclust()


labels: A character vector of labels for the leaves of the tree. The default value is row names. if labels = FALSE, no labels are drawn.
hang: The fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang
down from 0.
main, sub, xlab, ylab: character strings for title.

#Defaultplot
plot(hc)

#Putthelabelsatthesameheight:hang=1
plot(hc,hang=1,cex=0.6)
2 plot.dendrogram() function
In order to visualize the result of a hierarchical clustering analysis using the function plot.dendrogram(), we must firstly convert it as a
dendrogram.
The format of the function plot.dendrogram() is:

plot(x,type=c("rectangle","triangle"),horiz=FALSE)

x: an object of class dendrogram


type of plot. Possible values are rectangle or triangle
horiz: logical indicating if the dendrogram should be drawn horizontally or no

#Converthclustintoadendrogramandplot
hcd<as.dendrogram(hc)
#Defaultplot
plot(hcd,type="rectangle",ylab="Height")
#Triangleplot
plot(hcd,type="triangle",ylab="Height")

#Zoomintothefirstdendrogram
plot(hcd,xlim=c(1,20),ylim=c(1,8))
The above dendrogram can be customized using the arguments:
nodePar: a list of plotting parameters to use for the nodes (see ?points). Default value is NULL. The list may contain components named pch,
cex, col, xpd, and/or bg each of which can have length two for specifying separate attributes for inner nodes and leaves.
edgePar: a list of plotting parameters to use for the edge segments (see ?segments). The list may contain components named col, lty and
lwd (for the segments). As with nodePar, each can have length two for differentiating leaves and inner nodes.
leaflab: a string specifying how leaves are labeled. The default perpendicular write text vertically; textlike writes text horizontally (in a
rectangle), and none suppresses leaf labels.

#DefinenodePar
nodePar<list(lab.cex=0.6,pch=c(NA,19),
cex=0.7,col="blue")
#Customizedplot;removelabels
plot(hcd,ylab="Height",nodePar=nodePar,leaflab="none")
#Horizontalplot
plot(hcd,xlab="Height",
nodePar=nodePar,horiz=TRUE)

#Changeedgecolor
plot(hcd,xlab="Height",nodePar=nodePar,
edgePar=list(col=2:3,lwd=2:1))

3 Phylogenetic trees
The package ape (Analyses of Phylogenetics and Evolution) can be used to produce a more sophisticated dendrogram.
The function plot.phylo() can be used for plotting a dendrogram. A simplified format is:

plot(x,type="phylogram",show.tip.label=TRUE,
edge.color="black",edge.width=1,edge.lty=1,
tip.color="black")

x: an object of class phylo


type: the type of phylogeny to be drawn. Possible values are: phylogram (the default), cladogram, fan, unrooted and radial
show.tip.label: if true labels are shown
edge.color, edge.width, edge.lty: line color, width and type to be used for edge
tip.color: color used for labels

#install.packages("ape")
library("ape")
#Defaultplot
plot(as.phylo(hc),cex=0.6,label.offset=0.5)

#Cladogram
plot(as.phylo(hc),type="cladogram",cex=0.6,
label.offset=0.5)
#Unrooted
plot(as.phylo(hc),type="unrooted",cex=0.6,
no.margin=TRUE)

#Fan
plot(as.phylo(hc),type="fan")
#Radial
plot(as.phylo(hc),type="radial")

#Cutthedendrograminto4clusters
colors=c("red","blue","green","black")
clus4=cutree(hc,4)
plot(as.phylo(hc),type="fan",tip.color=colors[clus4],
label.offset=1,cex=0.7)
#Changetheappearance
#changeedgeandlabel(tip)
plot(as.phylo(hc),type="cladogram",cex=0.6,
edge.color="steelblue",edge.width=2,edge.lty=2,
tip.color="steelblue")

4 ggdendro package : ggplot2 and dendrogram


The R package ggdendro can be used to extract the plot data from dendrogram and for drawing a dendrogram using ggplot2.
4.1 Installation and loading

ggdendro can be installed as follow:

install.packages("ggdendro")

ggdendro requires the package ggplot2. Make sure that ggplot2 is installed and loaded before using ggdendro.
Load ggdendro as follow:

library("ggplot2")
library("ggdendro")

4.2 Visualize dendrogram using ggdendrogram() function

The function ggdendrogram() creates dendrogram plot using ggplot2.

#Visualizationusingthedefaultthemenamedtheme_dendro()
ggdendrogram(hc)

#Rotatetheplotandremovedefaulttheme
ggdendrogram(hc,rotate=TRUE,theme_dendro=FALSE)
4.3 Extract dendrogram plot data

The function dendro_data() can be used for extracting the data. It returns a list of data frames which can be extracted using the functions below:
segment(): To extract the data for dendrogram line segments
label(): To extract the labels

#Builddendrogramobjectfromhclustresults
dend<as.dendrogram(hc)
#Extractthedata(forrectangularlines)
#Typecanbe"rectangle"or"triangle"
dend_data<dendro_data(dend,type="rectangle")
#Whatcontainsdend_data
names(dend_data)

##[1]"segments""labels""leaf_labels""class"

#Extractdataforlinesegments
head(dend_data$segments)

##xyxendyend
##119.77148413.5162428.86718813.516242
##28.86718813.5162428.8671886.461866
##38.8671886.4618664.1250006.461866
##44.1250006.4618664.1250002.714554
##54.1250002.7145542.5000002.714554
##62.5000002.7145542.5000001.091092

#Extractdataforlabels
head(dend_data$labels)

##xylabel
##110Alabama
##220Louisiana
##330Georgia
##440Tennessee
##550NorthCarolina
##660Mississippi

dend_data can be used to draw a customized dendrogram using ggplot2:

#Plotlinesegmentsandaddlabels
p<ggplot(dend_data$segments)+
geom_segment(aes(x=x,y=y,xend=xend,yend=yend))+
geom_text(data=dend_data$labels,aes(x,y,label=label),
hjust=1,angle=90,size=3)+
ylim(3,15)
print(p)

5 dendextend package: Extending Rs dendrogram functionality

The package dendextend contains many functions for changing the appearance of a dendrogram and for comparing dendrograms.
In this section well use the chaining operator (%>%) to simplify our code.

5.1 Chaining

The chaining operator (%>%) turns x %>% f(y) into f(x, y) so you can use it to rewrite multiple operations such that they can be read from leftto
right, toptobottom. For instance, the results of the two R codes below are equivalent.
Standard R code for creating a dendrogram:

data<scale(USArrests)
dist.res<dist(data)
hc<hclust(dist.res,method="ward.D2")
dend<as.dendrogram(hc)
plot(dend)

R code for creating a dendrogram using chaining operator:


dend<USArrests[1:5,]%>%#data
scale%>%#Scalethedata
dist%>%#calculateadistancematrix,
hclust(method="ward.D2")%>%#Hierarchicalclustering
as.dendrogram#Turntheobjectintoadendrogram.
plot(dend)

5.2 Installation and loading

Install the stable version as follow:

install.packages('dendextend')

Loading:

library(dendextend)

5.3 How to change a dendrogram

The function set() can be used to change the parameters with dendextend.
The format is:

set(object,what,value)

1. object: a dendrogram object


2. what: a character indicating what is the property of the tree that should be set/updated
3. value: a vector with the value to set in the tree (the type of the value depends on the what).

Possible values for the argument what include:

Value for the argument what Description


labels set the labels
labels_colors and labels_cex Set the color and the size of labels, respectively
leaves_pch, leaves_cex and leaves_col set the point type, size and color for leaves, respectively
nodes_pch, nodes_cex and nodes_col set the point type, size and color for nodes, respectively
hang_leaves hang the leaves
branches_k_color color the branches
branches_col, branches_lwd , branches_lty Set the color, the line width and the line type of branches, respectively
by_labels_branches_col, by_labels_branches_lwd and Set the color, the line width and the line type of branches with specific labels,
by_labels_branches_lty respectively
clear_branches and clear_leaves Clear branches and leaves, respectively

5.4 Create a simple dendrogram

#Createadendrogramandplotit
dend<USArrests[1:5,]%>%scale%>%
dist%>%hclust%>%as.dendrogram
dend%>%plot
#Getthelabelsofthetree
labels(dend)

##[1]"Alaska""Arizona""California""Alabama""Arkansas"

5.5 Change labels

This section describes how to change label names as well as the color and the size for labels.

#Changethelabels,andthenplot:
dend%>%set("labels",c("a","b","c","d","e"))%>%plot

#Changecolorandsizeforlabels
dend%>%set("labels_col",c("green","blue"))%>%#changecolor
set("labels_cex",2)%>%#Changesize
plot(main="Changethecolor\nandsize")#plot

#Colorlabelsbyspecifyingthenumberofcluster(k)
dend%>%set("labels_col",value=c("green","blue"),k=2)%>%
plot(main="Colorlabels\npercluster")
abline(h=2,lty=2)
In the R code above, the value of color vectors are too short. Hence, its recycled.

5.6 Change the points of a dendrogram nodes/leaves

#Changethetype,thecolorandthesizeofnodepoints
#+++++++++++++++++++++++++++++
dend%>%set("nodes_pch",19)%>%#nodepointtype
set("nodes_cex",2)%>%#nodepointsize
set("nodes_col","blue")%>%#nodepointcolor
plot(main="Nodepoints")

#Changethetype,thecolorandthesizeofleavepoints
#+++++++++++++++++++++++++++++
dend%>%set("leaves_pch",19)%>%#nodepointtype
set("leaves_cex",2)%>%#nodepointsize
set("leaves_col","blue")%>%#nodepointcolor
plot(main="Leavespoints")
#Specifydifferentpointtypesandcolorsforeachleave
dend%>%set("leaves_pch",c(17,18,19))%>%#nodepointtype
set("leaves_cex",2)%>%#nodepointsize
set("leaves_col",c("blue","red","green"))%>%#nodepointcolor
plot(main="Leavespoints")

5.7 Change the color of branches

The color for branches can be controlled using kmeans clustering:

#Defaultcolors
dend%>%set("branches_k_color",k=2)%>%
plot(main="Defaultcolors")
#Customizedcolors
dend%>%set("branches_k_color",
value=c("red","blue"),k=2)%>%
plot(main="Customizedcolors")
Its also possible to use the function color_branches().

5.8 Adding colored rectangles

Clusters can be highlighted by adding colored rectangles. This is done using the rect.dendrogram() function (modeled based on the rect.hclust()
function). One advantage of rect.dendrogram over rect.hclust, is that it also works on horizontally plotted trees:

#Verticalplot
dend%>%set("branches_k_color",k=3)%>%plot
dend%>%rect.dendrogram(k=3,border=8,lty=5,lwd=2)
#Horizontalplot
dend%>%set("branches_k_color",k=3)%>%plot(horiz=TRUE)
dend%>%rect.dendrogram(k=3,horiz=TRUE,border=8,lty=5,lwd=2)

5.9 Adding colored bars

This is useful for annotating the items in the clusters:

grp<c(1,1,1,2,2)
k_3<cutree(dend,k=3,order_clusters_as_data=FALSE)
#TheFALSEabovemakessurewegettheclustersintheorderofthe
#dendrogram,andnotinthatoftheoriginaldata.Itislike:
#cutree(dend,k=3)[order.dendrogram(dend)]
the_bars<cbind(grp,k_3)
dend%>%set("labels","")%>%plot
colored_bars(colors=the_bars,dend=dend)
5.10 ggplot2 integration

The following 2 steps are used:


1. Transform a dendrogram into a ggdend object using as.ggdend() function
2. Make the plot using the function ggplot()

dend<iris[1:30,5]%>%scale%>%dist%>%
hclust%>%as.dendrogram%>%
set("branches_k_color",k=3)%>%set("branches_lwd",1.2)%>%
set("labels_colors")%>%set("labels_cex",c(.9,1.2))%>%
set("leaves_pch",19)%>%set("leaves_col",c("blue","red"))
#plotthedendinusual"base"plottingengine:
plot(dend)

Produce the same plot in ggplot2 using the function:

library(ggplot2)
#Rectangledendrogramusingggplot2
ggd1<as.ggdend(dend)
ggplot(ggd1)
#Changethethemetothedefaultggplot2theme
ggplot(ggd1,horiz=TRUE,theme=NULL)

#Thememinimal
ggplot(ggd1,theme=theme_minimal())
#Createaradialplotandremovelabels
ggplot(ggd1,labels=FALSE)+
scale_y_reverse(expand=c(0.2,0))+
coord_polar(theta="x")

5.11 pvclust and dendextend

The package dendextend can be used to enhance many packages including pvclust. Recall that, pvclust is for calculating pvalues for hierarchical
clustering.
pvclust can be used as follow:
library(pvclust)
data(lung)#916genesfor73subjects
set.seed(1234)
result<pvclust(lung[1:100,1:10],method.dist="cor",
method.hclust="average",nboot=10)

##Bootstrap(r=0.5)...Done.
##Bootstrap(r=0.6)...Done.
##Bootstrap(r=0.7)...Done.
##Bootstrap(r=0.8)...Done.
##Bootstrap(r=0.9)...Done.
##Bootstrap(r=1.0)...Done.
##Bootstrap(r=1.1)...Done.
##Bootstrap(r=1.2)...Done.
##Bootstrap(r=1.3)...Done.
##Bootstrap(r=1.4)...Done.

#Defaultplotoftheresult
plot(result)
pvrect(result)

#pvclustanddendextend
result%>%as.dendrogram%>%
set("branches_k_color",k=2,value=c("purple","orange"))%>%
plot
result%>%text
result%>%pvrect
6 Infos

This analysis has been performed using R software (ver. 3.2.1)


Adsby Google Plotting Clustering KMean

Share 2 Like Share 2 Tweet Share 23 55 Share 13


WanttoLearnMoreonRProgrammingandDataScience?

FollowusbyEmail

Subscribe
byFeedBurner
OnSocialNetworks:
onSocialNetworks

Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on "Discussion" button (topright position of this page)
Sign up as a member and post news and articles on STHDA web site.

Suggestions

Determining the optimal number of clusters: 3 must known methods Unsupervised Machine Learning
Cluster Analysis in R Unsupervised machine learning
Partitioning cluster analysis: Quick start guide Unsupervised Machine Learning
DBSCAN: densitybased clustering for discovering clusters in large datasets with noise Unsupervised Machine Learning
Clustering Validation Statistics: 4 Vital Things Everyone Should Know Unsupervised Machine Learning
Hierarchical Clustering Essentials Unsupervised Machine Learning
Static and Interactive Heatmap in R Unsupervised Machine Learning
ModelBased Clustering Unsupervised Machine Learning
Clarifying distance measures Unsupervised Machine Learning
HCPC: Hierarchical clustering on principal components Hybrid approach (2/2) Unsupervised Machine Learning
Assessing clustering tendency: A vital issue Unsupervised Machine Learning
How to choose the appropriate clustering algorithms for your data? Unsupervised Machine Learning
Hybrid hierarchical kmeans clustering for optimizing clustering outputs Unsupervised Machine Learning
The Guide for Clustering Analysis on a Real Data: 4 steps you should know Unsupervised Machine Learning
Visual Enhancement of Clustering Analysis Unsupervised Machine Learning
How to compute pvalue for hierarchical clustering in R Unsupervised Machine Learning
Fuzzy clustering analysis Unsupervised Machine Learning
Practical Guide to Cluster Analysis in R Book

This page has been seen 34266 times

License
(Click on the image below)

Welcome!
Want to Learn More on R Programming and Data Science?
Follow us by Email

Subscribe
by FeedBurner

on Social Networks

R Basi cs

Impo rt i ng D at a

Ex po rt i ng D at a

Reshapi ng D at a

Das könnte Ihnen auch gefallen