Sie sind auf Seite 1von 1

Ward's minimum variance method can be defined and implemented recursively by a

LanceWilliams algorithm. The LanceWilliams algorithms are an infinite family of


agglomerative hierarchical clustering algorithms which are represented by a
recursive formula for updating cluster distances at each step (each time a pair of
clusters is merged). At each step, it is necessary to optimize the objective
function (find the optimal pair of clusters to merge). The recursive formula
simplifies finding the optimal pair.

Suppose that clusters {\displaystyle C_{i}} C_{i} and {\displaystyle C_{j}} C_{j}
were next to be merged. At this point all of the current pairwise cluster distances
are known. The recursive formula gives the updated cluster distances following the
pending merge of clusters {\displaystyle C_{i}} C_{i} and {\displaystyle C_{j}}
C_{j}. Let

{\displaystyle d_{ij}} d_{ij}, {\displaystyle d_{ik}} d_{{ik}}, and {\displaystyle


d_{jk}} d_{{jk}} be the pairwise distances between clusters {\displaystyle C_{i}}
C_{i}, {\displaystyle C_{j}} C_{j}, and {\displaystyle C_{k}} C_{k}, respectively,
{\displaystyle d_{(ij)k}} d_{{(ij)k}} be the distance between the new cluster
{\displaystyle C_{i}\cup C_{j}} C_{i}\cup C_{j} and {\displaystyle C_{k}} C_{k}.
An algorithm belongs to the Lance-Williams family if the updated cluster distance
{\displaystyle d_{(ij)k}} d_{{(ij)k}} can be computed recursively by

{\displaystyle d_{(ij)k}=\alpha _{i}d_{ik}+\alpha _{j}d_{jk}+\beta d_{ij}+\gamma |


d_{ik}-d_{jk}|,} d_{{(ij)k}}=\alpha _{i}d_{{ik}}+\alpha _{j}d_{{jk}}+\beta d_{{ij}}
+\gamma |d_{{ik}}-d_{{jk}}|,
where {\displaystyle \alpha _{i},\alpha _{j},\beta ,} \alpha _{i},\alpha _{j},\beta
, and {\displaystyle \gamma } \gamma are parameters, which may depend on cluster
sizes, that together with the cluster distance function {\displaystyle d_{ij}}
d_{ij} determine the clustering algorithm. Several standard clustering algorithms
such as single linkage, complete linkage, and group average method have a recursive
formula of the above type. A table of parameters for standard methods is given by
several authors.[2][3][4]

Ward's minimum variance method can be implemented by the LanceWilliams formula.


For disjoint clusters {\displaystyle C_{i},C_{j},} C_{i},C_{j}, and {\displaystyle
C_{k}} C_{k} with sizes {\displaystyle n_{i},n_{j},} n_{i},n_{j}, and
{\displaystyle n_{k}} n_{k} respectively:

{\displaystyle d(C_{i}\cup C_{j},C_{k})={\frac {n_{i}+n_{k}}{n_{i}+n_{j}


+n_{k}}}\;d(C_{i},C_{k})+{\frac {n_{j}+n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{j},C_{k})-
{\frac {n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{i},C_{j}).} d(C_{i}\cup
C_{j},C_{k})={\frac {n_{i}+n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{i},C_{k})+{\frac
{n_{j}+n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{j},C_{k})-{\frac {n_{k}}{n_{i}+n_{j}
+n_{k}}}\;d(C_{i},C_{j}).
Hence Ward's method can be implemented as a LanceWilliams algorithm with

Das könnte Ihnen auch gefallen