[Haskell-cafe] FGL/Haskell and Hierarchical Clustering/dendograms

Nikolas Borrel-Jensen nikolasborrel at gmail.com
Wed Dec 23 05:57:35 EST 2009


Hi! I have some trouble implementing single-linkage clustering algorithm by
using a minimum-spanning tree, so I would appreciate if some of you could
give me some advise.

I am implementing a single-linkage clustering algorithm, and my approach is
to use minimum spanning trees for that task. I am using the library FGL (
http://web.engr.oregonstate.edu/~erwig/fgl/haskell/), and I have managed to
compute a minimum spanning tree from an arbitrary fully connected graph with
5 nodes. I get [ [(4,0) ] , [ (3,1) , (4,0) ] , [ (1,1) , (3,1) , (4,0) ] ,
[ (2,3) , (4,0) ] , [ (5,12) , (2,3) , (4,0) ] ], which is the root path
tree of the minimum spanning tree created by the function msTreeAt.

>From that I would create a dendrogram. [ (1,1) , (3,1) , (4,0) ]  is telling
that node 1,3 and 4 has the same cost, namely cost 1. Therefore these are
merged at level 1. At level 1 we now have 3 clusters: (1,3,4), 2 and 5. Now
the second lowest should be merged, that is 2 and 4. BUT because 4 is
already merged in the cluster (1,3,4), we should merge (1,3,4) and 2 at
level 3 (because the cost is 3). Now at level 3 we have 2 clusters,
(1,2,3,4) and 5. Now we merge the last one at level 12: (1,2,3,4,5), and we
are finished.

I have very hard to see, how this could be done efficiently without pointers
(as in C). I have thought of just saving the nodes from the start of the
root path, and traversing it, but a lot of searching should be done all the
time.

Can you please give me some advise on that?

Kind regards

Nikolas Borrel-Jensen
Computer Science
University Of Copenhagen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20091223/3179f818/attachment.html


More information about the Haskell-Cafe mailing list