[Haskellcafe] FGL/Haskell and Hierarchical Clustering/dendograms
Matt Morrow
moonpatio at gmail.com
Wed Dec 23 23:53:01 EST 2009
Hi Nikolas,
Interesting problem. I'd do something like the following, where
the initial spanning tree from you example (retreeified) is:
{
ghci> :t t
t :: Tree (Id, Cost)
g
ghci> ppT t
(4,0)

+ (3,1)
 
 ` (1,1)

` (2,3)

` (5,12)
}
and which results in the tree:
{
ghci> let s = agglom fst snd t
ghci> :t s
s :: Tree (Cost, [Id])
ghci> ppT s
(0,[4])

+ (1,[3,1])

` (3,[2])

` (12,[5])
}
which can then be flattened/etc as needed by further steps of the algo.
The code for `agglom':

import Data.Tree
import Data.List
type Id = Int
type Cost = Int
t :: Tree (Id,Cost)
t = Node (4,0)
[Node (3,1) [Node (1,1) []]
,Node (2,3) [Node (5,12) []]]
ppT :: Show a => Tree a > IO ()
ppT = putStrLn . drawTree . fmap show
  Compress the incoming @Tree a@ with @accumEq at .
agglom :: Eq cost
=> (a > b)
> (a > cost)
> Tree a > Tree (cost,[b])
agglom proj cost = go
where accum = accumEq proj cost
go (Node a []) = Node (cost a,[proj a]) []
go (Node a ts) = let b = proj a
c = cost a
(bs,ss) = accum c ts
in Node (c,b:bs) (fmap go ss)
  Repeatedly @splitEq@, and return a pair
 whose /first/ element is a list of the projected
 @b at s from those root values along paths from
 the roots of the trees in the incoming @[Tree a]@
 which have @cost@ equal to the third function parameter,
 and whose /second/ element is the (concatenation of the)
 list(s) gotten from each of the @splitEq@ calls.
accumEq :: Eq cost
=> (a > b)
> (a > cost) > cost
> [Tree a] > ([b],[Tree a])
accumEq proj cost c = go [] []
where split ts = splitEq proj cost c ts
go xs ys [] = (xs,ys)
go xs ys ts = let (eqs,neqs) = split ts
in case eqs of
[]> ([],ts)
_ > let (bs,tss) = unzip eqs
in go (bs++xs)
(neqs++ys)
(concat tss)
  Split the incoming trees into
 (1) a @[(b,Tree a)]@ of each @b@ is the
 @proj at ected value from an @a@ where
 the @cost@ of that @a@ is equal to
 the third function parameter, and (2)
 the members of the incoming @[Tree a]@
 whose roots' costs are /not/ equal to
 the third function parameter.
splitEq :: Eq cost
=> (a > b)
> (a > cost) > cost
> [Tree a] > ([(b,[Tree a])],[Tree a])
splitEq proj cost c = foldl' go ([],[])
where go (!eqs,!neqs)
t@(Node a ts)
 c==cost a = ((proj a,ts):eqs,neqs)
 otherwise = (eqs,t:neqs)

Cheers,
Matt
On 12/23/09, Nikolas BorrelJensen <nikolasborrel at gmail.com> wrote:
> Hi! I have some trouble implementing singlelinkage clustering algorithm by
> using a minimumspanning tree, so I would appreciate if some of you could
> give me some advise.
>
> I am implementing a singlelinkage clustering algorithm, and my approach is
> to use minimum spanning trees for that task. I am using the library FGL (
> http://web.engr.oregonstate.edu/~erwig/fgl/haskell/), and I have managed to
> compute a minimum spanning tree from an arbitrary fully connected graph with
> 5 nodes. I get [ [(4,0) ] , [ (3,1) , (4,0) ] , [ (1,1) , (3,1) , (4,0) ] ,
> [ (2,3) , (4,0) ] , [ (5,12) , (2,3) , (4,0) ] ], which is the root path
> tree of the minimum spanning tree created by the function msTreeAt.
>
> >From that I would create a dendrogram. [ (1,1) , (3,1) , (4,0) ] is
> telling
> that node 1,3 and 4 has the same cost, namely cost 1. Therefore these are
> merged at level 1. At level 1 we now have 3 clusters: (1,3,4), 2 and 5. Now
> the second lowest should be merged, that is 2 and 4. BUT because 4 is
> already merged in the cluster (1,3,4), we should merge (1,3,4) and 2 at
> level 3 (because the cost is 3). Now at level 3 we have 2 clusters,
> (1,2,3,4) and 5. Now we merge the last one at level 12: (1,2,3,4,5), and we
> are finished.
>
> I have very hard to see, how this could be done efficiently without pointers
> (as in C). I have thought of just saving the nodes from the start of the
> root path, and traversing it, but a lot of searching should be done all the
> time.
>
> Can you please give me some advise on that?
>
> Kind regards
>
> Nikolas BorrelJensen
> Computer Science
> University Of Copenhagen
>
More information about the HaskellCafe
mailing list