# [Haskell-cafe] Newbie seeking advice regarding data structure for a tricky algorithm

Tue Apr 24 15:31:49 EDT 2007

```Hi,

Toby Hutton wrote:
> Say I want to put the words 'foo', 'bar' and 'baz' into a binary tree.  The
> heuristic requires I split the words into letters and sort them:
> 'aabbfoorz'.  The heuristic then may decide, based on the sorted letters,
> that 'bar' and 'foo' should go in the left child and 'baz' goes in the
> right.  Typically we'd then simply recurse and for example, the left
> child's
> words would be re-sorted into 'abfoor' and the heuristic is reapplied.
>
> If we assume that sorting is relatively expensive, we can avoid the re-sort
> for the children by unmerging the parent's sorted list of letters.  Two
> sublists of a sorted list should already be sorted.  If we know which word
> each letter belongs to it would be more efficient to tag the letters with
> 'left' or 'right' as the words are classified.  Then we can iterate down
> the sorted letter list and  produce new sorted sublists rather simply.
>
> So it's not actually that complicated, and I can imagine exactly how it
> could be done in C but I really don't know how to approach this in Haskell.

What about just storing with each character a "reference" to the word it
orignally comes from?

> import Data.List (sortBy)
>
> data Tagged = Tag String Char
>
> tag :: Tagged -> String
> tag (Tag x _) = x
>
> compareTagged :: Tagged -> Tagged -> Ordering
> (Tag _ x) `compareTagged` (Tag _ y) = x `compare` y
>
> tagWord :: String -> [Tagged]
> tagWord word = map (Tag word) word
>
> unmerge :: [a] -> (a -> Bool) -> ([a], [a])
> unmerge xs p = foldr f ([], []) xs where
>   f x (ls, rs) | p x = (x:ls, rs)
>                | otherwise = (ls, x:rs)
>
> data Tree = Empty | Leaf String | Node Tree Tree
>
> tree :: ([Tagged] -> String -> Bool) -> [String] -> Tree
> tree heuristic words = tree' words sorted where
>
>   -- we only need to sort once ...
>   sorted = sortBy compareTagged . concat . map tagWord \$ words
>
>   tree' [] _ = Empty
>   tree' [word] _ = Leaf word
>
>   tree' words sorted = let
>      predicate = heuristic sorted
>      (leftWords, rightWords) = unmerge words predicate
>      -- ... because we reuse the ordered list by unmerging it
>      (leftSorted, rightSorted) = unmerge sorted (predicate . tag)
>    in
>      Node (tree' leftWords leftSorted) (tree' rightWords rightSorted)

Tillmann

```