[Haskell-cafe] Re: Paths to tree

Mon Jan 29 09:01:44 EST 2007

John Ky wrote:
> I've written some code and was wondering if there was a better way to write
> it in terms of readability, brevity and/or efficiency.
> 
> The function concerned is pathsToForest which takes a list of paths (ie.
> [[String]]) and converts it into a tree structure where the individual
> nodes
> are the names in the path.  Siblings with the same name are merged.
> 
> For instance:
> 
>  prettyPrint $ mergeForest $ pathsToForest [["a", "b", "c"], ["c", "b",
> "a"], ["a", "b", "d"]]
> 
> gives:
> 
>  a
>   b
>    d
>    c
>  c
>   b
>    a

Your code is fine, I like it. A minor hint is that mergeForest is a fold:
   mergeForest = foldr merge []
Also, we have
   prettyPrint = putStr . unlines . prettyPrint' $ forest

So far, so good. Ironically, Data.Tree is not used much because it is so
easy to invent your own tree type in Haskell. Most often, Data.Tree is
too rigid and does not offer enough structure. Indeed, this is the case
here, too: there is more structure behind your task than you may think
at first. Let me show it.

I can't know, but it doesn't seem unreasonable that you intend to use
the ArcForest as a trie, i.e. an efficient implementation of a set of
paths which allows to look up quickly whether a given path (here of type
[String]) is in the set or not. So, we have

  type Path = [String]
  type SetPath = ... -- a set of paths, to be defined later

     -- checks whether a given path is in the trie
  member  :: Path -> SetPath -> Bool

where 'member' is the function akin to 'elem' on lists. The focus of
your code is not on the membership test, but constructing the trie from
a list of paths. You named the function 'pathsToForest', we will name it

  fromList :: [Path] -> SetPath

Now, if we have a function

  insert :: Path -> SetPath -> SetPath

that inserts a path into the set and we if we are given the notion of an
empty set

  empty :: SetPath

, we can write

  fromList = foldr insert empty

Indeed, your function 'merge' corresponds to 'insert'. So let's find
'insert'.

It will turn out that it is easier to generalize from SetPath to a
storage that associates a value of type v with every path. This is
called "finite map" (here, the keys are of type Path):

  data MapPath v = ... -- to be defined later

     -- the empty map contains no values
  empty     :: MapPath v
     -- a map that contains a single key-value pair
  singleton :: Path -> v -> MapPath v
     -- lookup the value for a given Path
  lookup    :: Path -> MapPath v -> Maybe v
     -- insert a value with given key into the finite map
  insert    :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v

These functions are like the ones from the standard library Data.Map
(btw, you can use this library for your task, too). Note that we had to
generalize the type signature for 'insert' considerably because we have
to specify what to do when there is already a value for the given key in
the trie. This is what the argument of type '(v -> v -> v)' does: it
takes the old and the new value and merges them somehow. We will need
it, soon.

Given the generalization, we can now simply put

  type SetPath = MapPath ()

and function 'fromList' will read

  fromList :: [Path] -> SetPath
  fromList = foldr (\path -> insert (\_ x -> x) path ()) empty

It is time to think about coding 'lookup' and 'insert'. And here comes
the trick: we know that 'Path = [String]' so let's assume that we
already have a finite map for strings:

   data MapString v = ...

   emptyStr     :: MapString v
   singletonStr :: String -> v -> MapString v
   lookupStr    :: String -> MapString v -> Maybe v
   insertStr    :: (v -> v -> v) -> String -> v
                 -> MapString v -> MapString v

Now, we can build up our finite map for paths:

   data MapPath v = TriePath (Maybe v) (MapString (MapPath v))

because it (maybe) contains a value for the key '[] :: Path' and it
(maybe) contains a map of paths that is organized by their first String
element. Let's write this down:

   lookup :: Path -> MapPath v -> Maybe v
   lookup []     (TriePath w _) = w
   lookup (x:xs) (TriePath _ m) = lookup xs (lookupStr x m)

Coding 'insert' is slightly more involved and it may be very instructive
to find out how your code and the following are essentially the same.
We'll prepare us with 'singleton' which corresponds to your 'pathToTree'

   singleton :: Path -> v -> MapPath v
   singleton []     v = TriePath (Just v) emptyStr
   singleton (x:xs) v =
       TriePath Nothing $ singletonStr x (singleton xs v)

Now, we can tackle 'insert'. Compared to your implementation, it is
roughly equivalent to 'merge':

   insert _ p v m  ^=  merge (singleton p v) m

We have

   insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v
   insert f []     v (TriePath w m) = case w of
       Just v' -> TriePath (Just (f v v')) m
       Nothing -> TriePath (Just v) m
   insert f (x:xs) v (TriePath w m) =
       TriePath w (insertStr (insert f xs v) x empty m)

(Coding 'empty' is left as an exercise. Also note that the case
expression can be seen as a kind of 'insertMaybe :: (v -> v -> v) -> v
-> Maybe v -> Maybe v).

Now what about 'MapString v', how do we get this? Well, your
implementation corresponds to the choice

  type MapString v = [(String,v)]

But in our case, we can apply the same trick again! We have 'String =
[Char]' and given an implementation of

  data MapChar v = ...

we can use exactly the same code from 'MapPath v' to implement
'MapString v'! (This reuse can be abstracted into a type class, but I'll
not cover that here.) Of course, we need 'MapChar v' now. But yet, again
we can think of Char as

  Char ^= Int ^= [Bool]

where the '[Bool]' means the list of digits in binary representation.
So, given 'MapBool v', we can implement 'MapChar v' with yet again the
same code that we used for the preceding finite maps! Fortunately, the
recursion ends here because a finite map for 'Bool'eans is just the pair

  type MapBool v = (Maybe v, Maybe v)

In case your head does not yet hurt too much :), further information
about tries in Haskell and a detailed explanation of why the code above
works, can be found in

  Ralf Hinze. Generalizing generalized tries. Journal of Functional
  Programming, 10(4):327-351, July 2000
  http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz

Regards,
apfelmus

> import Data.Tree
> import Control.Monad
> 
> data ArcData = ArcData
>  { name :: String
>  } deriving Show
> 
> type ArcTree = Tree ArcData
> type ArcForest = Forest ArcData
> 
> pathsToForest :: [[String]] -> ArcForest
> pathsToForest paths = mergeForest $ concat $ map pathToTree paths
> 
> 
> mergeForest :: ArcForest -> ArcForest
> mergeForest [] = []
> mergeForest (x:xs) = merge x (mergeForest xs)
>  where
>    merge :: ArcTree -> ArcForest -> ArcForest
>    merge tree [] = [tree]
>    merge tree (y:ys) =
>      if sameTreeName tree y
>        then
>          merge
>            tree
>            { subForest = mergeForest ((subForest tree) ++ (subForest y))
>            }
>            ys
>        else
>          (y:merge tree ys)
> 
> treeName :: ArcTree -> String
> treeName tree = name $ rootLabel $ tree
> 
> sameTreeName :: ArcTree -> ArcTree -> Bool
> sameTreeName treeLeft treeRight = treeName treeLeft == treeName treeRight
> 
> pathToTree :: [String] -> ArcForest
> pathToTree [] = []
> pathToTree (name:subpath) =
>  [ Node
>    { rootLabel = ArcData { name = name }
>    , subForest = pathToTree subpath
>    }
>  ]
> 
> prettyPrint' :: ArcForest -> [String]
> prettyPrint' [] = []
> prettyPrint' (x:xs) =
>      [name $ rootLabel $ x] ++ (map (" " ++) (prettyPrint' $ subForest x))
> ++
>      prettyPrint' xs
> 
> prettyPrint :: ArcForest -> IO ()
> prettyPrint forest = do
>  forM_ (prettyPrint' forest) putStrLn