[Haskell-cafe] Re: Paths to tree
apfelmus at quantentunnel.de
apfelmus at quantentunnel.de
Mon Jan 29 09:01:44 EST 2007
John Ky wrote:
> I've written some code and was wondering if there was a better way to write
> it in terms of readability, brevity and/or efficiency.
>
> The function concerned is pathsToForest which takes a list of paths (ie.
> [[String]]) and converts it into a tree structure where the individual
> nodes
> are the names in the path. Siblings with the same name are merged.
>
> For instance:
>
> prettyPrint $ mergeForest $ pathsToForest [["a", "b", "c"], ["c", "b",
> "a"], ["a", "b", "d"]]
>
> gives:
>
> a
> b
> d
> c
> c
> b
> a
Your code is fine, I like it. A minor hint is that mergeForest is a fold:
mergeForest = foldr merge []
Also, we have
prettyPrint = putStr . unlines . prettyPrint' $ forest
So far, so good. Ironically, Data.Tree is not used much because it is so
easy to invent your own tree type in Haskell. Most often, Data.Tree is
too rigid and does not offer enough structure. Indeed, this is the case
here, too: there is more structure behind your task than you may think
at first. Let me show it.
I can't know, but it doesn't seem unreasonable that you intend to use
the ArcForest as a trie, i.e. an efficient implementation of a set of
paths which allows to look up quickly whether a given path (here of type
[String]) is in the set or not. So, we have
type Path = [String]
type SetPath = ... -- a set of paths, to be defined later
-- checks whether a given path is in the trie
member :: Path -> SetPath -> Bool
where 'member' is the function akin to 'elem' on lists. The focus of
your code is not on the membership test, but constructing the trie from
a list of paths. You named the function 'pathsToForest', we will name it
fromList :: [Path] -> SetPath
Now, if we have a function
insert :: Path -> SetPath -> SetPath
that inserts a path into the set and we if we are given the notion of an
empty set
empty :: SetPath
, we can write
fromList = foldr insert empty
Indeed, your function 'merge' corresponds to 'insert'. So let's find
'insert'.
It will turn out that it is easier to generalize from SetPath to a
storage that associates a value of type v with every path. This is
called "finite map" (here, the keys are of type Path):
data MapPath v = ... -- to be defined later
-- the empty map contains no values
empty :: MapPath v
-- a map that contains a single key-value pair
singleton :: Path -> v -> MapPath v
-- lookup the value for a given Path
lookup :: Path -> MapPath v -> Maybe v
-- insert a value with given key into the finite map
insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v
These functions are like the ones from the standard library Data.Map
(btw, you can use this library for your task, too). Note that we had to
generalize the type signature for 'insert' considerably because we have
to specify what to do when there is already a value for the given key in
the trie. This is what the argument of type '(v -> v -> v)' does: it
takes the old and the new value and merges them somehow. We will need
it, soon.
Given the generalization, we can now simply put
type SetPath = MapPath ()
and function 'fromList' will read
fromList :: [Path] -> SetPath
fromList = foldr (\path -> insert (\_ x -> x) path ()) empty
It is time to think about coding 'lookup' and 'insert'. And here comes
the trick: we know that 'Path = [String]' so let's assume that we
already have a finite map for strings:
data MapString v = ...
emptyStr :: MapString v
singletonStr :: String -> v -> MapString v
lookupStr :: String -> MapString v -> Maybe v
insertStr :: (v -> v -> v) -> String -> v
-> MapString v -> MapString v
Now, we can build up our finite map for paths:
data MapPath v = TriePath (Maybe v) (MapString (MapPath v))
because it (maybe) contains a value for the key '[] :: Path' and it
(maybe) contains a map of paths that is organized by their first String
element. Let's write this down:
lookup :: Path -> MapPath v -> Maybe v
lookup [] (TriePath w _) = w
lookup (x:xs) (TriePath _ m) = lookup xs (lookupStr x m)
Coding 'insert' is slightly more involved and it may be very instructive
to find out how your code and the following are essentially the same.
We'll prepare us with 'singleton' which corresponds to your 'pathToTree'
singleton :: Path -> v -> MapPath v
singleton [] v = TriePath (Just v) emptyStr
singleton (x:xs) v =
TriePath Nothing $ singletonStr x (singleton xs v)
Now, we can tackle 'insert'. Compared to your implementation, it is
roughly equivalent to 'merge':
insert _ p v m ^= merge (singleton p v) m
We have
insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v
insert f [] v (TriePath w m) = case w of
Just v' -> TriePath (Just (f v v')) m
Nothing -> TriePath (Just v) m
insert f (x:xs) v (TriePath w m) =
TriePath w (insertStr (insert f xs v) x empty m)
(Coding 'empty' is left as an exercise. Also note that the case
expression can be seen as a kind of 'insertMaybe :: (v -> v -> v) -> v
-> Maybe v -> Maybe v).
Now what about 'MapString v', how do we get this? Well, your
implementation corresponds to the choice
type MapString v = [(String,v)]
But in our case, we can apply the same trick again! We have 'String =
[Char]' and given an implementation of
data MapChar v = ...
we can use exactly the same code from 'MapPath v' to implement
'MapString v'! (This reuse can be abstracted into a type class, but I'll
not cover that here.) Of course, we need 'MapChar v' now. But yet, again
we can think of Char as
Char ^= Int ^= [Bool]
where the '[Bool]' means the list of digits in binary representation.
So, given 'MapBool v', we can implement 'MapChar v' with yet again the
same code that we used for the preceding finite maps! Fortunately, the
recursion ends here because a finite map for 'Bool'eans is just the pair
type MapBool v = (Maybe v, Maybe v)
In case your head does not yet hurt too much :), further information
about tries in Haskell and a detailed explanation of why the code above
works, can be found in
Ralf Hinze. Generalizing generalized tries. Journal of Functional
Programming, 10(4):327-351, July 2000
http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz
Regards,
apfelmus
> import Data.Tree
> import Control.Monad
>
> data ArcData = ArcData
> { name :: String
> } deriving Show
>
> type ArcTree = Tree ArcData
> type ArcForest = Forest ArcData
>
> pathsToForest :: [[String]] -> ArcForest
> pathsToForest paths = mergeForest $ concat $ map pathToTree paths
>
>
> mergeForest :: ArcForest -> ArcForest
> mergeForest [] = []
> mergeForest (x:xs) = merge x (mergeForest xs)
> where
> merge :: ArcTree -> ArcForest -> ArcForest
> merge tree [] = [tree]
> merge tree (y:ys) =
> if sameTreeName tree y
> then
> merge
> tree
> { subForest = mergeForest ((subForest tree) ++ (subForest y))
> }
> ys
> else
> (y:merge tree ys)
>
> treeName :: ArcTree -> String
> treeName tree = name $ rootLabel $ tree
>
> sameTreeName :: ArcTree -> ArcTree -> Bool
> sameTreeName treeLeft treeRight = treeName treeLeft == treeName treeRight
>
> pathToTree :: [String] -> ArcForest
> pathToTree [] = []
> pathToTree (name:subpath) =
> [ Node
> { rootLabel = ArcData { name = name }
> , subForest = pathToTree subpath
> }
> ]
>
> prettyPrint' :: ArcForest -> [String]
> prettyPrint' [] = []
> prettyPrint' (x:xs) =
> [name $ rootLabel $ x] ++ (map (" " ++) (prettyPrint' $ subForest x))
> ++
> prettyPrint' xs
>
> prettyPrint :: ArcForest -> IO ()
> prettyPrint forest = do
> forM_ (prettyPrint' forest) putStrLn
More information about the Haskell-Cafe
mailing list