Generic tries (long)

Wed Jun 18 12:16:21 EDT 2008

On Mon, Jun 16, 2008 at 9:28 PM, Jamie Brandon <jcb73 at cam.ac.uk> wrote:
> Hi, I'm writing a library for generic tries for the Summer of Code.
> The main point of this post is to get some feedback on the api but
> I'll briefly explain the idea.
>
> The point of a trie is to exploit the recursive nature of ADTs to save
> on expensive key comparisons and reduce space consumption. Hinze'
> original formulation is very elegant but results in very deep
> structures and is fairly inefficient. The normal trie optimisations
> (concatenating singleton maps, mainly) cant be applied to the generic
> version.
>
> I intend instead to encode ADTs as lists. This encoding can range from
> a simple walk of the ADT to creating a compressed, bit packing
> representation. The resulting tradeoff between encoding time and space
> usage should make the design fairly flexible. This will look something
> like:
>
> class Serial k where
>
> -- | Flattened form of key consists of a list of Nodes
> -- Node will be Int or UArray Int for compressed implementations
> type Node k
>
> -- | Flatten to a list of Nodes
> serialise :: k -> [Node k]
>
> -- | Reconstruct from a list of Nodes
> unserialise :: [Node k] -> k
>
> The api below is mostly cribbed from Adrian Hey's initial design.
> Guarantees about ordering will probably vary between maps. All
> ascending have descending version too. Strict versions of functions
> will be written where appropriate, I've omitted them here for brevity.
> Key reconstruction is likely to be expensive so it may make more sense
> to seperate foldrKeys and friends into a seperate class.
>
> Adrian has written instance of GMap for lists, UInts and Ord types so
> I can declare
> instance (Serial k) => GMap (ListGMap (Node k)) k where ... etc
>
> I should have a spot on code.haskell.org soon, at which point I'll put
> up a Haddock page with the most up to date version of the api.
>
> class GMap map where
>
> type k
>
> -- | The empty map.
> empty :: map a
>
> -- | Create a map with a single association.
> singleton :: k -> a -> map a
>
> -- | Create a map from a list of associations which /must/ be in
> ascending order of keys
> -- (with /no/ duplicate keys). If in doubt use one of the safer (but
> slower) 'fromAssocs' functions.
> fromAssocsAscending :: [(k,a)] -> map a
>
> -- | Return 'True' if the map contains no associations.
> isEmpty :: map a -> Bool
>
> -- | Return 'True' if the map contains exactly one association.
> isSingleton :: map a -> Bool
>
> -- | Return the value associated with the supplied key (if any).
> lookup :: k -> map a -> Maybe a
>
> -- | Insert a new association in the map if there is currently no
> value associated with the key.
> -- If there is a value associated with the key then replace it with
> the result of
> -- applying the supplied function to that value.
> insert :: (a -> a) -> k -> a -> map a -> map a
>
> -- | Delete the association for the supplied key (if any).
> delete :: k -> map a -> map a
>
> -- | This is a combined insert\/modify\/delete operation. The argument
> to the supplied function
> -- is ('Just' a) if there is a value (a) associated with the supplied
> key, otherwise 'Nothing'.
> -- If the return value is ('Just' a'), a' becomes the new value
> associated with the supplied key.
> -- If the return value is 'Nothing', the association for the supplied
> key (if any) is deleted.
> alter :: (Maybe a -> Maybe a) -> k -> map a -> map a
>
> -- | Evaluate the union of two maps. If the maps contain common keys
> then combine the
> -- values associated with those keys using the supplied function. The
> value arguments
> -- to this function are supplied in the same order as the map arguments.
> union :: (a -> a -> a) -> map a -> map a -> map a
>
> -- | Evaluate the intersection of two maps, combining common
> associations using the supplied function.
> intersection :: (a -> b -> c) -> map a -> map b -> map c
>
> -- | Evaluate the difference between two maps. For any key occuring in
> the second map,
> -- the corresponding association (if any) is deleted from the first map.
> -- The associated values in the second map are irrelevant.
> difference :: map a -> map b -> map a
>
> -- | Returns true if the keys in the first map are a subset of the
> keys in the second map.
> -- (This includes the case where the key sets are identical). Note
> that this function does
> -- not examine the associated values (which are irrelevant). See
> 'isSubmapOf' if you
> -- do want associated values examined.
> isSubsetOf :: map a -> map b -> Bool
>
> -- | Returns true if the keys in the first map are a subset of the
> keys in the second map
> -- and the corresponding function always returns true when applied to
> the values associated
> -- with matching keys.
> isSubmapOf :: (a -> b -> Bool) -> map a -> map b -> Bool
>
> -- | Apply the supplied function to every associated value in the map.
> map :: (a -> b) -> map a -> map b
>
> -- | Apply the supplied function to every association in the map, and
> use the result
> -- as the new associated value for the corresponding key.
> mapWithKey :: (k -> a -> b) -> map a -> map b
>
> -- | Delete associations for which the supplied predicate returns
> 'False' when applied to
> -- the associated value.
> filter :: (a -> Bool) -> map a -> map a
>
> -- | Fold right over the list of elements in ascending order of keys.
> -- See 'foldrElemsAscending'' for a strict version of this function.
> foldrElemsAscending :: (a -> b -> b) -> map a -> b -> b
>
> -- | Fold right over the list of keys in ascending order.
> -- See 'foldrKeysAscending'' for a strict version of this function.
> foldrKeysAscending :: (k -> b -> b) -> map a -> b -> b
>
> -- | Fold right over the list of associations in ascending order of keys.
> -- See 'foldrAssocsAscending'' for a strict version of this function.
> foldrAssocsAscending :: (k -> a -> b -> b) -> map a -> b -> b
>
> -- | Fold over elements in un-specified order using /unboxed/ Int
> accumulator (with GHC).
> -- Defaults to boxed Int for other Haskells. Typically used for
> counting functions.
> -- Implementations are free to traverse the map in any order.
> -- The folded function is always applied strictly.
> foldElemsUINT :: (a -> UINT -> UINT) -> map a -> UINT -> UINT
>
> In addition there a few functions which are useful for groups of maps
> or nested maps.
>
> -- | Add the number of associations in a map to the supplied /unboxed/
> Int (with GHC).
> -- Defaults to boxed Int for other Haskells.
> addSize :: map a -> UINT -> UINT
>
> -- | Find the value associated with the supplied key (if any) and
> return the result
> -- of applying the supplied continuation function to that value.
> Useful for nested lookup.
> lookupCont :: (a -> Maybe b) -> k -> map a -> Maybe b
>
> -- | Reject empty maps (return Nothing).
> nonEmpty :: map a -> Maybe (map a)
>
> The following functions are useful internally as most of the maps are
> defined in terms of simpler maps. Also see this thread on the need for
> unionMaybe.
>
> -- | Similar to 'insert', but the association is deleted if the
> supplied function returns 'Nothing'.
> -- (The supplied function is always applied strictly.)
> insertMaybe :: (a -> Maybe a) -> k -> a -> map a -> map a
>
> -- | Find the value associated with the supplied key (if any) and
> apply the supplied function
> -- to that value. Delete the association if the result is 'Nothing'.
> Replace the old value with
> -- the new value if the result is ('Just' something).
> -- (The supplied function is always applied strictly.)
> deleteMaybe :: (a -> Maybe a) -> k -> map a -> map a
>
> -- | Evaluate the union of two maps, but delete combined associations
> from the result map
> -- if the combining function returns 'Nothing'.
> -- (The combining function is always applied strictly.)
> unionMaybe :: (a -> a -> Maybe a) -> map a -> map a -> map a
>
> -- | Evaluate the intersection of two maps, but delete combined
> associations from the result map
> -- if the combining function returns 'Nothing'.
> -- (The combining function is always applied strictly.)
> intersectionMaybe :: (a -> b -> Maybe c) -> map a -> map b -> map c
>
> -- | Difference with a combining function. If the combining function returns
> -- @Just a@ then the corresponding association is not deleted from the
> result map
> -- (it is retained with @a@ as the associated value).
> differenceMaybe :: (a -> b -> Maybe a) -> map a -> map b -> map a
>
> -- | Apply the supplied function to every associated value in the map.
> -- If the result is 'Nothing' then the delete the corresponding association.
> -- (The supplied function is always applied strictly.)
> mapMaybe :: (a -> Maybe b) -> map a -> map b
>
> Thanks
>
> Jamie
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>

Just a thought: why not have:

import Data.Traversable

class Traversable map => GMap map where
  ...
  (remove 'map')
  ...

Because all maps ought to be traversable.

See: http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Traversable.html

Bas