Generic tries (long)

Mon Jun 16 15:28:32 EDT 2008

Hi, I'm writing a library for generic tries for the Summer of Code.
The main point of this post is to get some feedback on the api but
I'll briefly explain the idea.

The point of a trie is to exploit the recursive nature of ADTs to save
on expensive key comparisons and reduce space consumption. Hinze'
original formulation is very elegant but results in very deep
structures and is fairly inefficient. The normal trie optimisations
(concatenating singleton maps, mainly) cant be applied to the generic
version.

I intend instead to encode ADTs as lists. This encoding can range from
a simple walk of the ADT to creating a compressed, bit packing
representation. The resulting tradeoff between encoding time and space
usage should make the design fairly flexible. This will look something
like:

class Serial k where

-- | Flattened form of key consists of a list of Nodes
-- Node will be Int or UArray Int for compressed implementations
type Node k

-- | Flatten to a list of Nodes
serialise :: k -> [Node k]

-- | Reconstruct from a list of Nodes
unserialise :: [Node k] -> k

The api below is mostly cribbed from Adrian Hey's initial design.
Guarantees about ordering will probably vary between maps. All
ascending have descending version too. Strict versions of functions
will be written where appropriate, I've omitted them here for brevity.
Key reconstruction is likely to be expensive so it may make more sense
to seperate foldrKeys and friends into a seperate class.

Adrian has written instance of GMap for lists, UInts and Ord types so
I can declare
instance (Serial k) => GMap (ListGMap (Node k)) k where ... etc

I should have a spot on code.haskell.org soon, at which point I'll put
up a Haddock page with the most up to date version of the api.

class GMap map where

type k

-- | The empty map.
empty :: map a

-- | Create a map with a single association.
singleton :: k -> a -> map a

-- | Create a map from a list of associations which /must/ be in
ascending order of keys
-- (with /no/ duplicate keys). If in doubt use one of the safer (but
slower) 'fromAssocs' functions.
fromAssocsAscending :: [(k,a)] -> map a

-- | Return 'True' if the map contains no associations.
isEmpty :: map a -> Bool

-- | Return 'True' if the map contains exactly one association.
isSingleton :: map a -> Bool

-- | Return the value associated with the supplied key (if any).
lookup :: k -> map a -> Maybe a

-- | Insert a new association in the map if there is currently no
value associated with the key.
-- If there is a value associated with the key then replace it with
the result of
-- applying the supplied function to that value.
insert :: (a -> a) -> k -> a -> map a -> map a

-- | Delete the association for the supplied key (if any).
delete :: k -> map a -> map a

-- | This is a combined insert\/modify\/delete operation. The argument
to the supplied function
-- is ('Just' a) if there is a value (a) associated with the supplied
key, otherwise 'Nothing'.
-- If the return value is ('Just' a'), a' becomes the new value
associated with the supplied key.
-- If the return value is 'Nothing', the association for the supplied
key (if any) is deleted.
alter :: (Maybe a -> Maybe a) -> k -> map a -> map a

-- | Evaluate the union of two maps. If the maps contain common keys
then combine the
-- values associated with those keys using the supplied function. The
value arguments
-- to this function are supplied in the same order as the map arguments.
union :: (a -> a -> a) -> map a -> map a -> map a

-- | Evaluate the intersection of two maps, combining common
associations using the supplied function.
intersection :: (a -> b -> c) -> map a -> map b -> map c

-- | Evaluate the difference between two maps. For any key occuring in
the second map,
-- the corresponding association (if any) is deleted from the first map.
-- The associated values in the second map are irrelevant.
difference :: map a -> map b -> map a

-- | Returns true if the keys in the first map are a subset of the
keys in the second map.
-- (This includes the case where the key sets are identical). Note
that this function does
-- not examine the associated values (which are irrelevant). See
'isSubmapOf' if you
-- do want associated values examined.
isSubsetOf :: map a -> map b -> Bool

-- | Returns true if the keys in the first map are a subset of the
keys in the second map
-- and the corresponding function always returns true when applied to
the values associated
-- with matching keys.
isSubmapOf :: (a -> b -> Bool) -> map a -> map b -> Bool

-- | Apply the supplied function to every associated value in the map.
map :: (a -> b) -> map a -> map b

-- | Apply the supplied function to every association in the map, and
use the result
-- as the new associated value for the corresponding key.
mapWithKey :: (k -> a -> b) -> map a -> map b

-- | Delete associations for which the supplied predicate returns
'False' when applied to
-- the associated value.
filter :: (a -> Bool) -> map a -> map a

-- | Fold right over the list of elements in ascending order of keys.
-- See 'foldrElemsAscending'' for a strict version of this function.
foldrElemsAscending :: (a -> b -> b) -> map a -> b -> b

-- | Fold right over the list of keys in ascending order.
-- See 'foldrKeysAscending'' for a strict version of this function.
foldrKeysAscending :: (k -> b -> b) -> map a -> b -> b

-- | Fold right over the list of associations in ascending order of keys.
-- See 'foldrAssocsAscending'' for a strict version of this function.
foldrAssocsAscending :: (k -> a -> b -> b) -> map a -> b -> b

-- | Fold over elements in un-specified order using /unboxed/ Int
accumulator (with GHC).
-- Defaults to boxed Int for other Haskells. Typically used for
counting functions.
-- Implementations are free to traverse the map in any order.
-- The folded function is always applied strictly.
foldElemsUINT :: (a -> UINT -> UINT) -> map a -> UINT -> UINT

In addition there a few functions which are useful for groups of maps
or nested maps.

-- | Add the number of associations in a map to the supplied /unboxed/
Int (with GHC).
-- Defaults to boxed Int for other Haskells.
addSize :: map a -> UINT -> UINT

-- | Find the value associated with the supplied key (if any) and
return the result
-- of applying the supplied continuation function to that value.
Useful for nested lookup.
lookupCont :: (a -> Maybe b) -> k -> map a -> Maybe b

-- | Reject empty maps (return Nothing).
nonEmpty :: map a -> Maybe (map a)

The following functions are useful internally as most of the maps are
defined in terms of simpler maps. Also see this thread on the need for
unionMaybe.

-- | Similar to 'insert', but the association is deleted if the
supplied function returns 'Nothing'.
-- (The supplied function is always applied strictly.)
insertMaybe :: (a -> Maybe a) -> k -> a -> map a -> map a

-- | Find the value associated with the supplied key (if any) and
apply the supplied function
-- to that value. Delete the association if the result is 'Nothing'.
Replace the old value with
-- the new value if the result is ('Just' something).
-- (The supplied function is always applied strictly.)
deleteMaybe :: (a -> Maybe a) -> k -> map a -> map a

-- | Evaluate the union of two maps, but delete combined associations
from the result map
-- if the combining function returns 'Nothing'.
-- (The combining function is always applied strictly.)
unionMaybe :: (a -> a -> Maybe a) -> map a -> map a -> map a

-- | Evaluate the intersection of two maps, but delete combined
associations from the result map
-- if the combining function returns 'Nothing'.
-- (The combining function is always applied strictly.)
intersectionMaybe :: (a -> b -> Maybe c) -> map a -> map b -> map c

-- | Difference with a combining function. If the combining function returns
-- @Just a@ then the corresponding association is not deleted from the
result map
-- (it is retained with @a@ as the associated value).
differenceMaybe :: (a -> b -> Maybe a) -> map a -> map b -> map a

-- | Apply the supplied function to every associated value in the map.
-- If the result is 'Nothing' then the delete the corresponding association.
-- (The supplied function is always applied strictly.)
mapMaybe :: (a -> Maybe b) -> map a -> map b

Thanks

Jamie