[Haskell-cafe] Re: Optimization problem

Sun Sep 17 19:23:11 EDT 2006

On Thu, Sep 14, 2006 at 07:51:59PM +0200, Bertram Felgenhauer wrote:
> It's a result of thinking about lazy evaluation, and
> especially lazy patterns (and let bindings) for some time. A wiki article
> that helped me a lot to understand these is
> 
>   http://www.haskell.org/hawiki/TyingTheKnot
> 
> I'd like to point out the trustList function there which uses the idea
> of encoding the structure of a term and its actual values in different
> arguments, i.e. a blueprint.

One view of your device is as separating the shape (blueprint) from the
contents, e.g. one can split a finite map type

	data Map k a  = Node !Int k a (Map k a) (Map k a) | Leaf

into a pair of types

	data MapShape k = SNode !Int k (MapShape k) (MapShape k) | SLeaf
	data MapData a = DNode a (MapData a) (MapData a) | DLeaf

(We don't even need DLeaf, as it's never examined.)
We need functions

	fill :: a -> MapShape k -> MapData a
	update :: Ord k => k -> (a -> a) -> MapShape k ->
		MapData a -> MapData a
	insert :: Ord k => k -> MapShape k ->
		(MapShape k, MapData a -> (a, MapData a))

In a dependently typed language we could parameterize MapData with shapes
to give more precise types:

	fill :: a -> forall s :: MapShape k. MapData s a
	update :: Ord k => k -> (a -> a) ->
		forall s :: MapShape k.  MapData s a -> MapData s a
	insert :: Ord k => k -> forall s :: MapShape k.
		(exists s' :: MapShape k, MapData s' a -> (a, MapData s a))

hinting that the function returned by insert reverses the effect of
the insertion.  It's not possible to code this with GADTs, because
existentials are incompatible with irrefutable patterns, at least
in GHC.

Anyway, splitSeq then becomes

	splitSeq :: Ord a => [(a,b)] -> [(a,[b])]
	splitSeq = fst . splitSeq' SLeaf

	splitSeq' :: Ord a => MapShape a -> [(a,b)] -> ([(a,[b])], MapData [b])
	splitSeq' bp [] = ([], fill [] bp)
	splitSeq' bp ((a,b):xs)
	  | member a bp = let (l, m)         = splitSeq' bp xs
			  in  (l, update a (b:) bp m)
	  | otherwise   = let (bp', extract) = insert a bp
			      (l, m')        = splitSeq' bp' xs
			      (bs, m)        = extract m'
			  in  ((a, b:bs) : l, m)

Again, no knot-tying is required.

To prove that (even for partial and infinite lists ps)

	splitSeq ps = [(a, seconds a ps) | a <- nub ps]

where

	seconds :: Eq a => a -> [(a,b)] -> [b]
	seconds a ps = [b | (a', b) <- ps, a' == a]

we need another function

	get :: Ord k => k -> MapShape k -> MapData a -> a

and the properties

	member k s  =>
		get k s (fill v s) = v

	member k s  =>
		get k s (update k f s m) = f (get k s m)
	member k s /\ member x s /\ x /= k  =>
		get x s (update k f s m) = get x s m

	not (member k s) /\ insert k s = (s', extract)  =>
		member x s' = member x s || x == k
	not (member k s) /\ insert k s = (s', extract)  =>
		fst (extract m) = get k s' m
	not (member k s) /\ insert k s = (s', extract) /\ member x s  =>
		get x s (snd (extract m)) = get x s' m

Then we can establish, by induction on the input list,

(1)	fst (splitSeq' s ps) =
		[(a, seconds a ps) | a <- nub ps, not (member a s)]
(2)	member x s  =>
		get x s (snd (splitSeq' s ps)) = seconds x ps

This is enough to sustain the induction and obtain the desired property,
but it's a bit inelegant not to have an exact characterization of the
second component.  It seems essential to observe the map data only
though get.