[Haskell-cafe] Reference for technique wanted

Tue Nov 2 21:13:37 EDT 2010

On 2/11/2010, at 10:08 PM, Stephen Tetley wrote:
> Och, adding reverse or even head and tail to a Dlist / Hughes list
> seems out of spirit with the idea - build as a Hughes list (enjoying
> cheap concat) - convert to a list and manipulate thereafter. I know
> the version on Hackage has head, tail, fmap etc. their existence is
> one of the reasons I avoid it and roll my own.
> 
> Interestingly what was the test doing for boring old plain list to do
> so well? More than just cons I hope.

My message included the code, and yes, it was just cons.
But that's the point of the "The Concatenates Vanish" paper:
a straightforward source to source transformation that
doesn't so much make concatenation fast as eliminate it
completely.

For the case of a tree with 2**25 leaves, we have
- building a list using cons:     0.215 seconds
- using a dlist then converting:  5.144 seconds  ("fancy" dlists)
- using a dlist then converting:  5.512 seconds  ("plain" dlists)
- building a list using append:  17.764 seconds
(using MLton again).

plain dlists used
	fun l2dl xs ys = xs @ ys	(* list to dlist *)
	fun dl2l d     = d []		(* dlist to list *)
	fun dlap e d   = e o d		(* dlist append *)
which is precisely the roll-your-own concatenation-only approach.

I must admit that I was surprised to find the "fancy" version
that handles reverse in O(1) time as well as concatenation
was faster than the "plain" version; it just goes to show that
rolling your own simplified code DOESN'T always pay off.

Flattening this tree using append involved 25 layers of appends,
each allocating 2**24 cons cells.  If dlists can only beat _that_
by a factor of 3.2-3.5, they don't seem worth bothering about;
even if you refuse to rewrite your source code to avoid
concatenation, a data structure does better than a function.

The data structure is more versatile too.
"Plain" dlists give you O(1) concatenate and O(n) conversion to list.
"Fancy" ones give you O(1) reverse as well.
There they stop.

With a data structure, it's easy to get O(1) length and even
easier to do O(1) null test, you can do folding without first
copying.

Of course using Haskell and GHC the relative costs would be different,
but GHC knows enough about lists that the payoff from using a data
structure the compiler knows how to optimise might outweigh other
concerns.