[Haskell-cafe] Re: OCaml list sees abysmal Language Shootoutresults

John Meacham john at repetae.net
Fri Oct 8 18:49:53 EDT 2004


On Fri, Oct 08, 2004 at 12:43:45PM -0400, Robert Dockins wrote:
> >>x = [3,5,7]
> >>primes = 2:x
> >>odds = 1:x
> >>You can't do sharing like this if your lists are doubly-linked; lots
> >>of cool algorithms depend on this sharing.
> >
> >
> >That constraint makes various other things painful. I suppose there is
> >no one-size-fits-all solution.
> 
> Then perhaps it is worth considering having multiple implementations and 
>  choosing between them with pragmas and/or command line switches (with 
> a sensible default naturally).  Maybe doubly linked lists are not a 
> great idea, but if we had a good implementation with, eg. O(1) access to 
> both ends of the list but poor sharing, we can choose to use it only in 
> cases where queue semantics are important and sharing is not.  It would 
> be nice to be able to monkey about with that kind of "under the hood" 
> functionality w/o having to make any code changes.  You could also do 
> fun things like have chained-buffer list implementations for [Word8], 
> [Char] etc.

What I thought would be interesting to implement (and should be possible
with ghc) would be as lists were evaluated, if they were packed into
UArrays by the thunk-evaluation code. basically, the update mechanism,
rather than replace a node with a constant Char thunk, would write the
char to a UArray-like structure (of a certain block size) and replace
the code pointer to one that knows how to read the list out of said
UArray structure.

That way, lists would behave like normal, but once evaluated, would be
optimized into efficient arrays. also, the work of the update-analysis
phase could be leveraged to avoid this when we know the list is not
shared.

A variation would be to not do this in the update code, but rather the
GC scavaging code, if it notices chains of unboxable values in HNF. This
is more advantagous as we would avoid the work if the lists are GCed and
has no run-time cost. At the time of the GC, we might also have more
information, like the full size of the list (if it is fully evaluated)
which could help us choose buffer sizes more efficiently or pack it into
8 bit chars if we find out it is all ascii.

        John


-- 
John Meacham - ⑆repetae.net⑆john⑈ 


More information about the Haskell-Cafe mailing list