[Haskell-cafe] Structural sharing in haskell data structures?

Thu May 14 09:03:30 EDT 2009

On May 13, 2009, at 6:58 PM, wren ng thornton wrote:

> Jan-Willem Maessen wrote:
>> I wanted to clear up one misconception here...
>> wren ng thornton wrote:
>> > In heavily GCed languages like Haskell allocation and collection  
>> is > cheap, so we don't mind too much; but in Java and the like,  
>> both > allocation and collection are expensive so the idea of cheap  
>> throwaway > objects is foreign.
>> Not true!
>
> I was speaking of Java, not Clojure. I believe the costs in Java are  
> well documented, though I don't know enough about the JVM to know  
> where the blame belongs. (All I know of Clojure is that it's a Lisp- 
> like on the JVM :)

I think you're missing the point here: the code I refer to below *is  
in Java* and is running on a standard JVM; the "costs" you refer to  
simply don't exist!  As Vladimir Ivanov points out, and as Rich Hickey  
is happy to observe in his talks on Clojure, the JVM handles  
allocation-intensive garbage-intensive programs very well.

>> If you look at the internals of Clojure, you'll discover they're  
>> using trees with *very* wide fanout (eg fanout-64 leaf trees for  
>> lists).  Why?  Because it's so cheap to allocate and GC these  
>> structures!  By using shallow-but-wide trees we reduce the cost of  
>> indexing and accessing list elements.  I suspect you'd still be  
>> hard-pressed to support this kind of allocation behavior in any of  
>> the present Haskell implementations, and Haskell implementations of  
>> the same kinds of structures have limited fanout to 2-4 elements or  
>> so.
>
> I was under the impression that the reason datastructures in Haskell  
> tend to be limited to 4-fanout had more to do with the cleanliness  
> of the implementations--- pattern matching on 64-wide cells is quite  
> ugly, as is dealing with the proliferation of corner cases for  
> complex structures like finger trees, patricia trees, etc. The use  
> of view patterns could clean this up significantly. On the other  
> hand, we do have things like lazy ByteStrings and UVector which do  
> have wide fanouts.

Hmm, I think neither of the data structures you name actually support  
both O(lg n) indexing and O(lg n) cons or append.  That said, your  
point is well taken, so let's instead state it as a challenge:

Can you, oh Haskellers, implement a fast, wide-fanout (say >= 8) tree- 
based sequence implementation in Haskell, which supports at-least-log- 
time indexing and at-least-log-time cons with a large base for the  
logarithm?  Can you do it without turning off array bounds checking  
(either by using unsafe operations or low-level peeking and poking)  
and without using an algebraic data type with O(f) constructors for  
fanout of f?  You can turn off bounds checks if your program encodes  
static guarantees that indices cannot be out of bounds (there are a  
couple of libraries to do this).

The spirit here is "Work in Haskell with safe operations and no FFI  
except through safe libraries, but otherwise use any extensions you  
like."

I actually think this *is* doable, but it touches a few areas where  
Haskell doesn't presently do well (the bounds checking in particular  
is a challenge).  I threw in the bounds checking when I realized that  
in fact the equivalent Java code is always bounds checked, and these  
bounds checks are then optimized away where possible.  Actually, I'd  
*love* to see an *in*efficient solution to eliminating as many bounds  
checks as possible!

-Jan

>
>
> -- 
> Live well,
> ~wren
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe