Using associated data types to create unpacked data structures

Thu Aug 12 13:22:58 EDT 2010

On Thu, Aug 12, 2010 at 6:07 PM, Simon Peyton-Jones
<simonpj at microsoft.com>wrote:

>  1.       You **want** a distinct blob of lookup code for each different
> key type, because you really do want a different lookup structure for each
>

There are two reasons to want different blobs of code for the lookup
function:

1. You want a different data structure (e.g. Patricia trees or weight
balanced trees) for different key types.
2. You want the key type unboxed (e.g. Int#) to reduce indirection/reboxing
and that way speed up the function by some constant factor.

The second could perhaps be achieved using SPECIALIZE pragmas, if they could
be given in modules that uses the data type and not only in the module that
defines the data type.

> 2.       For the most part you **dont want** a different blob of lookup
> code for each value type, because almost all of them are represented
> uniformly by a pointer.
>
I don't know if I want a different blob of lookup code for each value
type. It seems to be that having a different blob of lookup code even for
the value type could avoid some reboxing of the value when the caller will
immediately unbox the value again.

What I definitely want is different data representation for each key/value
type to avoid the four word overhead (two pointers and two constructors) for
key/value types that can be unboxed into the data structures data
constructors.

> So it’s be silly to generate all possible combinations of  types.
>

My proposal is to do the generation lazily, only for the combination of
types that are actually used, just like C++ does. If we can get the same
result without code duplication that would be better.

>  The exception to (2) is that  you want different code for a handful of
> types that you want to unbox into the tree structure itself: Int#, Float#
> etc.  It would be good to design a convenient way to do that.  It’s nothing
> directly to do with associated types.  We’d like to allow Maybe Int#, say,
> but we don’t at the moment because that data structure would really be
> represented differently.  Some kind of data type and code cloning (a la C++)
> is probably the right thing.  This is what Max meant by “just engineering”
> but it would require careful thought and design.
>

The number of data types you might want to unbox may be larger than you
think: All the Int types, all the Word types, Floats, Doubles, tuples, and
user defined records (which are technically tuples I guess). Anything the
UNPACK pragma applies to really.

The unboxing into the data type constructors is really what I'm after, the
specialization of e.g. the lookup function to avoid reboxing seems to follow
naturally from that.

Cheers,
Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20100812/67cc6e64/attachment.html