strict collections via types (was: [Haskell-cafe] implementing python-style dictionary in Haskell)

Wed Nov 19 06:50:13 EST 2008

> One problem is that Haskell collections are lazy by default.  I'm 
> aware of a few use cases where laziness lets you formulate a very
> elegant recursive population of a collection, but I think that in
> general, strictness is what you want, 

While I strongly agree with the gist of your post, I never liked this 
particular kind of argument - different people have different needs.

> and further, that if you want lazy, you store your data as 
> one-tuples: data Lazy a = Lazy a  
> 
> (If there's a workaround/solution in the other direction, I'd really
> like to hear it).

That is the crunch point. If we can't choose between strict and
non-strict usage, the default becomes the only option, and it simply
doesn't fit all use cases. Workarounds are currently somewhat random:

- for indexed collections (which tend to be strict in the keys), one
    can use strict pairs when building from lists of pairs, but that
    doesn't work for all operations (at least not with the current APIs)
- some collections, for a few operations, provide strict alternatives,
    but that doesn't cover much of the API, and isn't even consistent
    over variations of the same kind of collection
- for some types, strict alternatives are provided as separate packages

That means that not only are the usage patterns different, for the
same problem, in different contexts (making switching types harder), 
but one can only get the solutions for some operations in some types:
one is limited to the spartan and incomplete subset of the data type 
API that supports switching between strict and non-strict (check which
operations in Data.Map support both modes, then check what happens
when you want to move from Data.Map to Data.IntMap..). 

Not to mention that the strict alternatives aren't made obvious (you 
have to know that you should look for '-ed variants of operation names 
to get them, sometimes; that goes back all the way to the popular foldl 
with strict accumulator trap) and are documented in names instead of
specified in types.

Using a single set of packages and operations, with standardized
ways of instantiating a collection/type as either strict or non-strict, 
would be much better (well, usable, for a start;-), IMHO.

- using a strict default, optionally disabled via non-strict one-tuples,
    sounds workable (perhaps the one-tuple should be standardized,
    to give the compiler an indication that this is really an annotation,
    and to get similar benefits as from newtype deriving).

- using strictness annotations in types seems simpler, but has the
    drawback that the annotations are really not part of the types;
    types just happen to be a nice place to put annotations that
    amount to invariants ('!t' in some context saying: "I always want 
    whnf in this position, not unevaluated thunks") that the compiler
    can use for modifying strictness analysis and code generation.

- an in-between step might be to parameterize collections by a
    type constructor, with two pre-defined constructors ("Strict"
    and "Id") that effectively play the role of annotations while being
    proper parts of types (type constructors), thereby documenting
    intent and enabling (pre-)specialization of code without creating 
    programming or efficiency overheads (at least not for the library
    author - not so sure about library users).

One important point is composition of contexts: if one wants, say,
a "Data.Map.Map Key (Data.Map.Map Key type)", it should be
possible to specify that one wants both Maps element-strict, and
hence the composed structure strict in "type", without obfuscating 
the code with 'seq's or 'Lazy's all over the place. I can see that
working with an annotation + code-generation/specialization 
approach, but how close would a type-tag approach come to 
supporting this?

> I'm therefore tempted to suggest that collections should be
> strict by default, and in particular, that there should be strict
> arrays for arbitrary types, not just the ones that happen to be
> unboxable. Unboxing should be an optimization for (some) strict
> arrays.

Another suggestion I find myself agreeing with: when I'm looking
for strict variants of standard data types, unboxing is a natural
follow-on optimization, but if unboxing isn't possible, that shouldn't
keep me from getting the benefits of strictness where I need them.

Claus