strict collections via types (was: [Haskell-cafe] implementing
python-style dictionary in Haskell)
claus.reinke at talk21.com
Wed Nov 19 06:50:13 EST 2008
> One problem is that Haskell collections are lazy by default. I'm
> aware of a few use cases where laziness lets you formulate a very
> elegant recursive population of a collection, but I think that in
> general, strictness is what you want,
While I strongly agree with the gist of your post, I never liked this
particular kind of argument - different people have different needs.
> and further, that if you want lazy, you store your data as
> one-tuples: data Lazy a = Lazy a
> (If there's a workaround/solution in the other direction, I'd really
> like to hear it).
That is the crunch point. If we can't choose between strict and
non-strict usage, the default becomes the only option, and it simply
doesn't fit all use cases. Workarounds are currently somewhat random:
- for indexed collections (which tend to be strict in the keys), one
can use strict pairs when building from lists of pairs, but that
doesn't work for all operations (at least not with the current APIs)
- some collections, for a few operations, provide strict alternatives,
but that doesn't cover much of the API, and isn't even consistent
over variations of the same kind of collection
- for some types, strict alternatives are provided as separate packages
That means that not only are the usage patterns different, for the
same problem, in different contexts (making switching types harder),
but one can only get the solutions for some operations in some types:
one is limited to the spartan and incomplete subset of the data type
API that supports switching between strict and non-strict (check which
operations in Data.Map support both modes, then check what happens
when you want to move from Data.Map to Data.IntMap..).
Not to mention that the strict alternatives aren't made obvious (you
have to know that you should look for '-ed variants of operation names
to get them, sometimes; that goes back all the way to the popular foldl
with strict accumulator trap) and are documented in names instead of
specified in types.
Using a single set of packages and operations, with standardized
ways of instantiating a collection/type as either strict or non-strict,
would be much better (well, usable, for a start;-), IMHO.
- using a strict default, optionally disabled via non-strict one-tuples,
sounds workable (perhaps the one-tuple should be standardized,
to give the compiler an indication that this is really an annotation,
and to get similar benefits as from newtype deriving).
- using strictness annotations in types seems simpler, but has the
drawback that the annotations are really not part of the types;
types just happen to be a nice place to put annotations that
amount to invariants ('!t' in some context saying: "I always want
whnf in this position, not unevaluated thunks") that the compiler
can use for modifying strictness analysis and code generation.
- an in-between step might be to parameterize collections by a
type constructor, with two pre-defined constructors ("Strict"
and "Id") that effectively play the role of annotations while being
proper parts of types (type constructors), thereby documenting
intent and enabling (pre-)specialization of code without creating
programming or efficiency overheads (at least not for the library
author - not so sure about library users).
One important point is composition of contexts: if one wants, say,
a "Data.Map.Map Key (Data.Map.Map Key type)", it should be
possible to specify that one wants both Maps element-strict, and
hence the composed structure strict in "type", without obfuscating
the code with 'seq's or 'Lazy's all over the place. I can see that
working with an annotation + code-generation/specialization
approach, but how close would a type-tag approach come to
> I'm therefore tempted to suggest that collections should be
> strict by default, and in particular, that there should be strict
> arrays for arbitrary types, not just the ones that happen to be
> unboxable. Unboxing should be an optimization for (some) strict
Another suggestion I find myself agreeing with: when I'm looking
for strict variants of standard data types, unboxing is a natural
follow-on optimization, but if unboxing isn't possible, that shouldn't
keep me from getting the benefits of strictness where I need them.
More information about the Haskell-Cafe