[Haskell-cafe] Motion to unify all the string data types

Sat Nov 10 13:37:01 CET 2012

On Sat, Nov 10, 2012 at 4:00 AM, Johan Tibell <johan.tibell at gmail.com> wrote:
> As for type classes, I don't think we use them enough. Perhaps because
> Haskell wasn't developed as an engineering language, some good software
> engineering principles (code against an interface, not a concrete
> implementation) aren't used in out base libraries. One specific example is
> the lack of a sequence abstraction/type class, that all the string, list,
> and vector types could implement. Right now all these types try to implement
> a compatible interface (i.e. the traditional list interface), without a
> language mechanism to express that this is what they do.

I think the challenge is designing an abstraction that everyone is
comfortable with. If you just make everything a class method
(ListLike), it's ugly. If you don't, how do you figure out what goes
in the class and what gets implemented on top of it? Is there any
principled reason for it, or is it just ad hoc? How do you make sure
that none of the implementations suffers a performance decrease? What
about sequential vs. random access (list vs. array) issues? Should an
interface be implemented if it's semantically reasonable, but slow? If
you treat everything as a uniform sequence, doesn't that bring back
the Unicode issues again? (And can you make it work for all of
Text/ByteString (kind *), boxed Vectors and lists (* -> *), and
unboxed vectors (* -> * with a constraint)? What about operations that
change the element type? Surely it's possible with TypeFamilies,
ConstraintKinds, and PolyKinds all available, but I'm not sure if it's
obvious. Can it go into the Prelude if it uses extensions? Should it
also support other containers, like Maps? And so on.)

So my impression is that the reason the problem hasn't been solved yet
is that it's hard. We do have some useful things: Functor, Foldable,
Traversable, and the classes in Data.Key[1], but for starters none of
them can be implemented by Text and ByteString, so that brings us back
to square one.

But a constructive idea: what if strict Text and ByteString were both
synonyms for unboxed Vectors (already available in ByteString's
case[2])? What if, for lazy Text and ByteString, we either had lazy
Vectors to make them synonyms of, or a 'data Lazy v' which made a lazy
chunked sequence out of any underlying strict Vector-ish type? That
would cut down on the number of types, which is a good thing in
itself, and it would suggest an obvious way to abstract over them: the
existing Functor/Foldable/Traversable/Data.Key classes extended with
an associated constraint. I'm not sure how much of the use cases that
would cover, but certainly a lot more than we have now. It wouldn't
solve every one of the questions above, but it anwers many of them,
and it seems like a good compromise. The big drawbacks I can see are
that (a) it would be a *lot* of work, especially if we want to be
completely uncompromising on performance, and (b) I'm not sure how
pinned arrays and interoperation with C would be handled without
making it complicated again. (Though I suppose we could just punt and
have ByteString be a synonym for Vector.Storable (pinned) and Text for
Vector.Unboxed (not pinned) to mirror the current situation. Or maybe
we could have a pinArray# primop?)

Anyway, if I'm blue-sky dreaming, that's what looks appealing to me.

[1] http://hackage.haskell.org/packages/archive/keys/3.0.1/doc/html/Data-Key.html
[2] http://hackage.haskell.org/package/vector-bytestring

-- 
Your ship was destroyed in a monadic eruption.