Containers in the Haskell Platform

Alexander Dunlap alexander.dunlap at gmail.com
Fri Aug 7 01:51:29 EDT 2009


On Thu, Aug 6, 2009 at 12:15 AM, Alexander
Dunlap<alexander.dunlap at gmail.com> wrote:
> Hi all,
>
> As we consider what we want to go into the Haskell Platform, I would
> like to seriously discuss the issue of container polymorphism,
> especially, but not exclusively, in dealing with strings. We currently
> have a large number of container types, many of which are going to
> co-exist long-term, i.e. there is no winner; they are useful in
> different situations. Here is an incomplete list of what we are
> dealing with:
>
> Fully polymorphic sequence types:
> []
> Data.Sequence
> Data.Edison sequence types
> Data.Array.Array & friends
>
> Restricted polymorphic sequence types:
> uvector
> vector
> storable-vector
>
> Monomorphic sequence types:
> ByteString (Word8 and Char8)
> Lazy ByteString (Word8 and Char8)
> Text
>
> Fully polymorphic associative array types:
> Data.Map
> Edison associative arrays
> gmap (?)
>
> Less-polymorphic associative array types:
> bytestring-tries
> list-tries
> Data.IntMap
>
> I'm sure there are more, of course, but those are the ones that a
> cursory search reveals.
>
> The question, then, is how we ought to deal with all of these types in
> the Haskell Platform. The current popular libraries deal with the
> multitude of container types in a very ad-hoc way. For instance, the
> Parsec parsing library allows you to parse Strings and Strict
> ByteStrings; it also includes a class so that you can parse other
> things. Attoparsec allows you to parse only lazy bytestrings. I don't
> know of any library that parses Text yet. The binary library deals
> with lazy bytestrings, while binary-strict lets you use strict
> bytestrings. Libraries like zlib just pick a type (lazy bytestrings)
> and use that, not accounting for other possible types you may want to
> use (granted, in zlib's case, the choice of a single data structure,
> seems fairly reasonable, given potential use cases).
>
> The point here is that the same functionality and practically the same
> code can be split over several different packages based on what data
> structure you want to use. Worse, there is no standard way of dealing
> with this separation: Parsec uses a type class, while binary uses two
> separate packages. And interoperability could potentially become a
> problem: the split library, for instance, only supports lists, so you
> have to roll your own solutions for splitting ByteStrings or Text. I
> think that with the introduction of the Haskell Platform, we ought to
> take the opportunity to clean up some of this chaos so that we have a
> clean set of standard libraries for people to build on.
>
> The way I see it, we have a few options. We could introduce some type
> classes to cover the various container operations and ask all Platform
> packages to only use operations from these classes, where applicable.
> (The ListLike library would probably be a good place to start.) The
> issue here is efficiency, massive class dictionaries, and type system
> issues (we need MPTCs or TFs; also, since not all sequence types are
> polymorphic, we probably need to have separate map and rigidMap
> functions like in ListLike, and even that doesn't work for types like
> uvector that can take some, but not all, concrete element types).
>
> We could also implement a standard module-naming scheme: for every
> package that operates on strings, for instance, we could require the
> list-based code to go in Foo.Bar.String and the Text code to go in
> Foo.Bar.Text. Whenever we "blessed" a new string datatype (presumably
> not very often), all packages would have to add a new module. The
> issue is code duplication.
>
> We could also implement a package naming scheme: have foo-text and
> foo-string for the two different datatypes.
>
> What does everyone think about the need for standardization here and
> how we ought to do it? I'm sorry that, having less experience with
> Haskell than many, I can't give as much in the way of concrete
> proposals, but I think it's important that we hash something out here
> to get more organization in our Platform.
>
> Thanks for your time in reading this admittedly long message,
> Alex
>

Both comments are well-taken; thank you very much for the feedback!
I'm starting to understand more about the issue and have a more
concrete idea to suggest.

Some of the most common libraries for dealing with strings seem to be
parsers. Parsing libraries would benefit a lot by becoming
polymorphic, as Strings, ByteStrings, Texts, and sequences of other
token types all have legitimate reasons to be parsed. Furthermore,
parsers seem to be libraries that use a very small number of functions
from the string type: I think something along the lines of the
following would suffice:

class Monoid c => Parseable c where
  type El c :: *
  uncons :: c -> Maybe (El c, c)
  splitAt :: Int -> c -> (c,c)
  cons :: El c -> c -> c
  span, break :: (El c -> Bool) -> c -> (c,c)

with mempty and mappend used from the Monoid class. I don't think this
would have very great efficiency implications, since most of the
parsing libraries seem to use those functions anyway. I will try to
mock up an example of this and try some tests and benchmarks to see if
I can come up with an even more concrete proposal. Any further input
is very much appreciated.

Thanks!
Alex


More information about the Libraries mailing list