[Haskell-cafe] The Proliferation of List-Like Types

Thu Feb 21 05:21:50 EST 2008

On Wed, 2008-02-20 at 19:01 -0600, John Goerzen wrote:
> On Wednesday 20 February 2008 5:13:34 pm Duncan Coutts wrote:
> > On Wed, 2008-02-20 at 08:39 -0600, John Goerzen wrote:
> > > * The iconv library works only on lazy ByteStrings, and does not
> > >    handle Strings or strict ByteStrings
> >
> > There is a very good reason for this. The right solution in this
> > particular example is not to overload every internal string operation in
> > the iconv lib (which would be far far too slow) but to convert to/from
> 
> I guess the first question here is: in general, why?

If one is stitching together coarse grained operations then it doesn't
matter too much that we pass in a dictionary and indirect every
operating through that. When we're using very fine grained operations
the overhead per-operation is more significant. If we do not specialise
to the list type we get all the extra overhead and we loose out on all
the static optimisations. If we do specialise we get N copies of the
code.

Iconv is a weird example because it is calling out to a foreign lib that
requires blocks of elements. A more convincing example might be an xml
parser. I conjecture that it is faster and uses less code to make a
single implementation on the best string representation and convert at
the boundaries than to overload all the operations inside the parser by
the string type. My guess is that a well written xml parser over lazy
bytestring will be about the same speed as one written one over String
*even* if we have to initially convert from a String input and certainly
faster than an xml parser that takes a StringLike dictionary at
runtime. 

So I'm claiming that the single impl with boundary conversion gives us
the best of both worlds, no code bloat due to specialisation and working
with whichever string type you like, by converting it at the beginning
and end. Of course only an experiment can say either way.

> Let's say you were using something like ListLike (or StringLike, see below).  
> If a library used these operations exclusively, you could make it work on 
> most any type of list by simply changing your imports.  (Hide the regular 
> functions from Prelude, and import ListLike).  For types such as ByteStrings 
> or lists, that already have a very rich native implementation of these 
> functions, the native implementation is used.  You should be getting greater 
> compatibility essentially for free.  ListLike is an exhaustive mapping over 
> these native functions.  This would be great for anything from sort 
> algorithms to parsers, etc.  I even have a ListLikeIO typeclass[2] to 
> facilitate this. [2]
> 
> Now in your iconv case, you have a special case because you are doing 
> manipulation specifically upon 8-bit binary data.  It may not make sense for 
> you to support a [Char] or even a Char8 ByteString because it does not lend 
> itself to those very well.  You could, perhaps, support a [Word8] as well as 
> a ByteString when using ListLike.  That is, you may have a function like 
> this:
> 
> head :: ListLike full Word8 => full -> Word8
> 
> You could still use this with a ByteString at native speeds, and a [Word8] at 
> its native speed.

Only if we definitely eliminate the type class dictionary statically and
pay the code bloat cost of having several implementations around.

> But this doesn't buy us the ability to use this library interchangably with a 
> Word8-based ByteString and a [Char].  That is a scenario ListLike is not 
> designed to help with.  ListLike is designed to make the container 
> interchangable, but does not address making the contents interchangable.
> 
> I think this is what you are pointing out?

Not really. The iconv example would work with any ListLike(withCString)
whose contents are Word8.

> > your favourite representation on the edge. So in this case those
> > conversions would be pack/unpack or the similar equivalents for strict
> > <-> lazy bytestrings.
> >
> > If we want it to be generic then we want a class of string like things
> > that provides conversions only, not operations.
> >
> > For example we could export iconv as:
> >
> > iconv :: StringLike string => Encoding -> Encoding -> string -> string
> > iconv to from = (convertStringRep :: Lazy.ByteString -> string)
> >               . theRealIconv
> >               . (convertStringRep :: string -> Lazy.ByteString)
> >
> > class StringLike string where
> >   ...
> >
> > convertStringRep :: (StringLike s1, StringLike s2) => s1 -> s2
> > -- analogous to fromIntegral
> 
> ListLike has something along these lines, too: [1]
> 
> class StringLike s where
>   toString :: s -> String
>   fromString :: String -> s
>   lines :: ListLike full s => s -> full
>   words :: ListLike full s => s -> full
>   unlines :: ListLike full s => full -> s
>   unwords :: ListLike full s => full -> s
> 
> The last four functions are there as a way to provide a universal interface 
> to optimized native functions, where available.  The minimal complete 
> definition is just toString and fromString.
> 
> Technically, you could make every function look like:
> 
> iconv data = fromString . toString $ realIConv (fromString . toString $ data)

> That can be made simpler for the programmer with a helper function, but is 
> not necessarily very efficient.  With an MPTC, we could do:
> 
> class StringConvertable s1 s2 where
> 	convertString :: s1 -> s2
> 	withString :: s1 -> (s2 -> a) -> a
> 	withString x func = func (convertString x)
> 
> 	interactString :: s1 -> (s2 -> s2) -> s1
> 	interactString x func = convertString (func (convertString x)

We can do something similar to fromIntegral so that we don't need a
general StringConvertable class.

-- | general coercion from integral types
fromIntegral :: (Integral a, Num b) => a -> b
fromIntegral = fromInteger . toInteger

> which ought to make easy conversions (String to ByteString, for instance) 
> easily doable for a library such as iconv.  What do you think?

I think it needs some performance and code size experiments.

Duncan