[Haskell-cafe] The container problem

Fri Sep 26 20:01:03 EDT 2008

On Fri, 2008-09-26 at 22:37 +0100, Andrew Coppin wrote:
> Derek Elkins wrote:
> > One aspect of it is a bit of a You Aren't Going To Need It.
> >   
> 
> Personally, I haven't had a huge problem with this in practice.

I suspected as much.  Personally I'd recomend worrying about the
problems you actually encounter, rather than worrying about problems
that maybe you'll have later.  Solving problems that you don't have
isn't very gratifying for you.

> > Finally, there -are- several more or less standard classes that capture
> > different general operations on data structures (though there certainly
> > could be more.) They, of course, have different names and different
> > interfaces and different factorings from imperative equivalents.  We
> > have Functor, Applicative, Monad, MonadPlus, Monoid, Foldable,
> > Traversable, IArray, MArray and others.  Notice how the ridiculous
> > proliferation of array types in Haskell has pressed the issue and led to
> > the creation of IArray and MArray.
> >   
> 
> As already noted, Data.Set *should* be a Monad, but can't be. 

No it shouldn't.  Data.Set forms a categorical monad but not one over
Haskell which is what the Monad class expresses.  Data.Set doesn't meet
the interface of Monad and doesn't provide the same guarantees.
Incidentally, Java would have the same problem if it was capable of
expressing something equivalent to the Monad type class; the "issue" is
with parametric polymorphism not type classes. So unsurprisingly the
type system is right because, in my opinion, parametricity is a property
to valuable to lose.  This does have the effect, however, that join
corresponds to the useful function unions with it's same definition only
using different "monad" operations.  Note that, for this particular
example there is a beautiful solution.  We don't really need to take the
union of a -Set- of Sets, all we need to be able to do is traverse the
outer structure.  Taking a hint from my previous reply, we could
specialize to lists and we would end up with mconcat from the Data.Set
instance of Data.Monoid.  If we didn't feel like imposing the conversion
to lists on the user we could write combine = mconcat . toList.
Conveniently, Data.Foldable has a generic toList function, however, even
more conveniently the function we're looking for is simply
Data.Foldable.fold.

> The type 
> system won't allow it. (And I know I'm not the first person to notice 
> this...) Similar fun and frolics with Functor, and presumably 
> Applicative and Foldable (I haven't actually heard of these until just now).
> 
> Frankly, the whole "array" thing is slightly crazy to me. There are 
> several things which the array libraries ought to support, but don't:

> - Making "slices" of arrays. (I.e., generating a subarray in O(1) by 
> using transparent reindexing.)
> - Linked lists of arrays that provide an array-like interface. 
> (ByteString.Lazy does this, but only for Word8 or Char.)
> - It really ought to be possible to unbox *any* type. Technically this 
> is implementable now, but I can't find details of how...
> - Performing "map" in-place for mutable arrays. (This must surely be a 
> very common operation.)
> - Build-in functions for joining arrays together, and splitting at a 
> given index.
> - Array sorting. [Arrays have O(1) indexing, which has big implications 
> for what sorting algorithm to choose.]
> - Lists have about 5,000,000 functions for processing them. Arrays have, 
> like, a dozen. Just how efficient is it to convert an array to a list, 
> process it, and then convert it back?

With the exception of slicing, none of these are interface issues and
thus are irrelevant to the topic of this thread.  All the functions you
want can be implemented with reasonable efficiency and correct
asymptotic complexity in terms of the provided interface.  Whether these
functions are in the standard library or not has no effect on the
contractual obligations between chunks of code.  Slicing can't be
implemented with the correct asymptotic behaviour in terms of these
operations.  So then it comes down to a cost/benefit analysis of whether
the cost of adding it to the interface is justified by the benefits of
being able to slice generically.  In this case, I think the scales tilt
in favor of adding such support.