[Haskell-cafe] The difficulty of designing a sequence class

Sun Jul 30 14:36:52 EDT 2006

On Sunday 30 July 2006 07:47, Brian Hulley wrote:
> Hi -
>
> Part 1 of 2 - Monoid versus MonadPlus
> ===========================
>
> I've just run into a troublesome question when trying to design a sequence
> class:
>
>     class ISeq c a | c -> a where
>          empty :: c
>          single :: a -> c
>          append :: c -> c -> c
>
> However I've noticed that people sometimes separate the empty and append
> operations out as sequences with these ops form a Monoid therefore:
>
>      class Monoid c => ISeq c a | c -> a where
>          single :: a -> c
>
>      -- now outside the class
>      append :: ISeq c a => c -> c -> c
>      append = mappend
>
>      empty :: ISeq c a => c
>      empty = mempty
>
> Another option, is the Edison library which uses:
>
>      class (Functor s, MonadPlus s) => Sequence s where
>
> so here MonadPlus is used instead of Monoid to provide empty and append.
> So I've got three main questions:

> 1) Did Edison choose MonadPlus just because this fitted in with the lack of
> multi-parameter typeclasses in H98?

Edison's design hails from a time when MPTCs were not only non-standard (as  
they still are), but also not widely used, and before fundeps were avaliable 
(I think).  So the answer to this one is pretty much "yes".  I've considered 
reformulating the Sequence class to be more similar to the Collection classes 
(which use MPTCs, fundeps and mention the element type), but for the 1.2 
version I wanted to make as few changes as I thought I could to the overall 
design decisions.

In fact, I will likely make this change at some point.  It would allow, eg, 
making Don's ByteString (or whatever it's called now, I forget) an instance 
of Sequence, which is currently impossible.  OTOH, it would require 
sacrificing the Functor, Monad and MonadPlus instances...

> 2) Are there any reasons to prefer the Edison design over the MPTC design
> (apart from H98 compatibility) or vice versa?

H98 is probably the big one.  I'm currently in wait-and-see mode concerning 
MPTCs and especially fundeps.  As Edison maintainer, I've tried to use them 
sparingly in the hope that Edison can be made Haskell' compliant (crosses 
fingers).  Aside: I hope the Haskell' committee makes some sort of decision 
here soonish.

> 3) Is it worth bothering to derive ISeq from Monoid (with the possible
> extra inefficiency of the indirection through the definitions for append =
> mappend etc or does the compiler completely optimize this out)?

Not sure about this one.  I suspect, however, that the appropriate SPECIALIZE 
pragmas could cover any cases that you really care about.

> and a fourth more long term question:
>
> 4) Would it be worth reconsidering the rules for top level names so that
> class methods could always be local to their class (ditto for value
> constructors and field names being local to their type constructor).

[snip more question]

No comment.

> Part 2 of 2 - Monad versus Ancillary result type
> ================================
>
> Another issue relates to left and right views of a sequence. If a sequence
> is non-empty, the left view is just the leftmost element and the rest of
> the sequence. The problem arises when the sequence is empty. In the Edison
> library, this is solved by returning a monadic value ie:
>
>      lview :: Monad m => s a -> m (a, s a)
>
> whereas from the paper "Finger trees: a simple general purpose data
> structure" by Ralf Hinze and Ross Paterson they use an ancillary data type
> to store the result of a view:
>
>     data ViewL s a = NilL | ConsL a (s a)
>
>     viewL :: FingerTree a -> ViewL FingerTree a
>
> So my question here is: what's the best choice? I can see that the monadic
> version has the advantage that it could be used in do notation in cases
> where you expect the sequence to be non-empty, but has the disadvantage
> that it treats the empty sequence as something special (resulting in
> Monad/fail) and an extra indirection to find the components when the
> sequence is non-empty.

Well, the empty sequence IS special, when it comes to looking the leftmost 
(resp. righmost) element.  You have to deal somehow with the fact that the 
operation is a partial function.

I think the arbitrary monad option is better.  It gives the user more 
flexibility about how to use the operation.  Really the only way to use ViewL 
is to put it inside a "case of".  With a monad you can use any of the 
plethora of monadic operations and, as you mentioned, the do notation.  In 
addition, if you want the use case of ViewL, you can always use the Maybe 
monad.

I'm not inclined to worry too much about the extra indirection, which seems 
like a viable target for being erased by the compiler (I've not tested if 
this happens, however).

> Anyway these are my main questions for now - any feedback appreciated ;-)

BTW, for what purpose are you desiging a new sequence class?  You are clearly 
aware of other efforts in this area; in what ways to they not meet your 
needs?

> Thanks, Brian.

-- 
Rob Dockins

Talk softly and drive a Sherman tank.
Laugh hard, it's a long way to the bank.
       -- TMBG