RFC: Collection Framework, reprise.

Thu Feb 23 09:58:05 EST 2006

On Feb 23, 2006, at 5:33 AM, Jean-Philippe Bernardy wrote:

> Hello,
>
> I'm glad you see so many good point in my design.
>
> I must say however that the operator names are (c) Ross Paterson :)
>
> On 2/21/06, Robert Dockins <robdockins at fastmail.fm> wrote:
>> Well, I have to begin with the caveat that I am not a data structures
>> expert, and that the design of Edison is almost entirely due to Chris
>> Okasaki.  That said, I can certainly share my opinion.  In no
>> particular order:
>> -- Your use of two type parameters to model read-only or unobservable
>> collections is very interesting.  I wonder, though, how useful it
>> actually is.  I facilitates some interesting possibilities with
>> views, but I can't think of any compelling use cases.  I sort of
>> dislike the fact that it can turn certain of the class methods
>> "useless", but maybe the idea just hasn't grown on me yet.
>
> The truth is, I hope this will become nicer when (if) we move to
> associated types.
> I need to play a bit more with Phrac to have a clearer idea though.
>
>> -- I like the idea of views generally, and I'm now trying to think if
>> there is a good way to work something like them into Edison.
>> -- The 'isSingleton' method seems unnecessary.  Are singleton
>> collections interesting enough to supply a special test for them?
>
> This is based on Christan Maeder idea. We are planning to move to AVL
> trees for A. Hey, which do not provide a O(1) size function. A
> singleton test thus becomes valuable, and since costs next to nothing
> to add to the class I decided to to it.

OK.  I'm still not sure why you'd be interested in knowing that a  
collection is a singleton.  Especially given that you can't extract  
elements with only your Collection class methods.  It seems to me  
that the only time you want a singleton class is when you want to get  
that thing out.  If that's the case, you can just extract one element  
and test the resulting collection for nullness;  but maybe there's  
something I'm missing.

>> -- Your 'lookup' method has a Monad context, which I assume is the
>> 'NotJustMaybe' pattern, but 'front' and 'back' return Maybe.  You
>> should probably pick one style and stick to it.
>
> True.
>
>> -- Qualified infix functions look horrible.  I know you say this
>> module should be imported unqualified, but with Prelude clashes
>> people qualify it  anyway.  I'd suggest you stick to regular prefix
>> functions for your class methods and define the infix functions as
>> the appropriate aliases.  That way people importing qualified can
>> still use something reasonable looking.
>> -- In a related thread I've been discussing argument orders.  I think
>> I've just about come to the decision that all the Edison API
>> functions will take the collection argument last, including the
>> troublesome 'rcons'.  If you take my suggestion to remove infix
>> symbols from your classes, I suggest you also make your right cons
>> and index functions take the collection argument last.  However, I'd
>> probably leave the symbols as is, so that, e.g.,  (|>) = (flip  
>> rcons).
>
> If i understand this correctly, you suggest that regular functions
> should have a prefix meaning, whereas operators should have infix
> meaning (also in the light of your answer to Ross).

I'm not sure I understand exactly what you mean by "prefix/infix  
meaing", but I think so.

> I remember some peope spoke against this when we tuned Data.Map (&
> friends) API some years ago. I guess the reason is that some functions
> cannot be assigned a clear enough operator (isSubsetOf, etc.)

Humm.  I can't say I'm fully persuaded by this.  As others have  
mentioned Data.{Map,Set} aren't exactly paragons of elegance, and I'm  
not inclined to be bound to their design decisions.  I also think the  
landscape may be changing with Haskell implementations drifting  
toward full support for Unicode source files, which would provide  
lovely operators for all sorts of collection operations.  In the  
meantime, I think we can just provide ASCII infix operators for  
troublesome or particularly common operations.

Does anyone know any other reasons this approach was rejected?  I  
suspect that qualified imports also played a role in this decision.

>> -- In fact, I like your right and left insert symbols so much, I may
>> adopt them myself.
>> -- You claim that 'union' takes an unspecified element in the case of
>> duplicates, but you also claim it is commutative.  You need to
>> specify in what sense you claim commutativity; your collection class
>> doesn't require an Eq or Ord instance, and the claim seems very
>> suspect if you mean up to identity.  Similar with intersection.
>
> This has been written in a hurry, and I didn't mean anything very  
> formal.
> We have re-hashed implied meanings of Eq and Ord quite a few times
> however, and the general idea is that we require Eq to mean equality,
> and similarly for Ord, otherwise all bets are off.

I guess my point is that your classes don't require an Eq or Ord  
context, so its hard to know what defines the equivalence class under  
consideration.

> This is better stated in a general comment rather than on the
> definition on union though.
>
>> -- Does 'adjust' affect all elements with the given key, or just a
>> single unspecified one?
>
> The class has the (implicit) assumption that a key cannot exist twice.
> The semantic could be generalized however.
>
>> -- 'Indexed' doesn't have any notion of bounds.  That seems like a
>> pretty necessary concept.  Humm, I see now that MapLike is a subclass
>> of Indexed.  Perhaps then you should have 'DenseIndexed' or
>> 'ArrayLike' which has bounds related methods together with the
>> "denseness" property.
>
> It seems like the class you suggest to add would become a replacement
> for the Array class.

Well, the only reason I mention it is because you say in the comments  
that indexes are "dense", and that lookup cannot return "not found".   
That sounds particularly array-like, except for the lack of bounds.   
As "MapLike" is a subclass of "Indexed", you should probably remove  
those comments.  If you want a class with those properties it should  
not be a superclass of "MapLike" and probably needs bounds.

>> Overall, I'd say the methods you have carved out seem to be the ones
>> that are most-used, with the notable exception of 'head' and 'tail'
>> for sequences.
>
> Rather, I intend to start with those. It seems easier to add than  
> to remove.

Indeed.

>>  If you are attempting to go for a fairly minimal set
>> of operations (that's more or less what I read your design goals as
>> saying), then the only method I'd consider adding is a strict fold.
>> People get irritated if they can't, e.g., sum the elements in a
>> collection without building a gajillion unnecessary closures.
>
> Every day I pray for a genius to come with the generic solution to
> eliminate unnecessary lazyness :)

I just bet that complete and sound strictness analysis is undecidable  
(does anybody know for sure?); I don't think we'll ever fully get  
away from having to deal with manual strictness annotations once  
we've committed to non-strict semantics.

<aside>
If there's one thing I learned from reading _Purely Function Data  
Structures_, it is that strictness and laziness both have their  
places.  I'm glad to see bang patterns getting some serious  
consideration for Haskell'.
</aside>

> Thanks again for the helpful comments.
> JP.

Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG