Edison 1.2rc2

Mon Mar 6 12:26:56 EST 2006

On Mar 6, 2006, at 10:56 AM, Christian Maeder wrote:

> Robert Dockins wrote:
>> I'm not sure I understand; why do you think that revealing  
>> structure  is unsafe?
>
> I don't want that i.e. equal maps yield different results, just  
> because they are internally differently balanced.

Well, this obviously depends on what you mean by "equal".  That can  
get slippery pretty fast.  I assume from your comments that you mean  
extensional equality (ie, viewing the map as a partial function with  
a known finite domain).

>> I'm hesitant to give "fold" a longer name because I think it  
>> should  be the first fold a programmer reaches for.
>
> I don't aggree. I immediately made it wrong by my own "toList =  
> fold (:) []" and did not even notice foldr or foldl. The fold for  
> efficiency should be mainly used by library writers. But users I  
> tell: correctness first, efficiency (when needed) later!

I tend to agree with that philosophy.  On the other hand, if you are  
bothering to use a dedicated data structure library, its probably  
because you are at least somewhat cognizant of performance issues;  
otherwise you'd just use lists and tuples for everything (a  
phenomenon one one can actually observe in a lot of projects).

>> In the fairly common  case where you have a commutative,  
>> associative function (eg (+)), you  don't care in what order the  
>> function is applied to the elements
>
> These properties are not garuanteed and difficult to track down in  
> the few (but severe) cases it's made wrong.

These properties _can't_ be guaranteed (at least not by Haskell  
compilers).  It is true that you can shoot yourself in the foot with  
the Edison API (this is one of the stronger criticisms one can level  
at it, IMO).  Some operations even allow you to violate internal data  
structure consistency if used incorrectly.  Edison has already made  
the design choice to present the programmer with abuse-able  
operations.  I think an arbitrary-order fold is pretty tame (on the  
other hand, it is a significantly more "visible" operation than the  
actually unsafe ones).

Humm... I wonder if there is a case for extracting a subset of the  
API which only presents "safe" (ie "difficult/impossible to abuse")  
operations?  Perhaps with a simplified class hierarchy as well....    
What do you think?  That way you could start by using the simplified  
API and move to the full API when/if you discover opportunities for  
performance enhancements from the less-safe or more esoteric operations.

>> , and  you have fold f z === foldl f z === foldr f z.
>
> I see, foldr and foldl are your safe versions! (I think Jean- 
> Philippe only has a safe "fold" and misses an unsafe one.)

Well, its actually more complicated than safe vs unsafe.  The base  
"AssocX" class (wherein fold appears) doesn't assume an ordering  
relation on keys (no "Ord k" context). What we do in the absence of a  
total ordering on keys is to present the elements in an "arbitrary"  
order.  Because the order is "arbitrary" we can chose it to be  
advantageous (ie, follow the internal structure), but that's really  
just a side benefit of the relaxed semantics.  The subclass  
"OrdAssocX" further assumes an ordering on keys, so it can provide  
foldr and friends (because now key ordering is defined).  However,  
even with foldr, foldl, etc, there is a degree undefinedness if the  
data structure is a finite relation rather than a finite map (in what  
order should you present elements bound to equal keys?).  The folds  
for bags have similar properties.

>> In light of this and previous discussions about folds, I think I  
>> will  be adding a section to the docs about how to chose an  
>> appropriate  fold for your application.
>
> At least that plus a warning!

Indeed.  Its seems folds are sufficiently complicated to warrant some  
special attention in the docs.  In addition, I suppose I should track  
down each and every operations which can be observed to behave  
differently based on hidden state and mark them as "non-deteministic"  
or some such.  What is a good term for an operation whose result is  
only partially defined?  I really want to reserve the term "unsafe"  
for operations that can violate internal consistency.

Thanks again!  This is exactly the kind of discussion that helps  
produce good APIs.

Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG