Edison 1.2rc2

Fri Mar 3 21:34:39 EST 2006

On Friday 03 March 2006 08:28 pm, you wrote:
> Robert Dockins writes:
> > The second release candidate for Edison 1.2 is now ready for
> > comments.  The notable changes from RC1 are:
> >
> >    * add strict variants of all folds and reduces
>
> You know, I was looking through the code last night and thought, "what
> about foldl'?" Funny.
>
> That being said, eight variants of fold* (plus six variants of reduce*)
> feels excessive to me. I can understand the desire for generality, but
> I'm not sure it's possible to decide between, say, foldr and foldr'
> without knowing the sequence type.

Well, that's certainly an issue I considered when adding them to the API.

> As I understand it, with regular lists you almost always[*] want either
> foldr or foldl', because foldl builds up a chain of unevaluated thunks
> and foldr' has linear call stack growth. (When I was learning ML, we
> were told to always prefer foldl over foldr for that reason.)
>
> With reverse lists (i.e., Rev []), the situation is reversed: foldr' is
> tail-recursive, foldl is fully lazy, and foldr and foldl' have linear
> growth.

I toyed for a good while with only defining some of the strict folds, or doing 
some other halfway measure.  The thing that eventually made be decide to 
throw in the whole kitchen sink (all the strict folds) was the realization 
that the API is specified at a purely semantic level; at that level, the only 
thing that matters about strict folds is that they have exactly the same 
semantics as the non-strict fold given a combining function which is strict 
in the correct argument(s) (otherwise they are guaranteed to be the same 
except that the strict fold may diverge in cases where the non-strict fold 
does not).  When it comes time to implement, the strict folds give you more 
leway to force closures early BUT YOU DON'T HAVE TO!  For the regular list 
example, I could define foldr' = foldr and it would be correct according to 
the specification.  For now I have defined all the implementations to be as 
strict as possible in all cases, but I can make them lazier later if it turns 
out to have better space behavior; as you mention, foldr' on regular lists is 
problematic and it will probably be the first thing to be reverted to a lazy 
implementation.

> For the other sequences, there may not be clear winners. With
> SimpleQueue, for example, the choice between foldr and foldr' might
> depend on how the queue is arranged.
>
> Maybe it would be better to have each implementation default to the
> strictness that's most likely best, at least in the generic Seq
> interface. That way, [] would get strict foldl and non-strict foldr, but
> Rev [] would get strict foldr and non-strict foldl. Individual modules
> could provide extra variants as desired outside the class interface.

Both the collection and the associated collection have fold (and fold') which 
folds over elements in an unspecified order.  Perhaps I could add that to the 
sequence API to give the "best behavior" choice to the sequence implementer.  
Then for lists you would have (lazy) fold = foldr and (strict) fold' = 
foldl'.

> [*] I guess you might want a non-strict foldl with lists if the foldee
> is expensive enough that building up a pile of thunks is worth it if
> they can potentially be skipped.

> > -- Edison defines 'subset' and 'subsetEq' in the set API.  Data.Set
> > has operations with the
> >     same meanings named 'isProperSubsetOf' and 'isSubsetOf'.  I am
> > considering adoping the
> >     Data.Set names, but they are quite a bit longer.  Also what about
> > the meaning of "subset"
> >     vs "proper subset"?  What do you find most intuitive?
>
> I would go with subset and properSubset. The "isSubsetOf" form only
> makes sense if used in-line, but we have (|=) for that.

I'm leaning in this direction as well; I think I've pretty well decided the 
names as they stand now have to go (they contradict standard usage), and I 
don't really like the Data.Map names.

> > In addition, I'm interested in any API related feedback you might
> > have.
>
> Is there a reason why the TernaryTrie and the PatriciaLoMap don't
> implement OrdAssoc?

Not really.  When I inherited Edison, none of the associated collections 
implemented the Ord* classes (I don't know why).  I added AssocList and 
StandardMap because they were easy.  The others are significantly more 
complicated and I thought it was more important to work out the API issues 
and kick 1.2 out the door before tacking this (same reason the 'strict' folds 
for TernaryTrie are really just the non-strict folds).

> For the Seq class, I'd rather have the O(n) stuff listed with the
> implementations, rather than having a default in the interface.

Noted.  Its a dreadfully tedious cut-n-paste job which is why I haven't done 
it; patches and clever scripts to do this are welcome ;-)  Of course, it 
would be nice to have time complexity for the Collection and Associated 
Collections as well...

> Finally, thanks for reviving Edison. I had always been disappointed by
> its obscurity.

Indeed; it has always seemed to me like a great project that never got the 
attention it deserved.

Thanks for your comments!
Rob Dockins