seth at cql.com
Fri Mar 3 22:38:26 EST 2006
I just noticed this email thread. I've wanted to use some things in Edison for a while so I'm glad to see that it is still alive.
As far as the fold variations go, I can't see that it can possibly hurt anything to include more variations, especially if the work is already done. I would like to see the documentation constructed so that the ones that are most likely to be "correct" (or desirable) are listed first. That way I don't have to figure out exactly what the issues are with them in order to grab the "right" one. Especially since I usually look at the more subtle performance issues after I've built a working program.
On Fri, 3 Mar 2006 21:34:39 -0500
Robert Dockins <robdockins at fastmail.fm> wrote:
> On Friday 03 March 2006 08:28 pm, you wrote:
> > Robert Dockins writes:
> > > The second release candidate for Edison 1.2 is now ready for
> > > comments. The notable changes from RC1 are:
> > >
> > > * add strict variants of all folds and reduces
> > You know, I was looking through the code last night and thought, "what
> > about foldl'?" Funny.
> > That being said, eight variants of fold* (plus six variants of reduce*)
> > feels excessive to me. I can understand the desire for generality, but
> > I'm not sure it's possible to decide between, say, foldr and foldr'
> > without knowing the sequence type.
> Well, that's certainly an issue I considered when adding them to the API.
> > As I understand it, with regular lists you almost always[*] want either
> > foldr or foldl', because foldl builds up a chain of unevaluated thunks
> > and foldr' has linear call stack growth. (When I was learning ML, we
> > were told to always prefer foldl over foldr for that reason.)
> > With reverse lists (i.e., Rev ), the situation is reversed: foldr' is
> > tail-recursive, foldl is fully lazy, and foldr and foldl' have linear
> > growth.
> I toyed for a good while with only defining some of the strict folds, or doing
> some other halfway measure. The thing that eventually made be decide to
> throw in the whole kitchen sink (all the strict folds) was the realization
> that the API is specified at a purely semantic level; at that level, the only
> thing that matters about strict folds is that they have exactly the same
> semantics as the non-strict fold given a combining function which is strict
> in the correct argument(s) (otherwise they are guaranteed to be the same
> except that the strict fold may diverge in cases where the non-strict fold
> does not). When it comes time to implement, the strict folds give you more
> leway to force closures early BUT YOU DON'T HAVE TO! For the regular list
> example, I could define foldr' = foldr and it would be correct according to
> the specification. For now I have defined all the implementations to be as
> strict as possible in all cases, but I can make them lazier later if it turns
> out to have better space behavior; as you mention, foldr' on regular lists is
> problematic and it will probably be the first thing to be reverted to a lazy
> > For the other sequences, there may not be clear winners. With
> > SimpleQueue, for example, the choice between foldr and foldr' might
> > depend on how the queue is arranged.
> > Maybe it would be better to have each implementation default to the
> > strictness that's most likely best, at least in the generic Seq
> > interface. That way,  would get strict foldl and non-strict foldr, but
> > Rev  would get strict foldr and non-strict foldl. Individual modules
> > could provide extra variants as desired outside the class interface.
> Both the collection and the associated collection have fold (and fold') which
> folds over elements in an unspecified order. Perhaps I could add that to the
> sequence API to give the "best behavior" choice to the sequence implementer.
> Then for lists you would have (lazy) fold = foldr and (strict) fold' =
> > [*] I guess you might want a non-strict foldl with lists if the foldee
> > is expensive enough that building up a pile of thunks is worth it if
> > they can potentially be skipped.
> > > -- Edison defines 'subset' and 'subsetEq' in the set API. Data.Set
> > > has operations with the
> > > same meanings named 'isProperSubsetOf' and 'isSubsetOf'. I am
> > > considering adoping the
> > > Data.Set names, but they are quite a bit longer. Also what about
> > > the meaning of "subset"
> > > vs "proper subset"? What do you find most intuitive?
> > I would go with subset and properSubset. The "isSubsetOf" form only
> > makes sense if used in-line, but we have (|=) for that.
> I'm leaning in this direction as well; I think I've pretty well decided the
> names as they stand now have to go (they contradict standard usage), and I
> don't really like the Data.Map names.
> > > In addition, I'm interested in any API related feedback you might
> > > have.
> > Is there a reason why the TernaryTrie and the PatriciaLoMap don't
> > implement OrdAssoc?
> Not really. When I inherited Edison, none of the associated collections
> implemented the Ord* classes (I don't know why). I added AssocList and
> StandardMap because they were easy. The others are significantly more
> complicated and I thought it was more important to work out the API issues
> and kick 1.2 out the door before tacking this (same reason the 'strict' folds
> for TernaryTrie are really just the non-strict folds).
> > For the Seq class, I'd rather have the O(n) stuff listed with the
> > implementations, rather than having a default in the interface.
> Noted. Its a dreadfully tedious cut-n-paste job which is why I haven't done
> it; patches and clever scripts to do this are welcome ;-) Of course, it
> would be nice to have time complexity for the Collection and Associated
> Collections as well...
> > Finally, thanks for reviving Edison. I had always been disappointed by
> > its obscurity.
> Indeed; it has always seemed to me like a great project that never got the
> attention it deserved.
> Thanks for your comments!
> Rob Dockins
> Libraries mailing list
> Libraries at haskell.org
More information about the Libraries