Proposal: priority queues in containers

Thu Mar 18 11:43:43 EDT 2010

Oh, god, so much to respond to...heh.

 Submit this package for canonicalization as part of the Haskell Platform. I
> would for one would support its inclusion.

This is an option I seriously hadn't considered.  To be fair, that's because
I've never used the Platform myself, preferring rather to have the most
up-to-date version of GHC at all times, heh.  That said, while I'd be okay
with this option, I'd prefer putting it into containers, because I feel like
a canonical, reliable priority queue implementation is the sort of thing a
self-respecting language ought to have built in.

As does Python. In Python, though, the PQ implementation is not a built-in
> class in the default namespace (as are dict and set).  Rather, it is one of
> the "batteries included" libraries that come with Python. I think that's the
> right place for it in Haskell, too.

I don't know Python, but according to Wikipedia, dict and set are built into
the language.  I don't think it's a fair comparison: set and dict in Python
seem to have a role almost as ubiquitous as [] in Haskell, much more
ubiquitous than e.g. Data.Set or Data.Map.  I'm also not entirely sure that
"batteries included" doesn't describe containers, given all the other
packages that come with GHC.

 >   * There is no distinction between keys and priority values.  A utility
> type

 >     Prio p a with the instance Ord p => Ord (Prio p a) is exported to
> allow

 >     usage of distinct keys and priority values.

I disagree with this one.  It requires an Ord instance that isn't really an
> ordering, and makes a Functor instance impossible.  I would prefer an
> interface separating keys and values like that of Data.Map (which would also
> increase consistency within the package).

I'd be okay with separating out a priority/value version.  However, I'm
still clueless as to what you're talking about with respect to Functor --
what's wrong with the following?
data Prio p a = Prio p a
instance Ord p => Ord (Prio p a) where ...
instance Functor (Prio p) where fmap f (Prio p a) = Prio p (f a)

I can understand if you're uncomfortable with (==) and (\ x y -> compare x y
== EQ) being inequivalent, but neither the H98 Report nor the Prelude make
any such claim, as far as I can tell.

 The Foldable instance breaks the abstraction.  I think it should
> process elements in order.

I think that wanting to iterate over the priority queue in the middle of the
computation, without caring about order, is a perfectly legitimate desire
for a programmer!  Moreover, the only way to make a Foldable instance
process elements in order would be something like
data Ord a => PQueue a = ....
which I think is an awfully dirty approach.  I'm not at all a fan of adding
restrictions like that, not least because it adds lots of awkward overhead.
Would you be okay with not exporting a Foldable instance at all, but still
exporting a toList method which doesn't guarantee any ordering on the return
list?

My advice would be not to stress over whether priority queues go into
> containers. It's not some pristine thing of beauty that deserves treatment
> with velvet gloves.

I'm...not sure how to respond to this claim.  At least part of me wants to
say, I genuinely do think the containers package is a piece of art...and
then another part pipes up, "except for the inconsistencies between the
various minView/maxView versions, and the little differences between IntMap
and Map, and..."  That said, I wouldn't be a fan of scrapping the style
which the containers package has at least tried to create.  I could be
convinced that rewriting the rest of containers would be a good thing to do,
though...and I might just want to do that myself.  Hah.

 How does this implementation compare with using Map/Set as a
> priority queue?

Continuing the discussion of the benchmarks: first, Jim, it appears that I'm
the one who made a n00b benchmarking error.  TT_____TT  That said, as you
found, my implementation is still slightly faster when the benchmark is
corrected.  Some comments:

   - QuickBinom needs to have O(1) findMin for a reasonable comparison.  I
   added that in the benchmark below, and here.
   - I can't think of any more optimizations for the sparse binomial heap --
   I genuinely think it's not going to get better.
   - There is a way to optimize my implementation still further, but it
   makes my code much less readable.  (Specifically, I start the BinomForest at
   Succ Zero, and unpack the first child of every node still in the forest.
    Modifying the whole implementation that way, though, makes it unreadably
   ugly...and I think QuickBinom is possibly already at that point.  I started
   implementing it, and realized just how ugly it was, and I stopped, but I can
   finish it if I have to.)

Sorting 500,000 elements, compiled with -O2, run with +RTS -H128m -K64m,
with another few implementations thrown in for good measure:
Times (ms)
               min      mean      +/-sd    median      max
Pairing:    1440.090  1482.893    31.501  1482.093  1532.095
Binomial:   1356.084  1389.687    26.881  1388.087  1436.090
SBinomial:  1376.086  1422.489    48.453  1400.088  1520.095
Data.Set:   1712.107  1800.912    74.880  1782.111  1976.123
Skew:       1584.099  1644.503    85.702  1602.101  1848.116

Some other benchmarks<http://hackage.haskell.org/trac/ghc/attachment/ticket/3909/plot_2.png>were
done by Milan Straka earlier, when we hadn't decided what heap
implementation to use at all.  His comments:

> I think the pairing heaps are out of the question now. I would choose
> between Binomial and Leftist. The Binomial have O(1) amortized inserts, but
> beware, that this does not work in persistent setting -- the insert is O(log
> N) when the heaps are used persistently (there are Skew Binomial Heaps with
> O(1) insert in the persistent setting too).

  I vote for current (v5) Binomial implementation, even if the O(1)
> amortized inserts works only non-persistently (ie. under heavy persistent
> usage, leftist heaps might be a _little_ better).

Conclusions: There aren't any differences in asymptotics, and as it stands,
the existing implementation is just as fast.  It's also a) done, and b) full
of Haskellish type goodness.

After about five hours' work (!!!) I *finally* managed to install Criterion
yesterday, so I'll send out those tests ASAP.

Louis Wasserman
wasserman.louis at gmail.com
http://profiles.google.com/wasserman.louis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20100318/84893357/attachment.html