Proposal #3271: new methods for Data.Sequence

Thu Jun 18 05:22:06 EDT 2009

Sorry about the delay in responding.

On Tue, Jun 02, 2009 at 04:53:44PM -0500, Louis Wasserman wrote:
> http://hackage.haskell.org/trac/ghc/ticket/3271
> 
> Data.Sequence is meant as a general-purpose implementation of finite
> sequences.  The range of methods it offers, however, is considerably
> more constrained than Data.List, even allowing for the constraint that
> sequences are finite.
> 
> The following methods cannot be efficiently implemented based on currently
> exported methods from Data.Sequence.
> 
>   * zipWith and its variants. Note: it suffices to define zipWith alone.
>   * replicate. (This can be implemented in O(log n) time with node sharing.)

This is great stuff: your version of replicate is obviously a big win,
though I would simplify replicate slightly to

replicateFinger n x = case n of
       0       -> Empty
       1       -> Single x
       2       -> deep one Empty one
       3       -> deep two Empty one
       4       -> deep two Empty two
       5       -> deep three Empty two
       _       -> case (n - 6) `quotRem` 3 of
               (q, 0)  -> deep three (replicateFinger q node) three
               (q, 1)  -> deep four (replicateFinger q node) three
               (q, 2)  -> deep four (replicateFinger q node) four
       where   one = One x             -- Maximize node sharing.
               two = Two x x
               three = Three x x x
               four = Four x x x x
               node = node3 x x x

It's not quite true that zipWith can't be implemented efficiently using
the current interface: one can do a mapAccumL over the shorter sequence
with a toList of the other one.  But it's convenient to have, and it's
also more efficient to have a separate zipWith3 and zipWith4 instead of
iterating zipWith.

You've also given concise definitions of many of the other functions,
which will be convenient to users.

But I have a general concern about the maintainability of large amounts of
repetitive (often machine-generated) code.  That may seem hypocritical,
as I put such code in there myself (notably in append), but I think
there's a trade-off.  The QuickCheck tests are a great help (as they
were in the original development), but this code is still difficult
to manage.  I think it needs to bring considerable gains to justify the
maintainance cost.  The most extreme case is sort, where I think your
400-line version would need to be a lot faster than the naive

  sort :: Ord a => Seq a -> Seq a
  sort = fromList . Data.List.sort . Data.Foldable.toList

to justify inclusion, but you don't include measurements.  The loss of
stability is also problematic.  Incidentally, the following would be
log-time and stable, but I haven't measured it:

  newtype OrdList a = OrdList { getList :: [a] }

  sort :: Ord a => Seq a -> Seq a
  sort = fromList . getList . foldMap wrap
    where wrap x = OrdList [x]

  instance Ord a => Monoid (OrdList a) where
    mempty = OrdList []
    OrdList xs `mappend` OrdList ys = OrdList (merge xs ys)

Hmm, it's not clear how to make that fit sortBy.

The zips are only 80 lines, but I wonder whether a simpler version of
those might also give comparable performance.

Similarly, I wonder if your partition is much faster than the naive

  partition :: (a -> Bool) -> Seq a -> (Seq a, Seq a)
  partition p = foldMap partition1
    where partition1 x
            | p x = (singleton x, empty)
            | otherwise = (empty, singleton x)

By the way, if you have partition you might as well have filter too.

Why have you introduced consDigitToTree and snocDigitToTree?

Could tails be

  tails = scanr (<|) empty

symmetrically with inits?

I think it's problematic that iterate

  iterate :: Int -> (a -> a) -> a -> Seq a

has a different signature from the list version.  But then a new function
is not much easier to use than

  fromList . take n . Data.List.iterate x

and that ought to be deforested to something reasonable anyway.