Proposal #3271: new methods for Data.Sequence
Ross Paterson
ross at soi.city.ac.uk
Thu Jun 18 05:22:06 EDT 2009
Sorry about the delay in responding.
On Tue, Jun 02, 2009 at 04:53:44PM -0500, Louis Wasserman wrote:
> http://hackage.haskell.org/trac/ghc/ticket/3271
>
> Data.Sequence is meant as a general-purpose implementation of finite
> sequences. The range of methods it offers, however, is considerably
> more constrained than Data.List, even allowing for the constraint that
> sequences are finite.
>
> The following methods cannot be efficiently implemented based on currently
> exported methods from Data.Sequence.
>
> * zipWith and its variants. Note: it suffices to define zipWith alone.
> * replicate. (This can be implemented in O(log n) time with node sharing.)
This is great stuff: your version of replicate is obviously a big win,
though I would simplify replicate slightly to
replicateFinger n x = case n of
0 -> Empty
1 -> Single x
2 -> deep one Empty one
3 -> deep two Empty one
4 -> deep two Empty two
5 -> deep three Empty two
_ -> case (n - 6) `quotRem` 3 of
(q, 0) -> deep three (replicateFinger q node) three
(q, 1) -> deep four (replicateFinger q node) three
(q, 2) -> deep four (replicateFinger q node) four
where one = One x -- Maximize node sharing.
two = Two x x
three = Three x x x
four = Four x x x x
node = node3 x x x
It's not quite true that zipWith can't be implemented efficiently using
the current interface: one can do a mapAccumL over the shorter sequence
with a toList of the other one. But it's convenient to have, and it's
also more efficient to have a separate zipWith3 and zipWith4 instead of
iterating zipWith.
You've also given concise definitions of many of the other functions,
which will be convenient to users.
But I have a general concern about the maintainability of large amounts of
repetitive (often machine-generated) code. That may seem hypocritical,
as I put such code in there myself (notably in append), but I think
there's a trade-off. The QuickCheck tests are a great help (as they
were in the original development), but this code is still difficult
to manage. I think it needs to bring considerable gains to justify the
maintainance cost. The most extreme case is sort, where I think your
400-line version would need to be a lot faster than the naive
sort :: Ord a => Seq a -> Seq a
sort = fromList . Data.List.sort . Data.Foldable.toList
to justify inclusion, but you don't include measurements. The loss of
stability is also problematic. Incidentally, the following would be
log-time and stable, but I haven't measured it:
newtype OrdList a = OrdList { getList :: [a] }
sort :: Ord a => Seq a -> Seq a
sort = fromList . getList . foldMap wrap
where wrap x = OrdList [x]
instance Ord a => Monoid (OrdList a) where
mempty = OrdList []
OrdList xs `mappend` OrdList ys = OrdList (merge xs ys)
Hmm, it's not clear how to make that fit sortBy.
The zips are only 80 lines, but I wonder whether a simpler version of
those might also give comparable performance.
Similarly, I wonder if your partition is much faster than the naive
partition :: (a -> Bool) -> Seq a -> (Seq a, Seq a)
partition p = foldMap partition1
where partition1 x
| p x = (singleton x, empty)
| otherwise = (empty, singleton x)
By the way, if you have partition you might as well have filter too.
Why have you introduced consDigitToTree and snocDigitToTree?
Could tails be
tails = scanr (<|) empty
symmetrically with inits?
I think it's problematic that iterate
iterate :: Int -> (a -> a) -> a -> Seq a
has a different signature from the list version. But then a new function
is not much easier to use than
fromList . take n . Data.List.iterate x
and that ought to be deforested to something reasonable anyway.
More information about the Libraries
mailing list