How lazy is DData.Seq?

Tue May 18 11:23:59 EDT 2004

Hi all,

Sorry I'm adding my 2c so late, but the problem is an instance of a
general tradeoff between data structural abstraction and "lazyness",
which is explained at
http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/principles.html
(search for "lazy").

In short, we can only have concrete lazy data structures, since *any*
optimisation (re-arrangement) will destroy some lazyness.

So the problem of DData.Seq is that it has no clear design guidelines that
specify whether it is to be thought of as a concrete structures (i.e.
users know the Data Constructors, as for the built-in Haskell lists), or
an abstract structure (which is the only way to provide worst-case time
bounds independent from the construction of the sequence, only dependent
from its size).

On Sun, 9 May 2004, Ross Paterson wrote:
>
> You don't need a balanced tree, but the O(n) bound requires the invariant
> that no proper subtree is empty.  Otherwise, a tree containing n elements
> can be of unbounded size, and take an unbounded time to convert to a list.

> I suspect there is no satisfactory general-purpose sequence implementation
> (DData.Seq isn't trying to be one -- it does no balancing.)  A balanced
> tree would make most things O(log n), but other implementations give
> you O(1) for various subsets of operations (and the smaller the subset,
> the smaller the constant factor, as a rule of thumb).  It seems essential
> to offer a range of implementations.

On Tue, 11 May 2004, Simon Marlow wrote:

> But, the tradeoff is strictness in append.  It may be that we need
> multiple versions of Seq, but I suspect that not much would be gained by
> having all three:
>
>   1. ordinary lists with lazy append
>   2. Seq with O(1) lazy append
>   3. Seq with O(1) strict append and O(n) toList
>
> If we have 1 & 3, then I'm guessing that 2 would be very rarely
> required.

Well, Dessy has only 1 and 3 and that for good reason.

In fact, this discussion does somehow miss the point, since none of these
three are most "general-purpose Sequences", with a lot of unnecessary O(N)
operations (e.g. for 'last' and (!!)).  I think the most general
implementation are Democratic Sequences (with O(log N) bounds for all
operations that allow that) followed by Deques (with O(1) bounds for all
tip-based operations, which I think are all that allow that).  Write-Only
Sequences (as DData.Seq) are already an "optimised case" that should only
be used when needed, since the user has to take into account, that every
"read access" implies some kind of O(N) "conversion" that is not shared
among those calls.

When using Dessy, one has the advantage of a good design: each data
structure is optimal in some sense.  Streams are optimal for laziness, all
others are different optima in time (and space) bounds.  Dessy is at the
edge of what is possible, so you never have to ask questions like "could
we make this a little more lazy?".

I would appreciate if someone could appreciate this ;-)

Robert