[Haskell-cafe] Musings on lists

Thu Jul 13 03:39:08 UTC 2023

Three points.
1.  xs ++ ys never copies the *elements* of xs at all.
    What it copies is the SPINE of xs (think of each cons node as a
    vertebra), the cons cells.  It's important to be clear about
    this, because students are only to ready to believe that the
    elements themselves are copied.

2.  xs ++ ys does copy the spine of xs.  In a strict language like
    ML or F#, it does so right away.  In a non-strict language
    like Haskell, it does so, but later.  If you ever explore the
    result of xs ++ ys far enough to reach ys, you will by then
    have copied the spine of xs.  This means that xs ++ [a] is still
    more expensive than a : xs, it's just a matter of when the
    payment is made.

3.  If you actually need to build a sequence from both ends for some
    reason, Data.Sequence exists.  Or you can build a tree and then
    flatten it.

On Thu, 13 Jul 2023 at 13:12, Jeff Clites via Haskell-Cafe <
haskell-cafe at haskell.org> wrote:

> Lazy evaluation does make it more subtle to actually define things like
> algorithmic complexity. So regarding this:
>
> > isn't xs ++ ys just a thunk that, as elements are "requested" from it,
> produces them first from xs and then, when those run out, defers entirely
> to ys from that point on? If so, then the elements of xs aren't "copied"
> any more than the elements of ys are; they are just produced from different
> places.
>
> It helps to start with the definition of ++:
>
>   (++) :: [a] -> [a] -> [a]
>   (++) []     ys = ys
>   (++) (x:xs) ys = x : xs ++ ys
>
> But to be even more explicit, let's make our own list type so that the
> constructors have normal names:
>
>   data List a = Cons a (List a) | Empty
>
> and now this definition directly translates to:
>
>   concat :: List a -> List a -> List a
>   concat Empty ys = ys
>   concat (Cons z zs) ys = Cons z (concat zs ys)
>
> But let's be even more explicit and simplify (desugar) this definition
> further:
>
>   concat :: List a -> List a -> List a
>   concat xs ys =
>     case xs of
>       Empty -> ys
>       Cons z zs ->
>         let temp = concat zs ys
>         in Cons z temp
>
> The point of all of this is to emphasize that when you evaluate `concat xs
> ys`, in the non-empty xs case you will allocate a new data structure, the
> Cons holding the head element of xs, and the `temp` thunk representing the
> tail of the result.
>
> So that's the additional overhead--you have to allocate a new cell,
> because although the head is an existing element, the tail is not--it has
> to capture the parameters necessary for the recursive call, which is new
> information. If (and only if) you evaluate the tail, you will force
> evaluation of the `temp` thunk, which will allocate another cell if zs is
> not empty (via the recursive call). So if you traverse the entire list, you
> will allocate a new cell for each element of xs--once these run out, you
> will recurse into the Empty case, and return ys, and you won't allocate new
> cells for the elements of ys. But you only perform this allocation as you
> traverse the list (for the first time), so if you never actually traverse
> it then indeed you will only ever allocate one new cell in total. So you
> are right that saying that "xs ++ ys copies xs" is misleading--that "copy"
> only happens during subsequent traversal, and it would be better described
> as "allocates a new cell for each element of xs, as it traverses".
>
> Also, in this notation, (:) becomes:
>
>   prepend :: a -> List a -> List a
>   prepend x xs = Cons x xs
>
> Here, you allocate one new cell, but that's it--there is no recursive
> call, just a constructor application to existing values.
>
> That was a bit long-winded but I hope it was clear.
>
> Important note: This is different from (for example) the Java case, where
> List is an interface, and you can create an implementation with a custom
> iterator which just serves out elements from two existing lists without
> having to do any O(n) allocation. In the Haskell case, a list is a concrete
> data type, and it's expressed directly as a linked list of cells, and
> retrieving an element requires creation (at some point) of a cell holding
> that element (as well as the tail representing the rest of the list.)
>
> Jeff
>
> > On Jul 12, 2023, at 3:07 PM, Todd Wilson <twilson at csufresno.edu> wrote:
> >
> > 
> > Deaf Cafe,
> >
> > I'm an educator using Haskell, and I've been telling my students certain
> stories about Haskell evaluation that I'm now starting to question, so I'm
> writing for some clarification. I've written before about similar things
> (e.g., "Basic list exercise" from Mar 16), so it seems my own education on
> this topic is not yet complete! My apologies if this is too trivial for the
> Cafe.
> >
> > One story I tell students is that list concatenation, xs ++ ys, ends up
> "copying" the elements of xs but not the elements of ys, and in particular
> an expression like xs ++ [a], putting a new element at the end of a list,
> is particularly expensive, at least compared to x : xs, putting one at the
> beginning, which seems obvious, right?
> >
> > But here is my question: isn't xs ++ ys just a thunk that, as elements
> are "requested" from it, produces them first from xs and then, when those
> run out, defers entirely to ys from that point on? If so, then the elements
> of xs aren't "copied" any more than the elements of ys are; they are just
> produced from different places. Even the expression xs ++ [a] isn't
> problematic in this light: just act like xs, but with one more "card up the
> sleeve" when you are done. Have I gotten that right?
> >
> > Second, if the above is correct, then if there is a nesting, (xs ++ ys)
> ++ zs, and we want an element from it, there will be two steps before we
> can get that element from xs. And this will continue until xs is exhausted,
> at which point we get new elements in one step from ys, and finally in zero
> steps from zs. But ++ is associative, and so this list is also xs ++ (ys ++
> zs), which requires less work getting the elements from xs. Is this kind of
> optimization something that is or can be done (automatically)?
> >
> > Thanks for your indulgence,
> >
> > Todd Wilson
> > _______________________________________________
> > Haskell-Cafe mailing list
> > To (un)subscribe, modify options or view archives go to:
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> > Only members subscribed via the mailman list are allowed to post.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20230713/68c542f4/attachment.html>