[Haskell-cafe] Q about last&init

Thu Jul 17 12:23:13 UTC 2014

Закиров Марат <marat61 at gmail.com> writes:

> My hypothesis is that somehow compiler reduces creating of a new list
> to just adding or removing one element. If it is not so.
> Then even ':' which is just adding to list head would be an O(n)
> operation just because it should return brand new list with one elem
> added. Or maybe functional approach uses pretty much different
> complexity metric, there copying of some structure "list" for example
> is just O(1)? If so then Q about compiler is still exists.

There is no magic in the compiler. It's a nice side-effect of having
immutable data.

If we copied the whole list, that would would be O(n), but we don't need
to. We just make references to the old data. In a language like C++ that
would be dangerous, since modifying the old list would affect the new
one. It's safe in Haskell since data is immutable, so nobody can affect
our code by mutating the data we're referencing.

Finger trees are complicated, so I'll demonstrate it with a simple
implementation of lists. These will work the same way as Haskell's built
in lists, except for the types:

l1 = (20, (10, ()))

This works just like "20 : 10 : []". Now, internally this is not
represented as one big array in memory. Instead, each part is stored
separately and joined together like this (ie. it's a singly-linked
list):

l1 = (20, l2)
l2 = (10, ())

Since l2 and "()" are immutable, there's no point copying their values
around, so the runtime will just use pointers to a single copy.

Prepending an element works in the same way, ie. there's no need to copy
l1:

l3 = (30, l1)

Appending to the end is a different story. We can't reference any
existing value, since their pointers will be wrong. For example
"(30, (20, (10, (40, ()))))" would become:

l4 = (30, l5)
l5 = (20, l6)
l6 = (10, l7)
l7 = (40, ())

The only new part here is l7, which is the "40" we've added, but we
couldn't re-use l2 for the "10" since its pointer goes to "()" and we
need it to go to l7. We're forced to make a new value l6 which has the
pointer we need.

Of course, the problem has just been shifted since we now can't use l1
for our "20" because it's pointing to l2 and we need one which points to
l6. That forces us to define the new value l5, and so on all the way
along the list, which is why it takes O(n) time.

Datastructures like Finger Trees use references to existing values in
much the same way, but they manage these references in clever ways so
that operations have small effects (like prepending to a list) rather
than expensive chain-reactions (like appending to a list).

Cheers,
Chris

PS : DLists work very differently, since they're created out of
     functions!