[Haskell-cafe] Re: Red-Blue Stack

Fri Sep 26 05:30:40 EDT 2008

apfelmus wrote:
> 
> data Stack2 r b = Empty | S [r] (Stack2 b r) deriving (Eq, Show)
> 

In the previous post, I considered an implementation of red-blue stacks
with the data type above. Unfortunately, it failed to perform in O(1)
time because list concatenation needs linear time:

  xs ++ ys  takes  O(length xs) time

But in Java, it's easy to append two (doubly) linked lists can in
constant time; I mean, just link the tail of the first list to the head
of the second. Why do we need linear time in Haskell? As Derek already
said, the thing is that in Haskell, all data structures are *persistent*
by default. Appending two linked lists in Java mutates the first list
and its old version is no longer available. Doubly linked lists are said
to be an *ephemeral* data structure. In Haskell, xs ++ ys  does not
change  xs  or  ys  at all, both are still around.

Persistent data structures are harder to come up with than ephemeral
ones, but there are some beautiful techniques available. For more, see
Okasaki's book

  Chris Okasaki. Purely Function Data Structures.
  http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
  (This is the thesis on which the book is based.)

In particular, chapter 7.2.1 presents a simple implementation of lists
that support head, tail and (++) in constant time. So, we could "rescue"
our red-blue stack by

    data StackL r b = Empty | S (List r) (StackL b r)

using some more efficient data structure  List a  instead of  [a].

That's exactly what we are going to do now, but with a twist: our lists
won't store any elements at all!

    newtype List a = Length Int   deriving (Eq,Show,Num)

Instead, we're only storing the length of the list, so that

  empty list   corresponds to   0
  tail         corresponds to   n-1
  ++           corresponds to   +

Clearly, this is a very efficient implementation of "lists" with
"concatenation" in constant time^1. (Enable  GeneralizedNewtypeDeriving
 so that the compiler will implement the  instance Num (Length a)  for
us.)

The implementation is just as before, except that we don't have any
elements. But we can think of () as the element type.

    recolorL :: StackL r b -> StackL b r
    recolorL (S 0 t) = t
    recolorL t       = S 0 t

    pushL :: () -> StackL r b -> StackL r b
    pushL Empty    = S 1 Empty
    pushL (S rs t) = S (rs+1) t

    popL :: StackL r b -> StackL r b
    popL (S 0  (S bs t)) = S 0 . s bs . popL $ t
        where
        s bs (S 0 Empty     ) = S bs Empty
        s bs (S 0 (S bs' t')) = S (bs + bs') t'
        s bs t                = S bs t
    popL (S rs t       ) = S (rs-1) t
    popL _               = Empty

    topL :: StackL r b -> Maybe (Either () ())
    topL Empty   = Nothing
    topL (S 0 t) = fmap (\(Left b) -> Right b) (topL t)
    topL (S _ _) = Just (Left ())

Note that we still enjoy the benefits of using different types for red
and blue elements. For instance, the compiler won't allow us to add
List r  and  List b , though both are plain integers. The  r  in  List r
 is called a *phantom type* because, well, just like a phantom, the  r
isn't really there.

Now, this is all well and good, but we wanted to store actual elements,
didn't we? While the implementation above can't do that, it's perfectly
able to keep track of the order of elements. And we just need to combine
that with an external element storage to get a full red-blue stack:

    data RBStack r b = RBS [r] [b] (StackL r b)

    recolor (RBS rs bs n) = RBS bs rs (recolorL n)
    push r  (RBS rs bs n) = RBS (r:rs) bs (pushL () n)
    pop     (RBS rs bs n) = RBS (drop 1 rs) bs (popL n)

    top (RBS rs bs n) = fmap f (topL n)
        where
        f (Left  _) = Left  (head rs)
        f (Right _) = Right (head bs)

Last but not least, I would like to add that the above implementation is
of course inspired by the technique of "numerical representation", i.e.
the analogy between the representation of a number  n  and a container
with  n elements. So, the trick was basically to replace the peano
numbers  [()]  with the more efficient representation  Int  while the
special nature of the problem allowed us to store the elements
externally. For more about designing purely functional data structures
with numerical representations, see of course Okasaki's book and also

  Ralf Hinze, Ross Paterson.
  Finger Trees: A Simple General-purpose Data Structure.
  http://www.soi.city.ac.uk/~ross/papers/FingerTree.html

Regards,
apfelmus

Footnotes:
^1 Actually, addition of big integers is linear in the number of digits,
i.e. logarithmic in the size of the integer.