[Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Sun Jun 24 01:22:19 EDT 2007


Felipe Almeida Lessa wrote:
> On 6/23/07, Bertram Felgenhauer <bertram.felgenhauer at googlemail.com> wrote:
> >"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
> >exercise.
> >
> >> rbwt xs = let
> >>     res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
> >>   in
> >>     tail . map snd . zip xs $ head res
> 
> Indeed it's very fun =). Although I must admit that, even after
> staring at the code for some time, I still don't see how it works. Is
> it too much to ask for a brief explanation?
> 
> I see that zipWith' creates all the lazy list without requiring any
> knowledge of the second list, which means that the list to be sorted
> is created nicely. But when sortBy needs to compare, say, the first
> with the second items, it forces the thunk of the lazy list. But that
> thunk depends on the sorted list itself!
> 
> What am I missing? ;)

Mainly the precise effect of the irrefutable pattern in zipWith',
I think:

> zipWith' f []     _       = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

The ~(y:ys) does not pattern match immediately; instead it asserts
that the argument will have a (:) constructor later, when either
y or ys are forced. The last line is equivalent to

> zipWith' f (x:xs) ys' = let (y:ys) = ys' in f x y : zipWith' f xs ys

which may be easier to understand.

The end effect is that

> map head $ zipWith' (:) [1..3] undefined

returns [1..3], never touching the "undefined" value at all.

Trying

> zipWith' [1..] [[]]

is also instructive:

[[1],[2*** Exception: bwt.hs:(18,0)-(19,51): Irrefutable pattern failed for pattern (y : ys)

Note the point where the error happens.

The second thing is that the comparison in "rbwt" only compares
the first character from the result lists - that is, it really
only sorts the list xs, annotated with some additional information
that is not available yet (isn't lazy evaluation great?).

Once the sorting is done, the ~(y:ys) pattern matches of "zipWith'"
all succeed. The effect is that a single circular list is formed
from the characters in xs, with each rotation of the list occuring
exactly once. In fact we get all rotations of the original text, in
sorted order. This is by no means obvious, but is how the inverse
BWT works.

Hope that helps,

Bertram

P.S. Thanks Chris for your improvements :)


More information about the Haskell-Cafe mailing list