[Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Sat Jun 23 17:29:03 EDT 2007


Andrew Coppin wrote:
> apfelmus wrote:
> >Note that the one usually adds an "end of string" character $ in the
> >Burrows-Wheeler transform for compression such that sorting rotated
> >strings becomes sorting suffices.
> >  
> 
> Yeah, I noticed that the output from by program can never actually be 
> reverted to its original form. ;-) Maybe it I alter the code to stick a 
> 0-byte in there or something...

To recover the original string you either need to store the offset to
the first character or add a sentinel character to mark the string end.
One advantage of the sentinel character approach is that it's sufficient
to sort just the suffixes of the text. A disadvantage is that you need
an additional character. The code below reserves \0 for this purpose.

>>> Code: >>>

"bwt" implements a  variation of the Burrows-Wheeler transform, using
\0 as a sentinel character for simplicity. The sentinel has to be smaller
than all other characters in the string.

> bwt xs = let
>     suffixes = [(a,as) | a:as <- tails ('\0':xs)]
>   in
>     map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes

"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.

> rbwt xs = let
>     res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
>   in
>     tail . map snd . zip xs $ head res

"zipWith'" is a variant of zipWith that asserts that the third argument
has the same shape as the second one.

> zipWith' f []     _       = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

<<< End Code <<<

Both transforms use linear memory (but quite a lot, admittedly).

regards,

Bertram


More information about the Haskell-Cafe mailing list