[Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform
Bertram Felgenhauer
bertram.felgenhauer at googlemail.com
Sat Jun 23 17:29:03 EDT 2007
Andrew Coppin wrote:
> apfelmus wrote:
> >Note that the one usually adds an "end of string" character $ in the
> >Burrows-Wheeler transform for compression such that sorting rotated
> >strings becomes sorting suffices.
> >
>
> Yeah, I noticed that the output from by program can never actually be
> reverted to its original form. ;-) Maybe it I alter the code to stick a
> 0-byte in there or something...
To recover the original string you either need to store the offset to
the first character or add a sentinel character to mark the string end.
One advantage of the sentinel character approach is that it's sufficient
to sort just the suffixes of the text. A disadvantage is that you need
an additional character. The code below reserves \0 for this purpose.
>>> Code: >>>
"bwt" implements a variation of the Burrows-Wheeler transform, using
\0 as a sentinel character for simplicity. The sentinel has to be smaller
than all other characters in the string.
> bwt xs = let
> suffixes = [(a,as) | a:as <- tails ('\0':xs)]
> in
> map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes
"rbwt" implements the corresponding inverse BWT. It's a fun knot tying
exercise.
> rbwt xs = let
> res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res)
> in
> tail . map snd . zip xs $ head res
"zipWith'" is a variant of zipWith that asserts that the third argument
has the same shape as the second one.
> zipWith' f [] _ = []
> zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
<<< End Code <<<
Both transforms use linear memory (but quite a lot, admittedly).
regards,
Bertram
More information about the Haskell-Cafe
mailing list