[Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform
Andrew Coppin
andrewcoppin at btinternet.com
Sat Jun 23 06:41:04 EDT 2007
apfelmus wrote:
> Andrew Coppin wrote:
>
>> Yeah, I noticed that the output from by program can never actually be
>> reverted to its original form.
>>
>
> Well it can, but that's a different story told in
>
> Richard S. Bird and Shin-Cheng Mu.
> Inverting the Burrows-Wheeler transform.
> http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html
> #DBLP:journals/jfp/BirdM04
>
> Oh, and you had a function inv_bwt, right?
>
Implementation #1 had an inverse - but it works by finding an unused
character and prepending that to the input before doing the forward BWT. ;-)
For the other implementations, there is no inverse at all.
>> I was trying to avoid O(n^2) RAM usage. :-}
>>
>
> Note that for ByteStrings, this takes only O(n) RAM because the
> substrings are shared. But for lists, this would take O(n^2) RAM since
> (take n) cannot share hole sublists.
Presumably that's only true for *strict* ByteStrings?
(I still don't actually understand how lazy bytestrings are any
different to normal lists - or why they produced such a massive
performance increase...)
> An O(n) choice for lists that
> doesn't recalculate ++ all the time would be
>
> bwt :: Ord a => [a] -> [a]
> bwt xs = map last . sortBy (compare `on` (take n)) $ rotations
> where
> n = length xs
> rotations = take n . tails $ xs ++ xs
>
> with the well-known
>
> on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c)
> on g f x y = g (f x) (f y)
>
Interesting... But how does it perform? ;-)
More information about the Haskell-Cafe
mailing list