[Haskell-cafe] Tuesday Morning Games (was: Friday Afternoon Play:) The Burrows Wheeler Transformation

Ketil Malde Ketil.Malde at bccs.uib.no
Tue Nov 21 03:53:09 EST 2006


apfelmus at quantentunnel.de wrote:
> sadly, nobody wants to play with you. 
Okay, here's a small possible contribution.
> Most likely, it's because the next
> move is too hard: for a faster forward transform, you need (something as
> tricky as) suffix trees. 
>   
To make it fast, I guess you'll need a suffix array.  (And in a sense,
that is what we're making when we sort the list of rotations, but
of course the time complexity is way worse than the linear time
SA algorithms.)

>>> rotations l = init $ zipWith (++) (tails l) (inits l)
>>>       
I haven't done the benchmarks, but I'm a bit disturbed by the number of
list traversals.  It seems to me that instead of sorting the rotations and
taking the last element, it would be more efficient to sort the 
rotations by the tails,
and taking the head?  Especially if the string contains little 
redundancy, the
sorting wouldn't need to traverse very deeply.  Here's a (rather ugly) 
attempt,
which seems to work:

\begin{code}

slow_bwt2:: Ord t => [t] -> ([t], Maybe Int)
slow_bwt2 l
    = (map fst tagged_bwt, index_of_original)
      where tagged_bwt = map (head><id)
                          $ sortBy ((. tail . fst) . compare . tail . 
fst) -- `on` (tail . fst)
                          $ rotations l`zip`[0..]
            index_of_original = findIndex ((==0).snd) tagged_bwt

rotations l = zipWith (++) (drop 1 $ tails l) (repeat l)

\end{code}


More information about the Haskell-Cafe mailing list