[Haskell-cafe] Re: Friday Afternoon Play: The Burrows Wheeler
Transformation
apfelmus at quantentunnel.de
apfelmus at quantentunnel.de
Mon Nov 20 11:34:24 EST 2006
Hello Jón,
sadly, nobody wants to play with you. Most likely, it's because the next
move is too hard: for a faster forward transform, you need (something as
tricky as) suffix trees. See also
Robert Giegerich, Stefan Kurtz:
"A Comparison of Imperative and Purely Functional Suffix Tree
Constructions"
http://citeseer.ist.psu.edu/giegerich95comparison.html
Regards,
apfelmus
Jón Fairbairn wrote:
> So, it's Friday afternoon (well, here anyway), and
> presumably everyone is thinking of leaving for the
> weekend. Is anyone interested in playing a little game? I'm
> going to make the first move, and then just wait and see if
> anyone makes the next. I'm certainly not promising to make
> any further moves myself, but I'm not promising I won't,
> either.
>
> Below you'll see a naïve implementation of the Burrows
> Wheeler transformation and its inverse. That's my
> move. (And don't complain too much about style and such --
> I'm half asleep.) Further moves are made by transforming
> the programme so that it becomes more efficient. The first
> target is not raw speed, but to adjust it until the
> computational complexity gets to be what it should be. I'm
> hoping for elegant moves here. After that, folk might start
> replacing the lists with more efficient datastructures and
> so on, but still /elegantly/.
>
> The object of the game isn't ultimate speed -- I've no
> particular wish to see a Haskell version that's as fast as a
> good C version, especially not if it ends up /looking/ like
> a C version. Really the objective is to think about Haskell
> programming -- and probably come up with some observations
> such as "there really ought to be a class of indexable
> structures, so that we don't have to change (!!) to (!) all
> over the place". And to have fun, anyhow...
>
> * * *
>
> Demonstration of the principle of the Burrows Wheeler
> transform. The implementations are simplistic (and have
> worse complexity than is possible).
>
>> module Burrows_Wheeler_Transform where
>> import List
>
> This is surely the wrong module to have to find (><) from:
>
>> import Data.Graph.Inductive.Query.Monad((><))
>
> The forward transformation. Input a list, output the
> transformed list and possibly an int (Nothing if there were
> no elements in the input -- perhaps the result type should
> be Maybe (NonEmptyList t,Int))
>
>> slow_bwt:: Ord t => [t] -> ([t], Maybe Int)
>
> This version works by computing a list of all the rotations
> of the input and sorting it. The transformation is then the
> last element of each line of the sorted rotations. Tag the
> data with an index so that it's possible to find where the
> original form of the unput ends up after sorting. This is
> essentially the specification of the transform coded up
> directly.
>
>> slow_bwt l
>> = (map fst tagged_bwt, index_of_original)
>> where tagged_bwt = map (last><id)
>> $ sort
>> $ rotations l`zip`[0..]
>> index_of_original = findIndex ((==0).snd) tagged_bwt
>
> that looks like worst case n²×log n to me (where n is the
> length of the string): if every element is the same, the
> n×log n comparisons the sort does will each look at every
> element, and that's really more than is necessary.
>
>
>
> The inverse transform. We should have that
>
> @
> slow_inv_bwt . slow_bwt == id
> @
>
> Observe that sorting the input gives the first column of the
> sorted rotations (and the input is the last). So after one
> sort we have
>
> X ... c
> Y ... b
> Z ... a
> .. . .
> .. . .
> .. . .
>
>
> Since they are rotations, rotating each row of this gives
> pairs that belong to the original list:
>
> cX ...
> by ...
> aZ ...
> .. .
> .. .
> .. .
>
> sorting these gives us the first two columns of the sorted
> rotations, and again we already know the last column, which
> can be rotated into the first and so on.
>
>> slow_inv_bwt ([],_) = []
>> slow_inv_bwt (l,Just index_of_original)
>> = iterate catsort (replicate len [])
>
> After (length input) iterations we have the original sorted
> rotations,
>
>> !! len
>
> and we have the index of where the untransformed string is,
> so we can just pick it back out.
>
>> !! index_of_original
>
>> where catsort s = sort $ map (uncurry (:)) $ l `zip`s
>> len = length l
>
>
> This version does rather fewer sorts:
>
>> mark2_inv_bwt ([],_) = []
>> mark2_inv_bwt (l,Just n)
>> = (transpose $ take (length l) $ iterate (permuteWith perm) l)!!(perm!!n)
>> where perm = map snd $ sort $ l `zip` [0..]
>
> Should I simply have left it out and hoped that it would
> turn up after a few moves? I'm not going to explain it,
> anyway.
>
>> rotations l = init $ zipWith (++) (tails l) (inits l)
>
>> permuteWith ns l = map (l!!) ns
More information about the Haskell-Cafe
mailing list