[Haskell-cafe] None brute-force BWT algorithm
larry.liuxinyu
liuxinyu95 at gmail.com
Thu Jun 23 07:32:53 CEST 2011
Hi,
I read a previous thread about BWT implementation in Haskell:
http://www.mail-archive.com/haskell-cafe@haskell.org/msg25609.html
and
http://sambangu.blogspot.com/2007/01/burrows-wheeler-transform-in-haskell
They are all in a `brute-force' way, that is implement based on
Burrows-Wheeler's definition like below:
BWT: sort the rotations of input S to get a matrix M', return the last
column L, and the row I, where S appears in M'
-- O( n^2 \lg n)
bwt :: (Ord a)=>[a]->([a], Int)
bwt s = (map last m, i) where
m = sort [ drop i s ++ take i s | i<-[1..length s]]
(Just i) = elemIndex s m
And the IBWT: Re-construct M' by iteratively sort on input L, add one column
at a time, and pick the I-th row in M'
-- O( n^3 \lg n )
ibwt :: (Ord a)=> ([a], Int) -> [a]
ibwt (r, i) = m !! i where
m = iterate f (replicate n []) !! n
f = sort . zipWith (:) r
n = length r
I read Richard Bird's book, `Pearls of functional algorithm design', there
is another solution. Although it is deduced step by step,
the core idea is based on random access the list by index. The algorithm
mentioned in the book uses suffixes for
sorting instead of rotations. The performance are same in terms of big-O. I
wrote the following program accordingly.
BWT: sort S along with the index to get a new order of IDs, and return a
permutation of S based on IDs.
-- O( n^2 \lg n) if (!!) takes O(n) time
bwt' :: (Ord a)=> [a] -> ([a], Int)
bwt' s = (l, i) where
l = map (\i->s !! ((i-1) `mod` n)) ids
(Just i) = elemIndex 0 ids
ids = map snd $ sortBy (compare `on` fst) $ zip rots [0..]
rots = init $ zipWith (++) (tails s) (inits s) -- only non-empties
n = length s
IBWT: Sort the input L along with index to get a Transform vector, T [1],
then permute L iteratively on T start from row I.
-- O( n^2 ) if (!!) takes O(n) time
ibwt' :: (Ord a) => ([a], Int) -> [a]
ibwt' (r, i) = fst $ iterate f ([], i) !! n where
t = map snd $ sortBy (compare `on` fst) $ zip r [0..]
f (s, j) = let j' = t !! j in (s++[r !! j'], j')
n = length r
However, As I commented, the (!!) takes time proportion to the length of the
list, Although it can be turned into real Haskell Array
by listArray (0, n-1) xs.
I wonder if there are any purely functional implementations of BWT/IBWT,
which don't base on random access idea nor in brute-force way.
[Reference]
[1], Mark Nelson. `Data Compression with the Burrows-Wheeler Transform'.
http://marknelson.us/1996/09/01/bwt/
--
Larry, LIU Xinyu
https://github.com/liuxinyu95/AlgoXY
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110622/dc17220c/attachment.htm>
More information about the Haskell-Cafe
mailing list