[Haskell-cafe] Haskell Speed
chris at mightyreason.com
Sat Dec 24 06:52:51 EST 2005
Greg Buchholz wrote:
> True. But there are some tests like "fasta" that appear to have a
> laziness induced space leak that presumably could be fixed.
Playing with the space-leaking code did not show any obvious reason to
expect the space leak. By commenting out the third output in main it
did not leak space.
Could some look at the original code at the link above and let me know
why there is a 300 MB space leak?
But if you never express huge lists, then it won't leak space. Here is
a tweaked version, called tweak6.hs, that interleaves making and writing
a line of output. It runs in 2.16 MB of RSIZE instead of over 300 MB.
Still slow, however. If you want more speed without extra libraries,
then use an unboxed array of Word8 (length of one line: 60 bytes), since
Unicode Char and linked lists are overkill.
The other killer is probably the array of tuple linear lookup to find
the chosen symbol. Hand coded if/then branches could be used to speed
that up. For even faster speed, do not use 0.0 to 1.0 probabilities at
all. Instead of normalize :: Int -> Double and then lookup (Double ->
Base) remove the use of Double and do (Int -> Base) lookup.
But the space leak was the embarrassing part. So I just fixed that.
-------------- next part --------------
-- The Great Computer Language Shootout
-- contributed by Jeff Newbern
-- Modified to tweak6.hs by Chris Kuklewicz
import System.IO (hFlush,stdout)
$ time ./fasta.ghc_run 2500000 > /dev/null
330+ Megs! Space leaking all over the place. No idea why.
$ time ./fasta.tweak6 2500000 > /dev/null
2.2 Megs RSIZE
-- Uses random generation code derived from Simon Marlow and Einar Karttunen's
-- "random" test entry.
-- Orignal version note: This code has not been optimized *at all*.
-- It is written to be clear and to follow standard Haskell idioms as
-- much as possible (but we have to match the stateful PRNG idiom in
-- the test definition, which is oriented toward an imperative
-- language). Performance is decent with ghc -O2, but if it can be
-- improved without sacrificing the clarity of the code, by all means
-- go for it!
-- Mondified tweak6.hs version note: Use a StateT around IO to manage
-- the seed. This makes interleaving the generation and output of the
-- sequence easier. It is only *minimally* abstracted.
type Base = Char
type Sequence = [Base]
alu :: Sequence -- predefined sequence
alu = "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG" ++
type BaseFrequency = (Base,Double)
iub :: [BaseFrequency]
iub = [ ('a', 0.27), ('c', 0.12), ('g', 0.12), ('t', 0.27),
('B', 0.02), ('D', 0.02), ('H', 0.02), ('K', 0.02), ('M', 0.02), ('N', 0.02),
('R', 0.02), ('S', 0.02), ('V', 0.02), ('W', 0.02), ('Y', 0.02) ]
homosapiens :: [BaseFrequency]
homosapiens = [ ('a', 0.3029549426680), ('c', 0.1979883004921),
('g', 0.1975473066391), ('t', 0.3015094502008) ]
-- select a base whose interval covers the given double
chooseBase :: [BaseFrequency] -> Double -> Base
chooseBase [(b,_)] _ = b
chooseBase ((b,f):xs) p | p < f = b
| otherwise = chooseBase xs (p-f)
type Seed = Int
type Pseudo a = StateT Seed IO a
prng :: Pseudo Double
prng = let nextSeed s = (s * ia + ic) `mod` im
normalize n = (fromIntegral n) * (1.0 / fromIntegral im)
im = 139968; ia = 3877; ic = 29573
in do seed <- get
let seed' = nextSeed seed
return (normalize seed')
prngN count = sequence $ replicate count prng
-- write a sequence in Fasta format
writeFasta :: String -> String -> Sequence -> IO ()
writeFasta label title sequence =
do putStrLn $ ">" ++ (label ++ (" " ++ title))
writeWrapped 60 sequence
where writeWrapped _  = do return ()
writeWrapped len str = do let (s1,s2) = splitAt len str
writeWrapped len s2
writeFastaHeader :: (MonadIO m) => String -> String -> m ()
writeFastaHeader label title = liftIO $ putStrLn $ ">" ++ (label ++ (" " ++ title))
writeWrapped' :: Int -> Int -> (Double->Base) -> Pseudo ()
writeWrapped' wrap total trans =
let work c = case c of
0 -> return ()
n -> do let c' = min wrap n
nextC = c - c'
s <- liftM (map trans) (prngN c')
liftIO $ putStrLn s
in work total
writeWrapped = writeWrapped' 60
main = do args <- getArgs
let n = if (null args) then 1000 else read (head args)
writeFasta "ONE" "Homo sapiens alu" (take (2*n) (cycle alu))
writeFastaHeader "TWO" "IUB ambiguity codes"
seed' <- execStateT (writeWrapped (3*n) (chooseBase iub)) 42
writeFastaHeader "THREE" "Homo sapiens frequency"
execStateT (writeWrapped (5*n) (chooseBase homosapiens)) seed'
More information about the Haskell-Cafe