[Haskell-cafe] Fast number parsing with strict bytestrings [Was: Re: Seemingly subtle change causes large performance variation]

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Jun 7 23:49:56 EDT 2007


dons:
> dons:
> > mdanish:
> > > Hello,
> > > 
> > > I've been playing with the INTEST problem on SPOJ which demonstrates
> > > the ability to write a program which processes large quantities of
> > > input data.  http://www.spoj.pl/problems/INTEST/
> >   
> > > But when I make a slight modification, the program chews up a ton more memory
> > > and takes more time:
> > > 
> > > import Control.Monad
> > > import Data.Maybe
> > > import qualified Data.ByteString.Char8 as B
> > > 
> > > divisibleBy :: Int -> Int -> Bool
> > > a `divisibleBy` n = a `rem` n == 0
> > > 
> > > main :: IO ()
> > > main = do
> > >     [n,k] <- (map int . B.split ' ') `fmap` B.getLine :: IO [Int]
> > > 
> > >     let
> > >         doLine :: Int -> Int -> IO Int
> > >         doLine r _ = B.getLine >>= return . testDiv r
> > >         -- 'return' moved here      ^^
> > 
> 
> 
> Original,
> 
>     95% cpu 1.668 total
> 
>     <<ghc: 258766440 bytes,
>            452 GCs,
>            3036/3036 avg/max bytes residency (1 samples), 
>            3M in use, 0.00 INIT (0.00 elapsed), 
>            1.51 MUT (1.63 elapsed), 
>            0.01 GC (0.03 elapsed) :ghc>>
  
> And the killer strict chunk parser:
> 
>     78% cpu 0.327 total
>     <<ghc: 20685092 bytes,
>     -->    38 GCs,
>     -->    81348/81348 avg/max bytes residency (1 samples),
>            2M in use,
>            0.00 INIT (0.00 elapsed),
>     -->    0.21 MUT (0.32 elapsed),
>            0.00 GC (0.00 elapsed) :ghc>>
> 

I note there was a missing constructor specialisation happening in the
calls to 'add', in the good program. We can fix that with some well
place inline pragma:

    add :: Int -> Int -> S.ByteString -> Int
    add k i s = if S.null s then i else test k i (parse x) xs
      where (x,xs) = uncons s
    {-# INLINE add #-}

Before, GHC -ddump-simpl-stats reported:

        22 RuleFired
        2 SC:$wprocess1
        4 SC:$wprocess2
        2 SC:comb1

After:
        24 RuleFired
        4 SC:$wprocess1
        4 SC:$wprocess2
        2 SC:comb1

And timing stats:

    $ time ./F < in
    29359
    ./F < in  0.20s user 0.04s system 81% cpu 0.288 total

So some 10% better.  It's often a good idea to inline non-recursive wrapper
functions like this, in bytestring code.

-- Don


More information about the Haskell-Cafe mailing list