[Haskell-cafe] State monad strictness (was: ... abysmal Language Shootout results)

Peter Simons simons at cryp.to
Thu Sep 30 11:06:32 EDT 2004


How can anyone stay away from such a deliciously pointless
waste of time as implementing a wc(1) derivate? :-)

Here is my attempt:

 > import IO
 >
 > type Count         = Int
 > data CountingState = ST !Bool !Count !Count !Count
 >                      deriving (Show)
 >
 > initCST = ST True 0 0 0
 >
 > wc :: CountingState -> [Char] -> CountingState
 > wc (ST _     l w c) ('\n':xs) = wc (ST True (l+1)  w   (c+1)) xs
 > wc (ST _     l w c) (' ' :xs) = wc (ST True   l    w   (c+1)) xs
 > wc (ST _     l w c) ('\t':xs) = wc (ST True   l    w   (c+1)) xs
 > wc (ST True  l w c) ( x  :xs) = wc (ST False  l  (w+1) (c+1)) xs
 > wc (ST False l w c) ( x  :xs) = wc (ST False  l    w   (c+1)) xs
 > wc st [] = st
 >
 > main :: IO ()
 > main = do
 >   ST _ l w c <- getContents >>= return . wc initCST
 >   putStrLn $ (l `shows`) . spaces . (w `shows`) . spaces . (c `shows`) $ []
 >     where spaces = (' ':) . (' ':) . (' ':)

I compiled this with "ghc -O2 -funbox-strict-fields" and got
the following performance results in a simple test.

The wc(1) tool:

    $ time /usr/bin/wc </usr/share/dict/words
     234937  234937 2486824

    real	0m0.069s
    user	0m0.059s
    sys	0m0.008s

My version:

    $ time ./wc </usr/share/dict/words
    234937   234937   2486824

    real    0m0.406s
    user    0m0.322s
    sys     0m0.060s

The version from the shootout pages:

    $ time ./wc-shootout </usr/share/dict/words
    234937 234937 2486824

    real    0m2.749s
    user    0m2.682s
    sys     0m0.062s

Then I made another experiment. I figured, the code above
yells out to be written with a State monad. So I did that:

 > import IO
 > import Control.Monad.State
 >
 > type Count         = Int
 > data CountingState = ST !Bool !Count !Count !Count
 >                      deriving (Show)
 >
 > type WordCounter   = State CountingState ()
 >
 > initCST = ST True 0 0 0
 >
 > wc :: Char -> WordCounter
 > wc x = get >>= \(ST b l w c) ->
 >   case (b,x) of
 >     (  _  , '\n') -> put (ST True (l+1) w (c+1))
 >     (  _  , '\t') -> put (ST True   l   w (c+1))
 >     (  _  , ' ' ) -> put (ST True   l   w (c+1))
 >     (True,   _  ) -> put (ST False  l  (w+1) (c+1))
 >     (False,  _  ) -> put (ST False  l    w   (c+1))
 >
 > main :: IO ()
 > main = do
 >   xs <- getContents
 >   let ST _ l w c = snd (runState (mapM wc xs) initCST)
 >   putStrLn $ (l `shows`) . spaces . (w `shows`) . spaces . (c `shows`) $ []
 >     where
 >     spaces = (' ':) . (' ':) . (' ':)

Curiously enough, this version fails to process the "words"
file because it runs out of stack space! Naturally, it is
very slow, too. So I wonder: How needs that program above to
be changed in order to solve this space leak?

Why does this happen in the first place?

Peter



More information about the Haskell-Cafe mailing list