[Haskell-cafe] Space usage problems

Chris Kuklewicz haskell at list.mightyreason.com
Tue Jan 10 12:28:03 EST 2006


I'll make a guess...

Ian Lynagh wrote:
> Hi all,
> 
> In the middle (readChunks) is the equivalent of gunzip. It repeatedly
> calls foo until there is no more input left.
> 
> At the top is a simple main function that calls them.
> 
> If I do
> 
>     dd if=/dev/zero of=data bs=1000 count=3000" # making data around 3MB
>     ghc --make Test1 -o Test1 -O -Wall
>     ./Test1
> 
> then in top I see Test1 increasing memory usage to around 150MB. I think
> this is because the "let (ys, zs) = foo xs" means zs holds on to xs
> (it's hard to be sure as compiling for profiling is too happy to change
> the behaviour).

I don't have a comment on your guess.

> 
> I tried (Test2) changing foo to be a monad transformer over the calling
> monad, so the caller's remaining input was updated as we went along, but
> (as well as memory usage not obviously being fixed) this is giving me a
> stack overflow.

I will ignore Test2

> Has anyone got any suggestions for making a constant space, constant
> stack version?

Not yet.

> 
> 
> Thanks
> Ian

> ------------------------------------------------------------------------
> 
> 
> module Main (main) where
> 
> import Control.Monad (liftM)
> import Control.Monad.State (State, runState, evalState, get, put)
> 
> main :: IO ()
> main = do xs <- readFile "data"
>           ys <- readFile "data"
>           print (evalState readChunks xs == ys)

The equality should be constant space.

> 
> ---
> 
> type FirstMonad = State String
> 
> readChunks :: FirstMonad String
> readChunks = do xs <- get
>                 if null xs then return []
>                            else do let (ys, zs) = foo xs
>                                    put zs
And zs is the final state of "runState bar" which is suspect is []
And ys is the whole input (which is now all in memory)
>                                    rest <- readChunks
>                                    return (ys ++ rest)
> 
> ---
> 
> type SecondMonad = State String
> 
> foo :: String -> (String, String)
> foo = runState bar
> 
> bar :: SecondMonad String
> bar = do inp <- get
>          case inp of
>              [] -> return []
>              x:xs -> do put xs
>                         liftM (x:) bar
The liftM should be equivalent to
                          temp <- bar
                          return ( (x:) temp )

It looks like the first call to foo will have bar consuming the entire
input string.

So the flow looks like

main

 readChuncks all-input

  foo all-input

   bar (iterated over whole input length)

   foo returns (all-input, [])

  "rest <- readChunks" (recursive call, sees null xs then "return []")

  "return (ys ++ rest)" which is return (all-input ++ [])


In essence, your bar traverses the whole string until the state is
empty.  This loads your whole file into memory


More information about the Haskell-Cafe mailing list