Array + memory usage

Tomasz Zielonka t.zielonka@students.mimuw.edu.pl
Thu, 19 Jun 2003 14:49:28 +0200


On Thu, Jun 19, 2003 at 11:11:00AM +0200, mies wrote:
> Hello, i'm a haskell newbie, and i'm trying to use arrays for counting 
> letters. But when I input a textfile of lets say 100KB the program uses 
> 75 M of memory, and I really don;t have a clue where the problem could 
> be. I have searched many topics here but I didn't find a solution. I 
> have made an axample of how i am using the array.
>
> module Main where
> 
> import Array
> import IO
> 
> main = do let fr = testUpdater f 100000
>          writeFile "test.txt" (show (fr!155))
> 
> f :: Array Int Int
> f = (array (0,255) [(i, 0) | i <- [0..255]])
> 
> testUpdater :: Array Int Int -> Int -> Array Int Int
> testUpdater fr 0 = fr
> testUpdater fr x = testUpdater ((fr//[(155, fr!(155) + 1)])) (x - 1)
> 
> updateF :: Array Int Int -> Array Int Int
> updateF x = (x//[(155, x!(155) + 1)])

I see two problems here:
a) you use immutable Arrays ineffectively - a new array is created for
   every input character
b) "too lazy" evaluation, resulting from laziness of Array type, which
   causes a space leak

Possible fixes are:

1. use accumArray or accum, for example:

    import Data.Array
    import IO

    main = do
	let fr = accumArray (\s _ -> s+1) 0 (0,255)
			    [ (155, ()) | _ <- [1..100000] ]
	writeFile "test.txt" (show (fr!155))

If you encounter a problem with stack overflow (I had it), you can
increase the maximal stack size, or use Unboxed arrays (in GHC), for
example:

    import Data.Array.Unboxed
    import IO

    main = do
	let fr :: UArray Int Int
	    fr = accumArray (\s _ -> s+1) 0 (0,255)
			    [ (155, ()) | _ <- [1..100000] ]
	writeFile "test.txt" (show (fr!155))

Alternatively you can use mutable arrays or write your own eager accum.

2. use deepSeq or Strategies.rnf in testUpdater, like this:
    (this will only fix b, not a)

    {- Probably only in GHC, compile with -package concurrent -}

    import Strategies

    ...

    testUpdater :: Array Int Int -> Int -> Array Int Int
    testUpdater fr 0 = fr
    testUpdater fr x =
        let fr' = ((fr//[(155, fr!(155) + 1)]))
	in  testUpdater fr' (x - 1) `demanding` rnf fr'

3. use unboxed immutable arrays (and possibly strictness annotations)
    (this will only fix b, not a)

    {- Not in Hugs -}

    import IO
    import Data.Array.Unboxed

    main = do
	let fr = testUpdater f 100000
	writeFile "test.txt" (show (fr!155))

    f :: UArray Int Int
    f = (array (0,255) [(i, 0) | i <- [0..255]])

    testUpdater :: UArray Int Int -> Int -> UArray Int Int
    testUpdater fr 0 = fr
    testUpdater fr x = (testUpdater (fr//[(155, fr!(155) + 1)])) (x - 1)

4. use mutable arrays, eg. STArray or STUArray within (ST s) monad:

    {- Works in GHC and Hugs -}
    
    import IO
    import Data.Array (Array)
    import Data.Array.ST
    import Data.Array.IArray
    import Control.Monad.ST

    main = do
	let fr :: Array Int Int
	    fr = runST (do
		    arr <- newArray (0, 255) 0 :: ST s (STArray s Int Int)
		    testUpdater arr 100000
		    freeze arr
		)
	writeFile "test.txt" (show (fr ! 155))

    testUpdater :: STArray s Int Int -> Int -> ST s ()
    testUpdater fr 0 = return ()
    testUpdater fr x = do
	n <- readArray fr 155
	writeArray fr 155 $! (n+1)  -- NOTE: $! used here
	testUpdater fr (x - 1)

> Regards, Richard Nieuwenhuis
> rnieuwen@cs.uu.nl

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links