[Haskell-cafe] Mutable arrays

Sat Feb 2 06:26:12 EST 2008

I prerequest your forgiveness if I sound patronizing, I'm just writing
everything that comes to mind.

2008/2/2 Jeff φ <jeff1.61803 at gmail.com>:
> {-# OPTIONS_GHC -fglasgow-exts -fbreak-on-exception #-}
>
> -- normalize_ary This takes a mutable array.  Determines the largest
>  -- element in the array (max_elem) and then divides every element by
> -- max_elem.
>  normalize_ary :: (Num t1,
>                   Num t,
>                    Ix t,
>                   Ix t1,
>                    MArray a e t2,
>                   Ord e,
>                    Fractional e,
>                   Enum t,
>                    Enum t1) =>
>                  a (t, t1) e -> t2 ()

Yagh!  Look at that type signature.  That looks like it came from
ghci.  That type should raise a few alarms, such as the Num t, Num t1.
  Why should the indices be numbers?  That indicates that your
implementation is not as general as it should be, so maybe try another
method.  (Really it's calc_max_2d_elem which is losing that
generality).  I usually write my type signatures first, and then let
that guide my implementation.  But you will find differing valid
opinions on this list on that issue.  Anyway, without further ado,
into the guts we go.

>  normalize_ary ary =
>     do
>          -- The following two commented out lines of code show my first
>         -- attempt at determining a value for max_elem.  However, this
>          -- produces a stack overflow.
>
>         -- elem_ary <- getElems ary
>       -- let max_elem = foldl1 max elem_ary

Hmm, how big is the array?   If it's pretty big, that's
understandable.  Frankly, it's because foldl sucks: I have never seen
a reason to use it.  You should be using the strict variant foldl'
here.  (I don't think there is a foldl1').  And that will get rid of
your big function calc_max_2d_elem.

>
>     max_elem <- calc_max_2d_elem ary
>      max_elem `seq` map_in_place_2d_arr (\x -> x / max_elem) ary

I don't think that max_elem `seq` is doing anything useful here  (but
I could be missing something subtle).

Oh and a really low level thing which may or may not make a
difference:  floating point division is expensive.  You'd be better
off precalculating 1 / max_elem and then multiplying by that instead.

>  map_in_place_2d_arr :: (MArray a e t, Enum t2, Enum t1, Ix t1, Ix t2) =>
>                        (e -> e) -> a (t1, t2) e -> t ()

Another conspicuous type signature.  Enum t2, Enum t1 is the red flag
here.  It's because you're using [i1..i2] instead of range (i1,i2)
from Data.Ix.

>  map_in_place_2d_arr fn arr = ret
>     where
>        ret = do ((i1,j1),(i2,j2)) <- getBounds arr
>                ( mapM_ (\i ->  do v <- readArray arr i
>                                    writeArray arr i (fn v) )
>                        [(i,j) | i <- [i1..i2], j <- [j1..j2]])

This looks pretty good modulo the [i1..i2] I mentioned above. For this
kind of stuff I prefer to use forM_, as it is a more
imperative-looking construct for imperative-looking code (then you can
lose the parentheses around (\i -> ...))...

> calc_max_2d_elem :: (Ord t, MArray a t t1, Ix t2, Ix t3, Num t3, Num t2) =>
>                      a (t3, t2) t -> t1 t
> calc_max_2d_elem arr =
>      do m <- readArray arr (0,0)
>        (_,(i_max, j_max)) <- getBounds arr
>         let calc_max_loop arr m (i,j)
>          | j == j_max     = return m
>           | otherwise      = do e <- readArray arr (i,j)
>                    let m2 = max e m
>                                     m2 `seq` calc_max_loop arr m2 nxt_idx
>          where nxt_idx | i == i_max - 1 = (0,j+1)
>                             | otherwise      = (i+1,j)
>        calc_max_loop arr m (0,0)

Hopefully we have done away with this thing given the foldl' thing.
There are a lot of implicit assumptions hiding in this code, such as
indices being zero-based integers.  Writing your type signature first
would have caught those assumptions, since you wouldn't have had (Num
t3, Num t2)  ;-).

Luke