[Haskell-cafe] How do I get this done in constant mem?
mf-hcafe-15c311f0c at etc-network.de
mf-hcafe-15c311f0c at etc-network.de
Sat Oct 10 16:06:59 EDT 2009
On Fri, Oct 09, 2009 at 05:48:15PM -0600, Luke Palmer wrote:
> To: mf-hcafe-15c311f0c at etc-network.de
> From: Luke Palmer <lrpalmer at gmail.com>
> Date: Fri, 9 Oct 2009 17:48:15 -0600
> Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
> On Fri, Oct 9, 2009 at 2:05 PM, <mf-hcafe-15c311f0c at etc-network.de> wrote:
> > Hi all,
> > I think there is something about my use of the IO monad that bites me,
> > but I am bored of staring at the code, so here you g. The code goes
> > through a list of records and collects the maximum in each record
> > position.
> > -- test.hs
> > import Random
> > import System.Environment (getArgs)
> > import System.IO (putStr)
> > samples :: Int -> Int -> IO [[Double]]
> > samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 ** 3)
> Yes, you should not do this in IO. That requires the entire
> computation to finish before the result can be used. This computation
> should be pure and lazy.
Yeah. I also got an excellent reason via private mail why sequence
has to be strict:
sequence [Maybe 3, Maybe 4, Nothing] = Nothing
sequence [Maybe 3, Maybe 4] = Just [3, 4]
> > maxima :: [[Double]] -> [Double]
> > maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)
> FWIW, This function has a beautiful alternate definition:
> maxima :: [[Double]] -> [Double]
> maxima = map maximum . transpose
Beautiful indeed! But see below.
To be honest, I don't really roll dice, but I am reading from a file.
I just thought that randomRIO would be more concise, but now the
discussion has gone totally in that direction. Sorry... (-: reading
the random number code is more fun, though!
Anyhow, I fixed my example to do lazy file processing where before I
used readFile (which has to be strict, as I can see now). First, I
generate a file with the samples, and then I read that file back (this
is the phase I'm interested in, since my real data is not really
samples :: Int -> Int -> IO [[Int]]
samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 * 1000 * 1000)
maxima :: [[Int]] -> [Int]
maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)
lazyProcess :: ([[Int]] -> a) -> FilePath -> IO a
lazyProcess f fileName =
h <- openFile fileName ReadMode
v <- fmap (f . map read . lines) $ hGetContents h
v `seq` hClose h
mkSamples = do
args <- getArgs
x <- samples (read (head args)) 5
putStr . (++ "\n") . join . intersperse "\n" . map show $ x
-- main = mkSamples
-- ghc --make -O9 test.hs -o test && ./test 10000 > test.data
main = lazyProcess length "test.data" >>= putStr . show
lazyProcess (What would be a better name? foldSampleFile perhaps?) is
where the IO happens, but the computation is located in a pure
function. And yet, only those lines are read that are relevant, and
GC on previous lines is allows if the pure function allows it.
This program has constant memory usage. Unfortunately, if I replace
the length function with implementation of maxima, it explodes again.
I tried a few things, such as
maxima'3 :: [[Int]] -> [Int]
maxima'3 (h:t) = foldr (\ x y -> let v = map (uncurry max) $ zip x y in sum v `seq` v) h t
with no luck so far. Tricky business, that! But much more curiously,
if I replace maxima'3 in main with this
maxima'4 :: [[Int]] -> [Int]
maxima'4 = map maximum . transpose
(with explicit type signature in both definitions), I get a 'no parse'
error from Prelude.read. maxima'3 with the same file gives me a
result. How can there be a difference if the type signatures are
Probably something about "don't use Prelude.read" :-)? I have to play
with this some more...
More information about the Haskell-Cafe