[Haskell-cafe] Data.Enumerator.Text.utf8 not constant memory?

Felipe Almeida Lessa felipe.lessa at gmail.com
Tue Apr 26 04:03:15 CEST 2011


[CC'ing John Millikin, enumerator's maintainer]

On Mon, Apr 25, 2011 at 7:10 PM, Skirmantas Kligys
<skirmantas.kligys at gmail.com> wrote:
> I expected to be able to do what SAX does in Java, i.e. to avoid loading the
> whole 2 gigabytes into memory.  For warm-up, I wrote an iteratee to count lines
> in the file, and it does load the whole file into memory!  After profiling, I
> see that the problem was Data.Enumerator.Text.utf8, it allocates up to 60
> megabytes when run on a 40 megabyte test file.

It seems to me that this is a bug in enumerator's "strict" fold not
being strict at all =).  The current version 0.4.9.1 of
Data.Enumerator.List.fold is

-- | Consume the entire input stream with a strict left fold, one element
-- at a time.
--
-- Since: 0.4.8
fold :: Monad m => (b -> a -> b) -> b
       -> Iteratee a m b
fold step = continue . loop where
	f = L.foldl' step
	loop acc stream = case stream of
		Chunks [] -> continue (loop acc)
		Chunks xs -> continue (loop (f acc xs))
		EOF -> yield acc EOF

Note that the list fold is strict (f = Data.List.foldl' step),
*however* the acc parameter of loop isn't strict at all!  It just
creates a big, fat thunk with references to all of you input =(.

But the fix is extremely easy, just change the 'Chunks xs' line to

		Chunks xs -> continue (loop $! f acc xs)

Using only your iterLinesWc test with a 105 MiB file (a movie I had
lying around), with enumerator's definition it takes 220 MiB of memory
and 1.3~1.5 seconds according to +RTS -s.  By doing only this very
change above, it takes 2 MiB of memory (100x improvement :P) and
0.8~0.9 seconds.

John Millikin, could you please apply the attached patch? =)

Cheers,

-- 
Felipe.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_strict.patch
Type: text/x-patch
Size: 5780 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110425/90db7638/attachment.bin>


More information about the Haskell-Cafe mailing list