[Haskell-cafe] Data.Text performance problem

Bryan O'Sullivan bos at serpentine.com
Sun Sep 12 21:06:31 EDT 2010

On Sun, Sep 12, 2010 at 12:23 PM, Petr Prokhorenkov
<prokhorenkov at gmail.com>wrote:

> I experienced a following problem while dealing with some text processing.

Thanks for the report and the test case. There's nothing wrong with your
code - read on for details.

You ran into one of the few functions in Data.Text that I copied straight
over from the list implementation due to it not being used often.
Unfortunately, that implementation constructs a new Text value (using cons)
on every iteration through the loop, and as you might expect that's very
slow even on tiny inputs, as it has quadratic behaviour.

I've rewritten both strict and lazy mapAccumL and mapAccumR to use as much
of the stream fusion machinery as possible. (By the way, there's an
interesting fact behind why those functions started out life as they did:
you can't write mapAccum functions using only stream machinery, due to their
types, and the strict code is more work to write if you can't use the stream
machinery. In the early days it just wasn't worth writing the more complex
variants of them, as I had more pressing performance concerns at the time.)

Where the old version of mapAccumL caused your test case to take 5 seconds
to process an 11KB file (ouch!), with the rewritten version, your code can
process an 81MB file in the same amount of time, without any changes to your
code, and that O(n^2) behaviour is gone :-)

text is now up on hackage, with the fix included. Enjoy!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100912/6418b0e0/attachment.html

More information about the Haskell-Cafe mailing list