[Haskell-cafe] Data.Text performance problem

Petr Prokhorenkov prokhorenkov at gmail.com
Mon Sep 13 06:26:14 EDT 2010


I really didn't expect mapAccumL to have quadratic complexity. Thank you a
lot for the fix!


On Mon, Sep 13, 2010 at 5:06 AM, Bryan O'Sullivan <bos at serpentine.com>wrote:

> On Sun, Sep 12, 2010 at 12:23 PM, Petr Prokhorenkov <
> prokhorenkov at gmail.com> wrote:
>
>> I experienced a following problem while dealing with some text processing.
>>
>
> Thanks for the report and the test case. There's nothing wrong with your
> code - read on for details.
>
> You ran into one of the few functions in Data.Text that I copied straight
> over from the list implementation due to it not being used often.
> Unfortunately, that implementation constructs a new Text value (using cons)
> on every iteration through the loop, and as you might expect that's very
> slow even on tiny inputs, as it has quadratic behaviour.
>
> I've rewritten both strict and lazy mapAccumL and mapAccumR to use as much
> of the stream fusion machinery as possible. (By the way, there's an
> interesting fact behind why those functions started out life as they did:
> you can't write mapAccum functions using only stream machinery, due to their
> types, and the strict code is more work to write if you can't use the stream
> machinery. In the early days it just wasn't worth writing the more complex
> variants of them, as I had more pressing performance concerns at the time.)
>
> Where the old version of mapAccumL caused your test case to take 5 seconds
> to process an 11KB file (ouch!), with the rewritten version, your code can
> process an 81MB file in the same amount of time, without any changes to your
> code, and that O(n^2) behaviour is gone :-)
>
> text 0.8.1.0 is now up on hackage, with the fix included. Enjoy!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100913/87e44826/attachment.html


More information about the Haskell-Cafe mailing list