[Haskell-cafe] Haskell performance when it comes to regex?
chris at chrisdornan.com
Sun May 28 12:22:01 UTC 2017
Sorry for being a bit late to this -- I have been on the road.
I have switched over you example to pre-compile the REs and use ByteString and can see 13x speedup on scan and a 9x speedup on mapping. Curiously, nearly all of that speedup seems to be gained by lifting the RE compilation out of the loop but I am pretty sure there are gains to be had from re-writing the loops.
Do you have the Python code that was performing 80x better?
From: Alfredo Di Napoli <alfredo.dinapoli at gmail.com>
Date: Monday, 22 May 2017 at 08:48
To: Bram Neijt <bneijt at gmail.com>
Cc: Станислав Черничкин <schernichkin at gmail.com>, haskell-cafe <haskell-cafe at haskell.org>, Chris Dornan <chris at chrisdornan.com>
Subject: Re: [Haskell-cafe] Haskell performance when it comes to regex?
you might be interested in the “regex” package from my colleague Chris Dornan:
I know some proper performance work still needs to be done, but I would be curious to hear your experience report ;)
On 19 May 2017 at 18:52, Bram Neijt <bneijt at gmail.com> wrote:
I already changed to Text instead, but I thought the regex was already
memoized by GHC, so that should not be a problem.
I'm trying regex-applicative now, maybe that will help, but it takes
some time to figure out the syntax. I'll also try to see if
On Fri, May 19, 2017 at 1:17 PM, Станислав Черничкин
<schernichkin at gmail.com> wrote:
> Try to use Text or ByteString instead of strings. Try to use compile and
> execute methods
> make sure regex get compiled once.
> 2017-05-16 12:12 GMT+03:00 Bram Neijt <bneijt at gmail.com>:
>> Dear reader,
>> I decided to do a little project which is a simple search and replace
>> program for large text files.
>> Written in Haskell, it does a few different regex matches on each line
>> and stores them in a leveldb key-value store to create a
>> consistent/reviewable search-replace index. It should provide for some
>> simple/brute-force anonymization of data and therefore I called it
>> hanon (sorry, could not think of a better name).
>> The code works, but I've done some benchmarking to compare it with
>> Python and the code is about 80x slower then doing the same thing in
>> Python, making it useless for larger data files.
>> I'm obviously doing something wrong.
>> Could you give me tips on improving the performance of this code?
>> Probably mainly looking at
>> where the regex code lives?
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> Only members subscribed via the mailman list are allowed to post.
> Sincerely, Stanislav Chernichkin.
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe