[Haskell-cafe] Haskell performance when it comes to regex?

Bram Neijt bneijt at gmail.com
Fri May 19 16:52:05 UTC 2017

Thank you!

I already changed to Text instead, but I thought the regex was already
memoized by GHC, so that should not be a problem.

I'm trying regex-applicative now, maybe that will help, but it takes
some time to figure out the syntax. I'll also try to see if
precompilation helps.



On Fri, May 19, 2017 at 1:17 PM, Станислав Черничкин
<schernichkin at gmail.com> wrote:
> Try to use Text or ByteString instead of strings. Try to use compile and
> execute methods
> (http://hackage.haskell.org/package/regex-tdfa-1.2.1/docs/Text-Regex-TDFA-ByteString.html),
> make sure regex get compiled once.
> 2017-05-16 12:12 GMT+03:00 Bram Neijt <bneijt at gmail.com>:
>> Dear reader,
>> I decided to do a little project which is a simple search and replace
>> program for large text files.
>> Written in Haskell, it does a few different regex matches on each line
>> and stores them in a leveldb key-value store to create a
>> consistent/reviewable search-replace index. It should provide for some
>> simple/brute-force anonymization of data and therefore I called it
>> hanon (sorry, could not think of a better name).
>> https://github.com/BigDataRepublic/hanon
>> The code works, but I've done some benchmarking to compare it with
>> Python and the code is about 80x slower then doing the same thing in
>> Python, making it useless for larger data files.
>> I'm obviously doing something wrong.
>> Could you give me tips on improving the performance of this code?
>> Probably mainly looking at
>> https://github.com/BigDataRepublic/hanon/blob/master/src/Mapper.hs
>> where the regex code lives?
>> Greetings,
>> Bram
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
> --
> Sincerely, Stanislav Chernichkin.

More information about the Haskell-Cafe mailing list