[Haskell-cafe] Haskell performance when it comes to regex?

David Fox dsf at seereason.com
Tue May 23 16:22:07 UTC 2017


I have been surprised at how rarely switching to Text or ByteString makes
things significantly faster.  If you do this you should look at
Data.ByteString.Builder or Data.Text.Lazy.Builder.

On Fri, May 19, 2017 at 4:17 AM, Станислав Черничкин <schernichkin at gmail.com
> wrote:

> Try to use Text or ByteString instead of strings. Try to use compile and
> execute methods (http://hackage.haskell.org/package/regex-tdfa-1.2.1/docs/
> Text-Regex-TDFA-ByteString.html), make sure regex get compiled once.
>
> 2017-05-16 12:12 GMT+03:00 Bram Neijt <bneijt at gmail.com>:
>
>> Dear reader,
>>
>> I decided to do a little project which is a simple search and replace
>> program for large text files.
>>
>> Written in Haskell, it does a few different regex matches on each line
>> and stores them in a leveldb key-value store to create a
>> consistent/reviewable search-replace index. It should provide for some
>> simple/brute-force anonymization of data and therefore I called it
>> hanon (sorry, could not think of a better name).
>>
>> https://github.com/BigDataRepublic/hanon
>>
>> The code works, but I've done some benchmarking to compare it with
>> Python and the code is about 80x slower then doing the same thing in
>> Python, making it useless for larger data files.
>>
>> I'm obviously doing something wrong.
>>
>> Could you give me tips on improving the performance of this code?
>> Probably mainly looking at
>>
>> https://github.com/BigDataRepublic/hanon/blob/master/src/Mapper.hs
>>
>> where the regex code lives?
>>
>> Greetings,
>>
>> Bram
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
>
>
>
> --
> Sincerely, Stanislav Chernichkin.
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170523/52bd74d7/attachment.html>


More information about the Haskell-Cafe mailing list