[Haskell-cafe] Haskell performance when it comes to regex?

Станислав Черничкин schernichkin at gmail.com
Fri May 19 11:17:28 UTC 2017


Try to use Text or ByteString instead of strings. Try to use compile and
execute methods (
http://hackage.haskell.org/package/regex-tdfa-1.2.1/docs/Text-Regex-TDFA-ByteString.html),
make sure regex get compiled once.

2017-05-16 12:12 GMT+03:00 Bram Neijt <bneijt at gmail.com>:

> Dear reader,
>
> I decided to do a little project which is a simple search and replace
> program for large text files.
>
> Written in Haskell, it does a few different regex matches on each line
> and stores them in a leveldb key-value store to create a
> consistent/reviewable search-replace index. It should provide for some
> simple/brute-force anonymization of data and therefore I called it
> hanon (sorry, could not think of a better name).
>
> https://github.com/BigDataRepublic/hanon
>
> The code works, but I've done some benchmarking to compare it with
> Python and the code is about 80x slower then doing the same thing in
> Python, making it useless for larger data files.
>
> I'm obviously doing something wrong.
>
> Could you give me tips on improving the performance of this code?
> Probably mainly looking at
>
> https://github.com/BigDataRepublic/hanon/blob/master/src/Mapper.hs
>
> where the regex code lives?
>
> Greetings,
>
> Bram
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.




-- 
Sincerely, Stanislav Chernichkin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170519/1ed3d604/attachment.html>


More information about the Haskell-Cafe mailing list