[Haskell-cafe] Code runs 7x FASTER with profiling

Jonas Scholl anselm.scholl at tu-harburg.de
Thu Dec 7 07:49:42 UTC 2017


Looking at the produced core of both versions reveals that in the
profiled build a closure of type Regex is floated to top level. The
non-profiled build doesn't do this, thus it recompiles the regex for
every iteration. This is most likely the source of the slowdown of the
non-profiled build.

On 12/07/2017 07:09 AM, Neil Mayhew wrote:
> I was writing a simple utility and I decided to use regexps to parse
> filenames. (I know, now I have two problems :-) )
> 
> I was surprised at how slow it ran, so I did a profiling build. The
> profiled code runs reasonably quickly, and is 7x faster, which makes it
> a bit hard to figure out where the slowdown is happening in the
> non-profiled code. I’m wondering if I’m doing something wrong, or if
> there’s a bug in |regex-tdfa| or in ghc.
> 
> I’ve pared my code down to just the following:
> 
> |import Text.Regex.TDFA ((=~)) main :: IO () main = do entries <- map
> parseFilename . lines <$> getContents let check (Right (_, t)) = last t
> == 'Z' check _ = False print $ all check entries parseFilename :: String
> -> Either String (String, String) parseFilename fn = case (fn =~ pattern
> :: [[String]]) of [[_, full, _, time]] -> Right $ (full, time) _ -> Left
> fn where pattern = "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
> \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."
> |
> 
> The relevant part of my |.cabal| file looks like this:
> 
> |executable DuplicityAnalyzer main-is: DuplicityAnalyzer.hs
> build-depends: base >=4.6 && <4.11, regex-tdfa >= 1.0 && <1.3
> default-language: Haskell2010 ghc-options: -Wall -rtsopts |
> 
> To run the profiling, I do:
> 
> |cabal clean cabal configure --enable-profiling cabal build
> dist/build/DuplicityAnalyzer/DuplicityAnalyzer <names.in +RTS
> -sprofiling-summary.out -p |
> 
> The |MUT| time in the non-profiling build is 7x bigger, and the |%GC|
> time goes from 8% to 21%. I’ve put the actual output in a gist
> <https://gist.github.com/neilmayhew/247a30738c0e294902e7c2830ca2c6f5>.
> I’ve also put my test input file there, in case anyone wants to try this
> themselves.
> 
> I’ve done my testing with NixOS (ghc 8.0.2) and Debian with the Haskell
> Platform (ghc 8.2.1) and the results are basically the same. I even
> tried using Docker containers with Debian Jessie and Debian Stretch,
> just to eliminate any OS influence, and the results are still the same.
> I’ve tried it on an i5-2500K, i5-3317U and Xeon E5-1620.
> 
> I also wrote a dummy implementation of |=~| that ignores the regex
> pattern and does a hard-coded manual parse, and that produces times just
> slightly better than the profiled ones. So I don’t think there’s a
> problem in my outer code that uses |=~|.
> 
>> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20171207/8df68542/attachment.sig>


More information about the Haskell-Cafe mailing list