[Haskell-cafe] Monad transformer performance - Request to review benchmarking code + results

Oliver Charles ollie at ocharles.org.uk
Sun Jan 29 15:45:31 UTC 2017


I would start by inlining operations in the Functor, Applicative and Monad
classes for your monad and all the layers in the stack (such as HtmlT). An
un-inlining monadic bind can end up allocating a lot (as it's such a common
operation)

On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda, <saurabhnanda at gmail.com> wrote:

> Please tell me what to INLINE. I'll update the benchmarks.
>
> Also, shouldn't this be treated as a GHC bug then? Using monad
> transformers as intended should not result in a severe performance penalty!
> Either monad transformers themselves are a problem or GHC is not doing the
> right thing.
>
> -- Saurabh.
>
> On 29 Jan 2017 7:50 pm, "Oliver Charles" <ollie at ocharles.org.uk> wrote:
>
> I would wager a guess that this can be solved with INLINE pragmas. We
> recently added INLINE to just about everything in transformers and got a
> significant speed up.
>
> On Sun, 29 Jan 2017, 11:18 am David Turner, <dct25-561bs at mythic-beasts.com>
> wrote:
>
> I would guess that the issue lies within HtmlT, which looks vaguely
> similar to a WriterT transformer but without much in the way of
> optimisation (e.g. INLINE pragmas). But that's just a guess after about 30
> sec of glancing at
> https://hackage.haskell.org/package/lucid-2.9.7/docs/src/Lucid-Base.html
> so don't take it as gospel.
>
> My machine is apparently an i7-4770 of a similar vintage to yours, running
> Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM, 16 in
> the host FWIW.
>
>
> On 29 Jan 2017 10:26, "Saurabh Nanda" <saurabhnanda at gmail.com> wrote:
>
> Thank you for the PR. Does your research suggest something is wrong with
> HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an
> mtl issue or a lucid issue in that case?
>
> Curiously, what's your machine config? I'm on a late 2011 macbook pro with
> 10G ram and some old i5.
>
> -- Saurabh.
>
> On 29 Jan 2017 3:05 pm, "David Turner" <dct25-561bs at mythic-beasts.com>
> wrote:
>
> The methodology does look reasonable, although I think you should wait for
> all the scotty threads to start before starting the benchmarks, as I see
> this interleaved output:
>
> Setting phasers to stun... (port 3002) (ctrl-c to quit)
> Setting phasers to stun... (port 3003) (ctrl-c to quit)
> Setting phasers to stun... (port 3001) (ctrl-c to quit)
> benchmarking bareScotty
> Setting phasers to stun... (port 3000) (ctrl-c to quit)
>
> Your numbers are wayyy slower than the ones I see on my dev machine:
>
> benchmarking bareScotty
> Setting phasers to stun... (port 3000) (ctrl-c to quit)
> time                 10.94 ms   (10.36 ms .. 11.52 ms)
>                      0.979 R²   (0.961 R² .. 0.989 R²)
> mean                 12.53 ms   (11.98 ms .. 13.28 ms)
> std dev              1.702 ms   (1.187 ms .. 2.589 ms)
> variance introduced by outliers: 66% (severely inflated)
>
> benchmarking bareScottyBareLucid
> time                 12.95 ms   (12.28 ms .. 13.95 ms)
>                      0.972 R²   (0.951 R² .. 0.989 R²)
> mean                 12.20 ms   (11.75 ms .. 12.69 ms)
> std dev              1.236 ms   (991.3 μs .. 1.601 ms)
> variance introduced by outliers: 50% (severely inflated)
>
> benchmarking transScottyBareLucid
> time                 12.05 ms   (11.70 ms .. 12.39 ms)
>                      0.992 R²   (0.982 R² .. 0.996 R²)
> mean                 12.43 ms   (12.06 ms .. 13.01 ms)
> std dev              1.320 ms   (880.5 μs .. 2.071 ms)
> variance introduced by outliers: 54% (severely inflated)
>
> benchmarking transScottyTransLucid
> time                 39.73 ms   (32.16 ms .. 49.45 ms)
>                      0.668 R²   (0.303 R² .. 0.969 R²)
> mean                 42.59 ms   (36.69 ms .. 54.38 ms)
> std dev              16.52 ms   (8.456 ms .. 25.96 ms)
> variance introduced by outliers: 92% (severely inflated)
>
> benchmarking bareScotty
> time                 11.46 ms   (10.89 ms .. 12.07 ms)
>                      0.986 R²   (0.975 R² .. 0.994 R²)
> mean                 11.73 ms   (11.45 ms .. 12.07 ms)
> std dev              800.6 μs   (636.8 μs .. 975.3 μs)
> variance introduced by outliers: 34% (moderately inflated)
>
> but nonetheless I do also see the one using renderTextT to be
> substantially slower than the one without.
>
> I've sent you a PR [1] that isolates Lucid from Scotty and shows that
> renderTextT is twice as slow over IO than it is over Identity, and it's
> ~10% slower over Reader too:
>
> benchmarking renderText
> time                 5.529 ms   (5.328 ms .. 5.709 ms)
>                      0.990 R²   (0.983 R² .. 0.995 R²)
> mean                 5.645 ms   (5.472 ms .. 5.888 ms)
> std dev              593.0 μs   (352.5 μs .. 908.2 μs)
> variance introduced by outliers: 63% (severely inflated)
>
> benchmarking renderTextT Id
> time                 5.439 ms   (5.243 ms .. 5.640 ms)
>                      0.991 R²   (0.985 R² .. 0.996 R²)
> mean                 5.498 ms   (5.367 ms .. 5.631 ms)
> std dev              408.8 μs   (323.8 μs .. 552.9 μs)
> variance introduced by outliers: 45% (moderately inflated)
>
> benchmarking renderTextT Rd
> time                 6.173 ms   (5.983 ms .. 6.396 ms)
>                      0.990 R²   (0.983 R² .. 0.995 R²)
> mean                 6.284 ms   (6.127 ms .. 6.527 ms)
> std dev              581.6 μs   (422.9 μs .. 773.0 μs)
> variance introduced by outliers: 55% (severely inflated)
>
> benchmarking renderTextT IO
> time                 12.35 ms   (11.84 ms .. 12.84 ms)
>                      0.989 R²   (0.982 R² .. 0.995 R²)
> mean                 12.22 ms   (11.85 ms .. 12.76 ms)
> std dev              1.159 ms   (729.5 μs .. 1.683 ms)
> variance introduced by outliers: 50% (severely inflated)
>
> I tried replacing
>
>     forM [1..10000] (\_ -> div_ "hello world!")
>
> with
>
>     replicateM_ 10000 (div_ "hello world!")
>
> which discards the list of 10,000 () values that the forM thing generates,
> but this made very little difference.
>
> Hope this helps,
>
> David
>
>
> [1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
>
>
>
> On 29 January 2017 at 07:26, Saurabh Nanda <saurabhnanda at gmail.com> wrote:
>
> Hi,
>
> I was noticing severe drop in performance when Lucid's HtmlT was being
> combined with Scotty's ActionT. I've tried putting together a minimal repro
> at https://github.com/vacationlabs/monad-transformer-benchmark Request
> someone with better knowledge of benchmarking to check if the benchmarking
> methodology is correct.
>
> Is my reading of 200ms performance penalty correct?
>
> -- Saurabh.
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170129/f58c5abe/attachment-0001.html>


More information about the Haskell-Cafe mailing list