[Haskell-cafe] Monad transformer performance - Request to review benchmarking code + results

Sun Jan 29 16:59:29 UTC 2017

Thanks for digging deeper, David. What exactly did you inline?

Also, am I the only one losing my mind over this? It's such a
straightforward use of available code structuring tools in Haskell. How
come the compiler is not being smart about this OOB?

-- Saurabh.

On 29 Jan 2017 9:42 pm, "David Turner" <dct25-561bs at mythic-beasts.com>
wrote:

Here's the profiling summary that I got:

COST CENTRE                      MODULE                              %time
%alloc

getOverhead                      Criterion.Monad                      41.3
   0.0
>>=                              Lucid.Base                           19.2
  41.6
makeElement.\.\                  Lucid.Base                           11.4
  23.4
fromHtmlEscapedString            Blaze.ByteString.Builder.Html.Utf8    7.9
  14.9
>>=                              Data.Vector.Fusion.Util               2.3
   1.7
return                           Lucid.Base                            1.4
   2.1
runBenchmark.loop                Criterion.Measurement                 1.2
   0.0
with.\                           Lucid.Base                            1.0
   2.1
foldlMapWithKey                  Lucid.Base                            0.5
   2.6
streamDecodeUtf8With.decodeChunk Data.Text.Encoding                    0.0
   1.7

As expected, HtmlT's bind is the expensive bit. However I've been unable to
encourage it to go away using INLINE pragmas.

On 29 January 2017 at 15:45, Oliver Charles <ollie at ocharles.org.uk> wrote:

> I would start by inlining operations in the Functor, Applicative and Monad
> classes for your monad and all the layers in the stack (such as HtmlT). An
> un-inlining monadic bind can end up allocating a lot (as it's such a common
> operation)
>
> On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda, <saurabhnanda at gmail.com>
> wrote:
>
>> Please tell me what to INLINE. I'll update the benchmarks.
>>
>> Also, shouldn't this be treated as a GHC bug then? Using monad
>> transformers as intended should not result in a severe performance penalty!
>> Either monad transformers themselves are a problem or GHC is not doing the
>> right thing.
>>
>> -- Saurabh.
>>
>> On 29 Jan 2017 7:50 pm, "Oliver Charles" <ollie at ocharles.org.uk> wrote:
>>
>> I would wager a guess that this can be solved with INLINE pragmas. We
>> recently added INLINE to just about everything in transformers and got a
>> significant speed up.
>>
>> On Sun, 29 Jan 2017, 11:18 am David Turner, <
>> dct25-561bs at mythic-beasts.com> wrote:
>>
>> I would guess that the issue lies within HtmlT, which looks vaguely
>> similar to a WriterT transformer but without much in the way of
>> optimisation (e.g. INLINE pragmas). But that's just a guess after about 30
>> sec of glancing at https://hackage.haskell.org
>> /package/lucid-2.9.7/docs/src/Lucid-Base.html so don't take it as gospel.
>>
>> My machine is apparently an i7-4770 of a similar vintage to yours,
>> running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM,
>> 16 in the host FWIW.
>>
>>
>> On 29 Jan 2017 10:26, "Saurabh Nanda" <saurabhnanda at gmail.com> wrote:
>>
>> Thank you for the PR. Does your research suggest something is wrong with
>> HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an
>> mtl issue or a lucid issue in that case?
>>
>> Curiously, what's your machine config? I'm on a late 2011 macbook pro
>> with 10G ram and some old i5.
>>
>> -- Saurabh.
>>
>> On 29 Jan 2017 3:05 pm, "David Turner" <dct25-561bs at mythic-beasts.com>
>> wrote:
>>
>> The methodology does look reasonable, although I think you should wait
>> for all the scotty threads to start before starting the benchmarks, as I
>> see this interleaved output:
>>
>> Setting phasers to stun... (port 3002) (ctrl-c to quit)
>> Setting phasers to stun... (port 3003) (ctrl-c to quit)
>> Setting phasers to stun... (port 3001) (ctrl-c to quit)
>> benchmarking bareScotty
>> Setting phasers to stun... (port 3000) (ctrl-c to quit)
>>
>> Your numbers are wayyy slower than the ones I see on my dev machine:
>>
>> benchmarking bareScotty
>> Setting phasers to stun... (port 3000) (ctrl-c to quit)
>> time                 10.94 ms   (10.36 ms .. 11.52 ms)
>>                      0.979 R²   (0.961 R² .. 0.989 R²)
>> mean                 12.53 ms   (11.98 ms .. 13.28 ms)
>> std dev              1.702 ms   (1.187 ms .. 2.589 ms)
>> variance introduced by outliers: 66% (severely inflated)
>>
>> benchmarking bareScottyBareLucid
>> time                 12.95 ms   (12.28 ms .. 13.95 ms)
>>                      0.972 R²   (0.951 R² .. 0.989 R²)
>> mean                 12.20 ms   (11.75 ms .. 12.69 ms)
>> std dev              1.236 ms   (991.3 μs .. 1.601 ms)
>> variance introduced by outliers: 50% (severely inflated)
>>
>> benchmarking transScottyBareLucid
>> time                 12.05 ms   (11.70 ms .. 12.39 ms)
>>                      0.992 R²   (0.982 R² .. 0.996 R²)
>> mean                 12.43 ms   (12.06 ms .. 13.01 ms)
>> std dev              1.320 ms   (880.5 μs .. 2.071 ms)
>> variance introduced by outliers: 54% (severely inflated)
>>
>> benchmarking transScottyTransLucid
>> time                 39.73 ms   (32.16 ms .. 49.45 ms)
>>                      0.668 R²   (0.303 R² .. 0.969 R²)
>> mean                 42.59 ms   (36.69 ms .. 54.38 ms)
>> std dev              16.52 ms   (8.456 ms .. 25.96 ms)
>> variance introduced by outliers: 92% (severely inflated)
>>
>> benchmarking bareScotty
>> time                 11.46 ms   (10.89 ms .. 12.07 ms)
>>                      0.986 R²   (0.975 R² .. 0.994 R²)
>> mean                 11.73 ms   (11.45 ms .. 12.07 ms)
>> std dev              800.6 μs   (636.8 μs .. 975.3 μs)
>> variance introduced by outliers: 34% (moderately inflated)
>>
>> but nonetheless I do also see the one using renderTextT to be
>> substantially slower than the one without.
>>
>> I've sent you a PR [1] that isolates Lucid from Scotty and shows that
>> renderTextT is twice as slow over IO than it is over Identity, and it's
>> ~10% slower over Reader too:
>>
>> benchmarking renderText
>> time                 5.529 ms   (5.328 ms .. 5.709 ms)
>>                      0.990 R²   (0.983 R² .. 0.995 R²)
>> mean                 5.645 ms   (5.472 ms .. 5.888 ms)
>> std dev              593.0 μs   (352.5 μs .. 908.2 μs)
>> variance introduced by outliers: 63% (severely inflated)
>>
>> benchmarking renderTextT Id
>> time                 5.439 ms   (5.243 ms .. 5.640 ms)
>>                      0.991 R²   (0.985 R² .. 0.996 R²)
>> mean                 5.498 ms   (5.367 ms .. 5.631 ms)
>> std dev              408.8 μs   (323.8 μs .. 552.9 μs)
>> variance introduced by outliers: 45% (moderately inflated)
>>
>> benchmarking renderTextT Rd
>> time                 6.173 ms   (5.983 ms .. 6.396 ms)
>>                      0.990 R²   (0.983 R² .. 0.995 R²)
>> mean                 6.284 ms   (6.127 ms .. 6.527 ms)
>> std dev              581.6 μs   (422.9 μs .. 773.0 μs)
>> variance introduced by outliers: 55% (severely inflated)
>>
>> benchmarking renderTextT IO
>> time                 12.35 ms   (11.84 ms .. 12.84 ms)
>>                      0.989 R²   (0.982 R² .. 0.995 R²)
>> mean                 12.22 ms   (11.85 ms .. 12.76 ms)
>> std dev              1.159 ms   (729.5 μs .. 1.683 ms)
>> variance introduced by outliers: 50% (severely inflated)
>>
>> I tried replacing
>>
>>     forM [1..10000] (\_ -> div_ "hello world!")
>>
>> with
>>
>>     replicateM_ 10000 (div_ "hello world!")
>>
>> which discards the list of 10,000 () values that the forM thing
>> generates, but this made very little difference.
>>
>> Hope this helps,
>>
>> David
>>
>>
>> [1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
>>
>>
>>
>> On 29 January 2017 at 07:26, Saurabh Nanda <saurabhnanda at gmail.com>
>> wrote:
>>
>> Hi,
>>
>> I was noticing severe drop in performance when Lucid's HtmlT was being
>> combined with Scotty's ActionT. I've tried putting together a minimal repro
>> at https://github.com/vacationlabs/monad-transformer-benchmark Request
>> someone with better knowledge of benchmarking to check if the benchmarking
>> methodology is correct.
>>
>> Is my reading of 200ms performance penalty correct?
>>
>> -- Saurabh.
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>>
>>
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170129/7d78d3fe/attachment-0001.html>