[Haskell-cafe] Monad transformer performance - Request to review benchmarking code + results
David Turner
dct25-561bs at mythic-beasts.com
Sun Jan 29 19:27:17 UTC 2017
Because, I guess, nobody has put the time and effort into optimising this
particular benchmark. Lucid's fast enough that there are normally other
more pressing bottlenecks in a real application. The compiler has no
relevant smarts here, it's the library code to look at.
There's something funny going on on your system that I can't help with,
since I'm seeing rendering and serving the 230kB HTML page in a reasonably
punchy 10-20ms range, an order of magnitude less than your numbers.
Nonetheless, here is a fork of Lucid which performs substantially better
running renderTextT over the IO monad on the benchmarks I sent earlier,
thanks to a sprinkling of inlining:
https://github.com/chrisdone/lucid/compare/master...DaveCTurner:98d69d0457034390d596eb40113368b24504dd6c
benchmarking renderText
time 4.900 ms (4.577 ms .. 5.218 ms)
0.895 R² (0.749 R² .. 0.988 R²)
mean 5.560 ms (5.189 ms .. 6.461 ms)
std dev 1.717 ms (510.4 μs .. 3.380 ms)
variance introduced by outliers: 95% (severely inflated)
benchmarking renderTextT Id
time 4.879 ms (4.755 ms .. 5.036 ms)
0.989 R² (0.979 R² .. 0.997 R²)
mean 5.057 ms (4.946 ms .. 5.219 ms)
std dev 373.7 μs (285.5 μs .. 483.4 μs)
variance introduced by outliers: 47% (moderately inflated)
benchmarking renderTextT Rd
time 5.034 ms (4.916 ms .. 5.152 ms)
0.994 R² (0.989 R² .. 0.997 R²)
mean 5.226 ms (5.090 ms .. 5.772 ms)
std dev 713.8 μs (261.3 μs .. 1.417 ms)
variance introduced by outliers: 74% (severely inflated)
benchmarking renderTextT IO
time 7.168 ms (6.694 ms .. 7.557 ms)
0.969 R² (0.946 R² .. 0.982 R²)
mean 8.388 ms (8.014 ms .. 8.880 ms)
std dev 1.132 ms (932.1 μs .. 1.397 ms)
variance introduced by outliers: 71% (severely inflated)
and here is all the things I tried on it:
https://github.com/chrisdone/lucid/compare/master...DaveCTurner:inline-the-things
Hope that helps,
David
On 29 January 2017 at 16:59, Saurabh Nanda <saurabhnanda at gmail.com> wrote:
> Thanks for digging deeper, David. What exactly did you inline?
>
> Also, am I the only one losing my mind over this? It's such a
> straightforward use of available code structuring tools in Haskell. How
> come the compiler is not being smart about this OOB?
>
> -- Saurabh.
>
> On 29 Jan 2017 9:42 pm, "David Turner" <dct25-561bs at mythic-beasts.com>
> wrote:
>
> Here's the profiling summary that I got:
>
> COST CENTRE MODULE %time
> %alloc
>
> getOverhead Criterion.Monad 41.3
> 0.0
> >>= Lucid.Base 19.2
> 41.6
> makeElement.\.\ Lucid.Base 11.4
> 23.4
> fromHtmlEscapedString Blaze.ByteString.Builder.Html.Utf8
> 7.9 14.9
> >>= Data.Vector.Fusion.Util 2.3
> 1.7
> return Lucid.Base 1.4
> 2.1
> runBenchmark.loop Criterion.Measurement 1.2
> 0.0
> with.\ Lucid.Base 1.0
> 2.1
> foldlMapWithKey Lucid.Base 0.5
> 2.6
> streamDecodeUtf8With.decodeChunk Data.Text.Encoding
> 0.0 1.7
>
> As expected, HtmlT's bind is the expensive bit. However I've been unable
> to encourage it to go away using INLINE pragmas.
>
>
>
>
> On 29 January 2017 at 15:45, Oliver Charles <ollie at ocharles.org.uk> wrote:
>
>> I would start by inlining operations in the Functor, Applicative and
>> Monad classes for your monad and all the layers in the stack (such as
>> HtmlT). An un-inlining monadic bind can end up allocating a lot (as it's
>> such a common operation)
>>
>> On Sun, 29 Jan 2017, 3:32 pm Saurabh Nanda, <saurabhnanda at gmail.com>
>> wrote:
>>
>>> Please tell me what to INLINE. I'll update the benchmarks.
>>>
>>> Also, shouldn't this be treated as a GHC bug then? Using monad
>>> transformers as intended should not result in a severe performance penalty!
>>> Either monad transformers themselves are a problem or GHC is not doing the
>>> right thing.
>>>
>>> -- Saurabh.
>>>
>>> On 29 Jan 2017 7:50 pm, "Oliver Charles" <ollie at ocharles.org.uk> wrote:
>>>
>>> I would wager a guess that this can be solved with INLINE pragmas. We
>>> recently added INLINE to just about everything in transformers and got a
>>> significant speed up.
>>>
>>> On Sun, 29 Jan 2017, 11:18 am David Turner, <
>>> dct25-561bs at mythic-beasts.com> wrote:
>>>
>>> I would guess that the issue lies within HtmlT, which looks vaguely
>>> similar to a WriterT transformer but without much in the way of
>>> optimisation (e.g. INLINE pragmas). But that's just a guess after about 30
>>> sec of glancing at https://hackage.haskell.org
>>> /package/lucid-2.9.7/docs/src/Lucid-Base.html so don't take it as
>>> gospel.
>>>
>>> My machine is apparently an i7-4770 of a similar vintage to yours,
>>> running Ubuntu in a VirtualBox VM hosted on Windows. 4GB of RAM in the VM,
>>> 16 in the host FWIW.
>>>
>>>
>>> On 29 Jan 2017 10:26, "Saurabh Nanda" <saurabhnanda at gmail.com> wrote:
>>>
>>> Thank you for the PR. Does your research suggest something is wrong with
>>> HtmlT when combined with any MonadIO, not necessarily ActionT? Is this an
>>> mtl issue or a lucid issue in that case?
>>>
>>> Curiously, what's your machine config? I'm on a late 2011 macbook pro
>>> with 10G ram and some old i5.
>>>
>>> -- Saurabh.
>>>
>>> On 29 Jan 2017 3:05 pm, "David Turner" <dct25-561bs at mythic-beasts.com>
>>> wrote:
>>>
>>> The methodology does look reasonable, although I think you should wait
>>> for all the scotty threads to start before starting the benchmarks, as I
>>> see this interleaved output:
>>>
>>> Setting phasers to stun... (port 3002) (ctrl-c to quit)
>>> Setting phasers to stun... (port 3003) (ctrl-c to quit)
>>> Setting phasers to stun... (port 3001) (ctrl-c to quit)
>>> benchmarking bareScotty
>>> Setting phasers to stun... (port 3000) (ctrl-c to quit)
>>>
>>> Your numbers are wayyy slower than the ones I see on my dev machine:
>>>
>>> benchmarking bareScotty
>>> Setting phasers to stun... (port 3000) (ctrl-c to quit)
>>> time 10.94 ms (10.36 ms .. 11.52 ms)
>>> 0.979 R² (0.961 R² .. 0.989 R²)
>>> mean 12.53 ms (11.98 ms .. 13.28 ms)
>>> std dev 1.702 ms (1.187 ms .. 2.589 ms)
>>> variance introduced by outliers: 66% (severely inflated)
>>>
>>> benchmarking bareScottyBareLucid
>>> time 12.95 ms (12.28 ms .. 13.95 ms)
>>> 0.972 R² (0.951 R² .. 0.989 R²)
>>> mean 12.20 ms (11.75 ms .. 12.69 ms)
>>> std dev 1.236 ms (991.3 μs .. 1.601 ms)
>>> variance introduced by outliers: 50% (severely inflated)
>>>
>>> benchmarking transScottyBareLucid
>>> time 12.05 ms (11.70 ms .. 12.39 ms)
>>> 0.992 R² (0.982 R² .. 0.996 R²)
>>> mean 12.43 ms (12.06 ms .. 13.01 ms)
>>> std dev 1.320 ms (880.5 μs .. 2.071 ms)
>>> variance introduced by outliers: 54% (severely inflated)
>>>
>>> benchmarking transScottyTransLucid
>>> time 39.73 ms (32.16 ms .. 49.45 ms)
>>> 0.668 R² (0.303 R² .. 0.969 R²)
>>> mean 42.59 ms (36.69 ms .. 54.38 ms)
>>> std dev 16.52 ms (8.456 ms .. 25.96 ms)
>>> variance introduced by outliers: 92% (severely inflated)
>>>
>>> benchmarking bareScotty
>>> time 11.46 ms (10.89 ms .. 12.07 ms)
>>> 0.986 R² (0.975 R² .. 0.994 R²)
>>> mean 11.73 ms (11.45 ms .. 12.07 ms)
>>> std dev 800.6 μs (636.8 μs .. 975.3 μs)
>>> variance introduced by outliers: 34% (moderately inflated)
>>>
>>> but nonetheless I do also see the one using renderTextT to be
>>> substantially slower than the one without.
>>>
>>> I've sent you a PR [1] that isolates Lucid from Scotty and shows that
>>> renderTextT is twice as slow over IO than it is over Identity, and it's
>>> ~10% slower over Reader too:
>>>
>>> benchmarking renderText
>>> time 5.529 ms (5.328 ms .. 5.709 ms)
>>> 0.990 R² (0.983 R² .. 0.995 R²)
>>> mean 5.645 ms (5.472 ms .. 5.888 ms)
>>> std dev 593.0 μs (352.5 μs .. 908.2 μs)
>>> variance introduced by outliers: 63% (severely inflated)
>>>
>>> benchmarking renderTextT Id
>>> time 5.439 ms (5.243 ms .. 5.640 ms)
>>> 0.991 R² (0.985 R² .. 0.996 R²)
>>> mean 5.498 ms (5.367 ms .. 5.631 ms)
>>> std dev 408.8 μs (323.8 μs .. 552.9 μs)
>>> variance introduced by outliers: 45% (moderately inflated)
>>>
>>> benchmarking renderTextT Rd
>>> time 6.173 ms (5.983 ms .. 6.396 ms)
>>> 0.990 R² (0.983 R² .. 0.995 R²)
>>> mean 6.284 ms (6.127 ms .. 6.527 ms)
>>> std dev 581.6 μs (422.9 μs .. 773.0 μs)
>>> variance introduced by outliers: 55% (severely inflated)
>>>
>>> benchmarking renderTextT IO
>>> time 12.35 ms (11.84 ms .. 12.84 ms)
>>> 0.989 R² (0.982 R² .. 0.995 R²)
>>> mean 12.22 ms (11.85 ms .. 12.76 ms)
>>> std dev 1.159 ms (729.5 μs .. 1.683 ms)
>>> variance introduced by outliers: 50% (severely inflated)
>>>
>>> I tried replacing
>>>
>>> forM [1..10000] (\_ -> div_ "hello world!")
>>>
>>> with
>>>
>>> replicateM_ 10000 (div_ "hello world!")
>>>
>>> which discards the list of 10,000 () values that the forM thing
>>> generates, but this made very little difference.
>>>
>>> Hope this helps,
>>>
>>> David
>>>
>>>
>>> [1] https://github.com/vacationlabs/monad-transformer-benchmark/pull/2
>>>
>>>
>>>
>>> On 29 January 2017 at 07:26, Saurabh Nanda <saurabhnanda at gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I was noticing severe drop in performance when Lucid's HtmlT was being
>>> combined with Scotty's ActionT. I've tried putting together a minimal repro
>>> at https://github.com/vacationlabs/monad-transformer-benchmark Request
>>> someone with better knowledge of benchmarking to check if the benchmarking
>>> methodology is correct.
>>>
>>> Is my reading of 200ms performance penalty correct?
>>>
>>> -- Saurabh.
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> To (un)subscribe, modify options or view archives go to:
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> Only members subscribed via the mailman list are allowed to post.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> To (un)subscribe, modify options or view archives go to:
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> Only members subscribed via the mailman list are allowed to post.
>>>
>>>
>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170129/ba33da64/attachment-0001.html>
More information about the Haskell-Cafe
mailing list