[GHC] #14610: newtype wrapping of a monadic stack kills performance
GHC
ghc-devs at haskell.org
Sun Dec 24 13:43:21 UTC 2017
#14610: newtype wrapping of a monadic stack kills performance
-------------------------------------+-------------------------------------
Reporter: mrkkrp | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Description changed by mrkkrp:
Old description:
> I have a project where I in one module (A) I decided to build something
> like
> a minimal framework for the other bigger module (B) to use. One of the
> main
> points of the framework is that the monadic stacks are hidden behind
> `newtype`s and in those monads you can only use the functions that the
> module A provides.
>
> The module A can be found here:
>
> https://github.com/mrkkrp/mmark/blob/master/Text/MMark/Parser/Internal.hs
>
> There are two monadic stack wrapped with newtypes: `BParser` and
> `IParser`.
>
> The module B is this:
>
> https://github.com/mrkkrp/mmark/blob/master/Text/MMark/Parser.hs
>
> But it's not really relevant.
>
> Now if I change `newtype`s to `type` synonyms like so:
>
> {{{
> type BParser a = ParsecT MMarkErr Text (State BlockState) a
> type IParser a = StateT InlineState (Parsec MMarkErr Text) a
> }}}
>
> and do corresponding minor corrections, I get 2x less allocations and
> almost 2x faster code (before):
>
> {{{
> Case Allocated GCs Max
> with file: data/bench-yaml-block.md 119,080 0 11,088
> with file: data/bench-thematic-break.md 74,368 0 10,224
> with file: data/bench-heading.md 901,928 0 9,432
> with file: data/bench-fenced-code-block.md 145,744 0 9,368
> with file: data/bench-indented-code-block.md 124,312 0 9,368
> with file: data/bench-unordered-list.md 2,010,496 1 10,784
> with file: data/bench-ordered-list.md 2,025,016 1 10,728
> with file: data/bench-blockquote.md 1,961,672 1 42,648
> with file: data/bench-paragraph.md 2,084,104 1 42,648
> with file: data/comprehensive.md 25,899,496 24 44,200
>
> benchmarking with file: data/comprehensive.md
> time 3.590 ms (3.578 ms .. 3.601 ms)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 3.555 ms (3.546 ms .. 3.565 ms)
> std dev 31.07 μs (24.63 μs .. 39.90 μs)
> }}}
>
> After:
>
> {{{
> Case Allocated GCs Max
> with file: data/bench-yaml-block.md 116,864 0 11,088
> with file: data/bench-thematic-break.md 64,776 0 10,392
> with file: data/bench-heading.md 615,672 0 9,432
> with file: data/bench-fenced-code-block.md 144,736 0 9,368
> with file: data/bench-indented-code-block.md 123,352 0 9,368
> with file: data/bench-unordered-list.md 795,072 0 41,568
> with file: data/bench-ordered-list.md 808,808 0 41,512
> with file: data/bench-blockquote.md 826,392 0 41,568
> with file: data/bench-paragraph.md 881,432 0 41,568
> with file: data/comprehensive.md 10,945,104 10 44,440
>
> benchmarking with file: data/comprehensive.md
> time 2.451 ms (2.448 ms .. 2.456 ms)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 2.432 ms (2.427 ms .. 2.437 ms)
> std dev 15.86 μs (13.16 μs .. 19.01 μs)
> }}}
>
> I'm not great at reading non-trivial core, but I gave it a shot and
> dumped
> some core. One thing I noticed that core for the module A is the same in
> both cases. Then the problem is an inter-module problem, probably like
> when
> you don't dump definitions into interface files with `INLINEABLE` and end
> up
> without specialization, but here it perhaps has to do with the fact that
> B
> has no information about internals of the monadic stacks from A, but I'm
> not
> sure.
>
> Here is the beginnings of core dumps (before):
>
> {{{
> ==================== Tidy Core ====================
> 2017-12-23 04:55:17.967944505 UTC
>
> Result size of Tidy Core
> = {terms: 210,169,
> types: 209,675,
> coercions: 82,492,
> joins: 973/3,753}
> }}}
>
> After:
>
> {{{
> ==================== Tidy Core ====================
> 2017-12-23 04:58:46.386265108 UTC
>
> Result size of Tidy Core
> = {terms: 301,256,
> types: 263,104,
> coercions: 28,560,
> joins: 1,726/5,393}
> }}}
>
> So it looks like `newtype` wrapping reduces GHC's ability to create join
> points SPJ talked about recently.
>
> I do understand that you expect a minimal reproducing example but I could
> not produce it even though I spent several hours in futile attempts. I
> started by creating a small project, defining a similar stack with a
> `newtype` wrapper and using it in a simple parser in another module. Then
> I
> benchmarked the parser. There is no difference between `newtype`d and
> code
> and the code that uses just type synonyms. I tried different variations,
> no difference.
>
> The core output is too big for me to analyze, it's like 25 Mb and 33 Mb,
> and I have no idea what's going on there.
>
> You're welcome to experiment with the repo, there are benchmarks for
> memory usage and criterion benchmarks for speed of execution:
>
> https://github.com/mrkkrp/mmark
>
> If you like I can create a separate branch with newtypes replaced with
> type synonyms, so you can compare the both approaches easier.
>
> I'm submitting this because my friend convinced me that it's better to
> let you know (even though I could not create a minimal reproducing
> example on my own) than to let it go completely unnoticed.
New description:
I have a project where I in one module (A) I decided to build something
like
a minimal framework for the other bigger module (B) to use. One of the
main
points of the framework is that the monadic stacks are hidden behind
`newtype`s and in those monads you can only use the functions that the
module A provides.
The module A can be found here:
https://github.com/mrkkrp/mmark/blob/ghc-bug-
newtypes/Text/MMark/Parser/Internal.hs
There are two monadic stack wrapped with newtypes: `BParser` and
`IParser`.
The module B is this:
https://github.com/mrkkrp/mmark/blob/ghc-bug-newtypes/Text/MMark/Parser.hs
But it's not really relevant.
Now if I change `newtype`s to `type` synonyms like so:
{{{
type BParser a = ParsecT MMarkErr Text (State BlockState) a
type IParser a = StateT InlineState (Parsec MMarkErr Text) a
}}}
and do corresponding minor corrections, I get 2x less allocations and
almost 2x faster code (before):
{{{
Case Allocated GCs Max
with file: data/bench-yaml-block.md 119,080 0 11,088
with file: data/bench-thematic-break.md 74,368 0 10,224
with file: data/bench-heading.md 901,928 0 9,432
with file: data/bench-fenced-code-block.md 145,744 0 9,368
with file: data/bench-indented-code-block.md 124,312 0 9,368
with file: data/bench-unordered-list.md 2,010,496 1 10,784
with file: data/bench-ordered-list.md 2,025,016 1 10,728
with file: data/bench-blockquote.md 1,961,672 1 42,648
with file: data/bench-paragraph.md 2,084,104 1 42,648
with file: data/comprehensive.md 25,899,496 24 44,200
benchmarking with file: data/comprehensive.md
time 3.590 ms (3.578 ms .. 3.601 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 3.555 ms (3.546 ms .. 3.565 ms)
std dev 31.07 μs (24.63 μs .. 39.90 μs)
}}}
After:
{{{
Case Allocated GCs Max
with file: data/bench-yaml-block.md 116,864 0 11,088
with file: data/bench-thematic-break.md 64,776 0 10,392
with file: data/bench-heading.md 615,672 0 9,432
with file: data/bench-fenced-code-block.md 144,736 0 9,368
with file: data/bench-indented-code-block.md 123,352 0 9,368
with file: data/bench-unordered-list.md 795,072 0 41,568
with file: data/bench-ordered-list.md 808,808 0 41,512
with file: data/bench-blockquote.md 826,392 0 41,568
with file: data/bench-paragraph.md 881,432 0 41,568
with file: data/comprehensive.md 10,945,104 10 44,440
benchmarking with file: data/comprehensive.md
time 2.451 ms (2.448 ms .. 2.456 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.432 ms (2.427 ms .. 2.437 ms)
std dev 15.86 μs (13.16 μs .. 19.01 μs)
}}}
I'm not great at reading non-trivial core, but I gave it a shot and dumped
some core. One thing I noticed that core for the module A is the same in
both cases. Then the problem is an inter-module problem, probably like
when
you don't dump definitions into interface files with `INLINEABLE` and end
up
without specialization, but here it perhaps has to do with the fact that B
has no information about internals of the monadic stacks from A, but I'm
not
sure.
Here is the beginnings of core dumps (before):
{{{
==================== Tidy Core ====================
2017-12-23 04:55:17.967944505 UTC
Result size of Tidy Core
= {terms: 210,169,
types: 209,675,
coercions: 82,492,
joins: 973/3,753}
}}}
After:
{{{
==================== Tidy Core ====================
2017-12-23 04:58:46.386265108 UTC
Result size of Tidy Core
= {terms: 301,256,
types: 263,104,
coercions: 28,560,
joins: 1,726/5,393}
}}}
So it looks like `newtype` wrapping reduces GHC's ability to create join
points SPJ talked about recently.
I do understand that you expect a minimal reproducing example but I could
not produce it even though I spent several hours in futile attempts. I
started by creating a small project, defining a similar stack with a
`newtype` wrapper and using it in a simple parser in another module. Then
I
benchmarked the parser. There is no difference between `newtype`d and code
and the code that uses just type synonyms. I tried different variations,
no difference.
The core output is too big for me to analyze, it's like 25 Mb and 33 Mb,
and I have no idea what's going on there.
You're welcome to experiment with the repo, there are benchmarks for
memory usage and criterion benchmarks for speed of execution:
https://github.com/mrkkrp/mmark
I have created two branches I won't touch, one with newtypes and another
one with type synonyms:
* https://github.com/mrkkrp/mmark/tree/ghc-bug-newtypes
* https://github.com/mrkkrp/mmark/tree/ghc-bug-type-synonyms
Just checkout one of these and run `stack bench`.
This is the commit that changes newtypes to type synonyms:
https://github.com/mrkkrp/mmark/commit/759d8d4aa52dd57a393299c63e8c9b70d0d43290
I'm submitting this because my friend convinced me that it's better to let
you know (even though I could not create a minimal reproducing example on
my own) than to let it go completely unnoticed.
--
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14610#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list