[Git][ghc/ghc][wip/simon-perf] 32 commits: ghc-prim: Turn some comments into haddocks

Sat Jul 11 16:18:26 UTC 2020

Ben Gamari pushed to branch wip/simon-perf at Glasgow Haskell Compiler / GHC

Commits:
e61d5395 by Chaitanya Koparkar at 2020-07-07T13:55:59-04:00
ghc-prim: Turn some comments into haddocks

[ci skip]

- - - - -
37743f91 by John Ericson at 2020-07-07T13:56:00-04:00
Support `timesInt2#` in LLVM backend

- - - - -
46397e53 by John Ericson at 2020-07-07T13:56:00-04:00
`genericIntMul2Op`: Call `genericWordMul2Op` directly

This unblocks a refactor, and removes partiality. It might be a PowerPC
regression but that should be fixable.

- - - - -
8a1c0584 by John Ericson at 2020-07-07T13:56:00-04:00
Simplify `PrimopCmmEmit`

Follow @simonpj's suggestion of pushing the "into regs" logic into
`emitPrimOp`. With the previous commit getting rid of the recursion in
`genericIntMul2Op`, this is now an easy refactor.

- - - - -
6607f203 by John Ericson at 2020-07-07T13:56:00-04:00
`opAllDone` -> `opIntoRegs`

The old name was and terrible and became worse after the previous
commit's refactor moved non-trivial funcationlity into its body.

- - - - -
fdcc53ba by Sylvain Henry at 2020-07-07T13:56:00-04:00
Optimise genericIntMul2Op

We shouldn't directly call 'genericWordMul2Op' in genericIntMul2Op
because a target may provide a faster primop for 'WordMul2Op': we'd
better use it!

- - - - -
686e7225 by Moritz Angermann at 2020-07-07T13:56:01-04:00
[linker/rtsSymbols] More linker symbols

Mostly symbols needed for aarch64/armv7l
and in combination with musl, where we have
to rely on loading *all* objects/archives

- __stack_chk_* only when not DYNAMIC

- - - - -
3f60b94d by Moritz Angermann at 2020-07-07T13:56:01-04:00
better if guards.

- - - - -
7abffced by Moritz Angermann at 2020-07-07T13:56:01-04:00
Fix (1)

- - - - -
cdfeb3f2 by Moritz Angermann at 2020-07-07T13:56:01-04:00
AArch32 symbols only on aarch32.

- - - - -
f496c955 by Adam Sandberg Ericsson at 2020-07-07T13:56:02-04:00
add -flink-rts flag to link the rts when linking a shared or static library #18072

By default we don't link the RTS when linking shared libraries because in the
most usual mode a shared library is an intermediary product, for example a
Haskell library, that will be linked into some executable in the end. So we
wish to defer the RTS flavour to link to the final link.

However sometimes the final product is the shared library, for example when
writing a plugin for some other system, so we do wish the shared library to
link the RTS.

For consistency we also make -staticlib honor this flag and its inversion.
-staticlib currently implies -flink-shared.

- - - - -
c59faf67 by Stefan Schulze Frielinghaus at 2020-07-07T13:56:04-04:00
hadrian: link check-ppr against debugging RTS if ghcDebugged

- - - - -
0effc57d by Adam Sandberg Ericsson at 2020-07-07T13:56:05-04:00
rts linker: teach the linker about GLIBC's special handling of *stat, mknod and atexit functions #7072

- - - - -
96153433 by Adam Sandberg Ericsson at 2020-07-07T13:56:06-04:00
hadrian: make hadrian/ghci use the bootstrap compiler from configure #18190

- - - - -
4d24f886 by Adam Sandberg Ericsson at 2020-07-07T13:56:07-04:00
hadrian: ignore cabal configure verbosity related flags #18131

- - - - -
7332bbff by Ben Gamari at 2020-07-07T13:56:08-04:00
testsuite: Widen T12234 acceptance window to 2%

Previously it wasn't uncommon to see +/-1% fluctuations in compiler
allocations on this test.

- - - - -
180b6313 by Gabor Greif at 2020-07-07T13:56:08-04:00
When running libtool, report it as such
- - - - -
d3bd6897 by Sylvain Henry at 2020-07-07T13:56:11-04:00
BigNum: rename BigNat types

Before this patch BigNat names were confusing because we had:

* GHC.Num.BigNat.BigNat: unlifted type used everywhere else
* GHC.Num.BigNat.BigNatW: lifted type only used to share static constants
* GHC.Natural.BigNat: lifted type only used for backward compatibility

After this patch we have:

* GHC.Num.BigNat.BigNat#: unlifted type
* GHC.Num.BigNat.BigNat: lifted type (reexported from GHC.Natural)

Thanks to @RyanGlScott for spotting this.

- - - - -
929d26db by Sylvain Henry at 2020-07-07T13:56:12-04:00
Bignum: don't build ghc-bignum with stage0

Noticed by @Ericson2314

- - - - -
d25b6851 by Sylvain Henry at 2020-07-07T13:56:12-04:00
Hadrian: ghc-gmp.h shouldn't be a compiler dependency

- - - - -
0ddae2ba by Sylvain Henry at 2020-07-07T13:56:14-04:00
DynFlags: factor out pprUnitId from "Outputable UnitId" instance

- - - - -
204f3f5d by Krzysztof Gogolewski at 2020-07-07T13:56:18-04:00
Remove unused function pprHsForAllExtra (#18423)

The function `pprHsForAllExtra` was called only on `Nothing`
since 2015 (1e041b7382b6aa).

- - - - -
3033e0e4 by Adam Sandberg Ericsson at 2020-07-08T20:36:49-04:00
hadrian: add flag to skip rebuilding dependency information #17636

- - - - -
b7de4b96 by Stefan Schulze Frielinghaus at 2020-07-09T09:49:22-04:00
Fix GHCi :print on big-endian platforms

On big-endian platforms executing

  import GHC.Exts
  data Foo = Foo Float# deriving Show
  foo = Foo 42.0#
  foo
  :print foo

results in an arithmetic overflow exception which is caused by function
index where moveBytes equals
  word_size - (r + item_size_b) * 8
Here we have a mixture of units. Both, word_size and item_size_b have
unit bytes whereas r has unit bits.  On 64-bit platforms moveBytes
equals then
  8 - (0 + 4) * 8
which results in a negative and therefore invalid second parameter for a
shiftL operation.

In order to make things more clear the expression
  (word .&. (mask `shiftL` moveBytes)) `shiftR` moveBytes
is equivalent to
  (word `shiftR` moveBytes) .&. mask
On big-endian platforms the shift must be a left shift instead of a
right shift. For symmetry reasons not a mask is used but two shifts in
order to zero out bits. Thus the fixed version equals
  case endian of
    BigEndian    -> (word `shiftL` moveBits) `shiftR` zeroOutBits `shiftL` zeroOutBits
    LittleEndian -> (word `shiftR` moveBits) `shiftL` zeroOutBits `shiftR` zeroOutBits

Fixes #16548 and #14455

- - - - -
3656dff8 by Sylvain Henry at 2020-07-09T09:50:01-04:00
LLVM: fix MO_S_Mul2 support (#18434)

The value indicating if the carry is useful wasn't taken into account.

- - - - -
d9f09506 by Simon Peyton Jones at 2020-07-10T10:33:44-04:00
Define multiShotIO and use it in mkSplitUniqueSupply

This patch is part of the ongoing eta-expansion saga;
see #18238.

It implements a neat trick (suggested by Sebastian Graf)
that allows the programmer to disable the default one-shot behaviour
of IO (the "state hack").  The trick is to use a new multiShotIO
function; see Note [multiShotIO].  For now, multiShotIO is defined
here in Unique.Supply; but it should ultimately be moved to the IO
library.

The change is necessary to get good code for GHC's unique supply;
see Note [Optimising the unique supply].

However it makes no difference to GHC as-is.  Rather, it makes
a difference when a subsequent commit

   Improve eta-expansion using ArityType

lands.

- - - - -
bce695cc by Simon Peyton Jones at 2020-07-10T10:33:44-04:00
Make arityType deal with join points

As Note [Eta-expansion and join points] describes,
this patch makes arityType deal correctly with join points.
What was there before was not wrong, but yielded lower
arities than it could.

Fixes #18328

In base GHC this makes no difference to nofib.

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
         n-body          -0.1%     -0.1%     -1.2%     -1.1%      0.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -0.1%    -55.0%    -56.5%      0.0%
            Max          -0.0%      0.0%    +16.1%    +13.4%      0.0%
 Geometric Mean          -0.0%     -0.0%    -30.1%    -31.0%     -0.0%

But it starts to make real difference when we land the change to the
way mkDupableAlts handles StrictArg, in fixing #13253 and friends.
I think this is because we then get more non-inlined join points.

- - - - -
2b7c71cb by Simon Peyton Jones at 2020-07-11T12:17:02-04:00
Improve eta-expansion using ArityType

As #18355 shows, we were failing to preserve one-shot info when
eta-expanding.  It's rather easy to fix, by using ArityType more,
rather than just Arity.

This patch is important to suport the one-shot monad trick;
see #18202.  But the extra tracking of one-shot-ness requires
the patch

   Define multiShotIO and use it in mkSplitUniqueSupply

If that patch is missing, ths patch makes things worse in
GHC.Types.Uniq.Supply.  With it, however, we see these improvements

    T3064     compiler bytes allocated -2.2%
    T3294     compiler bytes allocated -1.3%
    T12707    compiler bytes allocated -1.3%
    T13056    compiler bytes allocated -2.2%

Metric Decrease:
    T3064
    T3294
    T12707
    T13056

- - - - -
0c41e851 by Simon Peyton Jones at 2020-07-11T12:18:08-04:00
Use dumpStyle when printing inlinings

This just makes debug-printing consistent,
and more informative.

- - - - -
f44b5b9e by Simon Peyton Jones at 2020-07-11T12:18:08-04:00
Comments only

- - - - -
b9f0d583 by Simon Peyton Jones at 2020-07-11T12:18:08-04:00
Reduce result discount in conSize

Ticket #18282 showed that the result discount given by conSize
was massively too large.  This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.

Note [Constructor size and result discount] elaborates, as
does the ticket #18282.

Reducing result discount reduces inlining, which affects perf.  I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
          boyer          -0.2%     +5.4%     -3.2%     -3.4%      0.0%
       cichelli          -0.1%     +5.9%    -11.2%    -11.7%      0.0%
      compress2          -0.2%     +9.6%     -6.0%     -6.8%      0.0%
   cryptarithm2          -0.1%     -3.9%     -6.0%     -5.7%      0.0%
         gamteb          -0.2%     +2.6%    -13.8%    -14.4%      0.0%
         genfft          -0.1%     -1.6%    -29.5%    -29.9%      0.0%
             gg          -0.0%     -2.2%    -17.2%    -17.8%    -20.0%
           life          -0.1%     -2.2%    -62.3%    -63.4%      0.0%
           mate          +0.0%     +1.4%     -5.1%     -5.1%    -14.3%
         parser          -0.2%     -2.1%     +7.4%     +6.7%      0.0%
      primetest          -0.2%    -12.8%    -14.3%    -14.2%      0.0%
         puzzle          -0.2%     +2.1%    -10.0%    -10.4%      0.0%
            rsa          -0.2%    -11.7%     -3.7%     -3.8%      0.0%
         simple          -0.2%     +2.8%    -36.7%    -38.3%     -2.2%
   wheel-sieve2          -0.1%    -19.2%    -48.8%    -49.2%    -42.9%
--------------------------------------------------------------------------------
            Min          -0.4%    -19.2%    -62.3%    -63.4%    -42.9%
            Max          +0.3%     +9.6%     +7.4%    +11.0%    +16.7%
 Geometric Mean          -0.1%     -0.3%    -17.6%    -18.0%     -0.7%

I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.

I investigated compress2.  The difference is entirely caused by this
function no longer inlining

WriteRoutines.$woutputCodes
  = \ (w :: [CodeEvent]) ->
      let result_s1Sr
            = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
                (# ww1, ww2 #) -> (ww1, ww2)
      in (# case result_s1Sr of (x, _) ->
              map @Int @Char WriteRoutines.outputCodes1 x
         , case result_s1Sr of { (_, y) -> y } #)

It was right on the cusp before, driven by the excessive result
discount.  Too bad!

Happily, the compiler/perf tests show a number of improvements:
    T12227     compiler bytes-alloc  -6.6%
    T12545     compiler bytes-alloc  -4.7%
    T13056     compiler bytes-alloc  -3.3%
    T15263     runtime  bytes-alloc -13.1%
    T17499     runtime  bytes-alloc -14.3%
    T3294      compiler bytes-alloc  -1.1%
    T5030      compiler bytes-alloc -11.7%
    T9872a     compiler bytes-alloc  -2.0%
    T9872b     compiler bytes-alloc  -1.2%
    T9872c     compiler bytes-alloc  -1.5%

Metric Decrease:
    T12227
    T12545
    T13056
    T15263
    T17499
    T3294
    T5030
    T9872a
    T9872b
    T9872c

- - - - -
f2506067 by Simon Peyton Jones at 2020-07-11T12:18:08-04:00
This patch addresses the exponential blow-up in the simplifier.

Specifically:
  #13253 exponential inlining
  #10421 ditto
  #18140 strict constructors
  #18282 another nested-function call case

This patch makes two significant changes:

1. For Ids that are used at most once in each branch of a case,
   make the occurrence analyser record the total number of
   syntactic occurrences.  Then in postInlineUnconditionally
   use that info to avoid inling something many many times.

   Actual changes:
     * See the occ_n_br field of OneOcc.
     * postInlineUnconditionally
   See Note [Suppress exponential blowup] in GHC.Core.Opt.Simplify.Utils

2. Change the way that mkDupableCont handles StrictArg.
   The details are explained in GHC.Core.Opt.Simplify
      Note [Duplicating StrictArg]

Current nofib run

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
             VS          -0.3%   +115.9%    +12.1%    +11.2%      0.0%
         boyer2          -0.3%    +10.0%     +3.5%     +4.0%      0.0%
   cryptarithm2          -0.3%    +39.0%    +16.6%    +16.1%      0.0%
         gamteb          -0.3%     +4.1%     -0.0%     +0.4%      0.0%
     last-piece          -0.3%     +1.4%     -1.1%     -0.4%      0.0%
           mate          -0.4%    -11.1%     -8.5%     -9.0%      0.0%
     multiplier          -0.3%     -2.2%     -1.5%     -1.5%      0.0%
      transform          -0.3%     +3.4%     +0.5%     +0.8%      0.0%
--------------------------------------------------------------------------------
            Min          -0.8%    -11.1%     -8.5%     -9.0%      0.0%
            Max          -0.3%   +115.9%    +30.1%    +26.4%      0.0%
 Geometric Mean          -0.3%     +1.0%     +1.0%     +1.0%     -0.0%

Should investigate these numbers.

But the tickets are indeed cured, I think.

- - - - -

30 changed files:

- compiler/GHC/Builtin/PrimOps.hs
- compiler/GHC/CmmToLlvm/CodeGen.hs
- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Driver/Flags.hs
- compiler/GHC/Driver/Pipeline.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/Hs/Type.hs
- compiler/GHC/Runtime/Heap/Inspect.hs
- compiler/GHC/StgToCmm/Prim.hs
- compiler/GHC/SysTools.hs
- compiler/GHC/SysTools/Tasks.hs
- compiler/GHC/Tc/Solver/Flatten.hs
- compiler/GHC/Types/Basic.hs
- compiler/GHC/Types/Id/Info.hs
- compiler/GHC/Types/Unique/Supply.hs
- compiler/GHC/Unit/Types.hs
- configure.ac
- docs/users_guide/8.12.1-notes.rst
- docs/users_guide/phases.rst
- docs/users_guide/shared_libs.rst
- hadrian/.gitignore
- hadrian/README.md
- hadrian/doc/make.md

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d0a71dc443b9e4b017e043ee4c138720ddb4d9a6...f250606761b8120799a276eafc858bfb94743df9

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/d0a71dc443b9e4b017e043ee4c138720ddb4d9a6...f250606761b8120799a276eafc858bfb94743df9
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200711/50e748f9/attachment-0001.html>