ben at smart-cactus.org
Thu Mar 18 19:15:35 UTC 2021
Karel Gardas <karel.gardas at centrum.cz> writes:
> On 3/17/21 4:16 PM, Andreas Klebinger wrote:
>> Now that isn't really an issue anyway I think. The question is rather is
>> 2% a large enough regression to worry about? 5%? 10%?
> 5-10% is still around system noise even on lightly loaded workstation.
> Not sure if CI is not run on some shared cloud resources where it may be
> even higher.
I think when we say "performance" we should be clear about what we are
referring to. Currently, GHC does not measure instructions/cycles/time.
We only measure allocations and residency. These are significantly more
deterministic than time measurements, even on cloud hardware.
I do think that eventually we should start to measure a broader spectrum
of metrics, but this is something that can be done on dedicated hardware
as a separate CI job.
> I've done simple experiment of pining ghc compiling ghc-cabal and I've
> been able to "speed" it up by 5-10% on W-2265.
Do note that once we switch to Hadrian ghc-cabal will vanish entirely
(since Hadrian implements its functionality directly).
> Also following this CI/performance regs discussion I'm not entirely sure
> if this is not just a witch-hunt hurting/beating mostly most active GHC
> developers. Another idea may be to give up on CI doing perf reg testing
> at all and invest saved resources into proper investigation of
> GHC/Haskell programs performance. Not sure, if this would not be more
> beneficial longer term.
I don't think this would be beneficial. It's much easier to prevent a
regression from getting into the tree than it is to find and
characterise it after it has been merged.
> Just one random number thrown to the ring. Linux's perf claims that
> nearly every second L3 cache access on the example above ends with cache
> miss. Is it a good number or bad number? See stats below (perf stat -d
> on ghc with +RTS -T -s -RTS').
It is very hard to tell; it sounds bad but it is not easy to know why or
whether it is possible to improve. This is one of the reasons why I have
been trying to improve sharing within GHC recently; reducing residency should
improve cache locality.
Nevertheless, the difficulty interpreting architectural events is why I
generally only use `perf` for differential measurements.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the ghc-devs