nofib regressions in HEAD since 7.6.2 release

Nicolas Frisby nicolas.frisby at gmail.com
Tue Feb 12 12:00:50 CET 2013


Thanks for such specific requests, and great idea to focus on imaginary!

Outline: SUMMARY, HARDWARE, NUMBERS

== THE SUMMARY ==

  * I still get quite different numbers.
    * For wheel-sieve1 and kahan, we're in the same ballpark.
    * Though my kahan shows half as much allocation increase. Which
hardware difference would explain this?
    * For bernoiulli, exp3_8, and integrate, my nofib-analyse shows percent
changes in Runtime, where as yours shows absolutes. I've included my
absolute tables below; comparing those, we still get appreciable
differences.

  * I've included my mode=slow numbers.

  * We have some significant hardware differences.
    * My machine claims 32 processors, though it smells like it has 8 chips
with 8 cores each. I'll ask SPJ or Mainland.
    * My cache size is much smaller than yours: 512 KB versus 8 MB.
    * My CPU frequency is 2GHz compared to your 3.4GHz.

  * How do we want to handle hardware diversity like we're seeing in these
regular benchmark runs?
    * Are the different behaviors we're seeing expected for our hardware
differences or bugs of some sort?

Thanks, Johan.

== THE HARDWARE ==

$ cat /proc/cpuinfo
processor : 0 # counts up to 31, with physical id and core id pairs
duplicated once
vendor_id : AuthenticAMD
cpu family : 16
model  : 9
model name : AMD Opteron(tm) Processor 6128
stepping : 1
microcode : 0x10000d4
cpu MHz  : 1999.949
cache size : 512 KB
physical id : 0 # counts up to 3 for each core id, twice
siblings : 8
core id  : 0 # counts up to 3, for each physical id, twice
cpu cores : 8
apicid  : 0 # varies
initial apicid : 0 # varies
fpu  : yes
fpu_exception : yes
cpuid level : 5
wp  : yes
flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid
amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy
abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv
svm_lock nrip_save pausefilter
bogomips : 3999.89
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

physical id and core id exhaust combinations of {0,1,2,3}, twice for some
reason as processor counts from 0 to 31. I would have suspected 64
processors, given the sibling and cpu cores. Am I tripping on a common
misconception?

I included the rest of the info because I still get different numbers than
you do.

$ uname -a
Linux cam-05-unx 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

$ VERSION=7.0.4;
HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("Project version","7.0.4")
 ,("Booter version","6.12.1")
 ,("Stage","2")
 ,("Build platform","x86_64-unknown-linux")
 ,("Host platform","x86_64-unknown-linux")
 ,("Target platform","x86_64-unknown-linux")
 ,("Have interpreter","YES")
 ,("Object splitting","YES")
 ,("Have native code generator","YES")
 ,("Have llvm code generator","YES")
 ,("Support SMP","YES")
 ,("Unregisterised","NO")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug  thr thr_debug thr_l thr_p  dyn debug_dyn thr_dyn
thr_debug_dyn")
 ,("Leading underscore","NO")
 ,("Debug on","False")
 ,("LibDir","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4")
 ,("Global Package
DB","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4/package.conf.d")
 ,("C compiler flags","[\"-fno-stack-protector\"]")
 ,("Gcc Linker flags","[]")
 ,("Ld Linker flags","[]")

$ VERSION=7.6.2;
HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
 [("Project name","The Glorious Glasgow Haskell Compilation System")
 ,("GCC extra via C opts"," -fwrapv")
 ,("C compiler command","/usr/bin/gcc")
 ,("C compiler flags"," -fno-stack-protector ")
 ,("ar command","/usr/bin/ar")
 ,("ar flags","q")
 ,("ar supports at file","@ArSupportsAtFile@")
 ,("touch command","touch")
 ,("dllwrap command","/bin/false")
 ,("windres command","/bin/false")
 ,("perl command","/usr/bin/perl")
 ,("target os","OSLinux")
 ,("target arch","ArchX86_64")
 ,("target word size","8")
 ,("target has GNU nonexec stack","True")
 ,("target has .ident directive","True")
 ,("target has subsections via symbols","False")
 ,("LLVM llc command","llc")
 ,("LLVM opt command","opt")
 ,("Project version","7.6.2")
 ,("Booter version","7.4.1")
 ,("Stage","2")
 ,("Build platform","x86_64-unknown-linux")
 ,("Host platform","x86_64-unknown-linux")
 ,("Target platform","x86_64-unknown-linux")
 ,("Have interpreter","YES")
 ,("Object splitting supported","YES")
 ,("Have native code generator","YES")
 ,("Support SMP","YES")
 ,("Unregisterised","NO")
 ,("Tables next to code","YES")
 ,("RTS ways","l debug  thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn
thr_debug_dyn")
 ,("Leading underscore","NO")
 ,("Debug on","False")
 ,("LibDir","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2")
 ,("Global Package
DB","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2/package.conf.d")
 ,("Gcc Linker
flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
 ,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
 ]

== THE NUMBERS ==

With VERSION=7.0.4 or VERSION=7.6.2. (I'm not relying on $PATH, is the only
difference.)

$ HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; (make clean
&& make boot WithNofibHc=${HC} && make WithNofibHc=${HC}) >& log-${VERSION}

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%     +7.2%     +7.4%     +0.0%
         exp3_8          +1.1%    +53.7%    +55.4%    +57.5%   +300.0%
    gen_regexps         +18.6%     +3.9%      0.00      0.00     +0.0%
      integrate          -0.1%    +39.0%   +110.2%    +88.5%     +0.0%
          kahan          +1.7%    +41.8%     +8.2%     +8.0%     +0.0%
      paraffins          +1.3%     -1.2%     -3.6%     -0.8%     +0.0%
         primes          +1.4%    +64.7%      0.11      0.11    +50.0%
         queens          +0.8%     -0.2%      0.02      0.02     +0.0%
           rfib          +1.7%    +42.8%      0.03      0.03     +0.0%
            tak          +0.9%    +12.0%      0.02      0.02     +0.0%
   wheel-sieve1          +1.4%    +66.6%     -4.0%     -4.3%    -17.6%
   wheel-sieve2          +1.4%     +0.0%     -0.2%     -2.1%     +0.0%
           x2n1         +10.3%    +41.7%      0.01      0.01   +200.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -4.0%     -4.3%    -17.6%
            Max         +18.6%    +66.6%   +110.2%    +88.5%   +300.0%
 Geometric Mean          +3.3%    +25.6%    +19.6%    +18.1%    +23.0%

I did it twice.

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%     +7.0%     +7.1%     +0.0%
         exp3_8          +1.1%    +53.7%    +56.6%    +57.8%   +300.0%
    gen_regexps         +18.6%     +3.9%      0.00      0.00     +0.0%
      integrate          -0.1%    +39.0%   +102.1%    +86.2%     +0.0%
          kahan          +1.7%    +41.8%     +9.5%     +8.9%     +0.0%
      paraffins          +1.3%     -1.2%     -0.6%     -4.8%     +0.0%
         primes          +1.4%    +64.7%      0.11      0.11    +50.0%
         queens          +0.8%     -0.2%      0.02      0.02     +0.0%
           rfib          +1.7%    +42.8%      0.03      0.03     +0.0%
            tak          +0.9%    +12.0%      0.02      0.02     +0.0%
   wheel-sieve1          +1.4%    +66.6%     -4.4%     -4.3%    -17.6%
   wheel-sieve2          +1.4%     +0.0%     -1.1%     -2.8%     +0.0%
           x2n1         +10.3%    +41.7%      0.01      0.01   +200.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -4.4%     -4.8%    -17.6%
            Max         +18.6%    +66.6%   +102.1%    +86.2%   +300.0%
 Geometric Mean          +3.3%    +25.6%    +19.5%    +17.2%    +23.0%

Maybe your machine is too fast for nofib-analyse to include exp3_8.

Allocations
-------------------------------------------------------------------------------
        Program            log-7.0.4       log-7.6.2
-------------------------------------------------------------------------------
     bernouilli            303890616           +0.2%
         exp3_8            389023528          +53.7%
    gen_regexps               304768           +3.9%
      integrate            546206856          +39.0%
          kahan            700842656          +41.8%
      paraffins             56201680           -1.2%
         primes             65899520          +64.7%
         queens             17387888           -0.2%
           rfib                81176          +42.8%
            tak                94408          +12.0%
   wheel-sieve1             14620056          +66.6%
   wheel-sieve2             88734064           +0.0%
           x2n1              2491928          +41.7%
        -1 s.d.                -----           +3.0%
        +1 s.d.                -----          +53.2%
        Average                -----          +25.6%
Run Time
-------------------------------------------------------------------------------
        Program            log-7.0.4       log-7.6.2
-------------------------------------------------------------------------------
     bernouilli                 0.28           +7.2%
         exp3_8                 0.21          +55.4%
    gen_regexps                 0.00            0.00
      integrate                 0.34         +110.2%
          kahan                 1.07           +8.2%
      paraffins                 0.22           -3.6%
         primes                 0.10            0.11
         queens                 0.02            0.02
           rfib                 0.03            0.03
            tak                 0.01            0.02
   wheel-sieve1                 0.68           -4.0%
   wheel-sieve2                 0.37           -0.2%
           x2n1                 0.00            0.01
        -1 s.d.                -----           -9.3%
        +1 s.d.                -----          +57.7%
        Average                -----          +19.6%
And here are the results using the "mode=slow" Nofib option. Only
bernouilli and gen_regexps do not have SLOW_OPTS defined in their Makefile.
It's odd that gen_regexps shows such drastic change then...

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
     bernouilli          +3.3%     +0.2%     +6.7%     +7.4%     +0.0%
         exp3_8          +1.1%    +68.2%    +20.2%    +20.3%   +100.0%
    gen_regexps         +18.6%     -1.2%     -6.3%     -6.0%     +0.0%
      integrate          -0.1%    +39.0%   +114.3%   +104.9%     +3.9%
          kahan          +1.7%    +41.9%     +7.5%     +7.5%     +0.0%
      paraffins          +1.3%     -1.2%     -3.3%     -2.7%     +0.3%
         primes          +1.4%    +57.9%     +0.8%     +1.0%     +0.0%
         queens          +0.8%     -0.2%     -1.7%     -2.0%     +0.0%
           rfib          +1.7%    +42.8%     +6.3%     +6.0%     +0.0%
            tak          +0.9%    +12.0%     -2.4%     -2.5%     +0.0%
   wheel-sieve1          +1.4%    +99.2%     -3.4%     -3.4%    +58.8%
   wheel-sieve2          +1.4%     -0.1%     -3.6%     -3.8%     +0.0%
           x2n1         +10.3%    +43.1%      0.13      0.13  +1300.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.2%     -6.3%     -6.0%     +0.0%
            Max         +18.6%    +99.2%   +114.3%   +104.9%  +1300.0%
 Geometric Mean          +3.3%    +27.4%     +8.2%     +7.8%    +34.3%

Allocations
-------------------------------------------------------------------------------
        Program       log-slow-7.0.4  log-slow-7.6.2
-------------------------------------------------------------------------------
     bernouilli            303890616           +0.2%
         exp3_8           3500234960          +68.2%
    gen_regexps            780759064           -1.2%
      integrate           1092338624          +39.0%
          kahan           2797648296          +41.9%
      paraffins            363166288           -1.2%
         primes            861820872          +57.9%
         queens            569243336           -0.2%
           rfib                81488          +42.8%
            tak                94408          +12.0%
   wheel-sieve1             24134568          +99.2%
   wheel-sieve2            160800936           -0.1%
           x2n1             19375720          +43.1%
        -1 s.d.                -----           +1.2%
        +1 s.d.                -----          +60.4%
        Average                -----          +27.4%

Run Time
-------------------------------------------------------------------------------
        Program       log-slow-7.0.4  log-slow-7.6.2
-------------------------------------------------------------------------------
     bernouilli                 0.28           +6.7%
         exp3_8                 3.39          +20.2%
    gen_regexps                 1.90           -6.3%
      integrate                 0.68         +114.3%
          kahan                 4.35           +7.5%
      paraffins                 1.93           -3.3%
         primes                 1.51           +0.8%
         queens                 0.72           -1.7%
           rfib                 0.31           +6.3%
            tak                 1.58           -2.4%
   wheel-sieve1                 2.28           -3.4%
   wheel-sieve2                 0.77           -3.6%
           x2n1                 0.04            0.13
        -1 s.d.                -----          -12.9%
        +1 s.d.                -----          +34.3%
        Average                -----           +8.2%


On Tue, Feb 12, 2013 at 3:17 AM, Johan Tibell <johan.tibell at gmail.com>wrote:

> Hi Nicolas!
>
> I tried to reproduce the difference between 7.0.4 and 7.6.2 on the exp3_8,
> wheel-sieve1, and primes and couldn't get the same percent difference as
> you. We need to reconcile these differences somehow. Lets start with more
> exact machine specs. I have a:
>
> $ cat /proc/cpuinfo
> processor : 7
> vendor_id : GenuineIntel
> cpu family : 6
> model : 58
> model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> stepping : 9
> microcode : 0x12
> cpu MHz : 1600.000
> cache size : 8192 KB
> ...
>
> $ uname -a
> Linux johantibell.com 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> And GHC versions:
>
> $ ghc-7.0.4 --info
>  [("Project name","The Glorious Glasgow Haskell Compilation System")
>  ,("Project version","7.0.4")
>  ,("Booter version","6.12.1")
>  ,("Stage","2")
>  ,("Build platform","x86_64-unknown-linux")
>  ,("Host platform","x86_64-unknown-linux")
>  ,("Target platform","x86_64-unknown-linux")
>  ,("Have interpreter","YES")
>  ,("Object splitting","YES")
>  ,("Have native code generator","YES")
>  ,("Have llvm code generator","YES")
>  ,("Support SMP","YES")
>  ,("Unregisterised","NO")
>  ,("Tables next to code","YES")
>  ,("RTS ways","l debug  thr thr_debug thr_l thr_p  dyn debug_dyn thr_dyn
> thr_debug_dyn")
>  ,("Leading underscore","NO")
>  ,("Debug on","False")
>  ,("LibDir","/usr/local/lib/ghc-7.0.4")
>  ,("Global Package DB","/usr/local/lib/ghc-7.0.4/package.conf.d")
>  ,("C compiler flags","[\"-fno-stack-protector\"]")
>  ,("Gcc Linker flags","[]")
>  ,("Ld Linker flags","[]")
>  ]
>
> $ ghc-7.6.2 --info
>  [("Project name","The Glorious Glasgow Haskell Compilation System")
>  ,("GCC extra via C opts"," -fwrapv")
>  ,("C compiler command","/usr/bin/gcc")
>  ,("C compiler flags"," -fno-stack-protector ")
>  ,("ar command","/usr/bin/ar")
>  ,("ar flags","q")
>  ,("ar supports at file","@ArSupportsAtFile@")
>  ,("touch command","touch")
>  ,("dllwrap command","/bin/false")
>  ,("windres command","/bin/false")
>  ,("perl command","/usr/bin/perl")
>  ,("target os","OSLinux")
>  ,("target arch","ArchX86_64")
>  ,("target word size","8")
>  ,("target has GNU nonexec stack","True")
>  ,("target has .ident directive","True")
>  ,("target has subsections via symbols","False")
>  ,("LLVM llc command","llc")
>  ,("LLVM opt command","opt")
>  ,("Project version","7.6.2")
>  ,("Booter version","7.4.1")
>  ,("Stage","2")
>  ,("Build platform","x86_64-unknown-linux")
>  ,("Host platform","x86_64-unknown-linux")
>  ,("Target platform","x86_64-unknown-linux")
>  ,("Have interpreter","YES")
>  ,("Object splitting supported","YES")
>  ,("Have native code generator","YES")
>  ,("Support SMP","YES")
>  ,("Unregisterised","NO")
>  ,("Tables next to code","YES")
>  ,("RTS ways","l debug  thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn
> thr_debug_dyn")
>  ,("Leading underscore","NO")
>  ,("Debug on","False")
>  ,("LibDir","/usr/local/lib/ghc-7.6.2")
>  ,("Global Package DB","/usr/local/lib/ghc-7.6.2/package.conf.d")
>  ,("Gcc Linker
> flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
>  ,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
>  ]
>
> I ran the benchmarks by running e.g.:
>
> $ cd nofib/imaginary/sieve-wheel1
> $ make clean && make boot WithNofibHc=ghc-${VERSION} && make
> WithNofibHc=ghc-${VERSION}
>
> Could you please try to run the "imaginary" benchmarks using exactly these
> commands and report the difference you see between 7.0.4 and 7.6.2. Here's
> what I see. 7.0.4 vs 7.6.2:
>
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>      bernouilli          +3.3%     +0.2%      0.12      0.13     +0.0%
>          exp3_8          +1.1%    +53.7%      0.14      0.14   +300.0%
>     gen_regexps         +18.7%     +3.9%      0.00      0.00     +0.0%
>       integrate          -0.1%    +39.0%      0.21      0.23     +0.0%
>           kahan          +1.7%    +98.6%     +9.9%     +7.3%     +0.0%
>       paraffins          +1.3%     -1.2%      0.06      0.08     +0.0%
>          primes          +1.4%    +64.7%      0.04      0.05    +50.0%
>          queens          +0.8%     -0.5%      0.02      0.02     +0.0%
>            rfib          +1.7%    +42.8%      0.02      0.02     +0.0%
>             tak          +0.9%    +12.0%      0.01      0.01     +0.0%
>    wheel-sieve1          +0.8%    +66.6%     -4.6%     -5.8%    -12.5%
>    wheel-sieve2          +0.9%     +0.0%      0.12      0.13     +0.0%
>            x2n1         +10.3%    +87.3%      0.00      0.01   +200.0%
>
> --------------------------------------------------------------------------------
>             Min          -0.1%     -1.2%     -4.6%     -5.8%    -12.5%
>             Max         +18.7%    +98.6%     +9.9%     +7.3%   +300.0%
>  Geometric Mean          +3.2%    +31.7%     +2.4%     +0.5%    +23.6%
>
> -- Johan
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130212/e8e553af/attachment-0001.htm>


More information about the ghc-devs mailing list