nofib regressions in HEAD since 7.6.2 release
Nicolas Frisby
nicolas.frisby at gmail.com
Tue Feb 12 12:00:50 CET 2013
Thanks for such specific requests, and great idea to focus on imaginary!
Outline: SUMMARY, HARDWARE, NUMBERS
== THE SUMMARY ==
* I still get quite different numbers.
* For wheel-sieve1 and kahan, we're in the same ballpark.
* Though my kahan shows half as much allocation increase. Which
hardware difference would explain this?
* For bernoiulli, exp3_8, and integrate, my nofib-analyse shows percent
changes in Runtime, where as yours shows absolutes. I've included my
absolute tables below; comparing those, we still get appreciable
differences.
* I've included my mode=slow numbers.
* We have some significant hardware differences.
* My machine claims 32 processors, though it smells like it has 8 chips
with 8 cores each. I'll ask SPJ or Mainland.
* My cache size is much smaller than yours: 512 KB versus 8 MB.
* My CPU frequency is 2GHz compared to your 3.4GHz.
* How do we want to handle hardware diversity like we're seeing in these
regular benchmark runs?
* Are the different behaviors we're seeing expected for our hardware
differences or bugs of some sort?
Thanks, Johan.
== THE HARDWARE ==
$ cat /proc/cpuinfo
processor : 0 # counts up to 31, with physical id and core id pairs
duplicated once
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6128
stepping : 1
microcode : 0x10000d4
cpu MHz : 1999.949
cache size : 512 KB
physical id : 0 # counts up to 3 for each core id, twice
siblings : 8
core id : 0 # counts up to 3, for each physical id, twice
cpu cores : 8
apicid : 0 # varies
initial apicid : 0 # varies
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid
amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy
abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv
svm_lock nrip_save pausefilter
bogomips : 3999.89
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
physical id and core id exhaust combinations of {0,1,2,3}, twice for some
reason as processor counts from 0 to 31. I would have suspected 64
processors, given the sibling and cpu cores. Am I tripping on a common
misconception?
I included the rest of the info because I still get different numbers than
you do.
$ uname -a
Linux cam-05-unx 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux
$ VERSION=7.0.4;
HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
[("Project name","The Glorious Glasgow Haskell Compilation System")
,("Project version","7.0.4")
,("Booter version","6.12.1")
,("Stage","2")
,("Build platform","x86_64-unknown-linux")
,("Host platform","x86_64-unknown-linux")
,("Target platform","x86_64-unknown-linux")
,("Have interpreter","YES")
,("Object splitting","YES")
,("Have native code generator","YES")
,("Have llvm code generator","YES")
,("Support SMP","YES")
,("Unregisterised","NO")
,("Tables next to code","YES")
,("RTS ways","l debug thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn
thr_debug_dyn")
,("Leading underscore","NO")
,("Debug on","False")
,("LibDir","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4")
,("Global Package
DB","/home/t-nicof/installs/ghc-7.0.4/lib/ghc-7.0.4/package.conf.d")
,("C compiler flags","[\"-fno-stack-protector\"]")
,("Gcc Linker flags","[]")
,("Ld Linker flags","[]")
$ VERSION=7.6.2;
HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; $HC --info
[("Project name","The Glorious Glasgow Haskell Compilation System")
,("GCC extra via C opts"," -fwrapv")
,("C compiler command","/usr/bin/gcc")
,("C compiler flags"," -fno-stack-protector ")
,("ar command","/usr/bin/ar")
,("ar flags","q")
,("ar supports at file","@ArSupportsAtFile@")
,("touch command","touch")
,("dllwrap command","/bin/false")
,("windres command","/bin/false")
,("perl command","/usr/bin/perl")
,("target os","OSLinux")
,("target arch","ArchX86_64")
,("target word size","8")
,("target has GNU nonexec stack","True")
,("target has .ident directive","True")
,("target has subsections via symbols","False")
,("LLVM llc command","llc")
,("LLVM opt command","opt")
,("Project version","7.6.2")
,("Booter version","7.4.1")
,("Stage","2")
,("Build platform","x86_64-unknown-linux")
,("Host platform","x86_64-unknown-linux")
,("Target platform","x86_64-unknown-linux")
,("Have interpreter","YES")
,("Object splitting supported","YES")
,("Have native code generator","YES")
,("Support SMP","YES")
,("Unregisterised","NO")
,("Tables next to code","YES")
,("RTS ways","l debug thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn
thr_debug_dyn")
,("Leading underscore","NO")
,("Debug on","False")
,("LibDir","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2")
,("Global Package
DB","/home/t-nicof/installs/ghc-7.6/lib/ghc-7.6.2/package.conf.d")
,("Gcc Linker
flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
]
== THE NUMBERS ==
With VERSION=7.0.4 or VERSION=7.6.2. (I'm not relying on $PATH, is the only
difference.)
$ HC=/home/t-nicof/installs/ghc-${VERSION}/bin/ghc-${VERSION}; (make clean
&& make boot WithNofibHc=${HC} && make WithNofibHc=${HC}) >& log-${VERSION}
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
bernouilli +3.3% +0.2% +7.2% +7.4% +0.0%
exp3_8 +1.1% +53.7% +55.4% +57.5% +300.0%
gen_regexps +18.6% +3.9% 0.00 0.00 +0.0%
integrate -0.1% +39.0% +110.2% +88.5% +0.0%
kahan +1.7% +41.8% +8.2% +8.0% +0.0%
paraffins +1.3% -1.2% -3.6% -0.8% +0.0%
primes +1.4% +64.7% 0.11 0.11 +50.0%
queens +0.8% -0.2% 0.02 0.02 +0.0%
rfib +1.7% +42.8% 0.03 0.03 +0.0%
tak +0.9% +12.0% 0.02 0.02 +0.0%
wheel-sieve1 +1.4% +66.6% -4.0% -4.3% -17.6%
wheel-sieve2 +1.4% +0.0% -0.2% -2.1% +0.0%
x2n1 +10.3% +41.7% 0.01 0.01 +200.0%
--------------------------------------------------------------------------------
Min -0.1% -1.2% -4.0% -4.3% -17.6%
Max +18.6% +66.6% +110.2% +88.5% +300.0%
Geometric Mean +3.3% +25.6% +19.6% +18.1% +23.0%
I did it twice.
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
bernouilli +3.3% +0.2% +7.0% +7.1% +0.0%
exp3_8 +1.1% +53.7% +56.6% +57.8% +300.0%
gen_regexps +18.6% +3.9% 0.00 0.00 +0.0%
integrate -0.1% +39.0% +102.1% +86.2% +0.0%
kahan +1.7% +41.8% +9.5% +8.9% +0.0%
paraffins +1.3% -1.2% -0.6% -4.8% +0.0%
primes +1.4% +64.7% 0.11 0.11 +50.0%
queens +0.8% -0.2% 0.02 0.02 +0.0%
rfib +1.7% +42.8% 0.03 0.03 +0.0%
tak +0.9% +12.0% 0.02 0.02 +0.0%
wheel-sieve1 +1.4% +66.6% -4.4% -4.3% -17.6%
wheel-sieve2 +1.4% +0.0% -1.1% -2.8% +0.0%
x2n1 +10.3% +41.7% 0.01 0.01 +200.0%
--------------------------------------------------------------------------------
Min -0.1% -1.2% -4.4% -4.8% -17.6%
Max +18.6% +66.6% +102.1% +86.2% +300.0%
Geometric Mean +3.3% +25.6% +19.5% +17.2% +23.0%
Maybe your machine is too fast for nofib-analyse to include exp3_8.
Allocations
-------------------------------------------------------------------------------
Program log-7.0.4 log-7.6.2
-------------------------------------------------------------------------------
bernouilli 303890616 +0.2%
exp3_8 389023528 +53.7%
gen_regexps 304768 +3.9%
integrate 546206856 +39.0%
kahan 700842656 +41.8%
paraffins 56201680 -1.2%
primes 65899520 +64.7%
queens 17387888 -0.2%
rfib 81176 +42.8%
tak 94408 +12.0%
wheel-sieve1 14620056 +66.6%
wheel-sieve2 88734064 +0.0%
x2n1 2491928 +41.7%
-1 s.d. ----- +3.0%
+1 s.d. ----- +53.2%
Average ----- +25.6%
Run Time
-------------------------------------------------------------------------------
Program log-7.0.4 log-7.6.2
-------------------------------------------------------------------------------
bernouilli 0.28 +7.2%
exp3_8 0.21 +55.4%
gen_regexps 0.00 0.00
integrate 0.34 +110.2%
kahan 1.07 +8.2%
paraffins 0.22 -3.6%
primes 0.10 0.11
queens 0.02 0.02
rfib 0.03 0.03
tak 0.01 0.02
wheel-sieve1 0.68 -4.0%
wheel-sieve2 0.37 -0.2%
x2n1 0.00 0.01
-1 s.d. ----- -9.3%
+1 s.d. ----- +57.7%
Average ----- +19.6%
And here are the results using the "mode=slow" Nofib option. Only
bernouilli and gen_regexps do not have SLOW_OPTS defined in their Makefile.
It's odd that gen_regexps shows such drastic change then...
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
bernouilli +3.3% +0.2% +6.7% +7.4% +0.0%
exp3_8 +1.1% +68.2% +20.2% +20.3% +100.0%
gen_regexps +18.6% -1.2% -6.3% -6.0% +0.0%
integrate -0.1% +39.0% +114.3% +104.9% +3.9%
kahan +1.7% +41.9% +7.5% +7.5% +0.0%
paraffins +1.3% -1.2% -3.3% -2.7% +0.3%
primes +1.4% +57.9% +0.8% +1.0% +0.0%
queens +0.8% -0.2% -1.7% -2.0% +0.0%
rfib +1.7% +42.8% +6.3% +6.0% +0.0%
tak +0.9% +12.0% -2.4% -2.5% +0.0%
wheel-sieve1 +1.4% +99.2% -3.4% -3.4% +58.8%
wheel-sieve2 +1.4% -0.1% -3.6% -3.8% +0.0%
x2n1 +10.3% +43.1% 0.13 0.13 +1300.0%
--------------------------------------------------------------------------------
Min -0.1% -1.2% -6.3% -6.0% +0.0%
Max +18.6% +99.2% +114.3% +104.9% +1300.0%
Geometric Mean +3.3% +27.4% +8.2% +7.8% +34.3%
Allocations
-------------------------------------------------------------------------------
Program log-slow-7.0.4 log-slow-7.6.2
-------------------------------------------------------------------------------
bernouilli 303890616 +0.2%
exp3_8 3500234960 +68.2%
gen_regexps 780759064 -1.2%
integrate 1092338624 +39.0%
kahan 2797648296 +41.9%
paraffins 363166288 -1.2%
primes 861820872 +57.9%
queens 569243336 -0.2%
rfib 81488 +42.8%
tak 94408 +12.0%
wheel-sieve1 24134568 +99.2%
wheel-sieve2 160800936 -0.1%
x2n1 19375720 +43.1%
-1 s.d. ----- +1.2%
+1 s.d. ----- +60.4%
Average ----- +27.4%
Run Time
-------------------------------------------------------------------------------
Program log-slow-7.0.4 log-slow-7.6.2
-------------------------------------------------------------------------------
bernouilli 0.28 +6.7%
exp3_8 3.39 +20.2%
gen_regexps 1.90 -6.3%
integrate 0.68 +114.3%
kahan 4.35 +7.5%
paraffins 1.93 -3.3%
primes 1.51 +0.8%
queens 0.72 -1.7%
rfib 0.31 +6.3%
tak 1.58 -2.4%
wheel-sieve1 2.28 -3.4%
wheel-sieve2 0.77 -3.6%
x2n1 0.04 0.13
-1 s.d. ----- -12.9%
+1 s.d. ----- +34.3%
Average ----- +8.2%
On Tue, Feb 12, 2013 at 3:17 AM, Johan Tibell <johan.tibell at gmail.com>wrote:
> Hi Nicolas!
>
> I tried to reproduce the difference between 7.0.4 and 7.6.2 on the exp3_8,
> wheel-sieve1, and primes and couldn't get the same percent difference as
> you. We need to reconcile these differences somehow. Lets start with more
> exact machine specs. I have a:
>
> $ cat /proc/cpuinfo
> processor : 7
> vendor_id : GenuineIntel
> cpu family : 6
> model : 58
> model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> stepping : 9
> microcode : 0x12
> cpu MHz : 1600.000
> cache size : 8192 KB
> ...
>
> $ uname -a
> Linux johantibell.com 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> And GHC versions:
>
> $ ghc-7.0.4 --info
> [("Project name","The Glorious Glasgow Haskell Compilation System")
> ,("Project version","7.0.4")
> ,("Booter version","6.12.1")
> ,("Stage","2")
> ,("Build platform","x86_64-unknown-linux")
> ,("Host platform","x86_64-unknown-linux")
> ,("Target platform","x86_64-unknown-linux")
> ,("Have interpreter","YES")
> ,("Object splitting","YES")
> ,("Have native code generator","YES")
> ,("Have llvm code generator","YES")
> ,("Support SMP","YES")
> ,("Unregisterised","NO")
> ,("Tables next to code","YES")
> ,("RTS ways","l debug thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn
> thr_debug_dyn")
> ,("Leading underscore","NO")
> ,("Debug on","False")
> ,("LibDir","/usr/local/lib/ghc-7.0.4")
> ,("Global Package DB","/usr/local/lib/ghc-7.0.4/package.conf.d")
> ,("C compiler flags","[\"-fno-stack-protector\"]")
> ,("Gcc Linker flags","[]")
> ,("Ld Linker flags","[]")
> ]
>
> $ ghc-7.6.2 --info
> [("Project name","The Glorious Glasgow Haskell Compilation System")
> ,("GCC extra via C opts"," -fwrapv")
> ,("C compiler command","/usr/bin/gcc")
> ,("C compiler flags"," -fno-stack-protector ")
> ,("ar command","/usr/bin/ar")
> ,("ar flags","q")
> ,("ar supports at file","@ArSupportsAtFile@")
> ,("touch command","touch")
> ,("dllwrap command","/bin/false")
> ,("windres command","/bin/false")
> ,("perl command","/usr/bin/perl")
> ,("target os","OSLinux")
> ,("target arch","ArchX86_64")
> ,("target word size","8")
> ,("target has GNU nonexec stack","True")
> ,("target has .ident directive","True")
> ,("target has subsections via symbols","False")
> ,("LLVM llc command","llc")
> ,("LLVM opt command","opt")
> ,("Project version","7.6.2")
> ,("Booter version","7.4.1")
> ,("Stage","2")
> ,("Build platform","x86_64-unknown-linux")
> ,("Host platform","x86_64-unknown-linux")
> ,("Target platform","x86_64-unknown-linux")
> ,("Have interpreter","YES")
> ,("Object splitting supported","YES")
> ,("Have native code generator","YES")
> ,("Support SMP","YES")
> ,("Unregisterised","NO")
> ,("Tables next to code","YES")
> ,("RTS ways","l debug thr thr_debug thr_l thr_p dyn debug_dyn thr_dyn
> thr_debug_dyn")
> ,("Leading underscore","NO")
> ,("Debug on","False")
> ,("LibDir","/usr/local/lib/ghc-7.6.2")
> ,("Global Package DB","/usr/local/lib/ghc-7.6.2/package.conf.d")
> ,("Gcc Linker
> flags","[\"-Wl,--hash-size=31\",\"-Wl,--reduce-memory-overheads\"]")
> ,("Ld Linker flags","[\"--hash-size=31\",\"--reduce-memory-overheads\"]")
> ]
>
> I ran the benchmarks by running e.g.:
>
> $ cd nofib/imaginary/sieve-wheel1
> $ make clean && make boot WithNofibHc=ghc-${VERSION} && make
> WithNofibHc=ghc-${VERSION}
>
> Could you please try to run the "imaginary" benchmarks using exactly these
> commands and report the difference you see between 7.0.4 and 7.6.2. Here's
> what I see. 7.0.4 vs 7.6.2:
>
>
> --------------------------------------------------------------------------------
> Program Size Allocs Runtime Elapsed TotalMem
>
> --------------------------------------------------------------------------------
> bernouilli +3.3% +0.2% 0.12 0.13 +0.0%
> exp3_8 +1.1% +53.7% 0.14 0.14 +300.0%
> gen_regexps +18.7% +3.9% 0.00 0.00 +0.0%
> integrate -0.1% +39.0% 0.21 0.23 +0.0%
> kahan +1.7% +98.6% +9.9% +7.3% +0.0%
> paraffins +1.3% -1.2% 0.06 0.08 +0.0%
> primes +1.4% +64.7% 0.04 0.05 +50.0%
> queens +0.8% -0.5% 0.02 0.02 +0.0%
> rfib +1.7% +42.8% 0.02 0.02 +0.0%
> tak +0.9% +12.0% 0.01 0.01 +0.0%
> wheel-sieve1 +0.8% +66.6% -4.6% -5.8% -12.5%
> wheel-sieve2 +0.9% +0.0% 0.12 0.13 +0.0%
> x2n1 +10.3% +87.3% 0.00 0.01 +200.0%
>
> --------------------------------------------------------------------------------
> Min -0.1% -1.2% -4.6% -5.8% -12.5%
> Max +18.7% +98.6% +9.9% +7.3% +300.0%
> Geometric Mean +3.2% +31.7% +2.4% +0.5% +23.6%
>
> -- Johan
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130212/e8e553af/attachment-0001.htm>
More information about the ghc-devs
mailing list