[GHC] #7367: Optimiser / Linker Problem on amd64

Fri Aug 30 05:51:52 UTC 2013

#7367: Optimiser / Linker Problem on amd64
--------------------------------------------+------------------------------
        Reporter:  wurmli                   |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:  7.8.1
       Component:  Build System             |          Version:  7.6.1
      Resolution:                           |         Keywords:
Operating System:  Linux                    |     Architecture:  x86_64
 Type of failure:  Runtime performance bug  |  (amd64)
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by carter):

 Let me preface that I'm very likely not following this thread. So please
 view my following remarks as also being questions for clarification.

 I'm trying to follow this thread:

 1.  the issue initially was  that theres overally agressive let floating?
 I believe the way Manuel addresses this in his Repa Code is by using the
 touch function to prevent let floating, right?

 2. currently: its now a question about having a more systematic way of
 soundly handling the cost model of let floatings and when to do them?

 @Hans / Wurmli
 As a haskell programmer, you can absolutely write great performant low
 level code in haskell (or cheaply ffi out if need be). It does require
 really understanding how GHC works, and how it compiles code. Really
 performant haskell does not treat the compiler as a black box, but rather
 as a partner to in a conversation. I have some fun examples of bit
 fiddling haskell code that turns into exactly the straight line register
 manipulations I"d hope any language to generate. But to really write HPC
 grade code, you have to really understand your tools!

 The "standard" way to systematically write very very performant code in
 haskell, is to first design a library with the "right" abstractions, and
 in tandem, have a "dialogue" where you figure out how to give the library
 internals a representation that GHC can aggressively fuse / simplfify away
 to make things fast.  The Vector Library does this with stream fusion, and
 the GPU lib Accelerate and the CPU libs Repa 3 / Repa 4 libs all have very
 nice views on some parts of this methodology (I have my own, different
 take in progress that adds some interest twists too). In some respects,
 its also an ongoing exploratory engineering effort and research effort to
 make it better and better.

 point being: there is no magical compiler, merely compilers that can
 "collaborate" with the library author. GHC does an great (and getting even
 better over time) job of this.  If you have specific optimizations you'd
 want, please illustrate what the "input" and "result" codes from the
 optimization would be! Humans, given enough time, often are the best
 optimizers, so the best a compiler can do is support library authors
 writing easy to optimize libraries!

 Importantly: currently GHC doesn't pass much aliasing information to code
 generators, though for numerical / bit fiddling codes, LLVM can do some
 tremendously amazing optimziations.  There will also be great support for
 some basic simd code writing in 7.8.

 That said, after 7.8 release, and before 7.10 lands, I think its pretty
 fair to say that a lot of great work will be happening to better support
 GHC having a good numerical story. If nothing else, its something that I
 (time permitting), want to improve/help with.

 That said: in the mean time, its ok to have "fat primops" written in C
 that you ffi out to, and having all your application / numerical logic,
 and memory management be on the haskell side. I'm actually doing that
 quite a bit in my own codes, and while theres plenty of room for even
 better performance, even with that naive approach I'm able to get
 temptingly close to Ye Olde ancient but really really fast Fortran Grade
 performance with fairly little effort on my part.

 I hope i'm contributing to this thread with these questions and remarks :)

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/7367#comment:16>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler