[GHC] #7367: Optimiser / Linker Problem on amd64

Thu Aug 29 04:02:07 UTC 2013

#7367: Optimiser / Linker Problem on amd64
--------------------------------------------+------------------------------
        Reporter:  wurmli                   |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:  7.8.1
       Component:  Build System             |          Version:  7.6.1
      Resolution:                           |         Keywords:
Operating System:  Linux                    |     Architecture:  x86_64
 Type of failure:  Runtime performance bug  |  (amd64)
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by wurmli):

 Replying to [comment:12 rwbarton]:
 > wurmli, what's the matter with it?
 >
 > "800,100,272 bytes allocated in the heap" means that the total size of
 all the allocations done over the course of the program is 800,100,272
 bytes.  That's the expected size of 20 million (Int, Int) pairs which
 share their second field (`n`), plus a small amount of other stuff.  It
 doesn't have anything to do with the size of the heap at any given time.
 The maximum heap size is shown separately: "50,520 bytes maximum
 residency" which is quite reasonable.
 >
 > Similarly your original program does not ever occupy 10 GB of heap at a
 time.  If you look at the process in top you will see a memory usage close
 to "47,184 bytes maximum residency" (well probably more like a couple MB,
 to hold the program image, but not anything near 10 GB).
 >
 > I have no idea why the original program timed out on the language
 benchmark machines, but it wasn't due to it allocating 10 GB sequentially.
 Allocation of short-lived objects is very cheap.  But it is not free, and
 this discussion has been about why current GHC produces a program that
 allocates a lot when GHC 7.4 did not.  Eliminating the large amount of
 allocation might reduce the runtime by a few percent or so.

 Would you agree that it is reasonable to expect the optimiser to optimise
 these allocations away? My simple assumption about the fannkuch program is
 that speed is enhanced if memory use stays local. The more only registers
 and cache are used the faster the program runs. With the repeated
 allocation of an intermediary variable the cache might be exhausted and
 the processor might have to copy in and out of cache what could slow down
 the program.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/7367#comment:13>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler