[GHC] #5987: Too many symbols in ghc package DLL

GHC ghc-devs at haskell.org
Tue Jun 21 13:22:22 UTC 2016


#5987: Too many symbols in ghc package DLL
---------------------------------+----------------------------------------
        Reporter:  igloo         |                Owner:  Phyx-
            Type:  bug           |               Status:  new
        Priority:  normal        |            Milestone:
       Component:  Compiler      |              Version:  7.5
      Resolution:                |             Keywords:
Operating System:  Windows       |         Architecture:  Unknown/Multiple
 Type of failure:  None/Unknown  |            Test Case:
      Blocked By:                |             Blocking:  5355
 Related Tickets:                |  Differential Rev(s):
       Wiki Page:                |
---------------------------------+----------------------------------------
Changes (by Phyx-):

 * owner:   => Phyx-


Comment:

 I have recently taken a look at this and have an almost working version
 that should solve the problem once and for all.

 First, we don't actually have that many symbols to go over the limit. Or
 it seems we don't. If I measure amount of symbols in the input object
 files going into the link and the amount coming out, the difference is
 huge.

 Looking at it further this is because of two things. We never explicitly
 use `__declspec`, we just change the names of the functions to match the
 conventions that `__declspec` would use. This is fine, but it means that
 binutil's default of `--export-all-symbols` is still enabled. Which means,
 we'll re-export any symbol we link from archives as well.

 Given that `-dynamic` on Windows always produces an import library
 `.dll.a` and the search order for `ld` is

 {{{
 libxxx.dll.a
 xxx.dll.a
 libxxx.a
 cygxxx.dll (*)
 libxxx.dll
 xxx.dll
 }}}

 Then we always end up picking the import lib. This is recursive, we link
 against `gmp`, `base` etc. By the time it gets to `GHC` the resulting
 import lib is huge and hence we blow passed the number of symbols. Also
 `kernel` and `gdi32` and `mingwex` etc are all import libraries for GCC.
 So we accumulate a ton of symbols from there as well.

 So the first thing my changes do is only export symbols defined in the
 input object files.

 This not only drastically reduces the size of the resulting DLLs and
 import libraries, it also pushes the number of symbols way way below the
 limit.

 In fact I got rid of `dll-split` all together and allow all symbols to go
 into the same dll and we end up with

 {{{
 $ nm -g "R:\ghc\libHSghc-8.1-ghc8.1.20160617.dll" | wc -l
 49610
 }}}

 This down from ~240,000 (mingwex and mingw32 are huge for instance).

 The second thing my build changes do is that in order to prevent this from
 happening again, I implemented an automatic partitioning scheme which
 requires no special treatment of the split dlls.

 In case we hit the limit again, the build script will automatically detect
 this and do the following:

 It will split the symbols up per object file input. So that all symbols of
 the same object file are in the same DLL.

 Like @rassilon suggested before, I'm using `import libraries` to break the
 dependencies. So the specific grouping doesn't matter.

 The import libraries point to the location of the dll which contains the
 symbol:

 {{{
 LIBRARY "libHSCabal-1.25.0.0-ghc8.1.20160617-pt2.dll"
 EXPORTS
     "__stginit_Cabalzm1zi25zi0zi0_DistributionziCompatziBinary"
     "__stginit_Cabalzm1zi25zi0zi0_DistributionziCompatziCopyFile"
 ...
 }}}

 And these are used to break the dependencies.

 We then end up with smaller dlls with the suffix `-pt<num>.dll` and their
 import libraries.

 The next step is to produce one large/merged import library with the name
 of the dll we were originally supposed to create.
 `libHSCabal-1.25.0.0-ghc8.1.20160617.dll.a` which is just a merging of the
 different `-pt` import files.

 This has the effect that when `-lHSCabal-1.25.0.0-ghc8.1.20160617` is used
 as the link argument (which we do), the import lib is found and the linker
 puts a reference to the right dlls. No extra/special handling is needed by
 any other tool.

 Using the import libraries essentially removes the limit, since each
 symbol is an object file in the archive.

 (note that while I recently added support for import libraries to GHCi,
 this support only extends to single dll import libraries. It needs some
 minor modifications to support this too but LD should work fine.)

 This works fine, and I can successfully compile a dynamic version of GHC
 and the program runs (but segfaults due to a piece of bit rotted code I'm
 looking at).

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5987#comment:54>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list