[GHC] #9577: String literals are wasting space
GHC
ghc-devs at haskell.org
Thu Sep 11 20:39:40 UTC 2014
#9577: String literals are wasting space
-------------------------------------+-------------------------------------
Reporter: xnyhps | Owner:
Type: bug | Status: new
Priority: low | Milestone:
Component: Compiler (NCG) | Version: 7.8.2
Keywords: | Operating System:
Architecture: Unknown/Multiple | Unknown/Multiple
Difficulty: Unknown | Type of failure: Runtime
Blocked By: | performance bug
Related Tickets: | Test Case:
| Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
For [https://phabricator.haskell.org/D199 D199] I looked into how string
literals are compiled down by GHC.
On 64-bit OS X, a simple string `"AAA"` turns into assembly:
{{{
.const
.align 3
.align 0
c38E_str:
.byte 65
.byte 65
.byte 65
.byte 0
}}}
(And also something that invokes `unpackCString#`, but that isn't relevant
here.)
(`MkCore.mkStringExprFS` -> `CmmUtils.mkByteStringCLit` ->
`compiler/nativeGen/X86/Ppr.pprSectionHeader`.)
Note how this:
* Is 8 byte aligned.
* Is a `.const` section.
I can't find any reason why string literals would need to be 8-byte
aligned on OS X. There might be a small benefit in performance to read
data starting 8-byte aligned, but I doubt doing that for string literals
would be a meaningful difference. Assembly from both clang and gcc does
not align string literals.
The trivial program:
{{{#!hs
main :: IO ()
main = return ()
}}}
has almost 5kB of wasted space of padding between all strings the Prelude
brings in, built with GHC HEAD.
The fact that it is a `.const` section, instead of `.cstring`
(https://developer.apple.com/library/mac/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html#//apple_ref/doc/uid/TP30000823-TPXREF127)
means duplicate strings aren't shared by the assembler. GHC floats out
string literals to the top-level and uses CSE to eliminate duplicates, but
that only works in a single modules. Strings shared between different
modules end up as duplicate strings in an executable.
The same program as above also has ~4kB of wasted space due to duplicate
Prelude strings (`"base"` occurs 16 times!). Compared to the total binary
size (4MB after stripping), removing this redundant data wouldn't be a big
improvement (0.2%), but I still think it can be a worthwile optimization.
I think this can be solved quite easily by creating a new section header
for literal strings, which is unaligned and of type `.cstring`.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9577>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list