[GHC] #9577: String literals are wasting space
GHC
ghc-devs at haskell.org
Tue Sep 16 07:37:41 UTC 2014
#9577: String literals are wasting space
-------------------------------------+-------------------------------------
Reporter: xnyhps | Owner: xnyhps
Type: bug | Status: new
Priority: low | Milestone:
Component: Compiler | Version: 7.8.2
(NCG) | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Unknown
Unknown/Multiple | Blocked By:
Type of failure: Runtime | Related Tickets:
performance bug |
Test Case: |
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Comment (by xnyhps):
Replying to [comment:4 dfeuer]:
> Replying to [comment:3 xnyhps]:
> > The main argument in favor of alignment seems to be: code often
`memcpy`s string literals into buffers. By doing that with aligned
addresses (apparently) SSE instructions can be used. This is irrelevant
for GHC, because the strings are only parsed into `[Char]`s, never copied.
>
> Will that always be the case if a string literal represents something
like `Text` or `ByteString`? If so, will that continue to hold in the
future? Might a future optimization fuse `putStr` with the conversion to
do a copy? It may be that these concerns are baseless, but it might make
sense to consider what alternative optimizations yours could preclude.
For the record, this is the rewrite rule used by ByteString:
*
https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L250,
calling
https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L271.
This just wraps the `Addr#` directly, no copying here. However,
[https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L522
append] does call `memcpy` twice. I don't think GHC has the kind of
optimizations that can turn a `memcpy` call into SIMD instructions
directly, but maybe `memcpy` is more efficient when called with aligned
buffers. I'll try to test this.
And these are the rewrite rules for text:
*
https://github.com/bos/text/blob/e33c89be4256fdd1c31f39d8a2a63e58e23b0182/Data/Text.hs#L409
calling
https://github.com/bos/text/blob/e33c89be4256fdd1c31f39d8a2a63e58e23b0182/Data/Text/Internal/Fusion/Common.hs#L144
A loop similar to `unpackCString#`, so alignment won't matter much.
It is true that we might find optimizations later that benefit from
aligned strings. But unaligning them now doesn't preclude that. Literals
only exist within a single module, so any optimization has control over
both the literal and the code that uses it. (`Strings` can be exported,
but `Addr#`s can't.) Even if someone would try to mix object files
generated by different versions of GHC it wouldn't be a problem.
> You mention that there are a lot of string literals in the Prelude. I
would bet that the vast majority of those are error messages. Might it be
possible to specifically target ''exceptional'' strings that should never
be anywhere speed-critical, and pack them all together? Putting them all
together, ideally starting or ending on a page boundary, would (hopefully)
mean that they wouldn't even need to be swapped in unless an error
occurred.
I'm not familiar enough with assembly or executable file formats to say
whether this is possible, but I'll keep it in mind.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9577#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list