[GHC] #9577: String literals are wasting space

GHC ghc-devs at haskell.org
Tue Sep 16 07:37:41 UTC 2014


#9577: String literals are wasting space
-------------------------------------+-------------------------------------
              Reporter:  xnyhps      |            Owner:  xnyhps
                  Type:  bug         |           Status:  new
              Priority:  low         |        Milestone:
             Component:  Compiler    |          Version:  7.8.2
  (NCG)                              |         Keywords:
            Resolution:              |     Architecture:  Unknown/Multiple
      Operating System:              |       Difficulty:  Unknown
  Unknown/Multiple                   |       Blocked By:
       Type of failure:  Runtime     |  Related Tickets:
  performance bug                    |
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------

Comment (by xnyhps):

 Replying to [comment:4 dfeuer]:
 > Replying to [comment:3 xnyhps]:
 > > The main argument in favor of alignment seems to be: code often
 `memcpy`s string literals into buffers. By doing that with aligned
 addresses (apparently) SSE instructions can be used. This is irrelevant
 for GHC, because the strings are only parsed into `[Char]`s, never copied.
 >
 > Will that always be the case if a string literal represents something
 like `Text` or `ByteString`? If so, will that continue to hold in the
 future? Might a future optimization fuse `putStr` with the conversion to
 do a copy? It may be that these concerns are baseless, but it might make
 sense to consider what alternative optimizations yours could preclude.

 For the record, this is the rewrite rule used by ByteString:

 *
 https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L250,
 calling
 https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L271.

   This just wraps the `Addr#` directly, no copying here. However,
 [https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L522
 append] does call `memcpy` twice. I don't think GHC has the kind of
 optimizations that can turn a `memcpy` call into SIMD instructions
 directly, but maybe `memcpy` is more efficient when called with aligned
 buffers. I'll try to test this.

 And these are the rewrite rules for text:

 *
 https://github.com/bos/text/blob/e33c89be4256fdd1c31f39d8a2a63e58e23b0182/Data/Text.hs#L409
 calling
 https://github.com/bos/text/blob/e33c89be4256fdd1c31f39d8a2a63e58e23b0182/Data/Text/Internal/Fusion/Common.hs#L144

   A loop similar to `unpackCString#`, so alignment won't matter much.


 It is true that we might find optimizations later that benefit from
 aligned strings. But unaligning them now doesn't preclude that. Literals
 only exist within a single module, so any optimization has control over
 both the literal and the code that uses it. (`Strings` can be exported,
 but `Addr#`s can't.) Even if someone would try to mix object files
 generated by different versions of GHC it wouldn't be a problem.

 > You mention that there are a lot of string literals in the Prelude. I
 would bet that the vast majority of those are error messages. Might it be
 possible to specifically target ''exceptional'' strings that should never
 be anywhere speed-critical, and pack them all together? Putting them all
 together, ideally starting or ending on a page boundary, would (hopefully)
 mean that they wouldn't even need to be swapped in unless an error
 occurred.

 I'm not familiar enough with assembly or executable file formats to say
 whether this is possible, but I'll keep it in mind.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9577#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list