[GHC] #5218: Add unpackCStringLen# to create Strings from string literals

GHC ghc-devs at haskell.org
Thu Apr 6 17:59:10 UTC 2017


#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
        Reporter:  tibbe             |                Owner:  thoughtpolice
            Type:  feature request   |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.0.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5877 #10064      |  Differential Rev(s):  Phab:D2443
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by bgamari):

 > If i remember correctly, a `ByteArray#` would have an extra header and a
 length field, which in turn bring a 2 words overhead, one word more
 compareing to the `(# int#, addr# #)` solution.

 Right, this would incur another word of overhead. However, on the majority
 of machines this is 8-bytes which is quite significant. Looking at GHC
 itself, just over a third of all string literals are 8 characters or less
 (6000 our of 17500). For these literals adding another word would increase
 the fractional overhead from >50% to >67%.

 I have spoken with GHC users targeting mobile platforms who already suffer
 from our code size; it's hard to justify such an increase without a very
 good reason.


 > But i would argue this overhead can bring a nice solution to ghc's long
 lasting literal problem, for example, vector package and text package can
 provide some TH to directly save some byteArray# literal using hexadecimal
 notation, this save many extra copying during runtime.

 What is stopping these libraries from providing this mechanism currently
 using `Addr#` and primitive strings directly?

 In general primitive strings are, as the name would suggest, primitive.
 I'm not sure forcing a heap object representation here is necessary nor
 prudent.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:59>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list