[GHC] #5218: Add unpackCStringLen# to create Strings from string literals

Fri Jul 28 01:17:59 UTC 2017

#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
        Reporter:  tibbe             |                Owner:  thoughtpolice
            Type:  feature request   |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.0.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5877 #10064      |  Differential Rev(s):  Phab:D2443
  #11312                             |
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by jscholl):

 Thinking about the problem again I decided to try to add {{{ByteArray#}}}
 literals to GHC. The idea is the following:
  - Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence
 my try for a {{{String#}}} type.
  - Provide
 {{{#!haskell
 unpackStringLit# :: ByteArray# -> [Char]
 {-# INLINE[1] unpackStringLit# #-}
 unpackStringLit# ba# =
   unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#)
 }}}
  - Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}}
  - Let rewrites fire in phase 2.
  - In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to
 {{{unpackCStringWithLen# "foo"# 3#}}}
  - Thus most {{{ByteArray#}}}s should get eliminated and binary size
 should stay more or less the same.
  - If someone rewrites something like {{{ByteString.pack (unpackStringLit#
 lit)}}}, the literal is not eliminated and emitted to the binary. Thus a
 {{{ByteString}}} literal can increase binary size. However, I think this
 is what we want because we save making a copy of the data.
  - The downside is that turning optimization off causes the compiler to
 create a {{{ByteArray#}}} for every string literal instead of a c-string.
 GHCi will also allocate {{{ByteArray#}}}s instead of string literals
 directly.

 I currently implemented the new literal type, extended the parser, changed
 the desugaring, added the needed rules, taught GHCi to handle
 {{{ByteArray#}}} literals, and generated cmm Code for them. I still have
 to look at all parts involved for fusion and other string rules to work,
 check how the change affects bootstrapping with an older compiler, take a
 look at template haskell, and whether there are any typeable/generic
 things involved.

 I don't know how everyone feels about adding another literal type
 (especially because there are now two similar types, {{{Addr#}}} and
 {{{ByteArray#}}}, but if we want to keep binary sizes down, we need some
 form of {{{Addr#}}}, and it seems like having {{{ByteArray#}}} is
 beneficial too). Or whether it is reasonable to provide syntax for
 {{{ByteArray#}}}s (letting the compiler generate them would be enough for
 this ticket). But right now implementing it like this feels a lot better
 than my previous approach using unboxed tuples.

 And I am sorry for not saying or doing anything such a long time.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:74>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler