[GHC] #5218: Add unpackCStringLen# to create Strings from string literals

GHC ghc-devs at haskell.org
Wed Aug 16 03:35:03 UTC 2017


#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
        Reporter:  tibbe             |                Owner:  thoughtpolice
            Type:  feature request   |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.0.3
      Resolution:                    |             Keywords:  strings
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5877 #10064      |  Differential Rev(s):  Phab:D2443
  #11312, #9719                      |
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by winter):

 Replying to [comment:74 jscholl]:
 > Thinking about the problem again I decided to try to add
 {{{ByteArray#}}} literals to GHC. The idea is the following:
 >  - Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence
 my try for a {{{String#}}} type.
 >  - Provide
 > {{{#!haskell
 > unpackStringLit# :: ByteArray# -> [Char]
 > {-# INLINE[1] unpackStringLit# #-}
 > unpackStringLit# ba# =
 >   unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#)
 > }}}
 >  - Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}}
 >  - Let rewrites fire in phase 2.
 >  - In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to
 {{{unpackCStringWithLen# "foo"# 3#}}}
 >  - Thus most {{{ByteArray#}}}s should get eliminated and binary size
 should stay more or less the same.
 >  - If someone rewrites something like {{{ByteString.pack
 (unpackStringLit# lit)}}}, the literal is not eliminated and emitted to
 the binary. Thus a {{{ByteString}}} literal can increase binary size.
 However, I think this is what we want because we save making a copy of the
 data.
 >  - The downside is that turning optimization off causes the compiler to
 create a {{{ByteArray#}}} for every string literal instead of a c-string.
 GHCi will also allocate {{{ByteArray#}}}s instead of string literals
 directly.
 >

 The problem is, old `Addr#` unpacking differ from `ByteArray#` unpacking
 in that they are not the same encoding: they don't agree on how `\NUL`
 char get encoded, (at least I'm expecting `ByteArray#` is standard UTF-8
 encoded). So you can't cast them with rewrite rules like that: you have to
 mention the encoding pitfall.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:81>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list