[GHC] #5218: Add unpackCStringLen# to create Strings from string literals

GHC ghc-devs at haskell.org
Wed Jun 15 10:48:46 UTC 2016


#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
        Reporter:  tibbe             |                Owner:  thoughtpolice
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.0.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5877 #10064      |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by jscholl):

 How about instead of adding a new type {{{String#}}}, as suggested in
 #10064, or adding a function {{{unpackCStringLen#}}}, we add the ability
 to query the size of the payload of an {{{Addr#}}} at compile-time? We
 could provide a function which, given an {{{Addr#}}} constant, turns this
 into a {{{(# Int#, Addr# #)}}} pair without introducing a new type, thus
 keeping the overall changes low and the design flexible.

 How do we get the length at compile-time? We use a special builtin
 rewrite-rule, which writes the length to the appropiate places. For
 example:

 {{{
 {-# INLINE[0] viewCString# #-}
 viewCString# :: Addr# -> (# Int#, Addr# #)
 viewCString# addr# = (# -1#, addr# #)

 {-# RULES "viewCString#" forall addr . viewCString# addr = (# <length of
 addrs pointee>#, addr #) #-}
 }}}

 Library code could then use {{{viewCString#}}} to try to determine the
 length at compile-time, and, if optimizations are enabled, the call gets
 rewritten to the correct result. Otherwise, {{{viewCString#}}} inlines in
 phase 0, the resulting -1 is seen by the library code and the code is
 simplified to determine the length at runtime, like it does today.

 Why does {{{viewCString#}}} return the {{{Addr#}}} again? If it does not,
 the {{{Addr#}}} given to {{{viewCString#}}} will be used multiple times,
 thus, GHC will bind it in some let, complicating the design of the rule.
 If the function returns it, the library can continue to use the returned
 {{{Addr#}}}, and GHC will less likely share it.

 One could go even further and extend {{{viewCString#}}} to handle two
 additional cases: Converting the encoding at compile-time as well as
 determining the number of characters of the string. So {{{viewCString#}}}
 would become:

 {{{viewCString# :: Int# -> Addr# -> (# Int#, Addr#, Int#, Int# #)}}}

 The first input determines the requested mode of operation (only count
 bytes and characters, convert to utf16le/be, utf32le/be). The first output
 {{{Int#}}} determines the performed operation, it should always either be
 the input {{{Int#}}} or some "No operation performed" code. The other two
 {{{Int#}}} results are the number of bytes and number of characters, and
 the {{{Addr#}}} contains the potentially converted literal. Of course, an
 interface passing around magic {{{Int#}}}s is not the nicest, but I think,
 this is quite low-level code and only a few libraries like text and
 bytestring will have to deal with it.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:36>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list