[GHC] #5218: Add unpackCStringLen# to create Strings from string literals
GHC
ghc-devs at haskell.org
Wed Jun 15 10:48:46 UTC 2016
#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
Reporter: tibbe | Owner: thoughtpolice
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.0.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #5877 #10064 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by jscholl):
How about instead of adding a new type {{{String#}}}, as suggested in
#10064, or adding a function {{{unpackCStringLen#}}}, we add the ability
to query the size of the payload of an {{{Addr#}}} at compile-time? We
could provide a function which, given an {{{Addr#}}} constant, turns this
into a {{{(# Int#, Addr# #)}}} pair without introducing a new type, thus
keeping the overall changes low and the design flexible.
How do we get the length at compile-time? We use a special builtin
rewrite-rule, which writes the length to the appropiate places. For
example:
{{{
{-# INLINE[0] viewCString# #-}
viewCString# :: Addr# -> (# Int#, Addr# #)
viewCString# addr# = (# -1#, addr# #)
{-# RULES "viewCString#" forall addr . viewCString# addr = (# <length of
addrs pointee>#, addr #) #-}
}}}
Library code could then use {{{viewCString#}}} to try to determine the
length at compile-time, and, if optimizations are enabled, the call gets
rewritten to the correct result. Otherwise, {{{viewCString#}}} inlines in
phase 0, the resulting -1 is seen by the library code and the code is
simplified to determine the length at runtime, like it does today.
Why does {{{viewCString#}}} return the {{{Addr#}}} again? If it does not,
the {{{Addr#}}} given to {{{viewCString#}}} will be used multiple times,
thus, GHC will bind it in some let, complicating the design of the rule.
If the function returns it, the library can continue to use the returned
{{{Addr#}}}, and GHC will less likely share it.
One could go even further and extend {{{viewCString#}}} to handle two
additional cases: Converting the encoding at compile-time as well as
determining the number of characters of the string. So {{{viewCString#}}}
would become:
{{{viewCString# :: Int# -> Addr# -> (# Int#, Addr#, Int#, Int# #)}}}
The first input determines the requested mode of operation (only count
bytes and characters, convert to utf16le/be, utf32le/be). The first output
{{{Int#}}} determines the performed operation, it should always either be
the input {{{Int#}}} or some "No operation performed" code. The other two
{{{Int#}}} results are the number of bytes and number of characters, and
the {{{Addr#}}} contains the potentially converted literal. Of course, an
interface passing around magic {{{Int#}}}s is not the nicest, but I think,
this is quite low-level code and only a few libraries like text and
bytestring will have to deal with it.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:36>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list