[GHC] #5218: Add unpackCStringLen# to create Strings from string literals
GHC
ghc-devs at haskell.org
Fri Jul 28 01:17:59 UTC 2017
#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
Reporter: tibbe | Owner: thoughtpolice
Type: feature request | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 7.0.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443
#11312 |
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by jscholl):
Thinking about the problem again I decided to try to add {{{ByteArray#}}}
literals to GHC. The idea is the following:
- Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence
my try for a {{{String#}}} type.
- Provide
{{{#!haskell
unpackStringLit# :: ByteArray# -> [Char]
{-# INLINE[1] unpackStringLit# #-}
unpackStringLit# ba# =
unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#)
}}}
- Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}}
- Let rewrites fire in phase 2.
- In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to
{{{unpackCStringWithLen# "foo"# 3#}}}
- Thus most {{{ByteArray#}}}s should get eliminated and binary size
should stay more or less the same.
- If someone rewrites something like {{{ByteString.pack (unpackStringLit#
lit)}}}, the literal is not eliminated and emitted to the binary. Thus a
{{{ByteString}}} literal can increase binary size. However, I think this
is what we want because we save making a copy of the data.
- The downside is that turning optimization off causes the compiler to
create a {{{ByteArray#}}} for every string literal instead of a c-string.
GHCi will also allocate {{{ByteArray#}}}s instead of string literals
directly.
I currently implemented the new literal type, extended the parser, changed
the desugaring, added the needed rules, taught GHCi to handle
{{{ByteArray#}}} literals, and generated cmm Code for them. I still have
to look at all parts involved for fusion and other string rules to work,
check how the change affects bootstrapping with an older compiler, take a
look at template haskell, and whether there are any typeable/generic
things involved.
I don't know how everyone feels about adding another literal type
(especially because there are now two similar types, {{{Addr#}}} and
{{{ByteArray#}}}, but if we want to keep binary sizes down, we need some
form of {{{Addr#}}}, and it seems like having {{{ByteArray#}}} is
beneficial too). Or whether it is reasonable to provide syntax for
{{{ByteArray#}}}s (letting the compiler generate them would be enough for
this ticket). But right now implementing it like this feels a lot better
than my previous approach using unboxed tuples.
And I am sorry for not saying or doing anything such a long time.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:74>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list