[GHC] #5218: Add unpackCStringLen# to create Strings from string literals
GHC
ghc-devs at haskell.org
Wed Aug 16 03:35:03 UTC 2017
#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
Reporter: tibbe | Owner: thoughtpolice
Type: feature request | Status: patch
Priority: normal | Milestone:
Component: Compiler | Version: 7.0.3
Resolution: | Keywords: strings
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #5877 #10064 | Differential Rev(s): Phab:D2443
#11312, #9719 |
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by winter):
Replying to [comment:74 jscholl]:
> Thinking about the problem again I decided to try to add
{{{ByteArray#}}} literals to GHC. The idea is the following:
> - Use {{{"foo"##}}} as syntax for {{{ByteArray#}}}s. This is in essence
my try for a {{{String#}}} type.
> - Provide
> {{{#!haskell
> unpackStringLit# :: ByteArray# -> [Char]
> {-# INLINE[1] unpackStringLit# #-}
> unpackStringLit# ba# =
> unpackCStringWithLen# (byteArrayContents# ba#) (sizeofByteArray# ba#)
> }}}
> - Compile {{{"foo"}}} as {{{unpackStringLit# "foo"##}}}
> - Let rewrites fire in phase 2.
> - In phase 1, inline {{{unpackStringLit#}}} and let rules rewrite it to
{{{unpackCStringWithLen# "foo"# 3#}}}
> - Thus most {{{ByteArray#}}}s should get eliminated and binary size
should stay more or less the same.
> - If someone rewrites something like {{{ByteString.pack
(unpackStringLit# lit)}}}, the literal is not eliminated and emitted to
the binary. Thus a {{{ByteString}}} literal can increase binary size.
However, I think this is what we want because we save making a copy of the
data.
> - The downside is that turning optimization off causes the compiler to
create a {{{ByteArray#}}} for every string literal instead of a c-string.
GHCi will also allocate {{{ByteArray#}}}s instead of string literals
directly.
>
The problem is, old `Addr#` unpacking differ from `ByteArray#` unpacking
in that they are not the same encoding: they don't agree on how `\NUL`
char get encoded, (at least I'm expecting `ByteArray#` is standard UTF-8
encoded). So you can't cast them with rewrite rules like that: you have to
mention the encoding pitfall.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:81>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list