[GHC] #11312: GHC inlining primitive string literals can affect program output
GHC
ghc-devs at haskell.org
Wed Jan 6 20:25:45 UTC 2016
#11312: GHC inlining primitive string literals can affect program output
-------------------------------------+-------------------------------------
Reporter: RyanGlScott | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Incorrect result | Unknown/Multiple
at runtime | Test Case:
Blocked By: | Blocking:
Related Tickets: #11292 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Changes (by simonpj):
* cc: ekmett, core-libraries-committee (added)
Comment:
Let's separate two things:
* Top-level unboxed string literals: #8472
* Not using `Addr#` for string literals: this ticket.
Here's my summary for this ticket, after talking to Simon M.
* It's plain wrong to use `Addr#` as the type of a string literal. If we
do so, there is no reliable way to compute equality for
{{{
data T = MkT Addr# deriving( Eq )
}}}
Since the `Addr#` might come from `malloc` or something, it must compare
using equality on `Addr#`. But then there is no guarantee that `MkT
"foo"# == MkT "foo"#`.
* So we need a new type for unlifted string literals, say `String#`. It
could be primitive, and that's what I'll assume for now.
* Of course the underlying representation will be the same as `Addr#`.
But there should be no operation `get :: String# -> Addr#` (except maybe
in the IO monad), else it'd possible that `get "foo"#` might be not-equal
to `get "foo"#`.
* What operations do we need on `String#`? Presumably at least
{{{
eqString# :: String# -> String# -> Int# -- Like eqChar#
cmpString# :: String# -> String# -> Int# -- 3-way compare
lenString# :: String# -> Int# -- Number of chars
indexString# :: String# -> Int# -> Char# -- Get the ith char
}}}
* NB: I'm deliberately not saying that the string is null-terminated.
That's be up to the implementation of `String#`, provided it offered the
above operations. A better representation might be a record of a length
and a blob of bytes.
* Could `String#` simply be a `ByteArray#`?
{{{
type String# = ByteArray#
}}}
After all, `ByteArray#` already has primops `sizeOfByteArray#` and
`indexCharArray#`. We'd just need a way to have a statically-allocated
`ByteArray#`, but that would be an excellent thing anyway. e.g. I believe
that Happy mis-uses literal strings to allow it to build a statically-
allocated array. Avoiding yet another primitive type would be a relief.
See also #5218 and #9577
* I'm not sure about how Unicode plays with all of this.
* This would be a potentially breaking change for any code using unboxed
string literals. I'm copying the Core Libraries Committee
I'd love someone to take this up. there
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11312#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list