[GHC] #11312: GHC inlining primitive string literals can affect program output

GHC ghc-devs at haskell.org
Wed Jan 6 20:25:45 UTC 2016


#11312: GHC inlining primitive string literals can affect program output
-------------------------------------+-------------------------------------
        Reporter:  RyanGlScott       |                Owner:
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.10.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Incorrect result  |  Unknown/Multiple
  at runtime                         |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #11292            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by simonpj):

 * cc: ekmett, core-libraries-committee (added)


Comment:

 Let's separate two things:

 * Top-level unboxed string literals: #8472
 * Not using `Addr#` for string literals: this ticket.

 Here's my summary for this ticket, after talking to Simon M.

 * It's plain wrong to use `Addr#` as the type of a string literal.  If we
 do so, there is no reliable way to compute equality for
 {{{
 data T = MkT Addr# deriving( Eq )
 }}}
   Since the `Addr#` might come from `malloc` or something, it must compare
 using equality on `Addr#`.  But then there is no guarantee that `MkT
 "foo"# == MkT "foo"#`.

 * So we need a new type for unlifted string literals, say `String#`.  It
 could be primitive, and that's what I'll assume for now.

 * Of course the underlying representation will be the same as `Addr#`.
 But there should be no operation `get :: String# -> Addr#` (except maybe
 in the IO monad), else it'd possible that `get "foo"#` might be not-equal
 to `get "foo"#`.

 * What operations do we need on `String#`?  Presumably at least
 {{{
 eqString#    :: String# -> String# -> Int#   -- Like eqChar#
 cmpString#   :: String# -> String# -> Int#   -- 3-way compare
 lenString#   :: String# -> Int#              -- Number of chars
 indexString# :: String# -> Int# -> Char#   -- Get the ith char
 }}}

 * NB: I'm deliberately not saying that the string is null-terminated.
 That's be up to the implementation of `String#`, provided it offered the
 above operations.  A better representation might be a record of a length
 and a blob of bytes.

 * Could `String#` simply be a `ByteArray#`?
 {{{
 type String# = ByteArray#
 }}}
   After all, `ByteArray#` already has primops `sizeOfByteArray#` and
 `indexCharArray#`.  We'd just need a way to have a statically-allocated
 `ByteArray#`, but that would be an excellent thing anyway.  e.g. I believe
 that Happy mis-uses literal strings to allow it to build a statically-
 allocated array.   Avoiding yet another primitive type would be a relief.

   See also #5218 and #9577

 * I'm not sure about how Unicode plays with all of this.

 * This would be a potentially breaking change for any code using unboxed
 string literals.  I'm copying the Core Libraries Committee

 I'd love someone to take this up.  there

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11312#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list