[GHC] #5218: Add unpackCStringLen# to create Strings from string literals

GHC ghc-devs at haskell.org
Fri Aug 18 15:56:59 UTC 2017


#5218: Add unpackCStringLen# to create Strings from string literals
-------------------------------------+-------------------------------------
        Reporter:  tibbe             |                Owner:  thoughtpolice
            Type:  feature request   |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  7.0.3
      Resolution:                    |             Keywords:  strings
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #5877 #10064      |  Differential Rev(s):  Phab:D2443
  #11312, #9719                      |
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 I'm struggling to grok this ticket, especially: '''what is the problem we
 are trying to solve?'''.  I'm also concerned about making things too
 complicated.

 ''jscholl in comment:74 sounds right on target to me''.  Here's my
 thinking, written out.  Let's see if we agree at least about the "Goals"
 and "Core" part.

 == Goals ==

 I believe that one goal is

 * '''The ability to put a block of binary data in the program code,
 without heavy encoding.'''

 Is that a goal?  Can we focus solely on that for a while?

 == Core ==

 To meet that goal, in Core, we need

 * A primitive data type `B#` whose values are simply blobs of binary data.
 * Some operations over this type; e.g. `lenB :: B# -> Int`, or
 `unpackString :: B# -> [Char]` or whatever.
 * Literal values (in Core) for `B#` values.

 `B#` plays the role of the `(# Int#, Addr# #)` representation mentioned
 above (comment:38 ff), but without being so concrete.

 I'm only using "`B#`" as a placeholder; we need a proper name for it!  So
 what is it, precisely?

 * `B#` could be a completely new primitive type.

 * Or `B#` could be `ByteArray#`.  That would have the major advantage of
 not adding a new type, and for sure we'd need to be able to turn it into a
 `ByteArray#`.  So I like that, and it's what jscholl suggests in
 comment:74.


 * But `B#` can't be `Addr#` (which is a memory address)!  Also look at
 #11312, which is highly relevant because it has the same conclusion.  In
 #11312, I call this new type `String#`, but that's too character-oriented.
 I think we should focus on binary data.  But adopting `B#` would fix the
 ghastly problems in #11312.

 == Haskell ==

 If we had this new primitive type, we'd soon want literals for it in
 Haskell source code.

 * I suppose we could have a new literal syntax (about whose details I am
 intensely relaxed).  After all, the literals of a language should be
 expressible I suppose.
 * But we could say you could only get it via a TH quasiquote e.g. `[bytes|
 fec923ac |]`?  Is that so terrible?

 Note that everything in comment:84 belongs in this section.  By the time
 we get to Core all this typeclass stuff has gone away.

 == Other goals ==

 I don't have clarity on how `bytestring` would want to convert a
 `ByteArray#` to a `ByteString`.  That ought to be a constant time
 operation.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/5218#comment:86>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list