[GHC] #15113: Do not make CAFs from literal strings

GHC ghc-devs at haskell.org
Fri Dec 21 11:55:27 UTC 2018


#15113: Do not make CAFs from literal strings
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:  8.10.1
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:  CAFs
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #16014            |  Differential Rev(s):  Phab:D4717
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Old description:

> Currently (as I discovered in #15038), we get the following code for
> `GHC.Exception.Base.patError`:
> {{{
> lvl2_r3y3 :: [Char]
> [GblId]
> lvl2_r3y3 = unpackCString# lvl1_r3y2
>
> -- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0}
> patError :: forall a. Addr# -> a
> [GblId, Arity=1, Str=<B,U>x, Unf=OtherCon []]
> patError
>   = \ (@ a_a2kh) (s_a1Pi :: Addr#) ->
>       raise#
>         @ SomeException
>         @ 'LiftedRep
>         @ a_a2kh
>         (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException
>            ((untangle s_a1Pi lvl2_r3y3)
>             `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0])
>                     :: (String :: *) ~R# (PatternMatchFail :: *))))
> }}}
> That stupid `lvl2_r3y3 :: String` is a CAF, and hence `patError` has CAF-
> refs, and hence so does any function that calls `patError`, and any
> function that calls them.
>
> That's bad! Lots more CAF entries in SRTs, lots more work traversing
> those SRTs in the garbage collector.  And for what?  To share the work of
> unpacking a C string!  This is nuts.
>
> What to do?
>
> * Somehow refrain from floating `unpackCSTring# lit` to top level, even
> if you could otherwise do so. But that seems very ad-hoc, and it make the
> function bigger and less inlinable.
>
> * Treat a top level definition
> {{{
> x :: [Char]
> x = unpackCString# y
> }}}
>   as NOT a CAF, and make it single-entry so that the thunk is not
> updated.  Then every use of `x` will unpack the string afresh, which is
> probably a good idea anyhow.
>
>   I like this more.  It would be implemented somewhere in the code
> generator.

New description:

 Currently (as I discovered in #15038), we get the following code for
 `GHC.Exception.Base.patError`:
 {{{
 lvl2_r3y3 :: [Char]
 [GblId]
 lvl2_r3y3 = unpackCString# lvl1_r3y2

 -- RHS size: {terms: 7, types: 6, coercions: 2, joins: 0/0}
 patError :: forall a. Addr# -> a
 [GblId, Arity=1, Str=<B,U>x, Unf=OtherCon []]
 patError
   = \ (@ a_a2kh) (s_a1Pi :: Addr#) ->
       raise#
         @ SomeException
         @ 'LiftedRep
         @ a_a2kh
         (Control.Exception.Base.$fExceptionPatternMatchFail_$ctoException
            ((untangle s_a1Pi lvl2_r3y3)
             `cast` (Sym (Control.Exception.Base.N:PatternMatchFail[0])
                     :: (String :: *) ~R# (PatternMatchFail :: *))))
 }}}
 That stupid `lvl2_r3y3 :: String` is a CAF, and hence `patError` has CAF-
 refs, and hence so does any function that calls `patError`, and any
 function that calls them.

 That's bad! Lots more CAF entries in SRTs, lots more work traversing those
 SRTs in the garbage collector.  And for what?  To share the work of
 unpacking a C string!  This is nuts.

 What to do?

 1. Somehow refrain from floating `unpackCSTring# lit` to top level, even
 if you could otherwise do so. But that seems very ad-hoc, and it make the
 function bigger and less inlinable.

 2. Treat a top level definition
 {{{
 x :: [Char]
 x = unpackCString# y
 }}}
   as NOT a CAF, and make it single-entry so that the thunk is not updated.
 Then every use of `x` will unpack the string afresh, which is probably a
 good idea anyhow.

   I like this more.  It would be implemented somewhere in the code
 generator.

--

Comment (by simonpj):

 Looking at #16014, I like alternative (2) from the Description better and
 better.  If we spot
 {{{
 x = unpackCString# "blah"#
 }}}
 in the code generator, we could allocate a top-level closure with
 * Info-pointer: `rtsUnpackString_info`
 * One word of payload, a pointer to the literal string `"blah"#`.

 Now we can hand-write the single blob of code (plus info table)
 `rtsUnpackString_info` to unpack the string.  Easy!  And the overhead per
 string is only two words (for the closure) rather than all the stuff
 described in #16014.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15113#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list