[GHC] #16014: Avoid unnecessary code duplication from String Literals.

GHC ghc-devs at haskell.org
Sat Dec 8 13:43:23 UTC 2018


#16014: Avoid unnecessary code duplication from String Literals.
-------------------------------------+-------------------------------------
           Reporter:  AndreasK       |             Owner:  (none)
               Type:  task           |            Status:  new
           Priority:  normal         |         Milestone:  8.6.3
          Component:  Compiler       |           Version:  8.6.2
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 When we compile a function like this:

 {{{
 foo :: Int -> String
 foo 1 = "1_String"
 foo 2 = "2_String"
 foo 3 = "3_String"
 foo 4 = "4_String"
 foo 5 = "5_String"
 foo _ = "unknown_string"
 }}}

 We get a few things.

 Obviously there are the actual strings:
 {{{
 -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
 Foo.foo10 :: GHC.Prim.Addr#
 Foo.foo10 = "1_String"#
 }}}
 There is no way around these so that's fine.

 Then we end up with a function to unpack the actual string.
 {{{
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 Foo.foo9 :: [Char]
 [GblId,
  Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=True,
          WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 20 0}]
 Foo.foo9 = GHC.CString.unpackCString# Foo.foo10
 }}}
 We want to have a closure for each of these. Otherwise we end up
 constructing a new string each time, which would be a waste. So on the
 surface this is fine.
 However we end up with one of these for each string.

 Looking at the Cmm/Asm the functions are not THAT small.
 {{{
 [Foo.foo9_entry() //  [R1]
          { info_tbls: [(c2hj,
                         label: Foo.foo9_info
                         rep: HeapRep static { Thunk }
                         srt: Nothing)]
            stack_info: arg_space: 8 updfr_space: Just 8
          }
      {offset
        c2hj: // global
            if ((Sp + -16) < SpLim) (likely: False) goto c2hk; else goto
 c2hl;
        c2hk: // global
            R1 = R1;
            call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
        c2hl: // global
            (_c2hg::I64) = call "ccall" arg hints:  [PtrHint,
                                                     PtrHint]  result
 hints:  [PtrHint] newCAF(BaseReg, R1);
            if (_c2hg::I64 == 0) goto c2hi; else goto c2hh;
        c2hi: // global
            call (I64[R1])() args: 8, res: 0, upd: 8;
        c2hh: // global
            I64[Sp - 16] = stg_bh_upd_frame_info;
            I64[Sp - 8] = _c2hg::I64;
            R2 = Foo.foo10_bytes;
            Sp = Sp - 16;
            call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd:
 24;
      }
  }
 }}}
 In actual code size (HEAD/-O2) the code + info table is about 100Byte.

 To give an idea in my current ghc-stage2 I have 12,409 calls to
 unpackCString, and I assume 99% of them are from actual strings. So that
 comes out to about one megabyte. Out of a binary size of 84MB, so
 surprisingly significant actually!

 == Now what can we do about this!

 I think the right way to go here is to define a `unpackString` function
 which does the actual work.
 Then we jump into that from a small per-string wrapper.

 So we would end up with code like:
 {{{
 [Foo.foo9_entry() //  [R1]
     c2hj:
         // R1 = Closure pointer
         R2 = Foo.foo10_bytes;
         call (unpackString)(R1,R2) args: 8, res: 0, upd: 8;
  }
 }}}

 Which should be well under 50 bytes. It does cause an extra indirection
 when unpacking the String, but if the performance of that matters using
 String is likely the wrong choice to begin with.

 How to actually get GHC to generate code like that I'm not sure yet.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/16014>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list