[GHC] #16014: Avoid unnecessary code duplication from String Literals.
GHC
ghc-devs at haskell.org
Sat Dec 8 13:43:23 UTC 2018
#16014: Avoid unnecessary code duplication from String Literals.
-------------------------------------+-------------------------------------
Reporter: AndreasK | Owner: (none)
Type: task | Status: new
Priority: normal | Milestone: 8.6.3
Component: Compiler | Version: 8.6.2
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
When we compile a function like this:
{{{
foo :: Int -> String
foo 1 = "1_String"
foo 2 = "2_String"
foo 3 = "3_String"
foo 4 = "4_String"
foo 5 = "5_String"
foo _ = "unknown_string"
}}}
We get a few things.
Obviously there are the actual strings:
{{{
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
Foo.foo10 :: GHC.Prim.Addr#
Foo.foo10 = "1_String"#
}}}
There is no way around these so that's fine.
Then we end up with a function to unpack the actual string.
{{{
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
Foo.foo9 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=True,
WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 20 0}]
Foo.foo9 = GHC.CString.unpackCString# Foo.foo10
}}}
We want to have a closure for each of these. Otherwise we end up
constructing a new string each time, which would be a waste. So on the
surface this is fine.
However we end up with one of these for each string.
Looking at the Cmm/Asm the functions are not THAT small.
{{{
[Foo.foo9_entry() // [R1]
{ info_tbls: [(c2hj,
label: Foo.foo9_info
rep: HeapRep static { Thunk }
srt: Nothing)]
stack_info: arg_space: 8 updfr_space: Just 8
}
{offset
c2hj: // global
if ((Sp + -16) < SpLim) (likely: False) goto c2hk; else goto
c2hl;
c2hk: // global
R1 = R1;
call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
c2hl: // global
(_c2hg::I64) = call "ccall" arg hints: [PtrHint,
PtrHint] result
hints: [PtrHint] newCAF(BaseReg, R1);
if (_c2hg::I64 == 0) goto c2hi; else goto c2hh;
c2hi: // global
call (I64[R1])() args: 8, res: 0, upd: 8;
c2hh: // global
I64[Sp - 16] = stg_bh_upd_frame_info;
I64[Sp - 8] = _c2hg::I64;
R2 = Foo.foo10_bytes;
Sp = Sp - 16;
call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd:
24;
}
}
}}}
In actual code size (HEAD/-O2) the code + info table is about 100Byte.
To give an idea in my current ghc-stage2 I have 12,409 calls to
unpackCString, and I assume 99% of them are from actual strings. So that
comes out to about one megabyte. Out of a binary size of 84MB, so
surprisingly significant actually!
== Now what can we do about this!
I think the right way to go here is to define a `unpackString` function
which does the actual work.
Then we jump into that from a small per-string wrapper.
So we would end up with code like:
{{{
[Foo.foo9_entry() // [R1]
c2hj:
// R1 = Closure pointer
R2 = Foo.foo10_bytes;
call (unpackString)(R1,R2) args: 8, res: 0, upd: 8;
}
}}}
Which should be well under 50 bytes. It does cause an extra indirection
when unpacking the String, but if the performance of that matters using
String is likely the wrong choice to begin with.
How to actually get GHC to generate code like that I'm not sure yet.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/16014>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list