Ways and Build Tags for Optimisation

Ashley Yakeley ashley@semantic.org
Thu, 29 May 2003 20:40:59 -0700


In article 
<9584A4A864BD8548932F2F88EB30D1C60D17DC7F@TVP-MSG-01.europe.corp.microso
ft.com>,
 "Simon Marlow" <simonmar@microsoft.com> wrote:

> > -rw-r--r--    1 ashley   ashley    2117554 May 28 04:04 HBase.hi
> > -rw-r--r--    1 ashley   ashley    2119865 May 28 08:15 HBase.p hi
> > -rw-r--r--    1 ashley   ashley      72669 May 28 16:20 HBase.q hi
> 
> Wow :-)

It looks like the problem is very data-heavy Unicode property files. For 
instance, Org.Org.Semantic.HBase.Text.UnicodeNames exports just one 
value:

   getCharacterName :: Char -> String

Inside the module is an "Array Char String" created from a 
"[(Char,String)]" that is a long list of Unicode character names. The 
file is automatically generated from a downloaded data file. For 
instance:

> getCharacterName '\x189F'
"MONGOLIAN LETTER MANCHU ALI GALI DDHA"

For some reason, even though only getCharacterName is exported, when 
optimisation is switched on, the interface file balloons a thousandfold:

$ ls -l UnicodeNames.*hi
-rw-r--r--    1 ashley   ashley    5854480 May 28 02:49 UnicodeNames.hi
-rw-r--r--    1 ashley   ashley    5854497 May 28 06:56 UnicodeNames.p_hi
-rw-r--r--    1 ashley   ashley       2385 May 28 15:59 UnicodeNames.q_hi

What's the best way to stop this? Is it reasonable to simply switch off 
profiling just for these few files?

Also, I'd like to make all that data disappear when a binary program 
that doesn't use it is stripped; currently it doesn't. Any ideas?

-- 
Ashley Yakeley, Seattle WA