[jhc] Hotspots.

Lemmih lemmih at gmail.com
Tue Feb 19 11:38:00 EST 2008


On Feb 16, 2008 12:01 AM, John Meacham <john at repetae.net> wrote:
> On Fri, Feb 15, 2008 at 07:21:55PM +0100, Lemmih wrote:
> > Greetings,
> >
> > I've found a few hotspots that'll be working on. I'd be very
> > interested in discussing solutions.
> >
> > Performance flaws:
> >  * IdMaps are used to generate new ids.
> >  * Ho files contain huge amounts of duplicate information.
> >  * Ho files aren't saved lazily.
> >  * C code is used for generating atoms.
> >
> > Repeatedly mapping variables to 'const Nothing' is very expensive. It
> > is currently the most expensive procedure in Jhc, taking ~20% CPU time
> > when compiling the base library.
>
> Hmm.. yeah, sometimes I used Maps, sometimes Sets, depending on what I
> already have and sometimes it helped to switch between them, sometimes
> not. it wasn't always obvious.  Ideally all new id selection will be
> done in Name.Id as part of the general plan to turn Id into a newtype.
> It has the newIds routine, a couple variants of that to work on IdMap
> and IdSet would be good. if I am just doing a map (const Nothing) before
> passing to the id selection routine then that can probably just be
> dropped since the id selection stuff doesn't care about the actual
> values in the map.
>
> The id selection can be finicky, using Set.size to seed the iterations
> helped a bunch but I wanted to try a hash function from Id -> Id at some
> point as it should reduce the time spent linearly probing for an open
> Id.
>
> > The base library contains 65 megabytes of uncompressed data. Most of
> > that is duplicate information that disappears when it is compressed.
> > However, parsing that amount of data takes considerable time.
>
> Which duplicate data in particular is concerning you? I am in the
> process of completely reorganizing the Ho file layout so it is probably
> best to hold off here.  Some of the redundancy is there on purpose, but
> most probably isn't.

Each atom is saved ~100 times. A TVr can contain 50k of data and each
TVr is saved ~24 times.

-- 
Cheers,
  Lemmih


More information about the jhc mailing list