[jhc] Re: Ids in Jhc.

John Meacham john at repetae.net
Wed Feb 13 22:36:43 EST 2008

On Thu, Feb 14, 2008 at 01:23:08AM +0100, Lemmih wrote:
> I'd like to fix it in another way, if you have no objects.
> By my count there are 120 unique atoms in base-1.0.hl. Those 120 atoms
> are saved ~2344 times each (there's a total of 281338 atoms).
> Compressing the files with gzip reduces the negative effect on file
> size but the overhead of parsing 281k atoms is still there.
> On my system, testing with HelloWorld.hs, parsing those atoms took 16%
> of the runtime and 27% of the memory usage.
> I propose scrapping the current system of random atom ids in favor for
> a per-module map of symbols.
> Using an ADT for the ids would make the system more transparent. I'll
> have to investigate the performance impact of this.

What do you mean? just for saving binary files? it already converts them
to an ADT on the way out and back in since atom numbers can be different
between different runs of the program. 

If you mean using an ADT internally for ids,
stopping doing that is what brought memory usage from gigabytes to
megabytes. The text of the atoms only needs to be stored in a single
place, also the fact they are int's means they can be unboxed in
datatypes, meanding the garbage collector does not even have to care
about them, another huge gain. 

Also, IntSet/IntMap are orders of magnitude faster than Set and Map and
sets/maps of ids are the bed and butter of optimization passes.

my only issue with them is that it is

type Id = Int
instead of
newtype Id = Id Int

I have been working on the binary format with my recent changes, it is
now much less of an issue than it used to be in terms of speed and
memory usage. I modified it to use bytestrings and split the file into
various 'chunks' that can be lazily loaded independently. so when it
only needs the dependency information that is all it needs to pull out.
and so forth. I also put a hard limit on the length of atoms to 256
bytes, which means I can chop 7 bytes off the storage of each one.

By far the biggest win was in Info/Info, which used to store the type of
each field alongside of it as a string, now it stores a hash of the type
as a number, this reduced the file size (after compression) by about 30%.


John Meacham - ⑆repetae.net⑆john⑈

More information about the jhc mailing list