[jhc] Re: Ids in Jhc.

Wed Feb 13 19:23:08 EST 2008

On Feb 14, 2008 12:50 AM, John Meacham <john at repetae.net> wrote:
> (I'm gonna forward this response to the jhc list too, if that's okay)
>
>
> On Thu, Feb 14, 2008 at 12:26:49AM +0100, Lemmih wrote:
> > I've been wondering about your use of Ids in Jhc.
> > As far as I can see, the rules for Ids goes as following:
> >  * named ids are odd
> >  * unnamed ids are even
> >  * id '0' has special significance.
> >
> > Can you tell me a bit about how unnamed ids are used, why id '0' has
> > special meaning, and how you generate new, unique unnamed ids.
>
> Yes. that is correct.
>
> named and unnamed ids behave the same way, except named ids get a pretty
> name when printed out as opposed to just a numebr and are generally
> derived from a user supplied name. Also, all 'top level' names are
> always 'named' to ensure they remain globally unique.
>
> named Ids are odd, these are looked up in a table that is implemented
> via a hash table written in c in the StringTable/ directory. due to an
> intentional quirk of its implementation, it only generates odd, positive
> values. Conversion between a number and its string form is constant
> time in both directions.
>
> zero is a 'special' id in that it can be treated normally except it can
> never appear in an expression, only in a binder. so \ v0 -> ... is okay,
> but \ x -> .. v0 ... is not. This means you can always use 0 as a binder
> when you know a value is not used without worrying about shadowing a
> real definition.
>
>
> unnamed ids are arbitrary positive even numbers, there is no particular
> need to allocate them in a special way, they are _not_ globally unique
> so you just need to ensure you don't shadow any existing variables when
> assigning new ones. in general, this consists of having a set of 'in
> scope' names (which is often something you need to keep track of anyway)
> and you choose some value that is not in the 'in scope' set and add it
> to the set. this ensures ids don't shadow each other, but not that they
> are unique.
>
>
> negative ids are used inside of a few specialized routines internally
> and should never make it out into the world. for instance, in the
> unification algorithm they are used to represent alpha-renamed values so
> we don't need to scan for used names before hand, however the terms
> returned have their original ids restored.
>
> hope this helps...
>
> It is quite bothersome to me that 'Id' is just an Int and not a
> newtype... that is something I have been slowing getting the code ready
> to fix for a while now..

I'd like to fix it in another way, if you have no objects.

By my count there are 120 unique atoms in base-1.0.hl. Those 120 atoms
are saved ~2344 times each (there's a total of 281338 atoms).
Compressing the files with gzip reduces the negative effect on file
size but the overhead of parsing 281k atoms is still there.
On my system, testing with HelloWorld.hs, parsing those atoms took 16%
of the runtime and 27% of the memory usage.

I propose scrapping the current system of random atom ids in favor for
a per-module map of symbols.
Using an ADT for the ids would make the system more transparent. I'll
have to investigate the performance impact of this.

-- 
Cheers,
  Lemmih