[jhc] Atoms, Infos and unique ids.

Thu Feb 21 16:03:08 EST 2008

On Thu, Feb 21, 2008 at 9:09 PM, John Meacham <john at repetae.net> wrote:
> On Thu, Feb 21, 2008 at 08:32:47PM +0100, Lemmih wrote:
>  > I tried to implement this but, unfortunately, atoms are frequently
>  > being relied on for unique ids. Some cases are easy to fix, others
>  > less so.
>  > John, do you have time to document the intended behavior in the difficult cases?
>
>  I am not sure what you mean by difficult cases or what you are trying to
>  fix, atoms are exactly an implementation of the standard atom
>  type in computing as used in prolog, lisp, and the X11 protocol among
>  other things. Uninterpreted strings with a very fast identity operation.

Atoms in general are fine but they're not the right tool for this
particular job. We rarely use atoms for keys (I have the profiling
information to back this up) and, when we do, using (hash,string) is
ridiculously fast. The costs of using atoms include: broken properties
(get . put = id), inefficiencies and obfuscated code. Given that I
have significantly improved the performance of Jhc, I hope it carries
some weight when I say that there are NO performance reasons for using
atoms, neither in CPU time or memory usage.

>  Things got a lot better in terms of space once I made them a custom
>  binary implementation, that saved 7 bytes an atom which is often as long
>  as the string itself.

That's like giving ice cream to a kid with tuberculosis: It's a a nice
thing to do but curing the disease would be better.

Those 7 bytes would be completely irrelevant if we only saved unique
strings once. There are about 9000 unique strings. A 7 byte overhead
would be 63k. We currently waste 10,000k by saving ~100 copies of each
unique string.

>  In any case, I would want any solution to be completely independent of
>  the fact that atoms are being used as identifiers in a programming
>  intermediate langauge.

I couldn't agree more. I say, let's take it one step further and
assign unique ids to named variables in the same way we do with
unnamed variables. Each variable would have a 'Anonymous | Named Name'
tag that would be used for pretty-printing.

>  My Atom type is a generally useful library I use
>  other places. I would think this would involve a custom Binary monad
>  that distributed and collected an 'atom environment' of sorts that would
>  then be stored in a different chunk in the file, then whenever one
>  wanted to store/retrieve an atom, they would just add an index into the
>  table.

The only other usage I've found was in Stats.hs. And that's definitely
not justified.

I was referring to cases like DataConstructors.hs and LambdaLift.hs.
In DataConstructor.hs it is trivial to assign a new id since it only
has to be locally unique.
In LambdaLift.hs, on the other hand, I can't tell whether shadowing is
OK, whether reusing the old id is OK, or where to get a set free
variables if a unique id is required. Some documentation about what
the code tries to do would be very helpful.

-- 
Cheers,
 Lemmih