Name allocation

Fri, 30 May 2003 12:16:43 +0100

> 
> I know next to nothing about this GUID concept, so please correct me
> if I'm wrong, but isn't this GUID just "yet another unique identifier
> convention", which may or may not clash, depending on how lucky you
> are? Would using a GUID really be different than using a hierarchy
> like Haskell does today?

A GUID is basically a random 128-bit number; it's "just another unique
identifer", but it is far less likely to clash than your compiler is
likely to be affected by a stray cosmic ray.  In the paper I cited, I
made the following calculation:

  Both MD5 (RFC1321, 128-bit) and SHA-1 (RFC3174, 160-bit) are
  sufficiently cheap, and may be considered random functions for this
  application~\cite{robshaw96}.  Let us consider the likelihood of
  collisions.  For $n$ abstract types and $N$ possible hash values,
  the probability of a collision is approximately $n^2/2N$.
  Pessimistically assuming $10^{10}$~programmers in the world, writing
  $300$~lines of code per day with one abstract type per $100$~loc,
  the probability of a collision in a century of abstract types (using
  MD5) would then be $(10^{15})^2/2^{129} \approx 10^{-9}$.  This is
  much less than the probability of a cosmic-ray-induced processor
  error in this period.

(substitute "module" for "abstract type", for the Haskell context; the
argument is about hashing module source, but applies to
randomly-generated GUIDs equally well, as long as a good random source
is used, such as Linux /dev/random, or
http://www.fourmilab.ch/hotbits/generate.html).

The upshot is that you really don't need to worry about collisions.

--KW 8-)