Name allocation
Keith Wansbrough
Keith.Wansbrough@cl.cam.ac.uk
Fri, 30 May 2003 12:16:43 +0100
>
> I know next to nothing about this GUID concept, so please correct me
> if I'm wrong, but isn't this GUID just "yet another unique identifier
> convention", which may or may not clash, depending on how lucky you
> are? Would using a GUID really be different than using a hierarchy
> like Haskell does today?
A GUID is basically a random 128-bit number; it's "just another unique
identifer", but it is far less likely to clash than your compiler is
likely to be affected by a stray cosmic ray. In the paper I cited, I
made the following calculation:
Both MD5 (RFC1321, 128-bit) and SHA-1 (RFC3174, 160-bit) are
sufficiently cheap, and may be considered random functions for this
application~\cite{robshaw96}. Let us consider the likelihood of
collisions. For $n$ abstract types and $N$ possible hash values,
the probability of a collision is approximately $n^2/2N$.
Pessimistically assuming $10^{10}$~programmers in the world, writing
$300$~lines of code per day with one abstract type per $100$~loc,
the probability of a collision in a century of abstract types (using
MD5) would then be $(10^{15})^2/2^{129} \approx 10^{-9}$. This is
much less than the probability of a cosmic-ray-induced processor
error in this period.
(substitute "module" for "abstract type", for the Haskell context; the
argument is about hashing module source, but applies to
randomly-generated GUIDs equally well, as long as a good random source
is used, such as Linux /dev/random, or
http://www.fourmilab.ch/hotbits/generate.html).
The upshot is that you really don't need to worry about collisions.
--KW 8-)