unique identifiers as a separate library

Tue Dec 23 16:56:50 EST 2008

Hi,
Sigh.  I think that Isaac is quite right.  Even though I think that it
would be quite rare for multiple threads to share the same name supply
in practise, I would really rather have safe code and not have to
think about it.  So... I have reverted to the non-dupable versions by
default.  I also added an "unsafeNewIntSupply" that uses that dupable
primitives, for those who like living on the edge :)   Thanks to
Sebastian for the bench marking program!   The performance numbers for
the current version on hackage (0.4) are as follows:

With safe primitives value-supply is about 7 times slower, with the
unsafe ones it is about 2 times slower (a bit less actually).  Even
though this seems like a big difference, the actual time it takes to
generate names is very small:  I have to generate about 10 million
names to get reliable measurements, so I am not sure if the difference
matters in practise.

If anyone has further ideas, please chime in.

-Iavor

On Tue, Dec 23, 2008 at 8:30 AM, Isaac Dupree <isaacdupree at charter.net> wrote:
> Iavor Diatchki wrote:
>>
>>  - It uses unsafeDupableInterleaveIO to avoid double locking,
>
> in particular,
>
>> gen r = unsafeDupableInterleaveIO
>>              $ do v <- unsafeDupableInterleaveIO (genSym r)
>>                   ls <- gen r
>>                   rs <- gen r
>>                   return (Node v ls rs)
>
> where is the double locking?  We want referential transparency...
>
> e.g. suppose you use newNumSupply to create a thunk for a Gen; when
> evaluated, it will run unsafeDupableInterleaveIO.  You send that thunk off
> to two different other threads. Then those threads decide to evaluate it
> (say, enough to get the first genSym'd value) and happen to make a race
> condition, so it becomes two separate IO computations. Therefore one of them
> runs atomicModifyIORef, and the other one runs atomicModifyIORef, and so
> they get two different values out.
>
> Node 0 (...) (...)
> Node 1 (...) (...)
>
> when it's suppose to be the very same Gen data structure.
>
> so, am I right or wrong about what the perils of unsafeDupableInterleaveIO
> are?
>
> I could see changing (unsafe[Dupable]InterleaveIO (genSym r)) to (genSym r),
> to halve the number of unsafeInterleaveIOs needed if we assume that most of
> the time a node is evaluated in order to get a value... but it's hard to see
> a good way to make *fewer* InterleaveIOs than there are genSym'd values.
>  (possible, but hard, and really depends on the relative expenses/risks of
> locking, of computing the next number, and of using up the "address space"
> of all possible Ints for example).  Maybe the outer InterleaveIO could
> "strictly" make a few levels of Nodes (with lazy genSym'd values this time)
> before "interleaving" again, to reduce the amount of interleaving from the
> non-semantics-changing side.
>
> -Isaac
>