unique identifiers as a separate library

Tue Dec 23 20:08:29 EST 2008

Hello,

Thanks for your comments!

On Tue, Dec 23, 2008 at 3:22 PM, Isaac Dupree <isaacdupree at charter.net> wrote:
> unsafeInterleaveIO is cheap until you need to evaluate its result.  So how
> about this, I think it makes there be 1/3 as many "structural"
> unsafeInterleaveIO's, so if it took "2" amount of time on
> unsafeInterleaveIO:ing before, it should take "1.33" time on it after this:
> and just a bit more time/memory to construct Nodes that might not be used.
>
> gen r = unsafeInterleaveIO $ do
>                  v <- unsafeInterleaveIO (genSym r)
>                  n1 <- gen r; n2 <- gen r; n3 <- gen r; n4 <- gen r
>                  return (Node v1 (Node v2 n1 n2) (Node v3 n3 n4))

I played around with that but, for some reason, it did not seem to
speed things up, at least for the benchmark that I run.  Will have to
experiment some more.

> version 0.4:
>>
>> genericNewSupply :: b -> (IORef b -> IO a) -> IO (Supply a)
>> genericNewSupply start genSym = gen =<< newIORef start
>>  where gen r = unsafeInterleaveIO
>>              $ do ls <- gen r
>>                   rs <- gen r
>>                   return (Node (unsafePerformIO (genSym r)) ls rs)
>
> Why unsafePerformIO, was it faster?(i'd guess slower actually, as
> unsafePerformIO is NOINLINE..)

I just tested to see what would happen and forgot to change it back
(now I have switched it back).  There does not seem to be a
significant difference between the two, speed-wise.

> Did it help specializing that to Int, i.e. why not
> "unsafeGenericNewSupply"? because I can imagine a simple data that's not an
> Int, where you'd still want to avoid the thread-safety overhead.

Nope, it was just the simplest thing to do.  I can add more general
"unsafe" versions.

>  Also, your
> implementation of it could be more efficient: it doesn't need to do locking,
> so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have
> to use readIORef >>= writeIORef >> return, instead, because modifyIORef has
> a different type than atomicModifyIORef).

I don't think that that's quite right.  I was thinking that it should
be OK to use different supplies that share a reference in different
threads.  So, for example, split the supply and pass each version to a
new thread.  This should be OK with the dupable primitives because the
thunks would be evaluated in different threads.  However, we still
need the synchronize access to the reference, or we'll get incorrect
values.

Possible refactor: All the
> functions ***GenSym r = atomicModifyIORef r (some expression that doesn't
> mention r); doing the "[atomic]ModifyIORef r" could be the caller's
> responsibility instead, and e.g. listGenSym (a:as) = (as,a).

This is a good idea, the atomicUpdates are duplicated everywhere.  I
did this, and for some reason it seems to have made things a little
slower, but I like the code better, so I think that I will keep it.  I
am surprised that it affected the speed of execution, perhaps it is
just my benchmark that is being unreliable.

> in fact, for lists (as you get a incomplete-pattern-match warning there, but
> you know the list is always infinite, because you made it with "iterate"),
> you could instead use an infinite-list type, Data.Stream from package
> "Stream"[*]; as Stream is not a sum type (it only has one possible
> constructor: Cons), it might even be a bit more efficient!

You are right but the benefit is small enough that I don't think it
warrants a dependency on another package.

bye,
-Iavor