unique identifiers as a separate library

Tue Dec 23 18:22:46 EST 2008

Hi again Iavor,

A couple performance ideas if you want to test them:

unsafeInterleaveIO is cheap until you need to evaluate its 
result.  So how about this, I think it makes there be 1/3 as 
many "structural" unsafeInterleaveIO's, so if it took "2" 
amount of time on unsafeInterleaveIO:ing before, it should 
take "1.33" time on it after this: and just a bit more 
time/memory to construct Nodes that might not be used.

gen r = unsafeInterleaveIO $ do
                   v <- unsafeInterleaveIO (genSym r)
                   n1 <- gen r; n2 <- gen r; n3 <- gen r; n4 
<- gen r
                   return (Node v1 (Node v2 n1 n2) (Node v3 
n3 n4))

I also feel tempted to apply the 
static-argument-transformation manually,
where
    gen r = gen'
      where
        gen' = unsafeInterleaveIO $ do
           v <- unsafeInterleaveIO (genSym r)
           n1 <- gen'; n2 <- gen' --etc.
           return (Node ...)

or similar

which I guess is safe because this is only 
unsafeInterleaveIO, not unsafePerformIO?  Dunno if it'd be 
speed-beneficial though.

version 0.4:
> genericNewSupply :: b -> (IORef b -> IO a) -> IO (Supply a)
> genericNewSupply start genSym = gen =<< newIORef start
>   where gen r = unsafeInterleaveIO
>               $ do ls <- gen r
>                    rs <- gen r
>                    return (Node (unsafePerformIO (genSym r)) ls rs)

Why unsafePerformIO, was it faster?(i'd guess slower 
actually, as unsafePerformIO is NOINLINE..)  It's 
considerably less safe than unsafeInterleaveIO!  For 
example, do the static-argument-transformation above, then 
float out the unsafePerformIO because it's the same 
expression each time through gen', and suddenly the all the 
"unique" values are all the same!

we can make this value-supply very good ultimately :-)

also, I might call "unsafeNewIntSupply" something more 
specific, like "unthreadsafeNew...", or the more obscure but 
conventional "dupable" description-word.  Did it help 
specializing that to Int, i.e. why not 
"unsafeGenericNewSupply"? because I can imagine a simple 
data that's not an Int, where you'd still want to avoid the 
thread-safety overhead.  Also, your implementation of it 
could be more efficient: it doesn't need to do locking, so I 
suggest modifyIORef rather than atomicModifyIORef (Actually 
you'll have to use readIORef >>= writeIORef >> return, 
instead, because modifyIORef has a different type than 
atomicModifyIORef).  Possible refactor: All the functions 
***GenSym r = atomicModifyIORef r (some expression that 
doesn't mention r); doing the "[atomic]ModifyIORef r" could 
be the caller's responsibility instead, and e.g. listGenSym 
(a:as) = (as,a).

in fact, for lists (as you get a incomplete-pattern-match 
warning there, but you know the list is always infinite, 
because you made it with "iterate"), you could instead use 
an infinite-list type, Data.Stream from package "Stream"[*]; 
as Stream is not a sum type (it only has one possible 
constructor: Cons), it might even be a bit more efficient!
[*] 
http://hackage.haskell.org/packages/archive/Stream/0.2.6/doc/html/Data-Stream.html

thanks for your effort! and especially for measuring the 
performance timing!
-Isaac