unique identifiers as a separate library

Wed Dec 24 10:46:56 EST 2008

Iavor Diatchki wrote:
>>  Also, your
>> implementation of it could be more efficient: it doesn't need to do locking,
>> so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have
>> to use readIORef >>= writeIORef >> return, instead, because modifyIORef has
>> a different type than atomicModifyIORef).
> 
> I don't think that that's quite right.  I was thinking that it should
> be OK to use different supplies that share a reference in different
> threads.  So, for example, split the supply and pass each version to a
> new thread.  This should be OK with the dupable primitives because the
> thunks would be evaluated in different threads.  However, we still
> need the synchronize access to the reference, or we'll get incorrect
> values.

That's true (if you're careful enough about which thread is 
evaluating the splitting -- the existing split* functions 
are actually *not* strict enough for this, and they probably 
shouldn't be either, so an additional warning that the user 
may have to add some `seq`s or `evaluate`s might be needed). 
  Anyway, if atomicModifyIORef isn't that big of an 
overhead, then no problem :-)

>> in fact, for lists (as you get a incomplete-pattern-match warning there, but
>> you know the list is always infinite, because you made it with "iterate"),
>> you could instead use an infinite-list type, Data.Stream from package
>> "Stream"[*]; as Stream is not a sum type (it only has one possible
>> constructor: Cons), it might even be a bit more efficient!
> 
> You are right but the benefit is small enough that I don't think it
> warrants a dependency on another package.

I'm going to try to argue to convince you that it's entirely 
appropriate to use Data.Stream :-)

 From my somewhat mathematical point of view, that is.

1. Data.Stream is a small module that implements a very 
well-known, practically standard (though under-used), and 
simple data type.  All you really need from it is import 
Data.Stream(Stream(Cons)), plus (iterate, streamToList) if 
you're keeping your current interface.  It's nothing you 
should be afraid to depend on, if you're using its concept 
(which you are).  It's much smaller than the 'containers' 
package, which similarly you don't use but if you needed a 
Map or something you obviously should.

2. The more compelling argument, that it took me a good 
night's sleep to think of:

Supply is an infinite binary tree.  Stream is an infinite 
unary tree (er, more commonly called infinite "list"). 
They're both codata.  They're both comonads.  (although they 
don't go so far as to depend on category-extras to provide 
an instance.)  The primary function/purpose of Supply is an 
*efficient* way to turn a Stream into a Supply.  I daresay 
it would, even, be more fundamental to provide interface
newSupply      :: Stream a -> IO (Supply a)
than
newSupply      :: a -> (a -> a) -> IO (Supply a)
(although it might be worth keeping both interfaces? mainly 
for compatibility, since one's just a Data.Stream.iterate 
away from the other and your haddocks could say so)
In fact, it's terribly annoying to turn a (Stream a) into a 
(a, a -> a) -- in fact I don't think it can be done in 
general (you can turn it into a (b, b -> (a, b)) though a-la 
unfold, with b = Stream a) -- so you should definitely 
provide the above (Stream a -> IO (Supply a)) interface. 
(Although if you're cowardly enough about the extra 
dependency, I guess you could make it the riskier ([a] -> IO 
(Supply a)), risk being if the user provides a finite 
list... Sorry for conditionally insulting you; it seems a 
horribly underhanded thing for me to do :-)

Likewise, it would be nice for
split          :: Supply a -> Stream (Supply a)
...and then you would not even depend on Data.List anymore!

(I don't happen to think the arguments that Data.List is 
better than Stream for definitely-infinite lists are very 
convincing; except possibly that List will be more 
up-to-date with respect to stream-fusion optimizations, and 
even then, value-supply doesn't actually rely on any of 
those optimizations; it actually does keep around the Stream 
(or List), or in the case of Num/Enum, it doesn't use one in 
the first place.)

On the other hand, I still want the Stream-based interfaces, 
but my initial argument isn't even that necessary: consider 
implementing the current signature:
newSupply      :: a -> (a -> a) -> IO (Supply a)
currently by:
newSupply x f   = genericNewSupply (iterate f x) listGenSym
but we don't need to use a list at all, it could be like:
newSupply x f   = genericNewSupply x (\a -> (f a, a))
(with atomicallyModifyIORef added as appropriate depending 
on your refactorings)

P.S. more code cleanup, if you didn't notice it:

0.4:
> -- XXX: Is the atomic necessary?
> import Data.IORef(IORef,newIORef,atomicModifyIORef)

yes the atomic is necessary, so you can update the comment 
to say why? (because multiple threads might be reading from 
a Supply-structure created from the same newSupply run, and 
we don't want memory corruption etc.)

> import System.IO.Unsafe(unsafePerformIO,unsafeInterleaveIO)
> 
> #if __GLASGOW_HASKELL__ >= 608
> import GHC.IOBase(unsafeDupableInterleaveIO,unsafeDupablePerformIO)
> #else
> ...
> unsafeDupablePerformIO :: IO a -> a
> unsafeDupablePerformIO = unsafePerformIO

various "PerformIO" variants to delete (I'm absurdly pleased 
that we implemented this without needing any)

-Isaac