unique identifiers as a separate library
Isaac Dupree
isaacdupree at charter.net
Wed Dec 24 10:46:56 EST 2008
Iavor Diatchki wrote:
>> Also, your
>> implementation of it could be more efficient: it doesn't need to do locking,
>> so I suggest modifyIORef rather than atomicModifyIORef (Actually you'll have
>> to use readIORef >>= writeIORef >> return, instead, because modifyIORef has
>> a different type than atomicModifyIORef).
>
> I don't think that that's quite right. I was thinking that it should
> be OK to use different supplies that share a reference in different
> threads. So, for example, split the supply and pass each version to a
> new thread. This should be OK with the dupable primitives because the
> thunks would be evaluated in different threads. However, we still
> need the synchronize access to the reference, or we'll get incorrect
> values.
That's true (if you're careful enough about which thread is
evaluating the splitting -- the existing split* functions
are actually *not* strict enough for this, and they probably
shouldn't be either, so an additional warning that the user
may have to add some `seq`s or `evaluate`s might be needed).
Anyway, if atomicModifyIORef isn't that big of an
overhead, then no problem :-)
>> in fact, for lists (as you get a incomplete-pattern-match warning there, but
>> you know the list is always infinite, because you made it with "iterate"),
>> you could instead use an infinite-list type, Data.Stream from package
>> "Stream"[*]; as Stream is not a sum type (it only has one possible
>> constructor: Cons), it might even be a bit more efficient!
>
> You are right but the benefit is small enough that I don't think it
> warrants a dependency on another package.
I'm going to try to argue to convince you that it's entirely
appropriate to use Data.Stream :-)
From my somewhat mathematical point of view, that is.
1. Data.Stream is a small module that implements a very
well-known, practically standard (though under-used), and
simple data type. All you really need from it is import
Data.Stream(Stream(Cons)), plus (iterate, streamToList) if
you're keeping your current interface. It's nothing you
should be afraid to depend on, if you're using its concept
(which you are). It's much smaller than the 'containers'
package, which similarly you don't use but if you needed a
Map or something you obviously should.
2. The more compelling argument, that it took me a good
night's sleep to think of:
Supply is an infinite binary tree. Stream is an infinite
unary tree (er, more commonly called infinite "list").
They're both codata. They're both comonads. (although they
don't go so far as to depend on category-extras to provide
an instance.) The primary function/purpose of Supply is an
*efficient* way to turn a Stream into a Supply. I daresay
it would, even, be more fundamental to provide interface
newSupply :: Stream a -> IO (Supply a)
than
newSupply :: a -> (a -> a) -> IO (Supply a)
(although it might be worth keeping both interfaces? mainly
for compatibility, since one's just a Data.Stream.iterate
away from the other and your haddocks could say so)
In fact, it's terribly annoying to turn a (Stream a) into a
(a, a -> a) -- in fact I don't think it can be done in
general (you can turn it into a (b, b -> (a, b)) though a-la
unfold, with b = Stream a) -- so you should definitely
provide the above (Stream a -> IO (Supply a)) interface.
(Although if you're cowardly enough about the extra
dependency, I guess you could make it the riskier ([a] -> IO
(Supply a)), risk being if the user provides a finite
list... Sorry for conditionally insulting you; it seems a
horribly underhanded thing for me to do :-)
Likewise, it would be nice for
split :: Supply a -> Stream (Supply a)
...and then you would not even depend on Data.List anymore!
(I don't happen to think the arguments that Data.List is
better than Stream for definitely-infinite lists are very
convincing; except possibly that List will be more
up-to-date with respect to stream-fusion optimizations, and
even then, value-supply doesn't actually rely on any of
those optimizations; it actually does keep around the Stream
(or List), or in the case of Num/Enum, it doesn't use one in
the first place.)
On the other hand, I still want the Stream-based interfaces,
but my initial argument isn't even that necessary: consider
implementing the current signature:
newSupply :: a -> (a -> a) -> IO (Supply a)
currently by:
newSupply x f = genericNewSupply (iterate f x) listGenSym
but we don't need to use a list at all, it could be like:
newSupply x f = genericNewSupply x (\a -> (f a, a))
(with atomicallyModifyIORef added as appropriate depending
on your refactorings)
P.S. more code cleanup, if you didn't notice it:
0.4:
> -- XXX: Is the atomic necessary?
> import Data.IORef(IORef,newIORef,atomicModifyIORef)
yes the atomic is necessary, so you can update the comment
to say why? (because multiple threads might be reading from
a Supply-structure created from the same newSupply run, and
we don't want memory corruption etc.)
> import System.IO.Unsafe(unsafePerformIO,unsafeInterleaveIO)
>
> #if __GLASGOW_HASKELL__ >= 608
> import GHC.IOBase(unsafeDupableInterleaveIO,unsafeDupablePerformIO)
> #else
> ...
> unsafeDupablePerformIO :: IO a -> a
> unsafeDupablePerformIO = unsafePerformIO
various "PerformIO" variants to delete (I'm absurdly pleased
that we implemented this without needing any)
-Isaac
More information about the Glasgow-haskell-users
mailing list