[Haskell-cafe] Re: Stream processors

Thu Oct 21 14:20:25 EDT 2004

Peter Simons wrote:

 >Ben Rudiak-Gould writes:
 >
 > >> >     start  :: IO ctx
 > >> >     feed   :: ctx -> Buffer -> IO ()
 > >> >     commit :: ctx -> IO a
 >
 > >> 'feed' cannot have this signature because it needs to
 > >> update the context.
 >
 > > Sure it can -- it's just like writeIORef :: IORef a -> a -> IO ().
 >
 >I guess it's mood to argue that point. I don't want a stream
 >processor to have a global state, so using an internally
 >encapsulated IORef is not an option for me.
 >
 >I am looking for an more _general_ API, not one that forces
 >implementation details on the stream processor. That's what
 >my StreamProc data type does already. :-)

I'm not arguing about generality; I simply don't understand how your
interface is supposed to be used. E.g.:

    do ctx <- start
       ctx1 <- feed ctx array1
       ctx2 <- feed ctx array2
       val1 <- commit ctx1
       val2 <- commit ctx2
       return (val1,val2)

Should this return (MD5 of array1, MD5 of array2), or
(MD5 of array1+array2, MD5 of array1+array2), or cause a runtime error?
Any of these three might be reasonable, but for your interface to be
well-defined you need to stipulate which one is correct. Once you're
decided which one is correct, there's no reason not to change the
interface so that no one can misinterpret it. My two interfaces are
only less general than yours in that they don't have multiple
interpretations -- which is a good thing.

 > >> >     start  :: ctx
 > >> >     feed   :: ctx -> Buffer -> IO ctx
 > >> >     commit :: ctx -> a
 >
 > > In this interface contexts are supposed to be immutable
 > > Haskell values, so there's no meaning in creating new
 > > ones or finalizing old ones.
 >
 >I don't want to restrict the API to immutable contexts. A
 >context could be anything, _including_ an IORef or an MVar.
 >But the API shouldn't enforce that.

It doesn't. Even (length :: [a] -> Int) is likely to cause destructive
updating of thunks when it's called, but that's not a reason to change
the interface to [a] -> IO Int. The important thing is whether, from
the caller's perspective, the function is pure. If it's pure, it
shouldn't be in the IO monad, even if that forces some implementations
to use unsafePerformIO under the hood.

I think you're hoping to have it both ways, capturing destructive-
update semantics and value semantics in a single interface. That's not
going to work, unfortunately. You must decide whether to enforce
single-threading or not.

 > >> I would implement feedSTUArray and friends as wrappers
 > >> around the Ptr interface, not as primitive computations of
 > >> the stream processor.
 >
 > > I think it's impossible to do this safely, but it would be
 > > great if I were wrong.
 >
 >  wrap :: (Storable a, MArray arr a IO) => Ptr a -> Int
 >       -> IO (arr Int a)
 >  wrap ptr n = peekArray n ptr >>= newListArray (0,n)

Isn't this going in the wrong direction? I think what we want is
something like

  withArrayPtr :: (MArray arr Word8 IO) =>
                     arr i Word8 -> (Ptr Word8 -> IO a) -> IO a

You're right, though, this can be written safely:

  withArrayPtr arr act = getElems arr >>= flip withArray act

It's terribly slow, though. Ideally one wants a pointer into the
original array together with a guarantee that it won't be moved by
the garbage collector during the execution of your IO action. I
think current versions of GHC will never move the array if your IO
action performs no heap allocation, but I can easily imagine that
changing in other/future implementations.

I suppose you could also have

  withArrayPtrM :: (MArray arr Word8 m, Ix i) =>
                      arr i Word8 -> (Ptr Word8 -> m b) -> m b

  withArrayPtrI :: (IArray arr a, Ix i) =>
                      arr i Word8 -> (Ptr Word8 -> IO b) -> IO b

though I'm not sure how much sense those types (or names) make.
The first one would force the use of unsafeIOToST if you wanted
to use it with ST arrays, but probably that's unavoidable.

-- Ben