[Haskell-cafe] What's the deal with Clean?

Fri Nov 6 02:45:02 EST 2009

Duncan Coutts <duncan.coutts at googlemail.com> writes:

>> The "build operation" part often ends up a bit gross, but I have a plan
>> for that which I hope to come back to later on.
>
> Yes, they're not good for construction atm. The Builder monoid from the

I've used that a bit, but that's not what I'm talking about.
Serializing and deserializing to/from bytestrings is sometimes
complicated, but usually straightforward enough.

The operation in between is what is giving me headaches.  Basically, I
have a bunch of command line options - filter on this, modify on that,
produce some kind of output to some file - that must be interpreted in
order to produce a combined filter/modifier processing. 

The pipeline looks something like this:

  readFoo :: IO [Foo]
  process :: [Foo] -> IO [Foo]
  writeFoo :: [Foo] -> IO ()

The hard part is how to elegantly construct the "process" part.
If it were just filtering or modifications, it could be a pure
function.  The complexity comes from sometimes needing to split off
some output to some file.  

Currently, I'm opening handles in advance, and processing one Foo at a
time, writing it to the correct handles, and finally closing handles
when done.  This is a rather pedestrian approach.

I'm now considering defining

    branch :: ([Foo] -> IO ()) -> [Foo] -> IO [Foo]
    branch iop fs = do forkIO (iop fs)
                       return fs

Which, if I understand correctly, would allow me to write

   process = filterM this >>= mapM that 
             >>= branch (writeBar . map foo2bar) >>= filterM theother

So - is this a good way to approach it?  I feel slightly queasy about
using concurrency for this, although I think it'll work well in
practice.  It is very important that this is lazy - the list of Foos can
be larger than available memory, so one risk is that one thread might
run off with the list of Foos with the other trailing far behind,
leading to increased memory use. Previous experience seems to indicate
that the 'head' thread will be slowed by disk/memory and allow the
trailing threads to keep up.

I do have a nagging feeling that this could be solved more elegantly
with arrows or a lazy writer monad, or something else that I don't know
enough about. I'd be happy to hear any suggestions.

-k

PS: I probably need to catch the threadID, and wait for all threads to
finish as well.  This is left as an excercise for the writer. :-)
-- 
If I haven't seen further, it is by standing in the footprints of giants