[Haskell-cafe] memory needed for SAX parsing XML

Fri Apr 23 10:16:48 EDT 2010

John Lato wrote:
> 
>> Another (additional) approach would be to encapsulate unsafeInterleaveIO
>> within some routine and not let it go out into the wild.
>>
>> lazilyDoWithIO :: IO a -> (a -> b) -> IO b
>>
>> It would use unsafeInterleave internally but catch all IO errors within
>> itself.
>>
>> I wonder if this is a reasonable idea? Maybe it's already done?
>> So the topic is shifting...
> 
> doWithIO :: NFData b => IO a -> (a -> b) -> IO b
> doWithIO m f = liftM (\a -> let b = f a in b `deepseq` b) m
> 
> It works (just stick it in a "try" block for error handling), but you
> need to write a lot of NFData instances.  You also need to be careful
> that b is some sort of reduced structure, or you can end up forcing
> the whole file (or other data) into memory.

I meant a different thing. In your example there is no unsafeInterleave 
at all. I think you mean that 'm' argument is supposed to be an 
unsafeInterleaved io action, like getContents, and deepseq'ing saves us 
from it hanging somewhere for a long time. Ok

But I meant to have a routine that lets us use ordinary io actions in a 
lazy way, and restricting that 'hanging' within bounds of this routine.

But having thought about it a little more I understood that it's impossible.

Lazy io works now because unsafeInterleaveIO is sticked into getContents 
itself, and is called repeatedly (via recursion). Or at least I can 
think of this implementation, haven't looked into it.

I realised that calling unsafeInterleaveIO for an ordinary io action 
will not make it run lazily. It will, still, run all-at-once, just not 
now, but later.

So, to cope with it, I can think of exposing a little structure of an io 
action. Normally it's completely opaque. But if we knew where its 
recursion point lies, then we could control its course of execution.

So, if we had something like
   type RecIO a = IO a -> IO a
and io actions were like
   getContents :: RecIO [Char]
   getContents rec = do
       c <- readOneChar
       rest <- rec
       return (c:rest)

then we could either run them
   normally :: RecIO a -> IO a
   normally r = r (normally r)
or
   lazily :: RecIO a -> IO a
   lazily r = unsafeInterleavIO $ r (lazily r)

And
   lazilyDoWithIO :: RecIO a -> (a -> b) -> IO b
   lazilyDoWithIO m f = do
     a <- lazily m
     return $ f a

Hmm, but then, we would have to take special care to not let it out of 
this function anyway... So here we come to deepSeq'ing you proposed.

And anyway, instead of re-writing pure functions to become iteratees we 
will have to re-write io functions to adopt continuation passing style.

Initially it looked better to me :)

But with this approach we can run lazily any io action that has the form 
of RecIO. Also, we can interleave normally and lazily based on the time 
of day and other conditions :)

> It also doesn't help with
> other IO effects, e.g. writing output.  I consider this one of the
> nicest features of iteratee-based processing.

Can you clarify what's the problem with writing? I think I just haven't 
switched from the topic of gluing code.

Because as for gluing code, type signature for io writing d -> IO () is 
perfectly sufficient.

-- 
Daniil Elovkov