[Haskell-cafe] memory needed for SAX parsing XML
Daniil Elovkov
daniil.elovkov at googlemail.com
Fri Apr 23 10:16:48 EDT 2010
John Lato wrote:
>
>> Another (additional) approach would be to encapsulate unsafeInterleaveIO
>> within some routine and not let it go out into the wild.
>>
>> lazilyDoWithIO :: IO a -> (a -> b) -> IO b
>>
>> It would use unsafeInterleave internally but catch all IO errors within
>> itself.
>>
>> I wonder if this is a reasonable idea? Maybe it's already done?
>> So the topic is shifting...
>
> doWithIO :: NFData b => IO a -> (a -> b) -> IO b
> doWithIO m f = liftM (\a -> let b = f a in b `deepseq` b) m
>
> It works (just stick it in a "try" block for error handling), but you
> need to write a lot of NFData instances. You also need to be careful
> that b is some sort of reduced structure, or you can end up forcing
> the whole file (or other data) into memory.
I meant a different thing. In your example there is no unsafeInterleave
at all. I think you mean that 'm' argument is supposed to be an
unsafeInterleaved io action, like getContents, and deepseq'ing saves us
from it hanging somewhere for a long time. Ok
But I meant to have a routine that lets us use ordinary io actions in a
lazy way, and restricting that 'hanging' within bounds of this routine.
But having thought about it a little more I understood that it's impossible.
Lazy io works now because unsafeInterleaveIO is sticked into getContents
itself, and is called repeatedly (via recursion). Or at least I can
think of this implementation, haven't looked into it.
I realised that calling unsafeInterleaveIO for an ordinary io action
will not make it run lazily. It will, still, run all-at-once, just not
now, but later.
So, to cope with it, I can think of exposing a little structure of an io
action. Normally it's completely opaque. But if we knew where its
recursion point lies, then we could control its course of execution.
So, if we had something like
type RecIO a = IO a -> IO a
and io actions were like
getContents :: RecIO [Char]
getContents rec = do
c <- readOneChar
rest <- rec
return (c:rest)
then we could either run them
normally :: RecIO a -> IO a
normally r = r (normally r)
or
lazily :: RecIO a -> IO a
lazily r = unsafeInterleavIO $ r (lazily r)
And
lazilyDoWithIO :: RecIO a -> (a -> b) -> IO b
lazilyDoWithIO m f = do
a <- lazily m
return $ f a
Hmm, but then, we would have to take special care to not let it out of
this function anyway... So here we come to deepSeq'ing you proposed.
And anyway, instead of re-writing pure functions to become iteratees we
will have to re-write io functions to adopt continuation passing style.
Initially it looked better to me :)
But with this approach we can run lazily any io action that has the form
of RecIO. Also, we can interleave normally and lazily based on the time
of day and other conditions :)
> It also doesn't help with
> other IO effects, e.g. writing output. I consider this one of the
> nicest features of iteratee-based processing.
Can you clarify what's the problem with writing? I think I just haven't
switched from the topic of gluing code.
Because as for gluing code, type signature for io writing d -> IO () is
perfectly sufficient.
--
Daniil Elovkov
More information about the Haskell-Cafe
mailing list