[Haskell-cafe] Hexpat: Lazy I/O problem with huge input files

Aleksandar Dimitrov aleks.dimitrov at googlemail.com
Thu Oct 14 09:52:21 EDT 2010


Hello Daniel,

> I don't know Hexpat at all, so I can only guess.
>
> Perhaps due to the laziness of let-bindings, mError keeps a reference to
> the entire tuple, thus preventing tree from being garbage collected as it
> is consumed by print.

Thanks for your input. I think you are right, the parse tree isn't
freed as the parse proceeds if mError is forced later on in the
program (anywhere.) I don't think it has something to do with the
tuple constructor or 'let' itself, but I'm also not very proficient at
figuring these kinds of things out, so I may be very wrong. I did do
the following test to support my hypothesis:

> import Text.XML.Expat.Tree
> import System.Environment (getArgs)
> import Control.Monad (liftM)
> import qualified Data.ByteString.Lazy as C
>·
> -- This is the recommended way to handle errors in lazy parses
> main = do
>     f <- liftM head getArgs >>= C.readFile
>     let (_, mError) = parse defaultParseOptions f :: (UNode String, Maybe XMLParseError)

>     case mError of
>          Just err -> putStrLn $ "It failed : "++show err
>          Nothing -> putStrLn "Success!"

I.e., keeping the parse tree is forced by the evaluation of mError.
There is not a single reference to the parse tree within the program
itself (unless I'm not noticing some sort of do-notation magic in the
whole thing here...) It is interesting (and rather unfortunate) that
just evaluating a potential error seems to block garbage collection.

If I'm correct (and I hope I'm not!) this seems to prevent using lazy
I/O in Hexpat if you want to know if there's a parse error (and if so,
what that would be.) I'll contact the author, maybe it's a genuine
bug?

Thanks again :-)
Aleks


More information about the Haskell-Cafe mailing list