[Haskell-beginners] xml light problem or pebkac

Manfred Lotz manfred.lotz at arcor.de
Tue Jun 7 15:22:38 CEST 2011

On Mon, 6 Jun 2011 21:55:23 +0200
Daniel Fischer <daniel.is.fischer at googlemail.com> wrote:

> Problem, part 2, the real problem.
> The point is that getXmlContent doesn't really parse the file yet, it 
> returns a thunk saying how to get the result from the file contents.
> Therefore, it doesn't need to read the entire file, just enough of it
> to find out whether parseXMLDoc returns a Just contents or a Nothing.
> Then you close the handle, explicitly. That means, it's closed
> immediately, leaving the unread portion of the file unread. If the
> bla bla is long enough, the location tag is in the unread part, and
> when finally the search for that tag is forced by printing, the tag
> is not contained in the input.
> If you leave out the call to hClose, leaving the closing to
> hGetContents when it reaches the end of the file, the file contents
> is not truncated, and the location found.
> But then the file handle might remain half-closed longer than you
> wish (you could run out of file handles if you open a lot of files
> without explicitly closing them before the next [bunch] is opened).

I ran out of handles in the first place that's why I had changed the
code just to got bitten by the IO laziness. Thanks for explaining it to
me. Changing the sample code to use either bracket or withBinaryFile
(implicitly bracket) makes it work indeed.

When I go back to my original program it seems I cannot get rid easily
of the laziness.

I process a bunch of xml files like this:
  import qualified Data.Map as M

  -- xmlfiles is a list of [FilePath]
  ht <- foldM insertXml M.empty xmlfiles 
  mapM_ printEntry (M.toList ht)

and in insertXML I now have

insertXml m xf = do
  U.withBinaryFile xf ReadMode
     (\handle -> do ct <- getXmlContent xf handle
                    let k = ctName ct
                    let m' = if k /= ""
                                   if M.lookup k m == Nothing 
                                      then M.insert k ct m
                                      else m
                                else m
                    return m')

I'm not sure if it is obvious where my mistake here lies. If not I have
to try to make this a working minimal example.

Thanks again,

More information about the Beginners mailing list