[Haskell-beginners] xml light problem or pebkac

Daniel Fischer daniel.is.fischer at googlemail.com
Tue Jun 7 16:09:41 CEST 2011


On Tuesday 07 June 2011, 15:22:38, Manfred Lotz wrote:
> On Mon, 6 Jun 2011 21:55:23 +0200
> 
> Daniel Fischer <daniel.is.fischer at googlemail.com> wrote:
> > Problem, part 2, the real problem.
> > 
> > The point is that getXmlContent doesn't really parse the file yet, it
> > returns a thunk saying how to get the result from the file contents.
> > Therefore, it doesn't need to read the entire file, just enough of it
> > to find out whether parseXMLDoc returns a Just contents or a Nothing.
> > 
> > Then you close the handle, explicitly. That means, it's closed
> > immediately, leaving the unread portion of the file unread. If the
> > bla bla is long enough, the location tag is in the unread part, and
> > when finally the search for that tag is forced by printing, the tag
> > is not contained in the input.
> > 
> > 
> > If you leave out the call to hClose, leaving the closing to
> > hGetContents when it reaches the end of the file, the file contents
> > is not truncated, and the location found.
> > But then the file handle might remain half-closed longer than you
> > wish (you could run out of file handles if you open a lot of files
> > without explicitly closing them before the next [bunch] is opened).
> 
> I ran out of handles in the first place that's why I had changed the
> code just to got bitten by the IO laziness. Thanks for explaining it to
> me. Changing the sample code to use either bracket or withBinaryFile
> (implicitly bracket) makes it work indeed.
> 
> 
> When I go back to my original program it seems I cannot get rid easily
> of the laziness.
> 
> I process a bunch of xml files like this:
> 
>   import qualified Data.Map as M
>   ...
> 
>   -- xmlfiles is a list of [FilePath]
>   ht <- foldM insertXml M.empty xmlfiles
>   mapM_ printEntry (M.toList ht)
> 
> 
> and in insertXML I now have
> 
> 
> insertXml m xf = do
>   U.withBinaryFile xf ReadMode
>      (\handle -> do ct <- getXmlContent xf handle
>                     let k = ctName ct
>                     let m' = if k /= ""
>                                 then
>                                    if M.lookup k m == Nothing
>                                       then M.insert k ct m
>                                       else m
>                                 else m
>                     return m')
> 
> 
> 
> 
> I'm not sure if it is obvious where my mistake here lies. If not I have
> to try to make this a working minimal example.

I wrote:
> If you force the result before closing the handle, enough of the file is
> read to find the desired elements,

You're not forcing the result. You can choose to force in getXmlContent by 
seq'ing on name, location and whatever else fields you have, then using a 
strict return (return $! m') in the withBinaryFile action is sufficient, or 
you can do all the forcing in the withBinaryFile action, for example

do ct@(CTest !nm !loc) <- getXmlContent xf handle
   ...
   return $! m'

with BangPatterns (use seq if you want your code to be portable to other 
implementations than GHC).



More information about the Beginners mailing list