[Haskell-beginners] xml light problem or pebkac

Daniel Fischer daniel.is.fischer at googlemail.com
Mon Jun 6 21:55:23 CEST 2011


On Monday 06 June 2011, 20:19:49, Manfred Lotz wrote:
> Hi all,
> I'm a bit lost. I'm having a problem with xml light but as a Haskell
> newbie it could be of course that the problem is sitting between my
> keyboard and chair.
> 

Not really. Yes, you're doing something wrong, but it's a bit delicate and 
nothing obvious. It's the unobvious behaviour of lazy IO.

> 
> I tried hard to create a minimal example.
> 
> Here is a minimal xmltest.hs:
> 
> <-----------------------------snip-------------------------------->
> module Main where
> 
> import System.Environment.UTF8
> import qualified System.IO.UTF8 as U



> import System.IO
> import Text.XML.Light
> 
> 
> data CTest = CTest {
>     ctName :: String
>   , ctLocation :: String
>   } deriving (Show,Read,Eq,Ord)
> 
> getXmlContent :: Handle -> IO CTest
> getXmlContent inh = do
>     xml <- U.hGetContents inh

Problem, part 1

>     let content = parseXMLDoc xml
>     case content of
>       Just c -> do let name = case findChild (unqual "name") c of
>                          Nothing -> "<unknown>"
>                          Just n' -> strContent n'
>                    let path = case findChild (unqual "location") c of
>                          Nothing -> "<unknown>"
>                          Just path' -> case findAttr (unqual "path")
> path' of Nothing -> "<unknown>"
>                                       Just p -> p
>                    return CTest { ctName=name, ctLocation=path}
>       _  -> fail "not expected"
> 
> 
> readXmlFile :: FilePath -> IO CTest
> readXmlFile f = do
>   inh <- U.openBinaryFile f ReadMode
>   xml <- getXmlContent inh
>   hClose inh

Problem, part 2, the real problem.

The point is that getXmlContent doesn't really parse the file yet, it 
returns a thunk saying how to get the result from the file contents.
Therefore, it doesn't need to read the entire file, just enough of it to 
find out whether parseXMLDoc returns a Just contents or a Nothing.

Then you close the handle, explicitly. That means, it's closed immediately, 
leaving the unread portion of the file unread. If the bla bla is long 
enough, the location tag is in the unread part, and when finally the search 
for that tag is forced by printing, the tag is not contained in the input.


If you leave out the call to hClose, leaving the closing to hGetContents 
when it reaches the end of the file, the file contents is not truncated, 
and the location found.
But then the file handle might remain half-closed longer than you wish (you 
could run out of file handles if you open a lot of files without explicitly 
closing them before the next [bunch] is opened).

If you force the result before closing the handle, enough of the file is 
read to find the desired elements, and you make sure that there's no 
leaking file handle -- unless there's an exception in getXmlContent, so 
that the hClose is never reached. To prevent that, use 
Control.Exception.bracket, or use withBinaryFile, which does that for you.

>   return xml
> 
> doSomething :: Show a => a -> IO ()
> doSomething xml =
>   print xml
> 
> main :: IO ()
> main = do
>   args <- getArgs
>   xml <- readXmlFile $ head args
>   doSomething xml
> <-----------------------------snap-------------------------------->
> 
 
> Question: What is my error?



More information about the Beginners mailing list