[Haskell-cafe] Fwd: [Haskell-beginners] HaXml.SAX successfully parses a malformed XML document

David Frey dpfrey at shaw.ca
Fri Oct 17 18:59:48 EDT 2008


I asked this question on the haskell-beginners list, but I didn't get a
response, so I am forwarding it on to the cafe list.

David


--- Original Message ---
Date: 10/16/2008
From: "David Frey" <dpfrey at shaw.ca>
Subject: [Haskell-beginners] HaXml.SAX successfully parses a malformed
XML document


It seems that saxParse from Text.XML.HaXml.SAX will successfully parse a
malformed XML document.

Notice that the SubElem opening and closing tag are not matched in the
XML document below.

--- input.xml ---
<RootNode>
    <ElemOne>
        <SubElem attr="foo">element data</SubElemBroken>
        <NoDataElem/>
    <ElemOne>
</RootNode>
--- end input.xml ---


The Haskell code below runs without error.  It prints out the type of
elements found during the parse.


--- Haskell Code ---
module Main where

import qualified Text.XML.HaXml.SAX as SAX
import Data.Maybe

main = let inputFilename = "input.xml" in
    do content <- readFile inputFilename
       let (elements, error) = SAX.saxParse inputFilename content
       if isNothing error
            then mapM_ putStrLn (summarizeElements elements)
            else putStrLn $ "ERROR:" ++ (fromJust error)



summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = map summarizeElement elements


summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
    (SAX.SaxDocTypeDecl d)           -> "DocType"
    (SAX.SaxProcessingInstruction p) -> "Processing Instruction"
    (SAX.SaxComment s)               -> "Comment"
    (SAX.SaxElementOpen name attrs)  -> "Element Open"
    (SAX.SaxElementClose name)       -> "Element Close"
    (SAX.SaxElementTag name attrs)   -> "No Content Element"
    (SAX.SaxCharData charData)       -> "Character Data"
    (SAX.SaxReference reference)     -> "Reference"

--- End Haskell Code ---


The Python code below throws an exception when parsing the same input
document.


--- Python Code ---
from xml.sax import make_parser
from xml.sax.handler import ContentHandler


def main():
    c = ContentHandler()
    p = make_parser()
    p.setContentHandler(c)
    p.parse(open('input.xml'))


if __name__ == '__main__':
    main()
-- End Python Code ---


Is this a bug in the HaXml SAX parser?

Thanks,
David
_______________________________________________
Beginners mailing list
Beginners at haskell.org
http://www.haskell.org/mailman/listinfo/beginners


More information about the Haskell-Cafe mailing list