[Haskell-beginners] HaXml.SAX successfully parses a malformed XML
document
David Frey
dpfrey at shaw.ca
Thu Oct 16 19:47:56 EDT 2008
It seems that saxParse from Text.XML.HaXml.SAX will successfully parse a
malformed XML document.
Notice that the SubElem opening and closing tag are not matched in the
XML document below.
--- input.xml ---
<RootNode>
<ElemOne>
<SubElem attr="foo">element data</SubElemBroken>
<NoDataElem/>
<ElemOne>
</RootNode>
--- end input.xml ---
The Haskell code below runs without error. It prints out the type of
elements found during the parse.
--- Haskell Code ---
module Main where
import qualified Text.XML.HaXml.SAX as SAX
import Data.Maybe
main = let inputFilename = "input.xml" in
do content <- readFile inputFilename
let (elements, error) = SAX.saxParse inputFilename content
if isNothing error
then mapM_ putStrLn (summarizeElements elements)
else putStrLn $ "ERROR:" ++ (fromJust error)
summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = map summarizeElement elements
summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
(SAX.SaxDocTypeDecl d) -> "DocType"
(SAX.SaxProcessingInstruction p) -> "Processing Instruction"
(SAX.SaxComment s) -> "Comment"
(SAX.SaxElementOpen name attrs) -> "Element Open"
(SAX.SaxElementClose name) -> "Element Close"
(SAX.SaxElementTag name attrs) -> "No Content Element"
(SAX.SaxCharData charData) -> "Character Data"
(SAX.SaxReference reference) -> "Reference"
--- End Haskell Code ---
The Python code below throws an exception when parsing the same input
document.
--- Python Code ---
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
def main():
c = ContentHandler()
p = make_parser()
p.setContentHandler(c)
p.parse(open('input.xml'))
if __name__ == '__main__':
main()
-- End Python Code ---
Is this a bug in the HaXml SAX parser?
Thanks,
David
More information about the Beginners
mailing list