[Haskell-cafe] Fwd: [Haskell-beginners] HaXml.SAX successfully
parses a malformed XML document
David Frey
dpfrey at shaw.ca
Fri Oct 17 18:59:48 EDT 2008
I asked this question on the haskell-beginners list, but I didn't get a
response, so I am forwarding it on to the cafe list.
David
--- Original Message ---
Date: 10/16/2008
From: "David Frey" <dpfrey at shaw.ca>
Subject: [Haskell-beginners] HaXml.SAX successfully parses a malformed
XML document
It seems that saxParse from Text.XML.HaXml.SAX will successfully parse a
malformed XML document.
Notice that the SubElem opening and closing tag are not matched in the
XML document below.
--- input.xml ---
<RootNode>
<ElemOne>
<SubElem attr="foo">element data</SubElemBroken>
<NoDataElem/>
<ElemOne>
</RootNode>
--- end input.xml ---
The Haskell code below runs without error. It prints out the type of
elements found during the parse.
--- Haskell Code ---
module Main where
import qualified Text.XML.HaXml.SAX as SAX
import Data.Maybe
main = let inputFilename = "input.xml" in
do content <- readFile inputFilename
let (elements, error) = SAX.saxParse inputFilename content
if isNothing error
then mapM_ putStrLn (summarizeElements elements)
else putStrLn $ "ERROR:" ++ (fromJust error)
summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = map summarizeElement elements
summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
(SAX.SaxDocTypeDecl d) -> "DocType"
(SAX.SaxProcessingInstruction p) -> "Processing Instruction"
(SAX.SaxComment s) -> "Comment"
(SAX.SaxElementOpen name attrs) -> "Element Open"
(SAX.SaxElementClose name) -> "Element Close"
(SAX.SaxElementTag name attrs) -> "No Content Element"
(SAX.SaxCharData charData) -> "Character Data"
(SAX.SaxReference reference) -> "Reference"
--- End Haskell Code ---
The Python code below throws an exception when parsing the same input
document.
--- Python Code ---
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
def main():
c = ContentHandler()
p = make_parser()
p.setContentHandler(c)
p.parse(open('input.xml'))
if __name__ == '__main__':
main()
-- End Python Code ---
Is this a bug in the HaXml SAX parser?
Thanks,
David
_______________________________________________
Beginners mailing list
Beginners at haskell.org
http://www.haskell.org/mailman/listinfo/beginners
More information about the Haskell-Cafe
mailing list