[Haskell-beginners] HaXml.SAX successfully parses a malformed XML document

David Frey dpfrey at shaw.ca
Thu Oct 16 19:47:56 EDT 2008

It seems that saxParse from Text.XML.HaXml.SAX will successfully parse a
malformed XML document.

Notice that the SubElem opening and closing tag are not matched in the
XML document below.

--- input.xml ---
        <SubElem attr="foo">element data</SubElemBroken>
--- end input.xml ---

The Haskell code below runs without error.  It prints out the type of
elements found during the parse.

--- Haskell Code ---
module Main where

import qualified Text.XML.HaXml.SAX as SAX
import Data.Maybe

main = let inputFilename = "input.xml" in
    do content <- readFile inputFilename
       let (elements, error) = SAX.saxParse inputFilename content
       if isNothing error
            then mapM_ putStrLn (summarizeElements elements)
            else putStrLn $ "ERROR:" ++ (fromJust error)

summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = map summarizeElement elements

summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
    (SAX.SaxDocTypeDecl d)           -> "DocType"
    (SAX.SaxProcessingInstruction p) -> "Processing Instruction"
    (SAX.SaxComment s)               -> "Comment"
    (SAX.SaxElementOpen name attrs)  -> "Element Open"
    (SAX.SaxElementClose name)       -> "Element Close"
    (SAX.SaxElementTag name attrs)   -> "No Content Element"
    (SAX.SaxCharData charData)       -> "Character Data"
    (SAX.SaxReference reference)     -> "Reference"

--- End Haskell Code ---

The Python code below throws an exception when parsing the same input

--- Python Code ---
from xml.sax import make_parser
from xml.sax.handler import ContentHandler

def main():
    c = ContentHandler()
    p = make_parser()

if __name__ == '__main__':
-- End Python Code ---

Is this a bug in the HaXml SAX parser?


More information about the Beginners mailing list