[Haskell-cafe] Fwd: [Haskell-beginners] HaXml.SAX successfully parses a malformed XML document

David Frey dpfrey at shaw.ca
Fri Oct 17 18:59:48 EDT 2008

I asked this question on the haskell-beginners list, but I didn't get a
response, so I am forwarding it on to the cafe list.


--- Original Message ---
Date: 10/16/2008
From: "David Frey" <dpfrey at shaw.ca>
Subject: [Haskell-beginners] HaXml.SAX successfully parses a malformed
XML document

It seems that saxParse from Text.XML.HaXml.SAX will successfully parse a
malformed XML document.

Notice that the SubElem opening and closing tag are not matched in the
XML document below.

--- input.xml ---
        <SubElem attr="foo">element data</SubElemBroken>
--- end input.xml ---

The Haskell code below runs without error.  It prints out the type of
elements found during the parse.

--- Haskell Code ---
module Main where

import qualified Text.XML.HaXml.SAX as SAX
import Data.Maybe

main = let inputFilename = "input.xml" in
    do content <- readFile inputFilename
       let (elements, error) = SAX.saxParse inputFilename content
       if isNothing error
            then mapM_ putStrLn (summarizeElements elements)
            else putStrLn $ "ERROR:" ++ (fromJust error)

summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = map summarizeElement elements

summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
    (SAX.SaxDocTypeDecl d)           -> "DocType"
    (SAX.SaxProcessingInstruction p) -> "Processing Instruction"
    (SAX.SaxComment s)               -> "Comment"
    (SAX.SaxElementOpen name attrs)  -> "Element Open"
    (SAX.SaxElementClose name)       -> "Element Close"
    (SAX.SaxElementTag name attrs)   -> "No Content Element"
    (SAX.SaxCharData charData)       -> "Character Data"
    (SAX.SaxReference reference)     -> "Reference"

--- End Haskell Code ---

The Python code below throws an exception when parsing the same input

--- Python Code ---
from xml.sax import make_parser
from xml.sax.handler import ContentHandler

def main():
    c = ContentHandler()
    p = make_parser()

if __name__ == '__main__':
-- End Python Code ---

Is this a bug in the HaXml SAX parser?

Beginners mailing list
Beginners at haskell.org

More information about the Haskell-Cafe mailing list