[Haskell-cafe]

MR K P SCHUPKE k.schupke at imperial.ac.uk
Fri May 14 16:37:11 EDT 2004


>> correcting parser

>I would think this is a rather specialized requirement.  I certainly don't 
>want a "correcting" parser for my work.  But I can see that some 
>applications might...

Especially if you are aiming for a genarally applicable library. What's really needed
is a 'Strictness' switch for the parser, so you can select validating, or non-validating.

The parser I wrote uses heuristsics that specify how to fix mistakes (like missing close
tags) - this enables the XML parser to parse HTML with the correct set of rules.

>This seems reasonable, and I'd expect a reasonable implementation (of a 
>filter) to stream via lazy evaluation where that matches the final usage 
>pattern.  The outline I sketched (copied below) was intended to be built 
>upon something like HaXML's filter idea, so that streaming processing would 
>(in principle) be possible.

If you don't use a list based representation, the only other way to get 'lazy' behaviour
is to use the 'event' model - which is a lot more complex, but possible using one thread
to do the parsing, and a Channel to pass the events through. 

I think you misunderstand what the parser and renderer do, the parser takes String input
and outputs a stream of elements based on the BNF specification for XML... It looks like:

data XmlElement = XMLDecl [XmlAttribute] 
   | DocType XmlTagName XmlSystemLiteral XmlPubidLiteral
   | EmptyTag XmlTagName [XmlAttribute]
   | STag XmlTagName [XmlAttribute]
   | ETag XmlTagName
   | Text [XmlElement]
   | CharData String
   | CharRef Int
   | EntityRef XmlTagName
   | PERef XmlTagName
   | CDSect String 
   | PI XmlTagName String
   | Comment String
   | Flush
   | Undefined
   | Unparsed String deriving (Show,Eq)

The renderer takes a stream of these elements and converts to a String.

All the filtering/reading/writing is done on streams of these elements.

For example a filter could select only <Person ...> records from an XML
data source, the person specific reader would then convert to a specific
representation of a Person...

	myReadey :: [(XmlTreeDepth,XmlElement)] -> [Person]

a writer does the opposite.

	Regards,
	Keean.



More information about the Haskell-Cafe mailing list