MR K P SCHUPKE
k.schupke at imperial.ac.uk
Fri May 14 16:37:11 EDT 2004
>> correcting parser
>I would think this is a rather specialized requirement. I certainly don't
>want a "correcting" parser for my work. But I can see that some
Especially if you are aiming for a genarally applicable library. What's really needed
is a 'Strictness' switch for the parser, so you can select validating, or non-validating.
The parser I wrote uses heuristsics that specify how to fix mistakes (like missing close
tags) - this enables the XML parser to parse HTML with the correct set of rules.
>This seems reasonable, and I'd expect a reasonable implementation (of a
>filter) to stream via lazy evaluation where that matches the final usage
>pattern. The outline I sketched (copied below) was intended to be built
>upon something like HaXML's filter idea, so that streaming processing would
>(in principle) be possible.
If you don't use a list based representation, the only other way to get 'lazy' behaviour
is to use the 'event' model - which is a lot more complex, but possible using one thread
to do the parsing, and a Channel to pass the events through.
I think you misunderstand what the parser and renderer do, the parser takes String input
and outputs a stream of elements based on the BNF specification for XML... It looks like:
data XmlElement = XMLDecl [XmlAttribute]
| DocType XmlTagName XmlSystemLiteral XmlPubidLiteral
| EmptyTag XmlTagName [XmlAttribute]
| STag XmlTagName [XmlAttribute]
| ETag XmlTagName
| Text [XmlElement]
| CharData String
| CharRef Int
| EntityRef XmlTagName
| PERef XmlTagName
| CDSect String
| PI XmlTagName String
| Comment String
| Unparsed String deriving (Show,Eq)
The renderer takes a stream of these elements and converts to a String.
All the filtering/reading/writing is done on streams of these elements.
For example a filter could select only <Person ...> records from an XML
data source, the person specific reader would then convert to a specific
representation of a Person...
myReadey :: [(XmlTreeDepth,XmlElement)] -> [Person]
a writer does the opposite.
More information about the Haskell-Cafe