[Haskell-cafe] Programming style and XML processing in
gk at ninebynine.org
Fri May 14 15:00:43 EDT 2004
At 17:45 13/05/04 +0100, MR K P SCHUPKE wrote:
>Just sticking in my two pence worth...
>I am not sure what application you intend this for, but I find most XML
>parsers completely useless. With my application programmers hat on, I do
>not want to validate against a DTD, I want to extract as much information
>as possible from bad XML... what I would like is a correcting parser - one
>which outputs XML in compliance, but will accept any old rubbish and make
>a best guess attempt to fix it up (based on a set of configurable
I would think this is a rather specialized requirement. I certainly don't
want a "correcting" parser for my work. But I can see that some
>Secondly I deal with very large documents, the tree form of which won't fit
>in memory, so I would see an XML parser doin the following...
> parser :: String -> [XmlElements]
> filter :: [XmlElements] -> [XmlElements]
> reader :: [XmlElements] -> ... output data types ...
> writer :: ... input data types ... -> [XmlElements]
> render :: [XmlElements] -> String
>In order to keep track of the tree structure the tree-depth of each element
>is encoded within the XmlElement type... thus allowing the data to be streamed
>through the filters/readers etc. This means the parser can output the
>first element as
>soon as it encounters the second element (lazy list == stream in Haskell)
>rather than having to wait until the last element as would happen with a
>(it is a tree not a graph as XML elements can only contain sub-elements)...
This seems reasonable, and I'd expect a reasonable implementation (of a
filter) to stream via lazy evaluation where that matches the final usage
pattern. The outline I sketched (copied below) was intended to be built
upon something like HaXML's filter idea, so that streaming processing would
(in principle) be possible.
My requirement is not to generate yet more XML, but to extract something
quite different from the XML, so I think I'd be looking for something like
your 'reader', which could be part of the lowest element in my diagram.
>As I said the above is just my opinion, and as it happens I have written a
>parser that does the above... I guess that is why there are several
>parsers for XML available (different requirements) and there will probably
>be many more ...
I agree about the different requirements, but I think it would be good if
this didn't mean different XML libraries; I'm fishing for an arrangement
that allows the different requirements to be satisfied from common (or
overlapping) components. I like your suggested
parser/filter/reader/writer/render model, and I'll consider how that fits
with the existing libraries (I really don't want to start from scratch
here). I guess a 'parser' could be a special case of 'writer', and
'render' a special case of 'reader'.
(Reprise of last part of my previous message...)
What do I think an XML library for Haskell should look like? The
component's I'd like to see would look something like this:
XML parser :: String ---> (internal representation)
| -----> IO function to perform full validation
| and external DTD handling [optional]
XML filter combinators --+--> entity substitution logic
| +--> namespace handling
| +--> XSLT processing [optional, for now **]
DOM-like read-only interface for access to data at level comparable to
XML infoset (used to avoid dependency between applications that use
infoset data and details of the internal representation used.)
More information about the Haskell-Cafe