[Haskell-cafe] Programming style and XML processing in Haskell
MR K P SCHUPKE
k.schupke at imperial.ac.uk
Thu May 13 18:45:25 EDT 2004
Just sticking in my two pence worth...
I am not sure what application you intend this for, but I find most XML
parsers completely useless. With my application programmers hat on, I do
not want to validate against a DTD, I want to extract as much information
as possible from bad XML... what I would like is a correcting parser - one
which outputs XML in compliance, but will accept any old rubbish and make
a best guess attempt to fix it up (based on a set of configurable
heuristic rules)...
Secondly I deal with very large documents, the tree form of which won't fit
in memory, so I would see an XML parser doin the following...
parser :: String -> [XmlElements]
filter :: [XmlElements] -> [XmlElements]
reader :: [XmlElements] -> ... output data types ...
writer :: ... input data types ... -> [XmlElements]
render :: [XmlElements] -> String
In order to keep track of the tree structure the tree-depth of each element
is encoded within the XmlElement type... thus allowing the data to be streamed
through the filters/readers etc. This means the parser can output the first element as
soon as it encounters the second element (lazy list == stream in Haskell)
rather than having to wait until the last element as would happen with a DOM tree
(it is a tree not a graph as XML elements can only contain sub-elements)...
As I said the above is just my opinion, and as it happens I have written a
parser that does the above... I guess that is why there are several
parsers for XML available (different requirements) and there will probably
be many more ...
Regards,
Keean.
More information about the Haskell-Cafe
mailing list