[Haskell-cafe] Re: Lazy XML handling
Niklas Broberg
n_broberg at hotmail.com
Fri May 21 22:27:23 EDT 2004
MR K P Shupke wrote:
>Can you give me an example where processing HAS to be in a different order?
No, but I can give an example where a list based encoding is less
sufficient.
I'm coming from a completely different view here, namely that of using
Haskell not to query or transform but to create XML. I'm currently in a
project[1] trying to fix a working implementation of Haskell Server
Pages[2], i.e. using Haskell as a server-side scripting language for dynamic
webpages. Clearly the most used mode of operation in this setting is
constructing an XML tree rather than querying or transforming an existing
one.
With this in mind, the list based encoding would be less efficient than a
treelike datatype. Building the XML tree is likely to be done bottom-up, and
adding a parent to a subtree would be O(n) with a list, but O(1) with a tree
datatype. Building a whole tree from the bottom up is O(n^2) for a list, but
O(n) for the tree datatype.
MR K P Shupke wrote:
>The nested lists does not allow any extra flexibility in traversal, at the
>cost of
>greater complexity. The one case it is simpler is if you want to look up a
>single
>node by its 'tree address' - This is more complex with the list-rep, as you
>have to count nodes at a given level, skipping irrelevent nodes...
The fact that the list based representation is indeed more complex is in my
opinion a clear indication that it should not be the standard. Compare this
to Haskell strings where the standard is the simple but ineffective [Char],
but where there is another more efficient representation, PackedString, if
you really need it. Simplicity and ease of understanding is not to be taken
lightly.
By using a tree datatype representation you also get pattern matching
directly on the structure of the tree, a very useful feature to have.
MR K P Shupke wrote:
>The list rep is very good at data extraction, if you want to extract a
>sub-tree
>of a given type you just ignore all tags until you find one of the correct
>type, then output all tags until the depth returns to the level of the
>initial
>tag.
That might possibly be a programmatic advantage, though I don't see how it
would be more difficult with a tree datatype. In any case it is no more
efficient, both are O(n).
As far as I can tell there is no single area where the list representation
would outperform a tree datatype, other than the special case of lazy
streaming.
My suggestions are thus:
The standard representation should be a tree datatype, both because it is
more general and efficiently supports a wide range of operation modes
(construction, querying, transformation, rendering), and also because that
is what lies closest to the intuition of XML.
However, the ability to stream large XML structures lazily is clearly useful
and is difficult to achieve using a tree datatype. Why not add it as a
specialisation?
The standard representation would be a datatype XML living in Data.XML (or
similar), and the specialized would be a type XMLStream living in
Data.XML.Stream. There should of course also be functions to convert between
the types.
Comment away,
/Niklas
[1] http://www.dtek.chalmers.se/~d00nibro/hsp/
[2] http://www.research.microsoft.com/~emeijer/Papers/HSP.pdf
_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
More information about the Haskell-Cafe
mailing list