[Haskell-cafe] Re: Lazy XML handling
Graham Klyne
GK at ninebynine.org
Fri May 14 18:36:00 EDT 2004
At 15:37 14/05/04 +0100, MR K P SCHUPKE wrote:
>If you don't use a list based representation, the only other way to get
>'lazy' behaviour
>is to use the 'event' model - which is a lot more complex, but possible
>using one thread
>to do the parsing, and a Channel to pass the events through.
Surely a list isn't the *only* structure that will exhibit the required
streaming behaviour; e.g. HaXML uses a tree structure built around this:
[[
data Element = Elem Name [Attribute] [Content]
data ElemTag = ElemTag Name [Attribute] -- ^ intermediate for parsing
type Attribute = (Name, AttValue)
data Content = CElem Element
| CString Bool CharData
-- ^ bool is whether whitespace is significant
| CRef Reference
| CMisc Misc
]]
Provided the resulting tree of Element/Content values (ignoring the
attribute "fluff" here) is traversed depth-first, left-to-right, I'd expect
the details to be retrieved in the order that they are made accessible by a
left-to-right parser ... at each active (i.e. partially evaluated) node in
the tree, elements to the right of the most recently evaluated 'Content'
value would be unevaluated. A parser that can generate a simple list of
values in the style you describe could also progressively build a tree in
this way.
My view is that this allows an application designer to make the appropriate
space/convenience trade-offs: non-sequential access is possible, at a cost
of memory usage, by using different tree-traversal approach.
Hmmm... I guess what I describe above depends rather on the "processed"
part of the tree being garbage-collectable, which presumes that the
application isn't "hanging on" to a reference to an "Element" value once it
starts to process the next one. So I guess what I describe is possible,
but needs to be handled carefully if the memory usage is to be
contained. I think it should be possible to write a function that
traverses the tree and returns a list of the kind you describe (using a
technique like ShowS to build the list); as long as the calling program
uses the resulting list sequentially, one at a time, then I'd expect the
desired effect on memory usage. (Would this correspond to your "channel"
noted above?)
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
More information about the Haskell-Cafe
mailing list