[Haskell-cafe] Re: Lazy XML handling

Graham Klyne GK at ninebynine.org
Fri May 14 18:36:00 EDT 2004


At 15:37 14/05/04 +0100, MR K P SCHUPKE wrote:
>If you don't use a list based representation, the only other way to get 
>'lazy' behaviour
>is to use the 'event' model - which is a lot more complex, but possible 
>using one thread
>to do the parsing, and a Channel to pass the events through.

Surely a list isn't the *only* structure that will exhibit the required 
streaming behaviour;  e.g. HaXML uses a tree structure built around this:

[[
data Element   = Elem Name [Attribute] [Content]
data ElemTag   = ElemTag Name [Attribute]       -- ^ intermediate for parsing
type Attribute = (Name, AttValue)
data Content   = CElem Element
                | CString Bool CharData
                         -- ^ bool is whether whitespace is significant
                | CRef Reference
                | CMisc Misc
]]

Provided the resulting tree of Element/Content values (ignoring the 
attribute "fluff" here) is traversed depth-first, left-to-right, I'd expect 
the details to be retrieved in the order that they are made accessible by a 
left-to-right parser ... at each active (i.e. partially evaluated) node in 
the tree, elements to the right of the most recently evaluated 'Content' 
value would be unevaluated.  A parser that can generate a simple list of 
values in the style you describe could also progressively build a tree in 
this way.

My view is that this allows an application designer to make the appropriate 
space/convenience trade-offs:  non-sequential access is possible, at a cost 
of memory usage, by using different tree-traversal approach.

Hmmm... I guess what I describe above depends rather on the "processed" 
part of the tree being garbage-collectable, which presumes that the 
application isn't "hanging on" to a reference to an "Element" value once it 
starts to process the next one.  So I guess what I describe is possible, 
but needs to be handled carefully if the memory usage is to be 
contained.  I think it should be possible to write a function that 
traverses the tree and returns a list of the kind you describe (using a 
technique like ShowS to build the list);  as long as the calling program 
uses the resulting list sequentially, one at a time, then I'd expect the 
desired effect on memory usage.   (Would this correspond to your "channel" 
noted above?)

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact



More information about the Haskell-Cafe mailing list