[Haskell-cafe] Re: hxt memory useage

Malcolm Wallace Malcolm.Wallace at cs.york.ac.uk
Fri Feb 1 06:25:53 EST 2008


"Rene de Visser" <Rene_de_Visser at hotmail.com> wrote:

> Even if you replace parsec, HXT is itself not
> incremental.  (It stores the whole XML document in memory as a tree,
> and the tree is not  memory effecient.

If the usage pattern of the tree is search-and-discard, then only enough
of the tree to satisfy the search needs to be stored in memory at once.
Everything from the root to the first node of interest can easily be
pruned by the garbage collector.

A paper describing the lazy parsing technique, and using XML-parsing as
its motivating example, is available at
    http://www.cs.york.ac.uk/~malcolm/partialparse.html

> >> haxml offers the choice of non-incremental parsers and incremental
> >> parsers.

Indeed.  This lazy incremental parser for XML is available in the
development version of HaXml:
    http://www.cs.york.ac.uk/fp/HaXml-devel

The source code for partial parsing is available in a separate package:
    http://www.cs.york.ac.uk/fp/polyparse

These lazy parser combinators are roughly between 2x - 5x faster than
Parsec on large inputs (although the strict variation is about 2x slower
than Parsec).

Regards,
    Malcolm


More information about the Haskell-Cafe mailing list