updates to HaXml (1.13.1 and 1.16)

Malcolm Wallace Malcolm.Wallace at cs.york.ac.uk
Mon Jul 10 09:15:00 EDT 2006


For those who use the current stable version of HaXml, I'd like to
announce a new patch-level release, 1.13.1, which contains the following
bugfixes:

  * permit percent character in attribute values
  * parse unquoted attribute values starting '+' or '#' in HTML
  * keep the original DTD in the output of 'processXmlWith'

See http://www.haskell.org/HaXml/ for downloads.

For those living on the development edge, I'd like to report that the
current darcs version
    darcs get http://www.cs.york.ac.uk/fp/darcs/HaXml
contains a new set of parser combinators (with the same API as before)
that is lazier, whilst still allowing backtracking.  By lazy, I mean it
can start to return partial values as soon as it has consumed e.g. the
start tag of an element, without waiting to check that the close tag
matches.  This has two good effects:

  * your program will run faster
  * it will consume less memory

and two bad effects:

  * if there are errors in the document, they will throw an exception
    in the middle of your processing
  * the error message in the exception may be rather less accurate about
    the cause and location than previously.

The older XML parser has also been retained, since the lazy version is
still experimental.  To use the new one,
    import Text.XML.HaXml.ParseLazy

There are also lazy versions of the usual demo programs
    CanonicaliseLazy
    XtractLazy
As an example of the improved speed, a query to extract all the <key>
tags from a 3.7Mb XML document:
    Xtract "//key" file.xml
did not give any results after more than ten minutes on my machine, but
    XtractLazy "//key" file.xml
started producing results immediately, and completed the task in 25
seconds (returning 52584 tags).

Separate website and downloads at
    http://www.cs.york.ac.uk/fp/HaXml-devel

Regards,
    Malcolm


More information about the Libraries mailing list