updates to HaXml (1.13.1 and 1.16)
Malcolm Wallace
Malcolm.Wallace at cs.york.ac.uk
Mon Jul 10 09:15:00 EDT 2006
For those who use the current stable version of HaXml, I'd like to
announce a new patch-level release, 1.13.1, which contains the following
bugfixes:
* permit percent character in attribute values
* parse unquoted attribute values starting '+' or '#' in HTML
* keep the original DTD in the output of 'processXmlWith'
See http://www.haskell.org/HaXml/ for downloads.
For those living on the development edge, I'd like to report that the
current darcs version
darcs get http://www.cs.york.ac.uk/fp/darcs/HaXml
contains a new set of parser combinators (with the same API as before)
that is lazier, whilst still allowing backtracking. By lazy, I mean it
can start to return partial values as soon as it has consumed e.g. the
start tag of an element, without waiting to check that the close tag
matches. This has two good effects:
* your program will run faster
* it will consume less memory
and two bad effects:
* if there are errors in the document, they will throw an exception
in the middle of your processing
* the error message in the exception may be rather less accurate about
the cause and location than previously.
The older XML parser has also been retained, since the lazy version is
still experimental. To use the new one,
import Text.XML.HaXml.ParseLazy
There are also lazy versions of the usual demo programs
CanonicaliseLazy
XtractLazy
As an example of the improved speed, a query to extract all the <key>
tags from a 3.7Mb XML document:
Xtract "//key" file.xml
did not give any results after more than ten minutes on my machine, but
XtractLazy "//key" file.xml
started producing results immediately, and completed the task in 25
seconds (returning 52584 tags).
Separate website and downloads at
http://www.cs.york.ac.uk/fp/HaXml-devel
Regards,
Malcolm
More information about the Libraries
mailing list