haskell xml parsing for larger files?

malcolm.wallace malcolm.wallace at me.com
Thu Feb 20 15:49:21 UTC 2014


Is your usage pattern over the constructed tree likely to be a lazy prefix traversal?  If so, then HaXml supports lazy construction of the parse tree.  Some plots appear at the end of this paper, showing how memory usage can be reduced to a constant, even for very large inputs (1 million tree nodes):

http://www.cs.york.ac.uk/plasma/publications/pdf/partialparse.pdf
Regards,
    Malcolm

On 20 Feb, 2014,at 11:30 AM, Christian Maeder <Christian.Maeder at dfki.de> wrote:

Hi,

I've got some difficulties parsing "large" xml files (> 100MB).
A plain SAX parser, as provided by hexpat, is fine. However, 
constructing a tree consumes too much memory on a 32bit machine.

see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248

I suspect that sharing strings when constructing trees might greatly 
reduce memory requirements. What are suitable libraries for string pools?

Before trying to implement something myself, I'ld like to ask who else 
has tried to process large xml files (and met similar memory problems)?

I have not yet investigated xml-conduit and hxt for our purpose. (These 
look scary.)

In fact, I've basically used the content trees from "The (simple) xml 
package" and switching to another tree type is no fun, in particular if 
this gains not much.

Thanks Christian
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users at haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20140220/afde4e92/attachment.html>


More information about the Glasgow-haskell-users mailing list