haskell xml parsing for larger files?

Mateusz Kowalczyk fuuzetsu at fuuzetsu.co.uk
Thu Feb 20 16:42:05 UTC 2014


On 20/02/14 11:30, Christian Maeder wrote:
> Hi,
> 
> I've got some difficulties parsing "large" xml files (> 100MB).
> A plain SAX parser, as provided by hexpat, is fine. However, 
> constructing a tree consumes too much memory on a 32bit machine.
> 
> see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248
> 
> I suspect that sharing strings when constructing trees might greatly 
> reduce memory requirements. What are suitable libraries for string pools?
> 
> Before trying to implement something myself, I'ld like to ask who else 
> has tried to process large xml files (and met similar memory problems)?
> 
> I have not yet investigated xml-conduit and hxt for our purpose. (These 
> look scary.)
> 
> In fact, I've basically used the content trees from "The (simple) xml 
> package" and switching to another tree type is no fun, in particular if 
> this gains not much.
> 
> Thanks Christian
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> 

HXT will not work for you, you will run out of memory on files ~30MB. I
don't know about xml-conduit, I'd love to hear how it goes if you try it.

-- 
Mateusz K.


More information about the Glasgow-haskell-users mailing list