haskell xml parsing for larger files?
Mateusz Kowalczyk
fuuzetsu at fuuzetsu.co.uk
Thu Feb 20 16:42:05 UTC 2014
On 20/02/14 11:30, Christian Maeder wrote:
> Hi,
>
> I've got some difficulties parsing "large" xml files (> 100MB).
> A plain SAX parser, as provided by hexpat, is fine. However,
> constructing a tree consumes too much memory on a 32bit machine.
>
> see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248
>
> I suspect that sharing strings when constructing trees might greatly
> reduce memory requirements. What are suitable libraries for string pools?
>
> Before trying to implement something myself, I'ld like to ask who else
> has tried to process large xml files (and met similar memory problems)?
>
> I have not yet investigated xml-conduit and hxt for our purpose. (These
> look scary.)
>
> In fact, I've basically used the content trees from "The (simple) xml
> package" and switching to another tree type is no fun, in particular if
> this gains not much.
>
> Thanks Christian
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
HXT will not work for you, you will run out of memory on files ~30MB. I
don't know about xml-conduit, I'd love to hear how it goes if you try it.
--
Mateusz K.
More information about the Glasgow-haskell-users
mailing list