haskell xml parsing for larger files?

Mathieu Boespflug m at tweag.io
Thu Feb 20 15:06:10 UTC 2014


Hi Christian,

as regards your question about sharing strings, there are a number of
libraries on Hackage to achieve this, e.g. in the context of compiler
symbols. To cite only a few: intern, stringtable-atom, simple-atom.
I'm sure there are others.

Best,
--
Mathieu Boespflug
Founder at http://tweag.io.


On Thu, Feb 20, 2014 at 12:30 PM, Christian Maeder
<Christian.Maeder at dfki.de> wrote:
> Hi,
>
> I've got some difficulties parsing "large" xml files (> 100MB).
> A plain SAX parser, as provided by hexpat, is fine. However, constructing a
> tree consumes too much memory on a 32bit machine.
>
> see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248
>
> I suspect that sharing strings when constructing trees might greatly reduce
> memory requirements. What are suitable libraries for string pools?
>
> Before trying to implement something myself, I'ld like to ask who else has
> tried to process large xml files (and met similar memory problems)?
>
> I have not yet investigated xml-conduit and hxt for our purpose. (These look
> scary.)
>
> In fact, I've basically used the content trees from "The (simple) xml
> package" and switching to another tree type is no fun, in particular if this
> gains not much.
>
> Thanks Christian
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


More information about the Glasgow-haskell-users mailing list