haskell xml parsing for larger files?

Christian Maeder Christian.Maeder at dfki.de
Thu Feb 20 16:02:28 UTC 2014


I'm afraid our use case is not a lazy prefix traversal.
I'm more shocked that about 100 MB xml content do not fit (as tree) into 
3 GB memory.

Christian

Am 20.02.2014 16:49, schrieb malcolm.wallace:
> Is your usage pattern over the constructed tree likely to be a lazy
> prefix traversal?  If so, then HaXml supports lazy construction of the
> parse tree.  Some plots appear at the end of this paper, showing how
> memory usage can be reduced to a constant, even for very large inputs (1
> million tree nodes):
>
> http://www.cs.york.ac.uk/plasma/publications/pdf/partialparse.pdf
>
> Regards,
>      Malcolm
>
>
> On 20 Feb, 2014,at 11:30 AM, Christian Maeder <Christian.Maeder at dfki.de>
> wrote:
>
>> Hi,
>>
>> I've got some difficulties parsing "large" xml files (> 100MB).
>> A plain SAX parser, as provided by hexpat, is fine. However,
>> constructing a tree consumes too much memory on a 32bit machine.
>>
>> see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248
>>
>> I suspect that sharing strings when constructing trees might greatly
>> reduce memory requirements. What are suitable libraries for string pools?
>>
>> Before trying to implement something myself, I'ld like to ask who else
>> has tried to process large xml files (and met similar memory problems)?
>>
>> I have not yet investigated xml-conduit and hxt for our purpose. (These
>> look scary.)
>>
>> In fact, I've basically used the content trees from "The (simple) xml
>> package" and switching to another tree type is no fun, in particular if
>> this gains not much.
>>
>> Thanks Christian
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> <mailto:Glasgow-haskell-users at haskell.org>
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



More information about the Glasgow-haskell-users mailing list