haskell xml parsing for larger files?
Chris Smith
cdsmith at gmail.com
Thu Feb 20 15:01:50 UTC 2014
Ah, I'd misunderstood your question, and thought you were looking for a
sax-like alternative.
On Feb 20, 2014 6:57 AM, "Christian Maeder" <Christian.Maeder at dfki.de>
wrote:
> I've just tried:
>
> import Text.HTML.TagSoup
> import Text.HTML.TagSoup.Tree
>
> main :: IO ()
> main = getContents >>= putStr . renderTags . flattenTree . tagTree .
> parseTags
>
> which also ends with the getMBlock error.
> Only "renderTags . parseTags" works fine (like the hexpat SAX parser).
>
> Why should tagsoup be better suited for building trees from large files?
>
> C.
>
> Am 20.02.2014 15:30, schrieb Chris Smith:
>
>> Have you looked at tagsoup?
>>
>> On Feb 20, 2014 3:30 AM, "Christian Maeder" <Christian.Maeder at dfki.de
>> <mailto:Christian.Maeder at dfki.de>> wrote:
>>
>> Hi,
>>
>> I've got some difficulties parsing "large" xml files (> 100MB).
>> A plain SAX parser, as provided by hexpat, is fine. However,
>> constructing a tree consumes too much memory on a 32bit machine.
>>
>> see http://trac.informatik.uni-__bremen.de:8080/hets/ticket/__1248
>> <http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248>
>>
>> I suspect that sharing strings when constructing trees might greatly
>> reduce memory requirements. What are suitable libraries for string
>> pools?
>>
>> Before trying to implement something myself, I'ld like to ask who
>> else has tried to process large xml files (and met similar memory
>> problems)?
>>
>> I have not yet investigated xml-conduit and hxt for our purpose.
>> (These look scary.)
>>
>> In fact, I've basically used the content trees from "The (simple)
>> xml package" and switching to another tree type is no fun, in
>> particular if this gains not much.
>>
>> Thanks Christian
>> _________________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.__org
>> <mailto:Glasgow-haskell-users at haskell.org>
>> http://www.haskell.org/__mailman/listinfo/glasgow-__haskell-users
>> <http://www.haskell.org/mailman/listinfo/glasgow-haskell-users>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20140220/8141cd36/attachment.html>
More information about the Glasgow-haskell-users
mailing list