haskell xml parsing for larger files?
Christian Maeder
Christian.Maeder at dfki.de
Thu Feb 20 14:56:59 UTC 2014
I've just tried:
import Text.HTML.TagSoup
import Text.HTML.TagSoup.Tree
main :: IO ()
main = getContents >>= putStr . renderTags . flattenTree . tagTree .
parseTags
which also ends with the getMBlock error.
Only "renderTags . parseTags" works fine (like the hexpat SAX parser).
Why should tagsoup be better suited for building trees from large files?
C.
Am 20.02.2014 15:30, schrieb Chris Smith:
> Have you looked at tagsoup?
>
> On Feb 20, 2014 3:30 AM, "Christian Maeder" <Christian.Maeder at dfki.de
> <mailto:Christian.Maeder at dfki.de>> wrote:
>
> Hi,
>
> I've got some difficulties parsing "large" xml files (> 100MB).
> A plain SAX parser, as provided by hexpat, is fine. However,
> constructing a tree consumes too much memory on a 32bit machine.
>
> see http://trac.informatik.uni-__bremen.de:8080/hets/ticket/__1248
> <http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248>
>
> I suspect that sharing strings when constructing trees might greatly
> reduce memory requirements. What are suitable libraries for string
> pools?
>
> Before trying to implement something myself, I'ld like to ask who
> else has tried to process large xml files (and met similar memory
> problems)?
>
> I have not yet investigated xml-conduit and hxt for our purpose.
> (These look scary.)
>
> In fact, I've basically used the content trees from "The (simple)
> xml package" and switching to another tree type is no fun, in
> particular if this gains not much.
>
> Thanks Christian
> _________________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.__org
> <mailto:Glasgow-haskell-users at haskell.org>
> http://www.haskell.org/__mailman/listinfo/glasgow-__haskell-users
> <http://www.haskell.org/mailman/listinfo/glasgow-haskell-users>
>
More information about the Glasgow-haskell-users
mailing list