[Haskell-cafe] XML parser recommendation?
ketil.malde at bccs.uib.no
Tue Oct 23 15:42:12 EDT 2007
Ketil Malde <ketil+haskell at ii.uib.no> writes:
> HaXml on my list after TagSoup, which I'm about to get to work, I
> think (got distracted a bit ATM).
As it is, I managed to parse my document using TagSoup. One major
obstacle was the need to process a sizeable partition of the file.
Using 'partitions' from TagSoup (which is implemented using the
'groupBy (const (not . p))' trick) didn't work, as it requires space
proportional to the partition size.
My solution (and please forgive me, it is getting late at night here)
was to replace it with (slightly different semantics alert):
breaks :: (a -> Bool) -> [a] -> [[a]]
breaks p (x:xs) = let first = x : takeWhile (not.p) xs
rest = dropWhile (not.p) xs
in rest `par` first : if null rest then  else breaks p rest
I have no idea how reliable this is, and I suspect it isn't very, but
on the plus side it does seems to work, at long as I compile with
-smp. Parsing 300Mbytes of XML and outputting the information in 305K
records takes approximately 5 minutes, and works with less than 1G of
heap. This is fast and small enough for my purposes.
Thanks for listening, and good night!
If I haven't seen further, it is by standing in the footprints of giants
More information about the Haskell-Cafe