[Haskell-cafe] XML parser recommendation?

Ketil Malde ketil.malde at bccs.uib.no
Tue Oct 23 15:42:12 EDT 2007


Ketil Malde <ketil+haskell at ii.uib.no> writes:

> HaXml on my list after TagSoup, which I'm about to get to work, I
> think (got distracted a bit ATM).

As it is, I managed to parse my document using TagSoup.  One major
obstacle was the need to process a sizeable partition of the file.
Using 'partitions' from TagSoup (which is implemented using the
'groupBy (const (not . p))' trick) didn't work, as it requires space
proportional to the partition size.

My solution (and please forgive me, it is getting late at night here)
was to replace it with (slightly different semantics alert):

  breaks :: (a -> Bool) -> [a] -> [[a]]
  breaks p (x:xs) = let first = x : takeWhile (not.p) xs
                      rest  = dropWhile (not.p) xs
                  in  rest `par` first : if null rest then [] else breaks p rest

I have no idea how reliable this is, and I suspect it isn't very, but
on the plus side it does seems to work, at long as I compile with
-smp.  Parsing 300Mbytes of XML and outputting the information in 305K
records takes approximately 5 minutes, and works with less than 1G of
heap.  This is fast and small enough for my purposes.

Thanks for listening, and good night!

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants


More information about the Haskell-Cafe mailing list