[Haskell-beginners] Laziness and XML parsing

Michael Snoyman michael at snoyman.com
Wed Nov 9 08:15:30 CET 2011


The following should work. The basic idea is:

* Try to parse a <person>
* If it's not a <person>, recursively try again.

{-# LANGUAGE OverloadedStrings #-}
import Text.XML.Stream.Parse
import Data.Text (Text, unpack)
import Control.Monad (join)
import Data.Enumerator (Iteratee)
import Data.XML.Types (Event)

data Person = Person { age :: Int, name :: Text }
    deriving Show

parsePerson :: Monad m => Iteratee Event m (Maybe [Person])
parsePerson = tagName "person" (requireAttr "age") $ \age -> do
    name <- content
    return [Person (read $ unpack age) name]

parseWrapper :: Monad m => Iteratee Event m (Maybe [Person])
parseWrapper =
    parsePerson `orE`
    (fmap . fmap) concat (tagPredicate (const True) ignoreAttrs (const
$ many parseWrapper))

main = parseFile_ def "people.xml" $ force "people required" parseWrapper

Michael

On Tue, Nov 8, 2011 at 7:28 AM, Sean Hess <seanhess at gmail.com> wrote:
> Thanks all for your help so far. Using xml-enumerator, is there any way to
> parse the following xml, and ignore the people tag? In other words, can I
> parse it by only providing an Iteratee for Person, no matter where a
> <person> tag appears nested within a document?
>
>  <?xml version="1.0" encoding="utf-8"?>
>  <people>
>      <person age="25">Michael</person>
>      <person age="2">Eliezer</person>
>  </people>
>
>
>
>
> On Nov 8, 2011, at 7:33 AM, Michael Snoyman wrote:
>
> Thanks for the heads-up, it's just a few minor tweaks in the 0.3->0.4
> transition. I'll update later, and add a link to the blog post, and
> release a new version to Hackage.
>
> On Tue, Nov 8, 2011 at 6:03 AM, Sean Hess <seanhess at gmail.com> wrote:
>
> Thanks so much to both of you that sent that link.
>
> Sorry, my email totally wasn't clear. I meant that the example in the
>
> package description doesn't
>
> run: http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/Text-XML-Stream-Parse.html#t:ParseSettings
>
> I'll read through that article.
>
> On Nov 8, 2011, at 7:01 AM, Michael Snoyman wrote:
>
> Here's a blog post on the package:
>
> http://www.yesodweb.com/blog/2011/10/xml-enumerator . It doesn't cover
>
> the streaming interface, but it might give you a good overview of the
>
> package in general. I'm not sure what you mean by "it doesn't run,"
>
> but you'll need at least a basic understanding of enumerators to get
>
> off the ground.
>
> On Tue, Nov 8, 2011 at 5:38 AM, Sean Hess <seanhess at gmail.com> wrote:
>
> I cannot seem to find a working example of xml-enumerator. It doesn't run:
>
> the names seem to have changed for some things, and I'm too much of a
>
> beginner to figure it out easily.
>
> http://hackage.haskell.org/packages/archive/xml-enumerator/0.4.3.1/doc/html/Text-XML-Stream-Parse.html#t:ParseSettings
>
> On Nov 7, 2011, at 7:59 PM, Felipe Almeida Lessa wrote:
>
> On Tue, Nov 8, 2011 at 12:45 AM, Sean Hess <seanhess at gmail.com> wrote:
>
> I want to parse a large xml file (2GB), without putting the whole thing into
>
> memory. It's pretty simple with a sax parser in most languages, you just
>
> stream bytes to the sax parser, and wait for sax events.
>
> I recommend you taking a look at xml-enumerator [1] and
>
> libxml-enumerator [2].  They are the SAX parsers you know from the
>
> imperative world but much easier to write =).  In particular, you
>
> don't need to rely on lazyness.
>
> Cheers,
>
> [1] http://hackage.haskell.org/package/xml-enumerator
>
> [2] http://hackage.haskell.org/package/libxml-enumerator
>
> --
>
> Felipe.
>
>
> _______________________________________________
>
> Beginners mailing list
>
> Beginners at haskell.org
>
> http://www.haskell.org/mailman/listinfo/beginners
>
>
>
>
>
>



More information about the Beginners mailing list