[Haskell-cafe] Extracting structured data in XML into records

Daniel McAllansmith dm.maillists at gmail.com
Sat Feb 24 04:44:29 EST 2007


On Saturday 24 February 2007 21:22, Johan Tibell wrote:
> So my question is. How can I write the function
> 'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool
> for the job and I should be trying to walk the DOM tree instead?
>
> > module HCard where
> >
> > import Text.XML.HXT.Arrow
> >
> > data HCard = HCard
> >     {
> >       familyName :: String,
> >       givenName :: String
> >       org :: Maybe String
> >       url :: Maybe String
> >     } deriving Show
> >
> > parseHCards xml = runX $ parseXml xml
> >
> > parseXml xml =
> >     readString [(a_parse_html, v_1)] xml >>>
> >     deep (hasClassName "vcard") >>>
> >     extractElementsIntoRecords
> >
> > extractElementsIntoRecords = undefined

Perhaps something like the following (which is likely to be wrong seen I'm 
adlibing):

extractElementsIntoRecords = findFName <+> findGName <+> findOrg <+> findURL
    where
        findX c = deep (hasName "span" >>> hasAttrValue "class" (== c)) >>> 
getChildren >>> getText
        findFName = findX "family-name" >>> arr Just
        findGName = findX "given-name" >>> arr Just
        findOrg   = (findX "org" >>> arr Just) `withDefault` Nothing
        findURL   = (deep (hasName "a" >>> hasAttrValue "class" (== "url)) >>> 
getAttrValue "href" >>> arr Just) `withDefault` Nothing

and use the following at an appropriate place:

composeHCard (Just fn:Just gn:morg:murl:xs) = (HCard fn gn morg murl):(compose 
xs)
composeHCard _ = []

There's several other possibilities for dealing with bad data and 
simplifications you could do of course.


Daniel


More information about the Haskell-Cafe mailing list