[Haskell-cafe] Extracting structured data in XML into records
Daniel McAllansmith
dm.maillists at gmail.com
Sat Feb 24 04:44:29 EST 2007
On Saturday 24 February 2007 21:22, Johan Tibell wrote:
> So my question is. How can I write the function
> 'extractElementsIntoRecords' below. Or, perhaps HXT is the wrong tool
> for the job and I should be trying to walk the DOM tree instead?
>
> > module HCard where
> >
> > import Text.XML.HXT.Arrow
> >
> > data HCard = HCard
> > {
> > familyName :: String,
> > givenName :: String
> > org :: Maybe String
> > url :: Maybe String
> > } deriving Show
> >
> > parseHCards xml = runX $ parseXml xml
> >
> > parseXml xml =
> > readString [(a_parse_html, v_1)] xml >>>
> > deep (hasClassName "vcard") >>>
> > extractElementsIntoRecords
> >
> > extractElementsIntoRecords = undefined
Perhaps something like the following (which is likely to be wrong seen I'm
adlibing):
extractElementsIntoRecords = findFName <+> findGName <+> findOrg <+> findURL
where
findX c = deep (hasName "span" >>> hasAttrValue "class" (== c)) >>>
getChildren >>> getText
findFName = findX "family-name" >>> arr Just
findGName = findX "given-name" >>> arr Just
findOrg = (findX "org" >>> arr Just) `withDefault` Nothing
findURL = (deep (hasName "a" >>> hasAttrValue "class" (== "url)) >>>
getAttrValue "href" >>> arr Just) `withDefault` Nothing
and use the following at an appropriate place:
composeHCard (Just fn:Just gn:morg:murl:xs) = (HCard fn gn morg murl):(compose
xs)
composeHCard _ = []
There's several other possibilities for dealing with bad data and
simplifications you could do of course.
Daniel
More information about the Haskell-Cafe
mailing list