[Haskell-cafe] HXT: how to get sibling element

Wilfried van Asten sniperrifle2004 at gmail.com
Thu Mar 15 13:38:33 CET 2012


I'm am not really familiar with XML parsing in Haskell, but I am
wondering why, if you have an xml file, you not simply name the
element after the type of contents:

<book>
      <title>Some Story</title>
      <description>This story about..</description>
      <author>Tom Smith</author>
</book>

Alternatively you could use the class or some type attribute to
indicate the type.

Regards,

Wilfried van Asten


2012/3/15 Никитин Лев <leon.v.nikitin at pravmail.ru>:
> Hello, haskellers.
>
> Suppose we have this xml doc (maybe, little stupid):
>
> <div>
>  <span>Some story</span>
>  <span>Description</span>: This story about...
>  <span>Author</span>: Tom Smith
> </div>
>
> In the end I whant to get list: [("Title", "Some story"), ("Description","This story about..."), ("Author", "Tom Smith")],
> or, maybe this: Book  "Some story" [("description","This story about..."), ("Author", "Tom Smith")] (Book = Book String [(String, String)].
>
> First span is a special case then others and I undestand how to process it:
>
> ===============
>
> import Text.XML.HXT.Core
> import Text.XML.HXT.Curl
> import Text.XML.HXT.HTTP
>
> pageURL = "http://localhost/test.xml"
>
> main = do
>    r <- runX (configSysVars [withCanonicalize no, withValidate no, withTrace 0, withParseHTML no] >>>
>              readDocument [withErrors no, withWarnings no, withHTTP []] pageURL >>>
>              getChildren >>> isElem >>> hasName "div" >>> listA (getChildren >>> hasName "span") >>> getTitle <+> getSections)
>   putStrLn "Статьи:"
>    putStr "<"
>    mapM_ putStr $ map (\i -> (fst i) ++ ": " ++ (snd i) ++ "| ") r
>    putStrLn ">"
>
> getTitle = arr head >>> getChildren >>> getText >>> arr trim >>> arr ("Title",)
>
> getSections = arr tail >>> unlistA >>> ((getChildren >>> getText >>> arr trim) &&& (getChildren >>> getText >>> arr trim))
>
> ltrim [] = []
> ltrim (' ':x) = ltrim x
> ltrim ('\n':x) = ltrim x
> ltrim ('\r':x) = ltrim x
> ltrim ('\t':x) = ltrim x
> ltrim x = x
>
> rtrim = reverse . ltrim . reverse
>
> trim = ltrim . rtrim
>
> ===================
>
> And I' get list:  [("Title", "Some story"), ("Description","Description"), ("Author", "Author")]
>
> (Maybe, there is a better way to get this list?)
>
> But I cannot find a way to get text that followes some span.
>
> I suppose that I have to use function from  Data.Tree.NavigatableTree.XPathAxis, but I don't "puzzle out" how to do it.
>
> Please, help me.
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list