[Haskell-cafe] HXT: how to get sibling element
Никитин Лев
leon.v.nikitin at pravmail.ru
Thu Mar 15 11:28:08 CET 2012
Hello, haskellers.
Suppose we have this xml doc (maybe, little stupid):
<div>
<span>Some story</span>
<span>Description</span>: This story about...
<span>Author</span>: Tom Smith
</div>
In the end I whant to get list: [("Title", "Some story"), ("Description","This story about..."), ("Author", "Tom Smith")],
or, maybe this: Book "Some story" [("description","This story about..."), ("Author", "Tom Smith")] (Book = Book String [(String, String)].
First span is a special case then others and I undestand how to process it:
===============
import Text.XML.HXT.Core
import Text.XML.HXT.Curl
import Text.XML.HXT.HTTP
pageURL = "http://localhost/test.xml"
main = do
r <- runX (configSysVars [withCanonicalize no, withValidate no, withTrace 0, withParseHTML no] >>>
readDocument [withErrors no, withWarnings no, withHTTP []] pageURL >>>
getChildren >>> isElem >>> hasName "div" >>> listA (getChildren >>> hasName "span") >>> getTitle <+> getSections)
putStrLn "Статьи:"
putStr "<"
mapM_ putStr $ map (\i -> (fst i) ++ ": " ++ (snd i) ++ "| ") r
putStrLn ">"
getTitle = arr head >>> getChildren >>> getText >>> arr trim >>> arr ("Title",)
getSections = arr tail >>> unlistA >>> ((getChildren >>> getText >>> arr trim) &&& (getChildren >>> getText >>> arr trim))
ltrim [] = []
ltrim (' ':x) = ltrim x
ltrim ('\n':x) = ltrim x
ltrim ('\r':x) = ltrim x
ltrim ('\t':x) = ltrim x
ltrim x = x
rtrim = reverse . ltrim . reverse
trim = ltrim . rtrim
===================
And I' get list: [("Title", "Some story"), ("Description","Description"), ("Author", "Author")]
(Maybe, there is a better way to get this list?)
But I cannot find a way to get text that followes some span.
I suppose that I have to use function from Data.Tree.NavigatableTree.XPathAxis, but I don't "puzzle out" how to do it.
Please, help me.
More information about the Haskell-Cafe
mailing list