[Haskell-cafe] Parsing HTML tables with HXT

Dmitry Simonchik dima at simonchik.net
Mon Apr 11 13:27:04 CEST 2011


Thanks!

I was also able to extract the needed value with the code below:

testArrow :: IOSArrow XmlTree XmlTree
testArrow =
    deep (isElem >>> hasName "table" )
    >>>
    deep (isElem >>> hasName "tr")
    >>>
    (deep isText >>> hasText (=="a"))
    `guards`
    (getChildren >>> getChildren >>> isText)


2011/4/11 Albert Y. C. Lai <trebla at vex.net>

> On 11-04-08 06:29 AM, Dmitry Simonchik wrote:
>
>> Can someone please help me with getting the value of the table cell with
>> HXT in the following html:
>>
>> <table class="tblc">
>> <tr>
>> <td class="tdc">x</td>
>> <td>y</td>
>> </tr>
>> <tr>
>> <td class="tdc">a</td>
>> <td>b</td>
>> </tr>
>> </table>
>>
>> I need the value of the second cell in a row that has first cell with
>> some predefined value (in the example above it can be x or a) I need the
>> arrow of the type (IOSArrow XmlTree String) How to write it?
>>
>
> import Text.XML.HXT.Core
>
> main = do
>  rs <- runX (readDocument [] "example.xml" >>> example "x")
>  mapM_ putStrLn rs
>
> -- example "blah" reports those 2nd columns such that
> -- their 1st columns equal "blah"
> example :: String -> IOSArrow XmlTree String
> example s = deep (is "table" />
>                  is "tr" >>>
>                  listA (getChildren >>> is "td" /> getText) >>>
>                  arrL get2nd
>                 )
>  where get2nd (one:two:_) | one==s = [two]
>        get2nd _ = []
>
> is x = isElem >>> hasName x
>
> The important part is using listA at the right point to extract the list of
> cells (belonging to the same row) so that with a list in your hand you can
> test the 1st item and find the 2nd item.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110411/2c1e248c/attachment.htm>


More information about the Haskell-Cafe mailing list