[Haskell-beginners] calling inpure functions from pure code
Emmanuel Touzery
etouzery at gmail.com
Fri Oct 12 15:54:23 CEST 2012
Hello,
Thanks for the tip!
I'm in fact using dom-selector:
http://hackage.haskell.org/package/dom-selector
which is based on xml-conduit and html-conduit. The reason being that it
offers CSS selectors and is generally much higher-level than what I
would do with parsec.
So I'm not sure whether what you wrote applies.
Actually your function doing the parsing here is not pure as such, it's
a do block and ordered. What I have done so far is that dom-selector
gives me the DOM structure of the page (so that parsing part is done for
me), and then I give to my function that DOM structure and the
examination of that DOM structure is completely without a do block, it's
not ordered, it's pure. In that way my "parsing" (really examination of
the DOM tree) is completely split of any IO or other monad.
I think when you are within parsec as you mentioned, you are within the
parsec monad (bear in mind I don't really understand all of this for
now), and to do IO you need to go to the IO monad, and for that you use
liftIO. In that case that's another problem than the one I'm having.
Emmanuel
On 12.10.2012 15:39, David McBride wrote:
> There's a better option in my opinion. Use the monad transformer
> capability of the parser you are using (I'm assuming you are using
> parsec for parsing).
>
> If you check the hackage docs for parsec you'll see that the ParsecT
> is an instance of MonadIO. That means at any point during the parsing
> you can go liftIO $ <any IO action> and use the result in your
> parsing. Here's an example of what that would might look like.
>
> import Control.Monad.IO.Class
> import Control.Monad (when)
> import Text.Parsec
> import Text.Parsec.Char
>
> parseTvStuff :: (MonadIO m) => ParsecT String u m (Char,Maybe ())
> parseTvStuff = do
> string "tvshow:"
> c <- anyChar
> morestuff <- if c == 'x'
> then fmap Just $ liftIO $ putStrLn "run an http request, parse the
> result, and store the result in morestuff as a maybe"
> else return Nothing
> return (c,morestuff)
>
> So you will run an http request if you get back something that seems
> like it could be worth further parsing. Then you just parse that
> stuff with a separate parser and store it in your data structure and
> continue parsing the rest of the first page with the original parser
> if you wish.
>
> On Fri, Oct 12, 2012 at 9:28 AM, Emmanuel Touzery <etouzery at gmail.com
> <mailto:etouzery at gmail.com>> wrote:
>
> Hi,
>
>
> when parsing the string representing a page, you could
> save all the links you encounter.
>
> After the parsing you would load the linked pages and start
> again parsing.
>
> You would redo this until no more links are returned or a
> maximum deepness is reached.
>
>
> Thanks for the tip. That sounds much more reasonable than what I
> mentioned. It seems a bit "spaghetti" to me though in a way (but
> maybe I just have to get used to the Haskell way).
>
> To be more specific about what I want to do: I want to parse TV
> programs. On the first page I have the daily listing for a
> channel. start/end hour, title, category, and link or not.
> To fully parse one TV program I can follow the link if it's
> present and get the extra info which is there (summary, pictures..).
>
> So the first scheme that comes to mind is a method which takes the
> DOM tree of the daily page and returns the list of programs for
> that day.
>
> Instead, what I must then do, is to return the incomplete
> programs: the data object would have the link filled in, if it's
> available, but the summary, picture... would be empty.
> Then I have a "second pass" in the caller function, where for
> programs which have a link, I would fetch the extra page, and call
> a second function, which will fill in the extra data (thankfully
> if pictures are present I only store their URL so it would stop
> there, no need for a third pass for pictures).
>
> It annoys me that the first function returns "incomplete"
> objects... It somehow feels wrong.
>
> Now that I mentioned my problem with more details, maybe you can
> think of a better way of doing that?
>
> And otherwise I guess this is the policy when writing Haskell
> code: absolutely avoid spreading impure/IO tainted code, even if
> it maybe negatively affects the general structure of the program?
>
> Thanks again for the tip though! That's definitely what I'll do if
> nothing better is suggested. It is actually probably the best way
> to do that if you want to separate IO from "pure" code.
>
> Emmanuel
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org <mailto:Beginners at haskell.org>
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20121012/dcaeabed/attachment.htm>
More information about the Beginners
mailing list