[Haskell-cafe] Converting wiki pages into pdf

Conrad Parker conrad at metadecks.org
Fri Sep 9 05:40:20 CEST 2011


On Sep 9, 2011 7:33 AM, "mukesh tiwari" <mukeshtiwari.iiitm at gmail.com>
wrote:
>
> Thank your for reply Daniel. Considering my limited knowledge of web
programming and javascript , first i need to simulated the some sort of
browser in my program which will run the javascript and will generate the
pdf. After that i can download the pdf . Is this you mean ?  Is
Network.Browser any helpful for this purpose ? Is there  way to solve this
problem ?
> Sorry for  many questions but this  is my first web application program
and i am trying hard to finish it.
>

Have you tried finding out if simple URLs exist for this, that don't require
Javascript? Does Wikipedia have a policy on this?

Conrad.

>
> On Fri, Sep 9, 2011 at 4:17 AM, Daniel Patterson <lists.haskell at dbp.mm.st>
wrote:
>>
>> It looks to me that the link is generated by javascript, so unless you
can script an actual browser into the loop, it may not be a viable approach.
>>
>> On Sep 8, 2011, at 3:57 PM, mukesh tiwari wrote:
>>
>> > I tried to use the PDF-generation facilities . I wrote a script which
>> > generates the rendering url . When i am pasting rendering url in
>> > browser its generating the download file but when i am trying to get
>> > the tags , its empty. Could some one please tell me what is wrong with
>> > code.
>> > Thank You
>> > Mukesh Tiwari
>> >
>> > import Network.HTTP
>> > import Text.HTML.TagSoup
>> > import Data.Maybe
>> >
>> > parseHelp :: Tag String -> Maybe String
>> > parseHelp ( TagOpen _ y ) = if ( filter ( \( a , b ) -> b == "Download
>> > a PDF version of this wiki page" ) y )  /= []
>> >                            then Just $  "http://en.wikipedia.org" ++  (
snd $
>> > y !!  0 )
>> >                             else Nothing
>> >
>> >
>> > parse :: [ Tag String ] -> Maybe String
>> > parse [] = Nothing
>> > parse ( x : xs )
>> >   | isTagOpen x = case parseHelp x of
>> >                        Just s -> Just s
>> >                        Nothing -> parse xs
>> >   | otherwise = parse xs
>> >
>> >
>> > main = do
>> >       x <- getLine
>> >       tags_1 <-  fmap parseTags $ getResponseBody =<< simpleHTTP
>> > ( getRequest x ) --open url
>> >       let lst =  head . sections ( ~== "<div class=portal id=p-coll-
>> > print_export>" ) $ tags_1
>> >           url =  fromJust . parse $ lst  --rendering url
>> >       putStrLn url
>> >       tags_2 <-  fmap parseTags $ getResponseBody =<< simpleHTTP
>> > ( getRequest url )
>> >       print tags_2
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe at haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110909/4ad7b9f0/attachment.htm>


More information about the Haskell-Cafe mailing list