[Haskell-cafe] Converting wiki pages into pdf

Fri Sep 9 00:47:13 CEST 2011

It looks to me that the link is generated by javascript, so unless you can script an actual browser into the loop, it may not be a viable approach.

On Sep 8, 2011, at 3:57 PM, mukesh tiwari wrote:

> I tried to use the PDF-generation facilities . I wrote a script which
> generates the rendering url . When i am pasting rendering url in
> browser its generating the download file but when i am trying to get
> the tags , its empty. Could some one please tell me what is wrong with
> code.
> Thank You
> Mukesh Tiwari
> 
> import Network.HTTP
> import Text.HTML.TagSoup
> import Data.Maybe
> 
> parseHelp :: Tag String -> Maybe String
> parseHelp ( TagOpen _ y ) = if ( filter ( \( a , b ) -> b == "Download
> a PDF version of this wiki page" ) y )  /= []
> 		             then Just $  "http://en.wikipedia.org" ++  ( snd $
> y !!  0 )
> 			      else Nothing
> 
> 
> parse :: [ Tag String ] -> Maybe String
> parse [] = Nothing
> parse ( x : xs )
>   | isTagOpen x = case parseHelp x of
> 			 Just s -> Just s
> 			 Nothing -> parse xs
>   | otherwise = parse xs
> 
> 
> main = do
> 	x <- getLine
> 	tags_1 <-  fmap parseTags $ getResponseBody =<< simpleHTTP
> ( getRequest x ) --open url
> 	let lst =  head . sections ( ~== "<div class=portal id=p-coll-
> print_export>" ) $ tags_1
> 	    url =  fromJust . parse $ lst  --rendering url
> 	putStrLn url
> 	tags_2 <-  fmap parseTags $ getResponseBody =<< simpleHTTP
> ( getRequest url )
> 	print tags_2
> 
> 
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe