[Haskell-cafe] Converting wiki pages into pdf

mukesh tiwari mukeshtiwari.iiitm at gmail.com
Thu Sep 8 14:49:46 CEST 2011


Is it possible to automate this process rather than manually clicking
and downloading  using Haskell ?

Thank You
Mukesh Tiwari

On Thu, Sep 8, 2011 at 6:11 PM, Max Rabkin <max.rabkin at gmail.com> wrote:

> This doesn't answer your Haskell question, but Wikpedia has
> PDF-generation facilities ("Books"). Take a look at
> http://en.wikipedia.org/wiki/Help:Book (for single articles, just use
> the "download PDF" option in the sidebar).
>
> --Max
>
> On Thu, Sep 8, 2011 at 14:34, mukesh tiwari
> <mukeshtiwari.iiitm at gmail.com> wrote:
> > Hello all
> > I am trying to write a Haskell program which download html pages from
> > wikipedia   including images and convert them into pdf . I wrote a
> > small script
> >
> > import Network.HTTP
> > import Data.Maybe
> > import Data.List
> >
> > main = do
> >        x <- getLine
> >        htmlpage <-  getResponseBody =<< simpleHTTP ( getRequest x ) --
> > open url
> >        --print.words $ htmlpage
> >        let ind_1 = fromJust . ( \n -> findIndex ( n `isPrefixOf`) .
> > tails $ htmlpage ) $ "<!-- content -->"
> >            ind_2 = fromJust . ( \n -> findIndex ( n `isPrefixOf`) .
> > tails $ htmlpage ) $ "<!-- /content -->"
> >            tmphtml = drop ind_1 $ take ind_2  htmlpage
> >        writeFile "down.html" tmphtml
> >
> > and its working fine except some symbols are not rendering as it
> > should be. Could some one please suggest me how to accomplish this
> > task.
> >
> > Thank you
> > Mukesh Tiwari
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110908/55d296ae/attachment.htm>


More information about the Haskell-Cafe mailing list