[Haskell-cafe] Generating PDF from HTML with Pandoc

Patrick Chilton chpatrick at gmail.com
Tue Jun 9 08:29:17 UTC 2020


Could you call wkhtmltopdf directly with System.Process? Pandoc doesn't
seem to add much value here.

On Tue, Jun 9, 2020, 09:15 Geraldus <heraldhoi at gmail.com> wrote:

> Hi dear Cafe!
>
> I'm trying to achieve trivial task to generate PDF from HTML template
> using Pandoc.
>
> So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no
> luck.
>
> I want to put few words about `pdflatex` and `xelatex` creators first, for
> someone who will struggle with same task in future, it's quite hard to find
> code examples on the web.
>
> Initially I wasn't able to render document with `pdflatex` creator.  I
> would like to mention that `pdflatex` required a lot of LaTeX stuff to be
> installed, especially font packages.  Also I've spent several hours to make
> rendering happen because I haven't specified template in `WriterOptions`.
> `pdflatex` do not capable to handle Cyrillic Unicode characters, and
> finally I figured out I have to use `xelatex` creator.  Also I've found and
> used default template:
>
> > pandoc <- readHtml def (toStrict $ renderHtml html)
> > tpl' <- getDefaultTemplate "latex"
> > makePDF "xelatex" [] writeLaTeX  (def {writerTemplate = Just tpl'})
> pandoc
>
> But in this case I got white space instead of Cyrillic chars in resulting
> PDF and a bunch of warnings about missing chars in default font in
> console.  I assume the font itself is specified in template.  I've looked
> into default template and it's huge.  I guess I can prepare more simple
> template for my own needs but it will take a lot of time to get familiar
> with LaTeX document syntax.
>
> I've tried `wkhtmltopdf`, which seems to be lightweight and easy
> solution.  It seemed to work well except encoding issues: resulting PDF
> contains Cyrillic which rendered incorrectly.  I've tried to pass
> `["encoding utf-8"]` as arguments in `makePDF` call, but this results in
> runtime error:
>
> > --margin-bottom specified in incorrect location
>
> Googling around this issue led me to glue that when I pass encoding
> argument to `wkhtmltopdf` it breaks expected arguments order in command
> which Pandoc generates.  This is likely could be easily fixed, but Pandoc
> have a lot of opened issues on Github and also it requires some digging
> into `wkhtmltopdf` command line arguments syntax.  I've looked into Pandoc
> sources and it seems possible to provide simple patch, but I need a
> guidance.   According to `wkhtmltopdf` it distinguish global args, page
> args, cover args, table of contents args.  `encoding` argument is page
> level argument, but Pandoc put extra args specified in `makePDF` after
> default page arguments (`pdfargs` in following code sample):
>
> >  let args   = mathArgs ++ concatMap toArgs
> >                  [("page-size", getField "papersize" meta')
> >                  ,("title", getField "title" meta')
> >                  ,("margin-bottom", Just $ fromMaybe "1.2in"
> >                             (getField "margin-bottom" meta'))
> >                  ,("margin-top", Just $ fromMaybe "1.25in"
> >                             (getField "margin-top" meta'))
> >                  ,("margin-right", Just $ fromMaybe "1.25in"
> >                             (getField "margin-right" meta'))
> >                  ,("margin-left", Just $ fromMaybe "1.25in"
> >                             (getField "margin-left" meta'))
> >                  ,("footer-html", getField "footer-html" meta')
> >                  ,("header-html", getField "header-html" meta')
> >                  ] ++ pdfargs
>
> Likely this breaks everything. The quickest and dirtiest workaround I see
> is to check each argument, and if it is a page level argument put it for
> each page object.  Another solution may be to specify encoding for Pandoc
> document some other way, but I can't guess how to do that yet.
>
> Maybe someone have already faced similar task and knows easier way to
> render HTML to PDF with Haskell.  I will very grateful for any help, advice
> or other glues how to achieve my goal.
>
> Arthur.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200609/76c9352e/attachment.html>


More information about the Haskell-Cafe mailing list