[Haskell-cafe] Generating PDF from HTML with Pandoc

Geraldus heraldhoi at gmail.com
Tue Jun 9 08:42:35 UTC 2020


Thank you Patrick.  This is exact solution I've came up with right after
I've sent letter to Cafe.  For some reason I didn't received copy of my own
message and wasn't able to respond until someone respond first.  Many
thanks.  Indeed in most cases if you can ask a question you can find a
solution.

вт, 9 июн. 2020 г. в 13:29, Patrick Chilton <chpatrick at gmail.com>:

> Could you call wkhtmltopdf directly with System.Process? Pandoc doesn't
> seem to add much value here.
>
> On Tue, Jun 9, 2020, 09:15 Geraldus <heraldhoi at gmail.com> wrote:
>
>> Hi dear Cafe!
>>
>> I'm trying to achieve trivial task to generate PDF from HTML template
>> using Pandoc.
>>
>> So far I've tried `wkhtmltopdf` and `pdflatex` creators, both with no
>> luck.
>>
>> I want to put few words about `pdflatex` and `xelatex` creators
>> first, for someone who will struggle with same task in future, it's quite
>> hard to find code examples on the web.
>>
>> Initially I wasn't able to render document with `pdflatex` creator.  I
>> would like to mention that `pdflatex` required a lot of LaTeX stuff to be
>> installed, especially font packages.  Also I've spent several hours to make
>> rendering happen because I haven't specified template in `WriterOptions`.
>> `pdflatex` do not capable to handle Cyrillic Unicode characters, and
>> finally I figured out I have to use `xelatex` creator.  Also I've found and
>> used default template:
>>
>> > pandoc <- readHtml def (toStrict $ renderHtml html)
>> > tpl' <- getDefaultTemplate "latex"
>> > makePDF "xelatex" [] writeLaTeX  (def {writerTemplate = Just tpl'})
>> pandoc
>>
>> But in this case I got white space instead of Cyrillic chars in resulting
>> PDF and a bunch of warnings about missing chars in default font in
>> console.  I assume the font itself is specified in template.  I've looked
>> into default template and it's huge.  I guess I can prepare more simple
>> template for my own needs but it will take a lot of time to get familiar
>> with LaTeX document syntax.
>>
>> I've tried `wkhtmltopdf`, which seems to be lightweight and easy
>> solution.  It seemed to work well except encoding issues: resulting PDF
>> contains Cyrillic which rendered incorrectly.  I've tried to pass
>> `["encoding utf-8"]` as arguments in `makePDF` call, but this results in
>> runtime error:
>>
>> > --margin-bottom specified in incorrect location
>>
>> Googling around this issue led me to glue that when I pass encoding
>> argument to `wkhtmltopdf` it breaks expected arguments order in command
>> which Pandoc generates.  This is likely could be easily fixed, but Pandoc
>> have a lot of opened issues on Github and also it requires some digging
>> into `wkhtmltopdf` command line arguments syntax.  I've looked into Pandoc
>> sources and it seems possible to provide simple patch, but I need a
>> guidance.   According to `wkhtmltopdf` it distinguish global args, page
>> args, cover args, table of contents args.  `encoding` argument is page
>> level argument, but Pandoc put extra args specified in `makePDF` after
>> default page arguments (`pdfargs` in following code sample):
>>
>> >  let args   = mathArgs ++ concatMap toArgs
>> >                  [("page-size", getField "papersize" meta')
>> >                  ,("title", getField "title" meta')
>> >                  ,("margin-bottom", Just $ fromMaybe "1.2in"
>> >                             (getField "margin-bottom" meta'))
>> >                  ,("margin-top", Just $ fromMaybe "1.25in"
>> >                             (getField "margin-top" meta'))
>> >                  ,("margin-right", Just $ fromMaybe "1.25in"
>> >                             (getField "margin-right" meta'))
>> >                  ,("margin-left", Just $ fromMaybe "1.25in"
>> >                             (getField "margin-left" meta'))
>> >                  ,("footer-html", getField "footer-html" meta')
>> >                  ,("header-html", getField "header-html" meta')
>> >                  ] ++ pdfargs
>>
>> Likely this breaks everything. The quickest and dirtiest workaround I see
>> is to check each argument, and if it is a page level argument put it for
>> each page object.  Another solution may be to specify encoding for Pandoc
>> document some other way, but I can't guess how to do that yet.
>>
>> Maybe someone have already faced similar task and knows easier way to
>> render HTML to PDF with Haskell.  I will very grateful for any help, advice
>> or other glues how to achieve my goal.
>>
>> Arthur.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200609/e0520df9/attachment-0001.html>


More information about the Haskell-Cafe mailing list