[Haskell-cafe] GSoC Proposal: Pandoc improvements including EPUB 3.0 reader

Matthew Pickering matthewtpickering at gmail.com
Tue Mar 18 01:16:14 UTC 2014


I'm looking to submit a proposal to (mainly) add an EPUB reader to pandoc.
I've spent the last few weeks getting to know the code base and wrote a
proposal in the last few days. I would really appreciate any comments on
the proposal and any further suggestions or things to look out for! Looking
forward to doing some hacking on pandoc independent of this!

Full proposal: https://www.dropbox.com/s/tdiimqa8mj22vq3/gsoc.pdf

Has anyone looked into MathML -> Latex conversion? It would be nice to have
this in the EPUB parser to deal with embedded equations.

Below is a sketch outline of the suggested implementation.

*Embedded Base64 images*

- Replace Target in the Image constructor with a new constructor which can
either be a Target as before or a base64 encoding.
- Update HTML5 reader to read embedded images successfully.

*EPUB 3.0 reader*

- Utilise the HTML parser with rawTags enabled
- Extract additional information about structure from walking over the AST
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20140318/74807110/attachment.html>


More information about the Haskell-Cafe mailing list