[Haskell-cafe] Re: Munging wiki articles with tagsoup

Gwern Branwen gwern0 at gmail.com
Sat Sep 20 18:11:57 EDT 2008


On 2008.09.09 19:49:49 +0100, Neil Mitchell <ndmitchell at gmail.com> scribbled 2.3K characters:
> Hi Gwern,
>
> Sorry for not noticing this sooner, my haskell-cafe@ reading is
> somewhat behind right now!

NP. I'm in no hurry; this TMR thing is an side project of mine, and I still haven't figured out how to get references/pandoc/citeproc-hs to work together, and I want to get them to work before I actually start uploading any converted articles.


> convertToHaskell (TagOpen "pre" atts) = TagOpen "haskell" atts
> convertToHaskell (TagClose "pre") = TagClose "haskell"
> convertToHaskell x = x
>
> Direct pattern matching is much easier and simpler.

That is very nice! Now the whole thing is like 5 lines of actual code. Once again, TagSoup wins.

> The escaping of ' is caused by renderTags, so instead call:
>
> renderTagsOptions (renderOptions{optEscape = (:[])})

Thanks.

> For no escaping of any characters, or more likely do something like <,
> > and & conversions. See the docs:
> http://hackage.haskell.org/packages/archive/tagsoup/0.6/doc/html/Text-HTML-TagSoup-Render.html

Well, I did look at that Haddock page, as well as the others. But honestly, just a bare line like 'renderTagsOptions :: RenderOptions -> [Tag] -> String' doesn't help me - it doesn't tell me that 'that's default behavior, but you can override it thusly'.

> > Am I just barking up the wrong tree and should be writing a simple-minded search-and-replace sed script which replaces <pre> with <haskell>, </pre> with </haskell>...?
>
> Not necessarily. If you literally just want to replace "<haskell>"
> with "<pre>" then sed is probably the easy choice. However, its quite
> likely you'll want to make more fixes, and tagsoup gives you the
> flexibility to extend in that direction.
>
> Thanks
>
> Neil

Hm hm. I see; the TagSoup way is more powerful in the long run.

--
gwern
blackjack NAVSVS Koancho Counter Merlin JICS 510 fuses JICC y
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20080920/468ef2e2/attachment.bin


More information about the Haskell-Cafe mailing list