Neil Mitchell ndmitchell at gmail.com
Tue Sep 9 14:49:49 EDT 2008

Hi Gwern,

Sorry for not noticing this sooner, my haskell-cafe@ reading is
somewhat behind right now!

>  After an hour, I came up with a nice clean little script:
>  ----
>  import Text.HTML.TagSoup.Render
>  import Text.HTML.TagSoup
>  main :: IO ()
>  main = interact convertPre
>  convertPre :: String -> String
>  convertPre = renderTags . map convertToHaskell . canonicalizeTags . parseTags
>  convertToHaskell :: Tag -> Tag
>  convertToHaskell x
>                | isTagOpenName  "pre" x = TagOpen  "haskell" (extractAttribs x)
>                | isTagCloseName "pre" x = TagClose "haskell"
>                | otherwise              = x
>                              where
>                                extractAttribs :: Tag -> [Attribute]
>                                extractAttribs (TagOpen _ y) = y
>                                extractAttribs _             = error "The impossible happened."

convertToHaskell (TagOpen "pre" atts) = TagOpen "haskell" atts
convertToHaskell (TagClose "pre") = TagClose "haskell"
convertToHaskell x = x

Direct pattern matching is much easier and simpler.

>  Anyway, so my script seems to work. I ran the wiki output through it and this is the diff: <http://haskell.org/haskellwiki/?title=User%3AGwern%2Fkenn&diff=22827&oldid=22811>.
>  Ok, good, it replaces all the tags... But wait, what's all this other stuff? It is replacing all my apostrophes with &apos;! No doubt this has something to do with XML/HTML/SGML or whatever, but it's not ideal. Even if it doesn't break the formatting (as I think it does), it's still cluttering up the source.

The escaping of ' is caused by renderTags, so instead call:

renderTagsOptions (renderOptions{optEscape = (:[])})

For no escaping of any characters, or more likely do something like <,
> and & conversions. See the docs:

> Am I just barking up the wrong tree and should be writing a simple-minded search-and-replace sed script which replaces <pre> with <haskell>, </pre> with </haskell>...?

Not necessarily. If you literally just want to replace "<haskell>"
with "<pre>" then sed is probably the easy choice. However, its quite
likely you'll want to make more fixes, and tagsoup gives you the
flexibility to extend in that direction.



