[Haskell] Non-trivial transformations from the Haskell markup

oleg at pobox.com oleg at pobox.com
Wed Mar 8 20:37:43 EST 2006


The earlier message showed that Haskell as it is can represent
semi-structured data with reasonable syntax, extensible set of `tags',
and statically enforced content restrictions. This follow-up
demonstrates non-trivial transformations of so marked-up data --
rendering data in HTML and RSS/XML. The resulting document has the
structure different from that of the original markup: hierarchies may
be flattened, some pieces of data rearranged among elements. Rendering
of a particular markup element may be truly context sensitive, e.g.,
by pulling data from the parent element. Creating an RSS document
further requires `subordinate' HTML rendering. We also demonstrate
markup transformations by successive rewriting (aka, `higher-order
tags') and the easy definition of new tags.


Our running example is rendering change log data in HTML and RSS/XML. 
This project has been inspired by Shae Matijs Erisson, who suggested
I provide rss.xml feed for my site. You can see the ChangeLog in the
master format and its two renderings at

	http://pobox.com/~oleg/ftp/ChangeLog.hs
	http://pobox.com/~oleg/ftp/ChangeLog.html
	http://pobox.com/~oleg/ftp/rss.xml

The updated archive
	http://pobox.com/~oleg/ftp/Haskell/HSXML.tar.gz
has the complete code.

Here's a small example of the marked up semi-structured data/code in
Haskell:

> test_h = CLHead 
>          HeadAttrs {
>           ha_description = "list of updates to this whole site",
>           ha_DateRevision = (5,February,2006),
>           -- snipped
>          }
>          (updates
>          
>             [update (5,February, 2006)
>              [ui (FileURLA "Computation/lambda-calc.html" "switch")
>               [[a [[code "switch"]]]] "in lambda-calculus"]
>              [ui (FileURLA "Haskell/types.html" "dependently-typed-append") 
>               [[a "Dependently-typed" [[code "append"]]]]]
>              ]
>           )

Our document is made of a (heterogeneous!) sequence of update chunks;
each chunk is a sequence of update elements, which contain the URL and
additional markup. The corresponding HTML document looks like
this

<h2>February 5, 2006</h2>
<ul>
<li><a href="Computation/lambda-calc.html#switch"><code>switch</code></a> in
lambda-calculus</li>

Each 'ui' element turned into the HTML 'li' element.  Please note that
the value of the HREF attribute of the 'a' element comes from the URL
attached to the _parent_ element. That is, rendering of the 'a'
element of the original mark-up indeed depends on the context.

The RSS code looks like the following:

<item>
<description>&lt;code&gt;switch&lt;/code&gt; in lambda-calculus</description>
<link>http://top/Computation/lambda-calc.html#switch</link>
<pubDate>5 Feb 2006 12:00:00 GMT</pubDate></item>
<item>
<description>Dependently-typed &lt;code&gt;append&lt;/code&gt;</description>
<link>http://top/Haskell/types.html#dependently-typed-append</link>
<pubDate>5 Feb 2006 12:00:00 GMT</pubDate></item>

The 'update' element from the original markup is turned into nothing,
with the update date spliced into each of the 'items'. The body of
the 'description' element contains HTML-rendered (and then encoded)
text.

The HTML transformation is done by the following code:

> toHTML (CLHead attrs updates) =
>        render (document
>                 (head
>                  [title (ha_title attrs)]
>                  [meta_tag [description (ha_description attrs)]]
>                  [author_address]
>                  [meta_tag [pub_date (ha_DateRevision attrs)]]
>                  [head_link LR_start [href (ha_top attrs)] 
>                   [title "All you can find here"]]
>                 )
>                 (body
>                  [h1 "Log of changes on" [[aref (ha_top attrs) "this site"]]]
>                  [p nbsp]
>                  [updates]
>                  [change_log_prev (ha_history_first attrs)]
>                  [change_log_prev (ha_history_last attrs)]
>                 )
>                )

We convert the original markup into another, intermediate markup
(which, in turn, may go through a couple of more stages). It seems
that complex transformations are sometimes easier if represented as a
sequence of simple re-writings.

The RSS rendering is equally simple:

> toRSS (CLHead attrs updates) =
>     render (as_doc (HW (RSSChannel
>                (tdiv
>                   [title "okmij.org"]
>                   [GBE_description .= "okmij.org"]
>                   [GBE_language    .= "en-us"]
>                   [GBE_ttl         .= "21600"]  -- 15 days
>                   [GBE_generator   .= "HSXML->RSS"]
>                   [pub_date (ha_DateRevision attrs)]
>                   [rss_link (ha_top attrs)]
>                   [HW . UpdatesForRSS $ updates]
>                  )
>                 )))

This code demonstrates easy extensibility via 'ad-hoc' tags like
GBE_ttl. These tags still have to be declared:

> data GBE_ttl         = GBE_ttl deriving Show

but that is the only one thing the user has to do for the tag. 
One could have introduced the notation
		["ttl"         .= "21600"]
However, strings as element names are error prone: if the tag is
mentioned several times in the code, we have to make sure it is
spelled exactly the same way. Requiring a declaration at least
enforces the uniform spelling. Another advantage is that the tag
becomes apparent in the type of the element where it
appears. Therefore, we may do more extensive content model validation.

As mentioned already, writing an RSS document requires `subordinate'
HTML rendering, for the content of the `description' element. In our
framework, that is quite easy to accomplish. The HTML rendering code
is polymorphic over the output monad, MonadOut. To render HTML into a
string, we merely need an appropriate instance of MonadOut:

> newtype ShowMonad a = ShowMonad (Writer [String] a)
>     deriving (Monad, MonadWriter [String])
> instance MonadOut ShowMonad where
>     emit_lit x = tell [x]
> runShowMonad (ShowMonad m) = let (_,x) = runWriter m in x

and so we can write

> render_rss_item date url body = 
>     emit_elem "item" [Hint_nl] Nothing
>        (Just . render . as_block $
>         (tdiv
>          [GBE_description .= 
>           (concat $ runShowMonad (runHTMLRender (render_inline False body)))]
>          [rss_link url]
>          [pub_date date]))

without any need for unsafePerformIO.



More information about the Haskell mailing list