[Haskell-cafe] how to get started: a text application

Thu Jun 24 08:17:02 EDT 2004

Graham Klyne wrote:

> I think the first choice is whether to go for a separately identifiable 
> lexing phase, rather than working directly from the raw text.  Either 
> might work, I think.

Fhe first option (tokenization) is more appealing to me.

> The HaXml XML parser has a separate lexer, but it 
> turns out that it's not always easy to get the tokenization right 
> without having contextual information (e.g. from the syntax analyzer).  
> (XML is rather messy in that way.)

Well, yes. In Markdown, like in most other "rich-text" formats symbols
are overloaded a lot. After all, it has to constrain itself to "plain text".

I'm going to try a "two-stage tokenization" (not sure how to name this 
correctly). Basically, first I'd split the raw text into "symbols" (like 
space, char, digit, left-bracket) and then turn these symbols into 
tokens (like paragraph, reference, start bold text, end bold text, etc.)

> In Haskell, it's often reasonably efficient to construct a program as a 
> composition of "filters", rather like a Unix command pipeline;  lazy 
> evaluation often means that data can "stream" through the filters and 
> never exist in its entirety in an intermediate form.  This immediately 
> allows the program structure to be resolved into a number of smaller, 
> independent pieces; e.g.
> 
>    tokenize    :: String -> [Token]
>    parse       :: [Token] -> DocModel
>    createXHTML :: DocModel -> Document  -- (cf. HaXml)

Yes, I have seen this pattern in tutorial materials and am inclined to 
use it.

> Then HaXml provides function that can generate textual XML.  Thus the 
> overall conversion function might look like:
> 
>    markdownToXHTML :: String -> String
>    markdownToXHTML = show . document . createXHTML . parse . tokenize

That would be a good start, thanks!

> (where "document" is from the HaXml module Text.XML.HaXml.Pretty).
> 
> For parsing of any complexity, I recommend Parsec:  it has the advantage 
> of being very well documented, and it helps to show how monads can be 
> used to handle state information.

OK.

> The outline sketched above has at least one weakness, it doesn't provide 
> any way to handle errors.  This could be overcome by using Either as an 
> error monad (see Control.Monad and Control.Monad.Error in the standard 
> hierarchical libraries), and then using >>= in place of function 
> composition (noting the reversal of component order):

Uhm. Looks like error handling is very different from the imperative 
languages. I think I'll try to get the basic version without it first. 
On a related note, how can I debug my program along the way? I suspect I 
can't even use a print inside a function.

[ error handling cut ]

> Just to check out my use of >>= and do-notation, I constructed a trivial 
> complete program using both.

Phew.
Not sure how much time it will take to comprehend this "triviality".
I'm not yet can grasp the monads and their applications.

[ example skipped ]

Thank you for your time and explanations. It were surely very helpful, 
esp. considering the fact I had exactly one reply to my post. ;-)