[Haskell-cafe] how to get started: a text application
max at ucmg.com.ua
Thu Jun 24 08:17:02 EDT 2004
Graham Klyne wrote:
> I think the first choice is whether to go for a separately identifiable
> lexing phase, rather than working directly from the raw text. Either
> might work, I think.
Fhe first option (tokenization) is more appealing to me.
> The HaXml XML parser has a separate lexer, but it
> turns out that it's not always easy to get the tokenization right
> without having contextual information (e.g. from the syntax analyzer).
> (XML is rather messy in that way.)
Well, yes. In Markdown, like in most other "rich-text" formats symbols
are overloaded a lot. After all, it has to constrain itself to "plain text".
I'm going to try a "two-stage tokenization" (not sure how to name this
correctly). Basically, first I'd split the raw text into "symbols" (like
space, char, digit, left-bracket) and then turn these symbols into
tokens (like paragraph, reference, start bold text, end bold text, etc.)
> In Haskell, it's often reasonably efficient to construct a program as a
> composition of "filters", rather like a Unix command pipeline; lazy
> evaluation often means that data can "stream" through the filters and
> never exist in its entirety in an intermediate form. This immediately
> allows the program structure to be resolved into a number of smaller,
> independent pieces; e.g.
> tokenize :: String -> [Token]
> parse :: [Token] -> DocModel
> createXHTML :: DocModel -> Document -- (cf. HaXml)
Yes, I have seen this pattern in tutorial materials and am inclined to
> Then HaXml provides function that can generate textual XML. Thus the
> overall conversion function might look like:
> markdownToXHTML :: String -> String
> markdownToXHTML = show . document . createXHTML . parse . tokenize
That would be a good start, thanks!
> (where "document" is from the HaXml module Text.XML.HaXml.Pretty).
> For parsing of any complexity, I recommend Parsec: it has the advantage
> of being very well documented, and it helps to show how monads can be
> used to handle state information.
> The outline sketched above has at least one weakness, it doesn't provide
> any way to handle errors. This could be overcome by using Either as an
> error monad (see Control.Monad and Control.Monad.Error in the standard
> hierarchical libraries), and then using >>= in place of function
> composition (noting the reversal of component order):
Uhm. Looks like error handling is very different from the imperative
languages. I think I'll try to get the basic version without it first.
On a related note, how can I debug my program along the way? I suspect I
can't even use a print inside a function.
[ error handling cut ]
> Just to check out my use of >>= and do-notation, I constructed a trivial
> complete program using both.
Not sure how much time it will take to comprehend this "triviality".
I'm not yet can grasp the monads and their applications.
[ example skipped ]
Thank you for your time and explanations. It were surely very helpful,
esp. considering the fact I had exactly one reply to my post. ;-)
More information about the Haskell-Cafe