[Haskell-cafe] Tokenizing and Parsec

Tue Jan 12 03:42:23 EST 2010

2010/1/12 Günther Schmidt <gue.schmidt at web.de>:
> [Snip...] I need to write my own parsec-token-parsers to parse this token
> stream in a context-sensitive way.
>
> Uhm, how do I that then?
>

Hi Günther

Get the Parsec manual from Daan Leijen's home page then see the
section '2.11 Advanced: Seperate scanners'.

Though mentioned rarely, Parsec in its regular mode is a scannerless
parser. Unless you have complex formatting problems (e.g. indentation
sensitivity, vis Python or Haskell's syntax) scannerless parsers are
often much more convenient than parsers+lexers (see the grammar
formalism SDF for many examples). For Parsec, if you want a separate
scanner there's quite a lot of boilerplate you need to manufacture if
you want to use the technique in section 2.11. Usually I can get by
with the Token and Language modules or do a few tricks with the
'symbol' parser instead.

Parsec is monadic so (>>=) allows you to write context-sensitive
parsers, see section '3.1. Parsec Prim'  for a discussion and example.
Again, writing a context-sensitive parser can often be more trouble
than studying the format of the input and working out a context-free
grammar (if there is one).

Best wishes

Stephen