[Haskell-cafe] Parsing indentation-based languages with Parsec

George Pollard porges at porg.es
Fri Apr 13 03:51:25 EDT 2007

Hi, first time list poster :)

I've searched around a bit but haven't been able to find any examples of
this. I want to be able to parse a language (such as Haskell, Python)
which has only EOL as the 'statement' separator and has indentation
levels to indicate block structure. Whilst doing this I want to use
Parsec's nice library.

The first thing I noticed was that Parsec's whiteSpace parser will
ignore EOL as just whiteSpace, so I need to redefine that. Is this the
correct way to do it? I've only been using Haskell for a week or so so
I'm not too sure on the record structures and updating them...

lexer :: P.TokenParser ()
lexer = (
			commentLine    = "#",
			nestedComments = True,
                        identStart     = letter,
                        identLetter    = letter,
                        opStart        = oneOf "+*/-=",
                        opLetter       = oneOf "+*/-=",
                        reservedNames  = [],
                        reservedOpNames = [],
                        caseSensitive = False
		{ --update lexer fields
			P.whiteSpace = do --just gobble spaces
				many (char ' ')
				return ()

(I got the basic code from the tutorial contained within the Parsec

For handling the indented blocks I thought I would use something to hold
current indentation state, as Parsec has support for threading state
through all the parsers.

Is this the right way to go about this? Has anyone done the 'groundwork'
with parsing such languages so I don't need to reinvent this?

Thanks in advance,
- porges.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20070413/a701200e/attachment.bin

More information about the Haskell-Cafe mailing list