[Haskell-cafe] happy + alex parsing question
Mihai Maruseac
mihai.maruseac at gmail.com
Wed Feb 16 17:10:45 CET 2011
On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky <romand.ne at gmail.com> wrote:
> Hi,
>
> using alex+happy, how could I parse lines like these?
>
>> "mr <username> says <message>\n"
>
> where both <username> and <message> may contain arbitrary characters (except
> eol)?
>
> If I make lexer tokens
>
>> "mr " { T_Mr }
>> " says " { T_Says }
>> \r?\n { T_Eol }
>> . { T_Char $$ }
>
> and parser
>
>> 'mr ' { T_Mr }
>> ' says ' { T_Says }
>> eol { T_Eol }
>> char { T_Char }
>
> ...
>
>> line :: { (String, String) }
>> : 'mr ' string ' says ' string eol { ($2, $4) }
>
>> string :: { String }
>> : char { [ $1 ] }
>> | char string { $1 : $2 }
>
> then I get error when <username> or <message> contain "mr "
> substrings, because parser encounters T_Mr token.
>
> Workaround is mention all small tokens in my <string> definition:
>
>> string :: { String }
>> : { [] }
>> | 'mr ' string { "mr " ++ $2 }
>> | ' says ' string { " says " ++ $2 }
>> | char string { $1 : $2 }
>
> but that is weird and I'm sure there is a better way.
>
I don't have an implementation right now but you could try having some
states or user data in which to record whether you have already parsed
the 'mr ' part (etc..) Guess you could use monadUserData parser (just
like I've found after a night without sleep [1] - solved now).
--
Mihai
[1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html
More information about the Haskell-Cafe
mailing list