[Haskell-cafe] happy + alex parsing question

Mihai Maruseac mihai.maruseac at gmail.com
Wed Feb 16 17:10:45 CET 2011


On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky <romand.ne at gmail.com> wrote:
> Hi,
>
> using alex+happy, how could I parse lines like these?
>
>> "mr <username> says <message>\n"
>
> where both <username> and <message> may contain arbitrary characters (except
> eol)?
>
> If I make lexer tokens
>
>> "mr "    { T_Mr }
>> " says " { T_Says }
>> \r?\n    { T_Eol }
>> .        { T_Char $$ }
>
> and parser
>
>> 'mr '    { T_Mr }
>> ' says ' { T_Says }
>> eol      { T_Eol }
>> char     { T_Char }
>
> ...
>
>> line :: { (String, String) }
>>      : 'mr ' string ' says ' string eol { ($2, $4) }
>
>> string :: { String }
>>        : char        { [ $1 ] }
>>        | char string { $1 : $2 }
>
> then I get error when <username> or <message> contain "mr "
> substrings, because parser encounters T_Mr token.
>
> Workaround is mention all small tokens in my <string> definition:
>
>> string :: { String }
>>        :                 { [] }
>>        | 'mr ' string    { "mr "    ++ $2 }
>>        | ' says ' string { " says " ++ $2 }
>>        | char string     { $1 : $2 }
>
> but that is weird and I'm sure there is a better way.
>

I don't have an implementation right now but you could try having some
states or user data in which to record whether you have already parsed
the 'mr ' part (etc..) Guess you could use monadUserData parser (just
like I've found after a night without sleep [1] - solved now).

-- 
Mihai

[1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html



More information about the Haskell-Cafe mailing list