No subject

Thu Feb 24 17:58:36 CET 2011

recursive parser. The idee is to do exactly what you did but involving
a call to some other function which should do some kind of sub-work.

Actually, you can see the pattern already in the code you provided; for ins=
tance

    tokenize' (('/' : binder) : words) tokens
    =3D tokenize' words ((BinderToken binder) : tokens)

would become something like

    parse' ('/' : words) tokens
    =3D parse' rest (t : tokens)
    where (t, rest) =3D parseBinder words -- parseBinder should parse
the input that follows '/'

so for an array:

    parse' ('[' : words) tokens
    =3D parse' rest (t : tokens)
    where (t, rest) =3D parseArray words -- parseArray should parse the
input that follows '[', until and including ']'

Remarks:
- You see that the parseX that parses some X returns t, the 'token' we
are interested in, and also the rest of the input, i.e. the part not
consumed by parseX.
- tokenize is named parse: parseArray for instance doesn't return a
list of tokens, but a single thing that represent a complete array,
including its elements.
- because of the first point, you should define 'parse' as returning
([GmlToken], String). The return type has the same shape as your
sub-parser: (<return-type>, <unconsumed-input>). So do not pass the
'tokens' argument in the recursive call, but let the caller make the
(:) append. This way, instead of doing an append, you can make
something more useful before returning the <return-type>, like
building a nice representation of an array.

After you have managed to digest my no-so-well explained prose, have a
read at monadic parsing. I believe Philip Wadler has a nice
introductory paper about that. This is exactly what you need. Then,
once you understood that, have a look at Parsec, a standard monadic
parsing library for Haskell. (You don't need to read about monadic
parsing to do your project, but it happens to be exactly what my
answer is all about).

HTH,
Thu

2011/2/28 Hauschild, Klaus (EXT) <klaus.hauschild.ext at siemens.com>:
> Hi Thu,
>
> You read my mind. Ok, for the details. Here are my data structur for the =
differen tokens (currently not complete):
>
> data GmlToken =3D       IntToken Int
>    | RealToken Double
>    | BoolToken Bool
>    | StringToken String
>    | FunctionToken TokenSequence
>    | ArrayToken TokenSequence
>    | BinderToken String
>    | IdentifierToken String
>    | AddiToken
>
> type TokenSequence  =3D [GmlToken]
> type TokenStack     =3D [GmlToken]
>
> Now I'm working on the tokenize method. There a public and a private vers=
ion (indicated by ')
>
> tokenize :: String =E2=86=92  TokenSequence
> -- foreward to intern method
> tokenize program =3D tokenize' (words program) []
>
> tokenize' :: [String] =E2=86=92 TokenSequence =E2=86=92  TokenSequence
> -- last recursion step
> tokenize' [] tokens
>    =3D tokens
> -- "addi" is a reserved keyword for the AddiToken
> tokenize' ("addi" : words) tokens
>    =3D tokenize' words (AddiToken : tokens)
> -- binder tokens start with '/'
> tokenize' (('/' : binder) : words) tokens
>    =3D tokenize' words ((BinderToken binder) : tokens)
> -- if nothing matches it is an identifier
> tokenize' (name : words) tokens
>    =3D tokenize' words ((IdentifierToken name) : tokens)
>
> With this code I'm able to tokenize the different reserved keywords of th=
e GML language, binder and identifier (these two handle variable access).
> Now the problem: How to write the code to tokenize numbers (integer and d=
ouble) and arrays and functions. I hope it is clear what is meant with numb=
ers. Arrays and functions are embedded token sequences enclosed by "[" and =
"]" or "{" and "}".
> Some examples:
>
> tokenize' ["2", "2", "addi"] [] results in [IntToken 2, IntToken 2, AddiT=
oken]
> tokenize' ["2.0", "2.0", "addi"] [] results in [RealToken 2.0, RealToken =
2.0, AddiToken]
>
> tokenize' ["[", "foo", "/bar", "]"] results in [ArrayToken [IdentifierTok=
en "foo", BinderToken "bar"]]
>
> tokenize' ["{", "foo", "/bar", "}"] results in [FunctionToken [Identifier=
Token "foo", BinderToken "bar"]]
>
>
> -----Urspr=C3=BCngliche Nachricht-----
> Von: Vo Minh Thu [mailto:noteed at gmail.com]
> Gesendet: Freitag, 25. Februar 2011 17:44
> An: Hauschild, Klaus (EXT)
> Cc: haskell-cafe at haskell.org
> Betreff: Re: [Haskell-cafe] http://www.cs.cornell.edu/icfp/task.htm
>
> 2011/2/25 Hauschild, Klaus (EXT) <klaus.hauschild.ext at siemens.com>:
>> Hi,
>>
>> Currently I'm trying to lern Haskell by doing. After doing some examples=
 I
>> plan to solve an ICFP task (see subject). In short: build an interpreter=
 for
>> a stack based language thata describes a 3D scene. Ray trace this scene =
in
>> an image. My current source state can be found here:
>> http://code.google.com/p/hgmltracer/source/browse/
>> My first goal is to develop the interpreter of the GML language. My sour=
ce
>> contains a data GmlToken with various constructors for the different
>> operators, sequences, int, real, string an so on. Some code for the
>> evaluation is also there und working.
>> But the parser converting a string (contained in the program file provid=
ed
>> as command line argument) is hard stuff and I'm stuck.
>>
>> Can any one help me with ideas or concepts for this parser?
>
> Hi,
>
> You have chosen to develop a very interesting program and I'm sure a
> lot of people could help you if you face any problem. But it would be
> easier for those people to help you if you could be a bit more
> specific about what works (or what you understand) and where you have
> some problem.
>
> In this case, maybe you could copy in the mail some code (possibly
> cutting it down to what really matter).
>
> For instance, I think central to your mail is this data structure:
>
> data GmlToken =3D
>      IntToken Int
>    | RealToken Double
>    | BoolToken Bool
>    | StringToken String
>    | FunctionToken [GmlSequence]
>    | ArrayTokenToken [GmlSequence]
>    | BinderToken String
>    | IdentifierToken String
>    | AddiToken
>
> and the function
>
> parse :: [String] -> TokenSequence -> TokenSequence
> parse =3D ...
>
> At first view I would say that parse should be (and I call it tokenize)
>
> tokenize :: String -> [GmlToken]
>
> Now, maybe you can provide some sample inputs *with* their expected
> output and tell us what is really a problem for you.
>
> Alternatively I can assume too much (or not enough) things from your
> mail and you can enlighten me :)
>
> Cheers,
> Thu
>