[Haskell-cafe] parsing machine-generated natural text

Udo Stenzel u.stenzel at web.de
Sat May 20 13:15:12 EDT 2006


Evan Martin wrote:
> Unfortunately, the output is intended to be human-readable, and this
> makes parsing it a bit of a pain.  Here are some sample lines from its
> output:
> 
> France: Army Marseilles SUPPORT Army Paris -> Burgundy.
> Russia: Fleet St Petersburg (south coast) -> Gulf of Bothnia.
> England:     4 Supply centers,  3 Units:  Builds   1 unit.
> The next phase of 'dip' will be Movement for Fall of 1901.

What's the difficulty?  "SUPPORT" and "CONVOY" are simply keywords, as
are "Army" and "Fleet", only other words are identifiers for locations.
Parsec supports this out of the box; have a look at the Language and
Token modules .  Note that CONVOY orders can get complex, so a true
parser is probably the right tool.


> And that "Supply centers" line ends up being
> code filled with stuff lie "char ':'; skipMany space".

do power
   colon
   integer
   reserved "Supply centers,"
   integer
   reserved "Units:"
   ((reserved "Builds" >> return id) <|>
   	(reserved "Disbands" >> return negate))
	`ap` integer
   reserved "units." <|> reserved "unit."


Come on, it isn't nearly as bad as you make it sound.  Use the
combinators, they are far more powerful than ugly never-quite-correct
regexes.

Oh, and drop me a line when your Diplomacy bot is finished.


Udo.
-- 
Jeder echte Wettbewerb ist ruinös. Darum beruht jede funktionierende
Wirtschaft auf Schiebung.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20060520/de259a16/attachment.bin


More information about the Haskell-Cafe mailing list