[Haskell-cafe] parsing machine-generated natural text
martine at danga.com
Fri May 19 21:35:15 EDT 2006
For a toy project I want to parse the output of a program. The
program runs on someone else's machine and mails me the results, so I
only have access to the output it generates,
Unfortunately, the output is intended to be human-readable, and this
makes parsing it a bit of a pain. Here are some sample lines from its
France: Army Marseilles SUPPORT Army Paris -> Burgundy.
Russia: Fleet St Petersburg (south coast) -> Gulf of Bothnia.
England: 4 Supply centers, 3 Units: Builds 1 unit.
The next phase of 'dip' will be Movement for Fall of 1901.
I've been using Parsec and it's felt rather complicated. For example,
a "location" is a series of words and possibly parenthesis, except if
the word is SUPPORT. And that "Supply centers" line ends up being
code filled with stuff lie "char ':'; skipMany space".
regular expressions and it's far shorter than my Haskell one, which
makes sense as munging this sort of text feels to me more like a
regexp job than a careful parsing job.
I'm considering writing a preprocessing stage in Ruby or Perl that
munges those output lines into something a bit more
"machine-readable", but before I did that I thought I'd ask here if
anyone had any pointers, hints, or better ideas.
More information about the Haskell-Cafe