[Haskell-cafe] parsing machine-generated natural text
bulat.ziganshin at gmail.com
Sat May 20 01:22:49 EDT 2006
Saturday, May 20, 2006, 5:35:15 AM, you wrote:
> France: Army Marseilles SUPPORT Army Paris -> Burgundy.
> Russia: Fleet St Petersburg (south coast) -> Gulf of Bothnia.
> England: 4 Supply centers, 3 Units: Builds 1 unit.
> The next phase of 'dip' will be Movement for Fall of 1901.
> I've been using Parsec and it's felt rather complicated. For example,
i have an experience of parsing such human-readable, imprecise texts
and should say that regexps was developed just to do such jibs. ghc
and hugs already contains regex library in module Text.Regex.Posix
(it's available on all systems, including Windows). this lib has
rather dumb interface, i recommend you to install JRegex lib by Johc
Meacham that supports familiar =~ operators. there is also
Text.Regex.Lazy (0.33). Chris Kuklewicz announced the release
of Text.Regex.Lazy. This is an alternative to Text.Regex along
with some enhancements. GHC's Text.Regex marshals the data back
and forth to C arrays, to call libc. This is far too slow (and
strict). This module understands regular expression Strings via a
Parsec parser and creates an internal data structure
(Text.Regex.Lazy.Pattern). This is then transformed into a Parsec
parser to process the input String, or into a DFA table for
matching against the input String or FastPackedString. The input
string is consumed lazily, so it may be an arbitrarily long or
Bulat mailto:Bulat.Ziganshin at gmail.com
More information about the Haskell-Cafe