[Haskell-cafe] parsing machine-generated natural text

Bulat Ziganshin bulat.ziganshin at gmail.com
Sat May 20 01:22:49 EDT 2006


Hello Evan,

Saturday, May 20, 2006, 5:35:15 AM, you wrote:

> France: Army Marseilles SUPPORT Army Paris -> Burgundy.
> Russia: Fleet St Petersburg (south coast) -> Gulf of Bothnia.
> England:     4 Supply centers,  3 Units:  Builds   1 unit.
> The next phase of 'dip' will be Movement for Fall of 1901.

> I've been using Parsec and it's felt rather complicated.  For example,

i have an experience of parsing such human-readable, imprecise texts
and should say that regexps was developed just to do such jibs. ghc
and hugs already contains regex library in module Text.Regex.Posix
(it's available on all systems, including Windows). this lib has
rather dumb interface, i recommend you to install JRegex lib by Johc
Meacham that supports familiar =~ operators. there is also
Text.Regex.Lazy module:

Text.Regex.Lazy (0.33). Chris Kuklewicz [6]announced the release
       of [7]Text.Regex.Lazy. This is an alternative to Text.Regex along
       with some enhancements. GHC's Text.Regex marshals the data back
       and forth to C arrays, to call libc. This is far too slow (and
       strict). This module understands regular expression Strings via a
       Parsec parser and creates an internal data structure
       (Text.Regex.Lazy.Pattern). This is then transformed into a Parsec
       parser to process the input String, or into a DFA table for
       matching against the input String or FastPackedString. The input
       string is consumed lazily, so it may be an arbitrarily long or
       infinite source.

   6. http://article.gmane.org/gmane.comp.lang.haskell.libraries/4464
   7. http://sourceforge.net/projects/lazy-regex

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list