[Haskell-cafe] Parsing unstructured data
olivier.boudry at gmail.com
Wed Nov 28 17:25:54 EST 2007
On 11/28/07, Grzegorz Chrupala <grzegorz.chrupala at computing.dcu.ie> wrote:
> You may have better luck checking out methods used in parsing natural
> language. In order to use statistical parsing techniques such as
> Probabilistic Context Free Grammars (, ) the standard approach is to
> extract rule probabilities from an annotated corpus, that is collection of
> strings with associated parse trees. Maybe you could use your 2/3 of
> addresses that you know are correctly parsed as your training material.
> A PCFG parser can output all (or n-best) parses ordered according to
> probabilities so that would seem to be fit your requirements.
>  http://en.wikipedia.org/wiki/Stochastic_context-free_grammar
>  http://www.cs.colorado.edu/~martin/slp2.html#Chapter14
Wow, Natural Language Processing looks quite complex! But it also seems to
be closely related to my problem. If someone finds a "NPL for dummies"
article or book I'm interested. ;-)
Thanks for your help,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe