[Haskell-cafe] Parsing unstructured data

Olivier Boudry olivier.boudry at gmail.com
Mon Dec 3 13:18:58 EST 2007


On 12/2/07, Steven Fodstad <flarelocke at hotpop.com> wrote:
>
> Sorry for not responding earlier.  The haskell-cafe list is hard to keep
> up with.
>
> The process of finding geographic (lat/long) coordinates from a text
> address is called geocoding.  Obviously extracting the parts of an
> address is part of that, so you might find better results looking for
> geocoding, rather than the more general and more difficult topic of
> extracting structure from unstructured data.  Unfortunately, I don't
> have any references at hand on that part of geocoding.
>

Hi Steven,

The idea of using the geocoding approach seems appealing. I already thought
of using geocoding for address validation (after the parsing) but not of
looking at how geocoding tools parse addresses. But I'm not sure geocoding
tools would be suitable to handle my addresses. I used a few geocoding tools
and usually you have to provide the address in a very specific format if you
want it to be recognized. Also most of the time it work quite well for US
addresses but not for other countries addresses.

I need to recognize very specific parts of an address. More than what a
geocoding tools will require. Like dock #, doors, suite #, contact person,
etc...

I'm currently using the ZipFourCE web service from BCCSoftware for
validating my addresses against the USPS address database. This tool is
built for parsing and correcting addresses but I just use it for validation
as it's not "smart enough" to parse them or maybe they are just too
scrambled for the parsing to be automated using an out of the box tool. ;-)

Thanks for your input,

Olivier.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20071203/c1185ab3/attachment-0001.htm


More information about the Haskell-Cafe mailing list