[Haskell-cafe] NLP libraries and tools?

wren ng thornton wren at freegeek.org
Wed Jul 6 18:32:27 CEST 2011


On 7/6/11 9:27 AM, Dmitri O.Kondratiev wrote:
> Hi,
> Continuing my search of Haskell NLP tools and libs, I wonder if the
> following Haskell libraries exist (googling them does not help):
> 1) End of Sentence (EOS) Detection. Break text into a collection of
> meaningful sentences.

Depending on how you mean, this is either fairly trivial (for English) or
an ill-defined problem. For things like determining whether the "."
character is intended as a full stop vs part of an abbreviation; that's
trivial.

But for general sentence breaking, how do you intend to deal with
quotations? What about when news articles quote someone uttering a few
sentences before the end-quote marker? So far as I'm aware, there's no
satisfactory definition of what the solution should be in all reasonable
cases. A "sentence" isn't really very well-defined in practice.

> 2) Part-of-Speech (POS) Tagging. Assign part-of-speech information to each
> token.

There are numerous approaches to this problem; do you care about the
solution, or will any one of them suffice?

I've been working over the last year+ on an optimized HMM-based POS
tagger/supertagger with online tagging and anytime n-best tagging. I'm
planning to release it this summer (i.e., by the end of August), though
there are a few things I'd like to polish up before doing so. In
particular, I want to make the package less monolithic. When I release it
I'll make announcements here and on the nlp@ list.


> 3) Chunking. Analyze each tagged token within a sentence and assemble
> compound tokens that express logical concepts. Define a custom grammar.
>
> 4) Extraction. Analyze each chunk and further tag the chunks as named
> entities, such as people, organizations, locations, etc.
>
> Any ideas where to look for similar Haskell libraries?

I don't know of any work in these areas in Haskell (though I'd love to
hear about it). You should try asking on the nlp@ list where the other
linguists and NLPers are more likely to see it.

-- 
Live well,
~wren




More information about the Haskell-Cafe mailing list