[Haskell-cafe] NLP libraries and tools?

Richard O'Keefe ok at cs.otago.ac.nz
Thu Jul 7 02:46:38 CEST 2011


On 7/07/2011, at 7:04 AM, Dmitri O.Kondratiev wrote:
> I am looking for Haskell implementation of sentence tokenizer such as described by Tibor Kiss and Jan Strunk’s in “Unsupervised Multilingual Sentence Boundary Detection”,  which is implemented in NLTK:

That method is multilingual but relies on the text being written using
fairly modern Western conventions, and tackles the problem of "too many
dots" and not knowing which are abbreviation points and which full stops.

I don't suppose anyone knows something that might help with the problem
of too few dots?  Run on sentences are one example.
> 
> I've been working over the last year+ on an optimized HMM-based POS
> tagger/supertagger with online tagging and anytime n-best tagging. I'm
> planning to release it this summer (i.e., by the end of August), though
> there are a few things I'd like to polish up before doing so. In
> particular, I want to make the package less monolithic. When I release it
> I'll make announcements here and on the nlp@ list.

One of the issues I've had with a POS tagger I've been using is that it
makes some really stupid decisions which can be patched up with a few
simple rules, but since it's distributed as a .jar file I cannot add
those rules.





More information about the Haskell-Cafe mailing list