[Haskell-cafe] NLP libraries and tools?
Rogan Creswick
creswick at gmail.com
Sat Jul 2 00:03:36 CEST 2011
On Fri, Jul 1, 2011 at 2:52 PM, Dmitri O.Kondratiev <dokondr at gmail.com> wrote:
> Any other then 'toktok' Haskell word tokenizer that compiles and works?
> I need something like:
> http://nltk.googlecode.com/svn/trunk/doc/api/nltk.tokenize.regexp.WordPunctTokenizer-class.html
>
I don't think this exists out of the box, but since it appears to be a
basic regex tokenizer, you could use Data.List.Split to create one.
(or one of the regex libraries may be able to do this more simply).
If you go the Data.List.Split route, I suspect you'll want to create a
Splitter based on the whenElt Splitter:
http://hackage.haskell.org/packages/archive/split/0.1.1/doc/html/Data-List-Split.html#v:whenElt
which takes a function from an element to a bool. (which you can
implement however you wish, possibly with a regular expression,
although it will have to be pure.)
If you want something like a maxent tokenizer, then you're currently
out of luck :( (as far as I know).
--Rogan
More information about the Haskell-Cafe
mailing list