[Haskell-cafe] NLP libraries and tools?

wren ng thornton wren at freegeek.org
Fri Jul 8 04:37:29 CEST 2011


On 7/7/11 3:38 AM, Aleksandar Dimitrov wrote:
> On Wed, Jul 06, 2011 at 07:27:10PM -0700, wren ng thornton wrote:
>> I definitely agree with the iteratees comment, but I'm curious about the
>> leaks you mention. I haven't run into leakiness issues (that I'm aware of)
>> in my use of ByteStrings for NLP.
>
> The issue is this: strict ByteStrings retain pointers to the original
chunk. The
> chunk is probably bigger than you'd want to keep in memory, if you, say,
wanted
> to just keep one or two words. In my case, the chunk was some 65K (that
was my
> Iteratee chunk size.)

Oh, that issue. Yeah, I maintain an intern table and make sure that the
copy in the table is a trimmed copy instead of keeping the whole string
alive. I guess I should factor that part of my tagger out into a separate
package :)

I didn't know if you meant there was a technical issue, e.g. something
about the fact that ByteStrings uses pinned memory (whereas Text doesn't
IIRC).

-- 
Live well,
~wren




More information about the Haskell-Cafe mailing list