utf8 strings: memory optimization and case-ignoring comparision

Fri Dec 16 04:38:19 EST 2005

Hello Simon,

Thursday, December 15, 2005, 12:44:35 PM, you wrote:

>> 2. what is the most memory-efficient representaion for such strings?
>>
>> data UArray i e = UArray !i !i ByteArray#

SM> I don't know why an extra 8/16 bytes per string is that worrying - if
SM> you have so many small strings perhaps you should be sharing them via a
SM> hash table?

i also use hash table in another part of program, but in this list
(it's a basenames of files on disk) 70% of strings are unique

SM> ForeignPtr and mallocForeignPtr are the way to go these days.  In GHC
SM> 6.6 these will be much faster than before.

can you please say what is a representation of ForeignPtr in 6.6? GHC
6.4.1 uses PinnedByteArray. as far as i can understand, there is
only two alternatives - either use pinned arrays and have access to C
functions which needs address of memory area, or use unpinned
ByteArray and do all processing in Haskell?

how pinned byte arrays work with garbage collector? are they allocated
in special memory blocks so they don't alternate with movable data?
can the memory, used by these arrays, be deallocated and then used
again?

-- 
Best regards,
 Bulat                            mailto:bulatz at HotPOP.com