[Haskell-cafe] Re: Constructing Data.Map from ByteString

Ketil Malde ketil at malde.org
Wed Mar 12 15:55:06 EDT 2008


Duncan Coutts <duncan.coutts at worc.ox.ac.uk> writes:

> To get something really compact we could use an index composed of three
> unboxed Int arrays. 

To get something *really* compact, we could build a (kind of) suffix
array.  That is, we do a lexical sort of the lines, and store the
sorted offsets of the lines in an array.  You can then look up any
index by binary search using this array. That's 10-15 characters plus
one offset (4 bytes) per line, ~1.3 times the original file.

(There are also compressed suffix array structures, but I don't think
those will gain you anything if you don't store all the positions.)

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants


More information about the Haskell-Cafe mailing list