[Haskell-cafe] Re: Constructing Data.Map from ByteString
Ketil Malde
ketil at malde.org
Wed Mar 12 15:55:06 EDT 2008
Duncan Coutts <duncan.coutts at worc.ox.ac.uk> writes:
> To get something really compact we could use an index composed of three
> unboxed Int arrays.
To get something *really* compact, we could build a (kind of) suffix
array. That is, we do a lexical sort of the lines, and store the
sorted offsets of the lines in an array. You can then look up any
index by binary search using this array. That's 10-15 characters plus
one offset (4 bytes) per line, ~1.3 times the original file.
(There are also compressed suffix array structures, but I don't think
those will gain you anything if you don't store all the positions.)
-k
--
If I haven't seen further, it is by standing in the footprints of giants
More information about the Haskell-Cafe
mailing list