Data.ByteString candidate 3

Thu Apr 27 07:52:09 EDT 2006

I hope you'll forgive me for re-advertising my FPS modifications.
I've started over from Don's sources (please don't use my old fps
repo), refactored, and reworked my changes into that.  

The refactored repo (all functionality and performance identical to
the original): 

           http://www.ii.uib.no/~ketil/src/fps-wrapped

Repo with added Latin1 and ASCII support:

           http://www.ii.uib.no/~ketil/src/fps-i18n

Latin1 functions equal to Char8, but packing chars > 255 will give an
error.  ASCII does the same, but stores characters > 127 out of harms
way.

Adding support for new character sets requires defining four functions
and three constants, and #include'ing a common file.

In addition, some nice properties hold, for instance:

        s1 > s2 => pack s1 > pack s2
        w2c . c2w == id   -- provided no error
        c2w . w2c == id   -- total function

Only the latter holds for Char8.

Latin1 has been tested with the Char8 QC tests, and they have all been
subjected to the benchmark suite, results at 

           http://www.ii.uib.no/~ketil/src/bench.txt

(This is using /usr/share/word/dict)

Packing and unpacking isn't part of the benchmark, but is expected to
be around 10% slower than for Char8.  I have no explanation why 'map'
and 'split' are faster.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants