[GHC] #710: library reorganisation

Fri Apr 28 07:29:40 EDT 2006

Bulat Ziganshin <bulat.ziganshin at gmail.com> writes:

> IMHO, because PackedString is anyway abstract and DON'T support any way
> to see it's internal representation, any implementation that supports
> full unicode range, would be enough.

Perhaps I'm misrepresenting FPS here, but from my POV, the
representation is very much the issue.  I see the typical use for FPS
to be a case where you have some data (files, network buffers,
whatever), which is, essentially, a string of bytes.

The Char interface(s) to FPS is, in theory, encoding agnostic, but in
practice, it will be limited by the underlying encoding.  IMO, that is
okay, as long as the interesting encodings are supported.

Note that ByteString.UTF8 is *not* going to be a replacement for the
other encoding-specific modules, since that would mean you would have
to do an (expensive) conversion of non-UTF8 data.  The current scheme
allows you to work with a universal Unicode interface (based on Char),
but keeping the data in its 'native' representation.

The question is how to extend this to muliti-byte fixed encodings
(UCS-2 and UCS-4), and variable encodings (UTF-8, UTF-16, UTF-32,
Shift-JIS, and why not Quoted-Printable?).  I feel confident it can be
done, but it is likely to involve some policy decisions and trade
offs. 

-k

PS: I implemented another single-byte encoding, Windows-1252. This
builds everything from a charset table, and while currently not too
efficient, should make it very easy to add other single-byte
encodings.  As usual: darcs get http://www.ii.uib.no/~ketil/src/fps-i18n
-- 
If I haven't seen further, it is by standing in the footprints of giants