Text I/O library proposal, first draft
Ashley Yakeley
ashley@semantic.org
Sun, 3 Aug 2003 21:16:17 -0700
In article <Pine.LNX.4.21.0307311506290.31823-100000@dark.darkweb.com>,
Ben Rudiak-Gould <benrg@dark.darkweb.com> wrote:
> [Crossposted to Haskell and Libraries. Replies to Libraries.]
There's a Haskell Internationalistion mailing list too. Also check out
the project on SF:
<http://sourceforge.net/projects/haskell-i18n/>
There's a bunch of my code for Unicode properties, plus a couple of UTF8
implementations.
> module System.TextIOFirstDraft (...) where
could be put in Text.* hierarchy
> type BlockRecoder from to =
> Ptr from -> BlockLength -> Ptr to -> BlockLength
> -> IO (BlockLength,BlockLength)
UArray and MArray would be slightly cleaner if you're doing the IO
thing. But actually my biggest problem is that this is in the IO monad.
Given your code, I should be able to write these without resorting to
unsafePerformIO:
encodeUTF8 :: String -> [Word8]
decodeUTF8 :: [Word8] -> Maybe String -- Nothing if not valid
Actually, if one makes certain assumptions about encodings, you could
get away with something like this:
type Encoder base t = t -> [base]
type Decoder base t = forall m. (Monad m) => m base -> m t
Is this any less efficient? Probably not if you're writing your
BlockRecoders in Haskell.
> type TextEncoder = BlockRecoder Word32 Octet
> type TextDecoder = BlockRecoder Octet Word32
On GHC, Char has exactly the range 0 to 0x10FFFF, as per Unicode
codepoints. If this becomes standardised as part of an
internationalisation effort, you might want to use Char rather than
Word32.
--
Ashley Yakeley, Seattle WA