Text I/O library proposal, first draft

Ashley Yakeley ashley@semantic.org
Sun, 3 Aug 2003 21:16:17 -0700


In article <Pine.LNX.4.21.0307311506290.31823-100000@dark.darkweb.com>,
 Ben Rudiak-Gould <benrg@dark.darkweb.com> wrote:

> [Crossposted to Haskell and Libraries. Replies to Libraries.]

There's a Haskell Internationalistion mailing list too. Also check out 
the project on SF:
<http://sourceforge.net/projects/haskell-i18n/>
There's a bunch of my code for Unicode properties, plus a couple of UTF8 
implementations.

> module System.TextIOFirstDraft (...) where

could be put in Text.* hierarchy

> type BlockRecoder from to =
>   Ptr from -> BlockLength -> Ptr to -> BlockLength
>    -> IO (BlockLength,BlockLength)

UArray and MArray would be slightly cleaner if you're doing the IO 
thing. But actually my biggest problem is that this is in the IO monad. 
Given your code, I should be able to write these without resorting to 
unsafePerformIO:

  encodeUTF8 :: String -> [Word8]
  decodeUTF8 :: [Word8] -> Maybe String -- Nothing if not valid

Actually, if one makes certain assumptions about encodings, you could 
get away with something like this:

  type Encoder base t = t -> [base]
  type Decoder base t = forall m. (Monad m) => m base -> m t

Is this any less efficient? Probably not if you're writing your 
BlockRecoders in Haskell.

> type TextEncoder = BlockRecoder Word32 Octet
> type TextDecoder = BlockRecoder Octet Word32

On GHC, Char has exactly the range 0 to 0x10FFFF, as per Unicode 
codepoints. If this becomes standardised as part of an 
internationalisation effort, you might want to use Char rather than 
Word32.

-- 
Ashley Yakeley, Seattle WA