Data encoding library
apfelmus
apfelmus at quantentunnel.de
Sun Oct 14 14:11:14 EDT 2007
Magnus Therning wrote:
>>> 2. Codecs, i.e. encoder/decoder pairs such as charset converters
>>> data Codec base derived = MkCodec
>>> {
>>> encode :: derived -> base,
>>> decode :: base -> Maybe derived -- or other Monad
>>> }
>>> utf8 :: Codec [Word8] String
>>> xml :: Codec String XML
>>
>> type ASCII = String
>> base16 :: Codec ASCII [Word8]
>> ...
>>
>> encode base16 [0xde,0xad,0xbe,0xef] :: ASCII
>
> A similar result could be gotten by using phantom types, right?
Most likely, although I'm not sure whether the choice from your blog is
the right one. I mean, the only-a-little-bit-phantom type
newtype Base16 a = Base16 { unBase16 :: a } deriving (Eq,Show)
will do the job too
instance DataEncoding Base16 where
encode = Base16 . b16Encode
decode = b16Decode . unBase16
chop n = Base16 . b16chop n . unBase16
unchop = Base16 . b16unchop . unBase16
liberate = unBase16
incarcerate = Base16
Usually, the "normal" phantom type approach would be to make the
encoding a phantom argument of a string type, not the other way round:
newtype EncodedString enc = ES String
data Base16 -- empty type, no constructors
instance DataEncoding (EncodedString Base16) where
...
But your idea of fixing the encoding in the type for more type safety is
good. Another way to do that would be to have an abstract data type
-- this is not a String, this is base16-encoded data!
newtype Base16 = Base16 String
with functions
encode :: [Word8] -> Base16
decode :: Base16 -> [Word8]
and functions
encode :: Base16 -> String
decode :: String -> Maybe Base16
The "normal" phantom type approach has the advantage of making the last
functions polymorphic
encode :: EncodedString enc -> String
decode :: String -> EncodedString enc
encode (ES s) = s
decode s = ES s
at the expense of shifting the possible failure to
decode :: EncodedString Base16 -> Maybe [Word8]
Of course, you can use both phantom types and the codec approach
eliminating the need for a type class
base16 :: Codec [Word8] (EncodedString Base16)
string :: Codec (EncodedString a) String
> But then there must be some way of liberating the result.
> I'm not sure yet whether they are worth it.
>
> AFAIU the example from above then changes to
>
> encode [0xde,0xad,0xbe,0xef] :: Base16 ASCII
Concerning the choice between encoding the encoding (... ;-) in the
types (like Base16) or as values (like base16 :: Codec ...), the
observation is that you have to specify the encoding anyway :) either as
type annotation ("type argument")
encode [0xde,0xad,0xbe,0xef] :: EncodedString Base16
encode' (undefined :: Base16) [0xde,0xad,0xbe,0xef]
or as value argument
encode base16 [0xde,0xad,0xbe,0xef]
In this case, I would prefer the value argument approach for its brevity
and mnemonics ("encode in base16 the following data"). However, possible
strong type guarantees usually are a good argument for the typed approach.
To be true, I'm not really sure whether strong types would gain us
something here.
>> Also, I don't have a clue about what chop and unchop are supposed
>> to do.
>
> For some encodings there are standard ways of splitting an encoded
> string over several lines. Unfortunately it's not always as simple as
> just splitting a string at a particular length. Uuencode is the most
> complicated I've come across so far. That's what chop/unchop is for.
Ah, that's what they are for. An idea would be to build the line length
into the encoding, like
base16 :: Int -> Codec [Word8] [String]
with the intention that
encode (base16 70) x
will encode x with a line length of 70 characters. Hm, should
decode (base16 70) s
fail when the lines are not 70 characters in length, or should it accept
any line length? Maybe it should be
basae16 :: Maybe Int -> Codec [Words8] [String]
since the programmer may choose to not wrap lines anyway. But perhaps
the line length is best paired with the data
base16 :: Codec ([Words8], Maybe Int) [String]
so that
encode base16 (..., Just 70) x
will encode with a line length of 70 characters and
let (,ll) = decode base16 s in ...
will return the parsed line length in ll .
Oh my lambda, it's wondrous how Haskell gives so many possibilities to
ponder for such a seemingly innocent API design problem :)
Regards,
apfelmus
More information about the Libraries
mailing list