[Haskell-beginners] Type polymorphism with size

Brent Yorgey byorgey at seas.upenn.edu
Tue Nov 18 14:18:22 EST 2008


On Tue, Nov 18, 2008 at 10:02:20AM -0800, Michael Snoyman wrote:
> I am trying to write some code to read flat files from a mainframe system.
> This includes some character fields. This is a fixed width file, so each
> field will have a consistent length between records, but there are fields of
> different length within a record. For example, I might have a "name" field
> length 20 and an eye color field length 5.
> 
> I am trying to use the binary library to read in this file. I've written a
> binary type, MFChar2, for reading in a 2-length character field. It is
> defined as such (you can safely ignore the ebcdicToAscii piece, it is just
> doing character conversion):
> 
> data MFChar2 = MFChar2 [Word8]
> instance Binary MFChar2 where
>     put = undefined
>     get = do ebcdic <- replicateM 2 getWord8
>              return $ MFChar2 $ map ebcdicToAscii ebcdic
> 
> What I would like to do is have some kind of generic "MFChar" data type
> which could take any character length, but I can't figure out how to do it.
> Any help would be appreciated.

Hm, interesting!  The problem is that 'get' does not take any
arguments, so must determine what to do from the type at which it is
called.  So the number of words to be read needs to be in the type.
We can't put actual Int values in a type -- but there is actually a
way to do what you want, by encoding natural numbers at the type
level!  I don't know whether this really belongs on a 'beginners' list
but I couldn't resist. =)


data Z      -- the type representing zero
data S n    -- the type representing the successor of another natural

-- for example,  Z,  S Z,  and S (S Z)  are types representing
--   zero, one, and two.

-- the n is for a type-level natural representing the length of the list.
data MFChar n = MFChar [Word8]

-- add a Word8 to the beginning of an MFChar, resulting in an MFChar
--   one word longer
mfCons :: Word8 -> MFChar n -> MFChar (S n)
mfCons w (MFChar ws) = MFChar (w:ws)

instance Binary (MFChar Z) where
  get = return $ MFChar []

instance (Binary (MFChar n)) => Binary (MFChar (S n)) where
  get = do ebcdic <- getWord8
           rest   <- get    -- the correct type of get is 
	                    -- inferred due to the use of mfCons below
           return $ mfCons (ebcdicToAscii ebcdic) rest


Now if you wanted to read a field with 20 chars, you can use

  get :: Get (MFChar (S (S (S ... 20 S's ... Z))))

Ugly, I know. You could make it slightly more bearable by defining
some type synonyms at the top of your program like

  type Five = S (S (S (S (S Z))))
  type Ten = S (S (S (S (S Five))))

and so on.  Then you can just say   get :: Get (MFChar Ten) or whatever.

This is untested but it (or something close to it) ought to work.  Of
course, you may well ask yourself whether this contortion is really
worth it.  Maybe it is, maybe it isn't, but I can't think of a better
way to do it in Haskell.  In a dependently typed language such as
Agda, we could just put regular old natural numbers in the types,
instead of going through contortions to encode natural numbers as
types as we have to do here.  So I guess the real answer to your
question is "use a dependently typed language". =)

If you have problems getting this to work or more questions, feel free
to ask!

-Brent


More information about the Beginners mailing list