binary files in haskell

John Meacham john@foo.net
Thu, 8 Feb 2001 01:53:15 -0800


Since there seems to be general support for the idea of some sort of
Portable Byte IO package, I will work on am implementation of my
proposal for ghc being the platform I am most familiar with. I really
like the simplicity of the hPut and hGet idea and will probably use it
as my mechanism for implementation on ghc, however for a Portable API
which should be implementable across a wide variety of Haskell 98
implementations and machine architectures it is too low level to allow
certain portable applications to be written. I made the API at pretty
much the exact level of the Haskell 98 IO API since it seems to strike a
good balance between portability and expressiveness/power. 

A nice advantage of using my mid-level routines is that there are very
little requirements placed on 'Byte' as a type, this means that as long
as to the outside world you only read in 8 bit values and spit 8 bit
values out you can represent it internally however you want. 

for example you might have a machine where a 16 bit word is the smallest
addressable entity, if you relied on hPut Word8 then your program would
not work since Word8 cannot exist on that platform. however if you made
Byte be 16 bits and only used the bottom half of each word then your
program will run unchanged even among architectures such as this.

my requirements for Byte were going to basically mirror the C
requirements for char, The smallest individually addressable integral
type greater than 8 bits in width. The trick that makes programs using
Byte portable is that ByteIO.read and ByteIO.write only utilize the
lower 8 bits of that datatype at a time, therefore one can write
portable Haskell applications which work on network sockets and file
streams in a machine independent fashion.

Anyone who is concerned about the space requirements of using up a
little more memory than necessary on certain architectures will have to
know about how those architectures store stuff in memory anyway to pack
values into the architecture primitives properly so they can use hPut
and hGet with explicit word widths...


I guess what would be nice would be a portable ByteIO as the standard
mid-level interface and the hPut, hGet idea available on those platforms
which support Storable since they seem to make sense as the primitives
for Haskell implementations which allow such fine grained access to the
hardware representations.  (but such access should not be required from
a haskell implementation in order to write portable programs which can
communicate in externally defined formats)

	John


-- 
--------------------------------------------------------------
John Meacham   http://www.ugcs.caltech.edu/~john/
California Institute of Technology, Alum.  john@foo.net
--------------------------------------------------------------