[Haskell-cafe] Re: [Haskell] reading binary files

Benjamin Franksen benjamin.franksen at bessy.de
Wed Apr 5 16:15:20 EDT 2006


[questions such as this one should go to cafe]

On Wednesday 05 April 2006 20:41, minh thu wrote:
> 1/ i want to read some binary file (e.g. targa file format : *.tga).
>
> -- first way : via IOUArray
> showInfoHeader1 handle = do
>     a <- newArray_ (1,8) :: IO (IOUArray Int Word8)
>     hGetArray handle a 8
>     idLength <- readArray a 1 -- or getElems...
>     putStrLn ("id length : " ++ show idLength)
>     return ()
>
> -- second way : via c-like array
> showInfoHeader2 handle = do
>     b <- mallocArray 8 :: IO (Ptr Word8)
>     hGetBuf handle b 8
>     [idLength] <- peekArray 1 b -- or peakArray 8 b

The index should be 0 if you want to read the first byte. Also, if you 
are only interested in the first byte, you could simply

    idLength <- peek b

or if it is not the first byte, then

    idLength <- peekByteOff b i

However, it is better to use arrays, than pointers.

>     putStrLn ("id length : " ++ show idLength)
>     free b
>     return ()
>
> so, briefly, i have to read some content into some kind of buffer
> (IOUArray Int Word8 or Ptr Word8), then get one (or more) elements
> from the buffor into a standard haskell variable (is it the correct
> word ?) (or list).
>
> in the second case, i also have to free the buffer.

Or use alloca or allocaBytes, which are both a lot faster than malloc 
and free.

> in some case, when the data is more than one Word8 long, i have to
> 'reconstruct' it, i.e.:
>
> [x1,x2] <- getElems a

This will give you a run-time error, because you array is 8 elements 
long, not 2. You can do

    x1:x2:_ <- ...

or better still

    x1 <- readArray b 1
    x2 <- readArray b 2

Still better: Use one of the available binary serialisation libraries. 
They are already tuned for efficiency and give you a much nicer 
high-level API.

> let x = fromIntegral x1 + fromIntegral x2 * 256 :: Int
>
> is it the correct way to read binary files ?

Depends on the byte order that is used in your file format. If it is 
big-endian then correct, else not correct. (I hope I did get this 
right; I always tend to confuse big- and little-endian.)

> 2/ haskell is (i heard that once ... :-) a high level language, so it
> has (must have) good support for abstraction...

Sure. See abve mentioned libraries.

> but in 1/, i have to choose between different kind of array
> representation (and i dont know which one is better) and it seems to
> me that the resulting code (compiled) would have to be the same.

I strongly recommend using some Array type (IOU or whatever). Ptr is 
really just a raw pointer into memory: no protection from out-of-bounds 
access, etc. much like in C. Ptr has been invented for interfacing with 
C routines, not for regular Haskell programming.

HTH,
Ben


More information about the Haskell-Cafe mailing list