[Haskell-cafe] Re: [Haskell] reading binary files

Dmitry V'yal akamaus at gmail.com
Fri Apr 7 05:44:23 EDT 2006


Hello, Bulat

I'm currently working on some kind of program for analysing FAT partitions.
Don't ask why did I chose to implement it in Haskell :)  Just for fun.
My program needs to read scattered chunks of binary data from a huge file and to
do a good amount of deserialisation.

I implemented basic functionality using Handles and Ptr's and now I'm starting
to regret for it. I have some pieces of code like:

type PtrAdvancer a b = StateT (Ptr a) IO b

peek_one :: Storable b => PtrAdvancer a b
peek_one = do
  p <- get
  res <- lift $ peek $ castPtr p
  put $ plusPtr p $ sizeOf res
  return res

peek_many :: Storable b => Int -> PtrAdvancer a [b]
peek_many 0 = return []
peek_many n = do
  first <- peek_one
  rest <- peek_many $ n-1
  return $ first:rest


data DirEntry = DirEntry
    { name :: String,
      attr :: Word8,
      crt_time_tenth :: Word8,
      crt_time :: Word16,
      crt_data :: Word16,
      lst_acc_data :: Word16,
      wrt_time :: Word16,
      wrt_date :: Word16,
      fst_cluster :: Word32,
      file_size :: Word32
    } deriving Show

instance Storable DirEntry where
    sizeOf _ = 32
    alignment _ = 32
    peek = evalStateT peek_dir_entry

peek_dir_entry = do
  n <- peek_many 11 :: PtrAdvancer a [Word8]
  at <- peek_one
  peek_one :: PtrAdvancer a Word8
  ctt <- peek_one
  ct <- peek_one
  cd <- peek_one
  lad <- peek_one
  fch <- peek_one :: PtrAdvancer a Word16
  wt <- peek_one
  wd <- peek_one
  fcl <- peek_one :: PtrAdvancer a Word16
  fs <- peek_one
  return $ DirEntry (words_to_str n) at ctt ct cd lad wt wd
             ((fromIntegral fch `shiftL` 16) + fromIntegral fcl) fs

or:

read_cluster_chain32 :: Handle -> FatAddress -> Cluster -> IO [Cluster]
read_cluster_chain32 h start cluster = do
  allocaBytes 4 $ \p -> chain32' p cluster True
  where
    chain32' p c need_seek = do
      when need_seek $ hSeek h AbsoluteSeek (fromIntegral $ start + c * 4)
      hGetBuf h p 4
      val <- peek p :: IO Word32
      let val28 = val .&. 0x0fffffff
      case val28 of
        0x0 -> return []
        0x0fffffff -> return [c]
        otherwise -> do rest <- if c+1 == val28 then chain32' p val28 False
                                 else  chain32' p val28 True
                        return $ c:rest

It works with a mediocre speed (about 10Mb/s when extracting files), but design
is ugly IMO. For example I need to write twice as much number of lines of
marshalling code compared to C. For data type declaration and then for Storable
instance. Is there a way to avoid it?

>
> with my lib, you can either read data directly from file (through
> implicit 512-byte buffer) or read whole file into the automatically
> allocated buffer with `readFromFile`. what is better - depends on what
> you plan to do with rest of file
>

Now I'm going to rewrite my code to make use of io library. So my question is
whether your library is well suited for such application (frequent positioning
and reading small pieces of data).



More information about the Haskell-Cafe mailing list