[Haskell] A opportunity to lern (parsing huge binary file)

S. Doaitse Swierstra doaitse at swierstra.net
Sat Mar 19 21:53:52 CET 2011

The uu-parsing library support every ata type that is an instance of  Data.Listlike (http://hackage.haskell.org/packages/archive/ListLike/3.0.1/doc/html/Data-ListLike.html#t:ListLike) and thus input from Data.Bytestring.Lazy.

A very small starting program can be found below. Note that here we ask for the error correction during parsin at the end of the processing; that is probably something you do not want to do, unless you only keep a very small part of the input in the result. The parsers are online, do not hang on to the input and thus you essentially only access and keep the part of the result you are interested in.

We find it a great help to have the error correction at hand since it makes it a lot easier to debug your parser. Here we just recognise any list of Word8's.


{-# LANGUAGE MultiParamTypeClasses #-}
module ReadLargeBinaryFile where

import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.BasicInstances
import Data.Word
import Data.ByteString.Lazy (ByteString,readFile)
import Prelude hiding (readFile)

type BS_Parser a = P (Str Word8 ByteString Integer) a

instance IsLocationUpdatedBy Integer Word8 where
   advance pos _ = pos + 1

p:: BS_Parser [Word8]
p =  pList (pSatisfy (const True) (Insertion "" 0 0) )
main filename = do    inp <- readFile filename
                      let r@(a, errors) =  parse ( (,) <$> p <*> pEnd) (createStr 0 inp)
                      putStrLn ("--  Result: " ++ show a)
                      if null errors then  return ()
                                     else  do putStr ("--  Correcting steps: \n")
                                              show_errors errors
                      putStrLn "-- "
                      where show_errors :: (Show a) => [a] -> IO ()
                            show_errors = sequence_ . (map (putStrLn . show))

interface and that exists for Data. 
On 10 mrt 2011, at 16:36, Skeptic . wrote:

> Hi,
> I finally have an opportunity to learn Haskell (I'm a day-to-day Java programmer, but I'm also at ease with Scheme), parsing a huge (i.e. up to 50 go) binary file. The encoding is very stable, but it's not a flat struct array (i.e. it uses flags). 
> Different outputs (i.e. text files) will be needed, some unknown at this time. 
> Sounds to me a perfect "real-world" task to see what Haskell can offer.
> Any suggestions at how to structure the code or on which packages to look at is welcome.
> Thanks. 		 	   		  
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell

More information about the Haskell mailing list