[Haskell-beginners] please critique my first stab at systems programming

Thu Apr 14 14:50:54 CEST 2011

On Thursday 14 April 2011 14:30:59, Felipe Almeida Lessa wrote:
> On Thu, Apr 14, 2011 at 8:50 AM, Daniel Fischer
> 
> <daniel.is.fischer at googlemail.com> wrote:
> > $ cabal install stringsearch
> > -- faster searches for a pattern than provided by bytestring
> > 
> > import qualified Data.ByteString.Char8 as BC
> > import qualified Data.ByteString.Lazy as L
> > import qualified Data.ByteString.Lazy.Search as L
> > 
> > blockSize = 512
> > 
> > main = do
> >   stuff <- L.readFile "blocks"
> >   case L.indices (BC.pack "PART") stuff of
> >     [] -> putStrLn "Not Found."
> >     (i:_) -> putStrLn $ "Found at " ++ show (i `quot` blockSize) ++
> > "." putStrLn "Done."
> 
> This solves a different problem.  The original looked for PART only on
> the beginning of each block,

Well, I would have expected that, but

> searchForPattern' index handle pat = do
> 
>     eof <- hIsEOF handle
>     if eof then return Nothing
>     
>       else do
>       
>         bytes <- B.hGet handle chunkSize
>         case BC.breakSubstring pat bytes of
>         
>           (x, y) | BC.null y -> searchForPattern' (index + 1) handle pat
>           
>                  | otherwise -> return (Just index)

searches the entire block and checks whether the second component is empty 
to see whether there's any match at all. So I thought that'd be the desired 
behaviour.

If one wants to only look at the start of the blocks, isPrefixOf  would be 
a much more efficient way than breakSubstring. It would probably be still 
more efficient to read larger chunks than 512 bytes and step through them 
with 512-byte steps to check for a "PART" there (unless the "PART" appears 
in the first few blocks).

> while your solution will find everything
> that has PART in it.  Because this is I/O bound, probably the real
> time taken won't be very different, but the program will surely use
> more CPU time and report more false positives.
> 
> Cheers,