[Haskell-beginners] please critique my first stab at systems programming
MAN
elviotoccalino at gmail.com
Thu Apr 14 15:02:44 CEST 2011
searchForPattern' (and therefore searchForPattern) will read the entire
file, indeed. To check only the first 'n' chunks in a file you'd have to
1) receive a lazy bytestring, check first n*chuckSize bytes.
2) receive the handle, loop over n chunks.
El jue, 14-04-2011 a las 14:50 +0200, Daniel Fischer escribió:
> On Thursday 14 April 2011 14:30:59, Felipe Almeida Lessa wrote:
> > On Thu, Apr 14, 2011 at 8:50 AM, Daniel Fischer
> >
> > <daniel.is.fischer at googlemail.com> wrote:
> > > $ cabal install stringsearch
> > > -- faster searches for a pattern than provided by bytestring
> > >
> > > import qualified Data.ByteString.Char8 as BC
> > > import qualified Data.ByteString.Lazy as L
> > > import qualified Data.ByteString.Lazy.Search as L
> > >
> > > blockSize = 512
> > >
> > > main = do
> > > stuff <- L.readFile "blocks"
> > > case L.indices (BC.pack "PART") stuff of
> > > [] -> putStrLn "Not Found."
> > > (i:_) -> putStrLn $ "Found at " ++ show (i `quot` blockSize) ++
> > > "." putStrLn "Done."
> >
> > This solves a different problem. The original looked for PART only on
> > the beginning of each block,
>
> Well, I would have expected that, but
>
> > searchForPattern' index handle pat = do
> >
> > eof <- hIsEOF handle
> > if eof then return Nothing
> >
> > else do
> >
> > bytes <- B.hGet handle chunkSize
> > case BC.breakSubstring pat bytes of
> >
> > (x, y) | BC.null y -> searchForPattern' (index + 1) handle pat
> >
> > | otherwise -> return (Just index)
>
> searches the entire block and checks whether the second component is empty
> to see whether there's any match at all. So I thought that'd be the desired
> behaviour.
>
> If one wants to only look at the start of the blocks, isPrefixOf would be
> a much more efficient way than breakSubstring. It would probably be still
> more efficient to read larger chunks than 512 bytes and step through them
> with 512-byte steps to check for a "PART" there (unless the "PART" appears
> in the first few blocks).
>
> > while your solution will find everything
> > that has PART in it. Because this is I/O bound, probably the real
> > time taken won't be very different, but the program will surely use
> > more CPU time and report more false positives.
> >
> > Cheers,
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
More information about the Beginners
mailing list