[Haskell-cafe] How to optimize a directory scanning?

Viktor Dukhovni ietf-dane at dukhovni.org
Fri May 10 18:29:03 UTC 2019


Why is the process id re-computed every second?  Do you
expected it to change during the process lifetime?

>    isMyPid fp me = do
>      let areDigit = fp >= "0" && fp <= "9"
>      isDir <- doesDirectoryExist $ "/proc/" </> fp
>      owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>      return $ areDigit && isDir && (owner == me)

And the code should skip looking for sub-directories of
non-numeric directory entries, avoiding unnecessary stat(2)
calls.

   import System.Posix.Directory as D
   import Control.Monad

   perEntry_ :: FilePath -> (FilePath -> IO ()) -> IO ()
   perEntry_ dirPath entryAction =
	bracket (D.openDirStream)
	        (D.closeDirStream)
	        (D.readDirStream >=> entryAction)

Or with Conduits:

   import Data.Conduit as C
   import Data.Conduit.Combinators as C

   C.runConduitRes $ C.sourceDirectory dirPath .|
	(C.awaitForever >>= entryAction)

But now you have more choices about when and what to return
from the loop, whether the scan the whole directory, ...

Note that the conduit version prepends the directory name to
the entry names.  I would not have done that, but you can just
copy the handful of lines of source and stream the bare entry names:

  http://hackage.haskell.org/package/conduit-1.3.1.1/docs/src/Data.Conduit.Combinators.html#sourceDirectory
                                 
-- 
	Viktor.



More information about the Haskell-Cafe mailing list