[Haskell-cafe] How to optimize a directory scanning?

Iustin Pop iustin at k1024.org
Fri May 10 07:46:46 UTC 2019


On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> Hi,
> I have asked this in Stackoverflow without getting an answer.
> Wondering if people here could have some thoughts.
> 
> I have a function reading the content of /proc every second.
> Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> improved. /proc is virtual filesystem, so this is not related to HDD
> performance. And I noticed this difference because my CPU is too old
> (Core Gen2). On modern CPU, as tested by others, the difference is
> barely noticeable.
> 
> watch u limit0s limit0h = do
>   listDirectory "/proc/" >>= mapM_ (\fp -> do
>     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
>     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
>   threadDelay 1000000
>   watch u limit0s limit0h
>   where
>     wrap2Maybe :: IO a -> IO (Maybe a)
>     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> return Nothing)
>     isMyPid :: FilePath -> UserID -> IO Bool
>     isMyPid fp me = do
>       let areDigit = fp >= "0" && fp <= "9"
>       isDir <- doesDirectoryExist $ "/proc/" </> fp
>       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
>       return $ areDigit && isDir && (owner == me)

Interesting, I can see a few potential issues. But first, have you
measure how many syscalls does this do in Haskell vs. C vs Rust? That
would allow you to separate the problem between internal Haskell
problems (e.g. String) vs. different algorithm in Haskell.

For exacmple, one issue that could lead to unneded syscalls is your
"isMyPid" function. AFAIK there's no caching done by getFileStatus, so
you're stat'ing (and making a syscall) each path twice, once to get file
type (is it directory) information, and then a second time to get owner
information.

You also build `"/proc/" <> fp` twice (and thus evaluate it twice).

But without understanding "how" Haskell it slower, it's not clear where
the problem lies (in syscalls or in GC or …).

regards,
iustin


More information about the Haskell-Cafe mailing list