[Haskell-cafe] How to optimize a directory scanning?

Magicloud Magiclouds magicloud.magiclouds at gmail.com
Fri May 10 07:49:27 UTC 2019


Good point. Let me see what strace can tell me.

On Fri, May 10, 2019 at 3:46 PM Iustin Pop <iustin at k1024.org> wrote:
>
> On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> > Hi,
> > I have asked this in Stackoverflow without getting an answer.
> > Wondering if people here could have some thoughts.
> >
> > I have a function reading the content of /proc every second.
> > Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> > logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> > improved. /proc is virtual filesystem, so this is not related to HDD
> > performance. And I noticed this difference because my CPU is too old
> > (Core Gen2). On modern CPU, as tested by others, the difference is
> > barely noticeable.
> >
> > watch u limit0s limit0h = do
> >   listDirectory "/proc/" >>= mapM_ (\fp -> do
> >     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
> >     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
> >   threadDelay 1000000
> >   watch u limit0s limit0h
> >   where
> >     wrap2Maybe :: IO a -> IO (Maybe a)
> >     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> > return Nothing)
> >     isMyPid :: FilePath -> UserID -> IO Bool
> >     isMyPid fp me = do
> >       let areDigit = fp >= "0" && fp <= "9"
> >       isDir <- doesDirectoryExist $ "/proc/" </> fp
> >       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
> >       return $ areDigit && isDir && (owner == me)
>
> Interesting, I can see a few potential issues. But first, have you
> measure how many syscalls does this do in Haskell vs. C vs Rust? That
> would allow you to separate the problem between internal Haskell
> problems (e.g. String) vs. different algorithm in Haskell.
>
> For exacmple, one issue that could lead to unneded syscalls is your
> "isMyPid" function. AFAIK there's no caching done by getFileStatus, so
> you're stat'ing (and making a syscall) each path twice, once to get file
> type (is it directory) information, and then a second time to get owner
> information.
>
> You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
>
> But without understanding "how" Haskell it slower, it's not clear where
> the problem lies (in syscalls or in GC or …).
>
> regards,
> iustin



-- 
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.


More information about the Haskell-Cafe mailing list