[Haskell-cafe] How to optimize a directory scanning?

Brandon Allbery allbery.b at gmail.com
Fri May 10 15:35:34 UTC 2019


The ioctl is standard, including in C unless you are using open() directly:
it checks to see if the opened file is a terminal, to determine whether to
set block or line buffering.

On Fri, May 10, 2019 at 11:09 AM Magicloud Magiclouds <
magicloud.magiclouds at gmail.com> wrote:

> So this is what I got. Seems like both calls two stat(stat/newfstatat)
> for dir checking and uid checking. But when open file for reading,
> there is an ioctl call (maybe from System.IO.Strict) which seems
> failed, for Haskell. I want to test the case without System.IO.Strict.
> But have no idea how to get exception catching works with lazy
> readFIle.
>
> For Haskell implenmentation,
> ```
> stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23
> fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> ioctl(23, TCGETS, 0x7ffe88c18090)       = -1 ENOTTY (Inappropriate
> ioctl for device)
> read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155
> read(23, "", 8192)                      = 0
> close(23)
> ```
> For Rust implenmentation,
> ```
> newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...},
> AT_SYMLINK_NOFOLLOW) = 0
> stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4
> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
> fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32
> read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32
> read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64
> read(4, "4885264596992 94885264603013 140"..., 128) = 128
> read(4, "0724521155542 140724521155575 14"..., 256) = 64
> read(4, "", 192)                        = 0
> close(4)
> ```
>
> On Fri, May 10, 2019 at 3:49 PM Magicloud Magiclouds
> <magicloud.magiclouds at gmail.com> wrote:
> >
> > Good point. Let me see what strace can tell me.
> >
> > On Fri, May 10, 2019 at 3:46 PM Iustin Pop <iustin at k1024.org> wrote:
> > >
> > > On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> > > > Hi,
> > > > I have asked this in Stackoverflow without getting an answer.
> > > > Wondering if people here could have some thoughts.
> > > >
> > > > I have a function reading the content of /proc every second.
> > > > Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> > > > logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> > > > improved. /proc is virtual filesystem, so this is not related to HDD
> > > > performance. And I noticed this difference because my CPU is too old
> > > > (Core Gen2). On modern CPU, as tested by others, the difference is
> > > > barely noticeable.
> > > >
> > > > watch u limit0s limit0h = do
> > > >   listDirectory "/proc/" >>= mapM_ (\fp -> do
> > > >     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
> > > >     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
> > > >   threadDelay 1000000
> > > >   watch u limit0s limit0h
> > > >   where
> > > >     wrap2Maybe :: IO a -> IO (Maybe a)
> > > >     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> > > > return Nothing)
> > > >     isMyPid :: FilePath -> UserID -> IO Bool
> > > >     isMyPid fp me = do
> > > >       let areDigit = fp >= "0" && fp <= "9"
> > > >       isDir <- doesDirectoryExist $ "/proc/" </> fp
> > > >       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
> > > >       return $ areDigit && isDir && (owner == me)
> > >
> > > Interesting, I can see a few potential issues. But first, have you
> > > measure how many syscalls does this do in Haskell vs. C vs Rust? That
> > > would allow you to separate the problem between internal Haskell
> > > problems (e.g. String) vs. different algorithm in Haskell.
> > >
> > > For exacmple, one issue that could lead to unneded syscalls is your
> > > "isMyPid" function. AFAIK there's no caching done by getFileStatus, so
> > > you're stat'ing (and making a syscall) each path twice, once to get
> file
> > > type (is it directory) information, and then a second time to get owner
> > > information.
> > >
> > > You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
> > >
> > > But without understanding "how" Haskell it slower, it's not clear where
> > > the problem lies (in syscalls or in GC or …).
> > >
> > > regards,
> > > iustin
> >
> >
> >
> > --
> > 竹密岂妨流水过
> > 山高哪阻野云飞
> >
> > And for G+, please use magiclouds#gmail.com.
>
>
>
> --
> 竹密岂妨流水过
> 山高哪阻野云飞
>
> And for G+, please use magiclouds#gmail.com.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.



-- 
brandon s allbery kf8nh
allbery.b at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20190510/19d3d686/attachment.html>


More information about the Haskell-Cafe mailing list