[Haskell-cafe] How to optimize a directory scanning?

Magicloud Magiclouds magicloud.magiclouds at gmail.com
Fri May 10 15:07:50 UTC 2019


So this is what I got. Seems like both calls two stat(stat/newfstatat)
for dir checking and uid checking. But when open file for reading,
there is an ioctl call (maybe from System.IO.Strict) which seems
failed, for Haskell. I want to test the case without System.IO.Strict.
But have no idea how to get exception catching works with lazy
readFIle.

For Haskell implenmentation,
```
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat("/proc/230", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc/230/stat", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 23
fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
ioctl(23, TCGETS, 0x7ffe88c18090)       = -1 ENOTTY (Inappropriate
ioctl for device)
read(23, "230 (scsi_eh_5) S 2 0 0 0 -1 212"..., 8192) = 155
read(23, "", 8192)                      = 0
close(23)
```
For Rust implenmentation,
```
newfstatat(3, "1121", {st_mode=S_IFDIR|0555, st_size=0, ...},
AT_SYMLINK_NOFOLLOW) = 0
stat("/proc/1121", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/1121/stat", O_RDONLY|O_CLOEXEC) = 4
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "1121 (ibus-engine-sim) S 1077 10", 32) = 32
read(4, "77 1077 0 -1 4194304 425 0 1 0 5", 32) = 32
read(4, "866 1689 0 0 20 0 3 0 3490 16454"..., 64) = 64
read(4, "4885264596992 94885264603013 140"..., 128) = 128
read(4, "0724521155542 140724521155575 14"..., 256) = 64
read(4, "", 192)                        = 0
close(4)
```

On Fri, May 10, 2019 at 3:49 PM Magicloud Magiclouds
<magicloud.magiclouds at gmail.com> wrote:
>
> Good point. Let me see what strace can tell me.
>
> On Fri, May 10, 2019 at 3:46 PM Iustin Pop <iustin at k1024.org> wrote:
> >
> > On 2019-05-10 10:00:45, Magicloud Magiclouds wrote:
> > > Hi,
> > > I have asked this in Stackoverflow without getting an answer.
> > > Wondering if people here could have some thoughts.
> > >
> > > I have a function reading the content of /proc every second.
> > > Surprisingly, its CPU usage in top is around 5%, peak at 8%. But same
> > > logic in C or Rust just takes like 1% or 2%. Wondering if this can be
> > > improved. /proc is virtual filesystem, so this is not related to HDD
> > > performance. And I noticed this difference because my CPU is too old
> > > (Core Gen2). On modern CPU, as tested by others, the difference is
> > > barely noticeable.
> > >
> > > watch u limit0s limit0h = do
> > >   listDirectory "/proc/" >>= mapM_ (\fp -> do
> > >     isMyPid' <- maybe False id <$> wrap2Maybe (isMyPid fp u)
> > >     wrap2Maybe (Strict.readFile ("/proc/" </> fp </> "stat")))
> > >   threadDelay 1000000
> > >   watch u limit0s limit0h
> > >   where
> > >     wrap2Maybe :: IO a -> IO (Maybe a)
> > >     wrap2Maybe f = catch ((<$>) Just $! f) (\(_ :: IOException) ->
> > > return Nothing)
> > >     isMyPid :: FilePath -> UserID -> IO Bool
> > >     isMyPid fp me = do
> > >       let areDigit = fp >= "0" && fp <= "9"
> > >       isDir <- doesDirectoryExist $ "/proc/" </> fp
> > >       owner <- fileOwner <$> getFileStatus ("/proc" </> fp)
> > >       return $ areDigit && isDir && (owner == me)
> >
> > Interesting, I can see a few potential issues. But first, have you
> > measure how many syscalls does this do in Haskell vs. C vs Rust? That
> > would allow you to separate the problem between internal Haskell
> > problems (e.g. String) vs. different algorithm in Haskell.
> >
> > For exacmple, one issue that could lead to unneded syscalls is your
> > "isMyPid" function. AFAIK there's no caching done by getFileStatus, so
> > you're stat'ing (and making a syscall) each path twice, once to get file
> > type (is it directory) information, and then a second time to get owner
> > information.
> >
> > You also build `"/proc/" <> fp` twice (and thus evaluate it twice).
> >
> > But without understanding "how" Haskell it slower, it's not clear where
> > the problem lies (in syscalls or in GC or …).
> >
> > regards,
> > iustin
>
>
>
> --
> 竹密岂妨流水过
> 山高哪阻野云飞
>
> And for G+, please use magiclouds#gmail.com.



-- 
竹密岂妨流水过
山高哪阻野云飞

And for G+, please use magiclouds#gmail.com.


More information about the Haskell-Cafe mailing list