[Haskell-cafe] Discussion: The CLOEXEC problem
Erik Hesselink
hesselink at gmail.com
Mon Jul 20 13:31:40 UTC 2015
I've run into this, but with sockets instead of files. For example, if
you run a kind of launcher that spawns processes with a double fork,
and it listens on its own socket, restarting it will fail to rebind
the socket, since the spawned processes inherited it. We set
FD_CLOEXEC on the socket now, but, at least on Linux, you could pass
SOCK_CLOEXEC to 'socket' in a similar way as with 'open'. Mac support
is trickier: it does seem to support the flag on 'open', but not on
'socket', as far as I can tell. I have no idea if this discussion
applies to Windows at all.
Personally I agree with you that we should probably set this by
default, and expose a flag to change it.
Erik
On Mon, Jul 20, 2015 at 3:07 PM, Niklas Hambüchen <mail at nh2.me> wrote:
> Hello Cafe,
>
> I would like to point out a problem common to all programming languages,
> and that Haskell hasn't addressed yet while other languages have.
>
> It is about what happens to file descriptors when the `exec()` syscall
> is used (whenever you `readProcess`, `createProcess`, `system`, use any
> form of `popen()`, Shake's `cmd` etc.).
>
> (A Markdown-rendered version containing most of this email can be found
> at https://github.com/ndmitchell/shake/issues/253.)
>
> Take the following function
>
> f :: IO ()
> f = do
> inSomeTemporaryDirectory $ do
> BS.writeFile "mybinary" binaryContents
> setPermissions "mybinary" (setOwnerExecutable True emptyPermissions)
> _ <- readProcess "./mybinary" [] ""
> return ()
>
> If this is happening in parallel, e.g. using,
>
> forkIO f >> forkIO f >> forkIO f >> threadDelay 5000000`
>
> then on Linux the `readProcess` might often fail wit the error message
>
> mybinary: Text file busy
>
> This error means "Cannot execute the program 'mybinary' because it is
> open for writing by some process".
>
> How can this happen, given that we're writing all `mybinary` files in
> completely separate temporary directories, and given that `BS.writeFile`
> guarantees to close the file handle / file descriptor (`Fd`) before it
> returns?
>
> The answer is that by default, child processes on Unix (`fork()+exec()`)
> inherit all open file descriptors of the parent process. An ordering
> that leads to the problematic case could be:
>
> * Thread 1 writes its file completely (opens and closes an Fd 1)
> * Thread 2 starts writing its file (Fd 2 open for writing)
> * Thread 1 executes "myBinary" (which calls `fork()` and `exec()`). Fd 2
> is inherited by the child process
> * Thread 2 finishes writing (closes its Fd 2)
> * Thread 2 executes "myBinary", which fails with `Text file busy`
> because an Fd is still open to it in the child of Process 1
>
> The scope of this program is quite general unfortunately: It will happen
> for any program that uses parallel threads, and that runs two or more
> external processes at some time. It cannot be fixed by the part that
> starts the external process (e.g. you can't write a reliable
> `readProcess` function that doesn't have this problem, since the problem
> is rooted in the Fds, and there is no version of `exec()` that doesn't
> inherit parent Fds).
>
> This problem is a general problem in C on Unix, and was discovered quite
> late.
>
> Naive solutions to this use `fcntl`, e.g. `fcntl(fd, F_SETFD, FD_CLOEXEC)`:
>
>
> http://stackoverflow.com/questions/6125068/what-does-the-fd-cloexec-fcntl-flag-do
>
> which is the equivalent of Haskell's `setFdOption` to set the `CLOEXEC`
> flag to all Fds before `exec()`ing. Fds with this flag are not inherited
> by `exec()`ed child processes. However, these solutions are racy in
> multi-threaded programs (such as typical Haskell programs), where an
> `exec()` made by some thread can fall just in between the `int fd =
> open(...); exec(...)` of some other thread.
>
> For this reason, the `O_CLOEXEC` flag was added in Linux 2.6.23, see
> e.g. `man 2 open`
>
> http://man7.org/linux/man-pages/man2/open.2.html
>
> to the `open()` syscall to atomically open a file and set the Fd to
> CLOEXEC in a single step.
>
> This flag is not the default in Haskell - but maybe it should be. Other
> languages set it by default, for example Python. See
>
> PEP-433: https://www.python.org/dev/peps/pep-0433/
> and the newer
> PEP-446: https://www.python.org/dev/peps/pep-0446/
>
> for a very good description of the situation.
>
> Python >= 3.2 closes open Fds *after* the `exec()` when performed with
> its `subprocess` module.
> Python 3.4 uses O_CLOEXEC by default on all Fds opened by Python.
>
> It is also noted that "The programming languages Go, Perl and Ruby make
> newly created file descriptors non-inheritable by default: since Go 1.0
> (2009), Perl 1.0 (1987) and Ruby 2.0 (2013)":
>
> https://www.python.org/dev/peps/pep-0446/#related-work
>
> A work-around for Haskell is to use `O_CLOEXEC` explicitly, as in this
> example module `System/Posix/IO/ExecSafe.hsc`:
>
> https://gist.github.com/nh2/4932ecf5ca919659ae51
>
> Then we can implement a safe version of `BS.writeFile`:
>
> https://gist.github.com/nh2/4932ecf5ca919659ae51
>
> Using this form of `writeFileExecSafe` helps in cases when your program
> is very small, when you can change all the code and you don't use any
> libraries that open files. However, this is a very rare case, and not a
> real solution.
>
> All multi-threaded Haskell programs that write and execute files will
> inherently trigger the `Text file busy` problem.
>
> We need to discuss what to do about this.
>
> Let us run this discussion on haskell-cafe and move to the libraries@
> mailing list once we've got some ideas and opinions.
>
> My personal stance is that we should follow Python's example, and all
> functions in our standard libraries should open files with the O_CLOEXEC
> flag set.
>
> Niklas
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list