[GHC] #14143: forkProcess leaks file descriptors
GHC
ghc-devs at haskell.org
Mon Aug 21 21:40:20 UTC 2017
#14143: forkProcess leaks file descriptors
----------------------------------------+---------------------------
Reporter: danharaj | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone:
Component: libraries/base | Version: 8.2.1
Keywords: | Operating System: POSIX
Architecture: Unknown/Multiple | Type of failure: Other
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
----------------------------------------+---------------------------
This is normal behavior as forking a process in POSIX will copy all file
descriptors unless they are marked O_CLOEXEC. But in Haskell it's quite
difficult to figure out which FDs need to be manually closed.
For example, if a `Handle` to a file is opened in the parent process and
isn't referenced in the code passed to `forkProcess`, its FD will leak. In
order to safely fork, a user has to know about all `Handle`s and other
structures that use file descriptors currently active in the program as
well as which ones will survive by being referenced in the child process.
A simpler problem is wanting to close most FDs (e.g. perhaps excepting
std*) when forking. When you don't know where the file descriptors in the
current process are coming from but you want them to be closed, a not
uncommon approach is to iterate over all file descriptors and close them
all. The `process` library does this. This doesn't work for `forkProcess`
if a Haskell program is built against the threaded runtime because the IO
event manager holds on to file descriptors it uses for control. Attempting
to iterate over all FDs carelessly causes the IO manager to die when
`-threaded` is used. As far as I understand, all of these FDs are held by
the `Control` structure associated with an `EventManager`:
[https://hackage.haskell.org/package/base-4.10.0.0/docs/src/GHC.Event.Control.html#Control]
.
The `base` library does not expose these modules so there is no way to
figure out what they are from user code.
In one's own application, these issues are tricky but ultimately
surmountable as one in principle has the ability to track down every file
descriptor being opened. However, when using `forkProcess` in a library,
one might need a sledgehammer. For example, in the `hdaemonize` package it
is noted that the library can leak file descriptors as there is no way to
deal with this issue:
[https://hackage.haskell.org/package/hdaemonize-0.5.4/docs/System-Posix-
Daemonize.html#v:daemonize]
I am writing a library in the same design space as `hdaemonize` that I
would like to be able to sensibly handle file descriptors. In general the
problem looks intractable (for example because arbitrary C libraries could
initialize their own internal FDs), but if I could know which file
descriptors are being used by the IO Manager, then I could at least
provide for the use case where no FDs should be shared between parent and
child.
Would it be sensible to expose more of the guts of the IO Manager in
`base`? Are there other parts of the RTS that use file descriptors that
need to be preserved?
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14143>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list