I/O manager: relying solely upon kqueue is not a safe way to go

Andreas Voellmy andreas.voellmy at gmail.com
Sat Mar 16 22:07:33 CET 2013


I started to look into fixing this issue, but HEAD no longer compiles for
me. Here is the build error I get (on os x 10.8.2):

$ "inplace/bin/ghc-stage1" -static  -H32m -O    -package-name
ghc-prim-0.3.1.0 -hide-all-packages -i -ilibraries/ghc-prim/.
-ilibraries/ghc-prim/dist-install/build
-ilibraries/ghc-prim/dist-install/build/autogen
-Ilibraries/ghc-prim/dist-install/build
-Ilibraries/ghc-prim/dist-install/build/autogen -Ilibraries/ghc-prim/.
 -optP-include
-optPlibraries/ghc-prim/dist-install/build/autogen/cabal_macros.h -package
rts-1.0 -split-objs -package-name ghc-prim -XHaskell98 -XCPP -XMagicHash
-XForeignFunctionInterface -XUnliftedFFITypes -XUnboxedTuples
-XEmptyDataDecls -XNoImplicitPrelude -O2  -no-user-package-db -rtsopts
 -dynamic-too -odir libraries/ghc-prim/dist-install/build -hidir
libraries/ghc-prim/dist-install/build -stubdir
libraries/ghc-prim/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c
libraries/ghc-prim/./GHC/IntWord64.hs -o
libraries/ghc-prim/dist-install/build/GHC/IntWord64.o -dyno
libraries/ghc-prim/dist-install/build/GHC/IntWord64.dyn_o"inplace/bin/ghc-stage1"
-static  -H32m -O    -package-name ghc-prim-0.3.1.0 -hide-all-packages -i
-ilibraries/ghc-prim/. -ilibraries/ghc-prim/dist-install/build
-ilibraries/ghc-prim/dist-install/build/autogen
-Ilibraries/ghc-prim/dist-install/build
-Ilibraries/ghc-prim/dist-install/build/autogen -Ilibraries/ghc-prim/.
 -optP-include
-optPlibraries/ghc-prim/dist-install/build/autogen/cabal_macros.h -package
rts-1.0 -split-objs -package-name ghc-prim -XHaskell98 -XCPP -XMagicHash
-XForeignFunctionInterface -XUnliftedFFITypes -XUnboxedTuples
-XEmptyDataDecls -XNoImplicitPrelude -O2  -no-user-package-db -rtsopts
 -dynamic-too -odir libraries/ghc-prim/dist-install/build -hidir
libraries/ghc-prim/dist-install/build -stubdir
libraries/ghc-prim/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c
libraries/ghc-prim/./GHC/IntWord64.hs -o
libraries/ghc-prim/dist-install/build/GHC/IntWord64.o -dyno
libraries/ghc-prim/dist-install/build/GHC/IntWord64.dyn_o
/var/folders/_c/4n2x0zfx7mx5gk_46pdxn3pm0000gn/T/ghc66530_0/ghc66530_1.split__2.s:unknown:missing
indirect symbols for section (__DATA,__la_sym_ptr2)


On Sat, Mar 16, 2013 at 11:08 AM, Andreas Voellmy <andreas.voellmy at gmail.com
> wrote:

>
>
>
> On Fri, Mar 15, 2013 at 3:54 PM, PHO <pho at cielonegro.org> wrote:
>
>> I found the HEAD stopped working on MacOS X 10.5.8 since the parallel
>> I/O manager got merged to HEAD. Stage-2 compiler successfully builds
>> (including Language.Haskell.TH.Syntax contrary to the report by Kazu
>> Yamamoto) but the resulting binary is very unstable especially for
>> ghci:
>>
>>   % inplace/bin/ghc-stage2  --interactive
>>   GHCi, version 7.7.20130313: http://www.haskell.org/ghc/  :? for help
>>   Loading package ghc-prim ... linking ... done.
>>   Loading package integer-gmp ... linking ... done.
>>   Loading package base ... linking ... done.
>>   Prelude>
>>   <stdin>: hGetChar: failed (Operation not supported)
>>
>> So I took a dtruss log and found it was kevent(2) that returned
>> ENOTSUP. GHC.Event.KQueue was just registering the stdin for
>> EVFILT_READ, whose type was of course tty, and then kevent(2) said
>> "tty is not supported". Didn't the old I/O manager do the same thing?
>> Why was it working then?
>>
>> After a hard investigation, I concluded that the old I/O manager was
>> not really working. It just looked fine but in fact wasn't. Here's an
>> explanation: If a fd to be registered is unsupported by kqueue,
>> kevent(2) returns -1 iff no incoming event buffer is passed
>> together. Otherwise it successfully returns with an incoming kevent
>> whose "flags" is EV_ERROR and "data" contains an errno. The I/O
>> manager has always been passing a non-empty event buffer until the
>> commit e5f5cfcd, while it wasn't (and still isn't) checking if a
>> received event in fact represents an error. That is, the KQueue
>> backend asks the kernel to monitor the stdin's readability. The kernel
>> then immediately delivers an event saying ENOTSUP. The KQueue backend
>> thinks "Hey, the stdin is now readable!" so it invokes a callback
>> associated with the fd. The thread which called "threadWaitRead" is
>> now awakened and performs a supposedly non-blocking read on the fd,
>> which in fact blocks but works anyway.
>>
>> However the situation has changed since the commit e5f5cfcd. The I/O
>> manager now registers fds without passing an incoming event buffer, so
>> kevent(2) no longer successfully delivers an error event instead it
>> directly returns -1 with errno set to ENOTSUP, hence the "Operation
>> not supported" exception.
>>
>
> One thing we can easily do is have the new IO manager pass in an incoming
> event buffer so we can distinguish this case and treat it exactly as the
> old IO manager did. Then this exception would not occur and the waiting
> thread would just continue to retry the read until it succeeded. This is
> inefficient, but is no worse than the old IO manager.
>
> Note that there is nothing about the IO manager that would cause the
> awakened thread to make a blocking read call - that is determined entirely
> by how the thread performs the read.  For example, if you take a look at
> the code in the network package, you will see that whenever a socket is
> created, the socket is put in non-blocking mode. Then the code to receive
> from a socket does a recv() which is now non-blocking and calls
> threadWaitRead if that would block.
>
> Going beyond this immediate fix, we can try to really tackle the problem.
> The simplest and arguably safest approach is probably to just use select
> for everything (on os x). That would have the downside of limiting the
> number of files that programs can wait on to 1024 per capability.
>
> A better approach would be to try to register with kqueue and then if it
> doesn't work, register it with an IO manager thread that is using select
> for the backend. We can probably reuse the IO manager thread that is
> watching timers for this purpose. With the parallel IO manager, we no
> longer use it to wait on files, but we certainly could do that. That would
> save us from adding more threads.  By only failing over to the
> manager-thread-using-select-backend if kqueue fails, we don't need to
> maintain a list of files types that kqueue works for, which might be a pain
> to maintain reliably.
>
> -Andi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130316/8c5a01d6/attachment-0001.htm>


More information about the ghc-devs mailing list