I/O manager: relying solely upon kqueue is not a safe way to go

Andreas Voellmy andreas.voellmy at gmail.com
Sat Mar 16 22:24:02 CET 2013


I created a ticket

http://hackage.haskell.org/trac/ghc/attachment/ticket/7773/

for the problem reported by PHO.


On Sat, Mar 16, 2013 at 5:07 PM, Andreas Voellmy
<andreas.voellmy at gmail.com>wrote:

> I started to look into fixing this issue, but HEAD no longer compiles for
> me. Here is the build error I get (on os x 10.8.2):
>
> $ "inplace/bin/ghc-stage1" -static  -H32m -O    -package-name
> ghc-prim-0.3.1.0 -hide-all-packages -i -ilibraries/ghc-prim/.
> -ilibraries/ghc-prim/dist-install/build
> -ilibraries/ghc-prim/dist-install/build/autogen
> -Ilibraries/ghc-prim/dist-install/build
> -Ilibraries/ghc-prim/dist-install/build/autogen -Ilibraries/ghc-prim/.
>  -optP-include
> -optPlibraries/ghc-prim/dist-install/build/autogen/cabal_macros.h -package
> rts-1.0 -split-objs -package-name ghc-prim -XHaskell98 -XCPP -XMagicHash
> -XForeignFunctionInterface -XUnliftedFFITypes -XUnboxedTuples
> -XEmptyDataDecls -XNoImplicitPrelude -O2  -no-user-package-db -rtsopts
>  -dynamic-too -odir libraries/ghc-prim/dist-install/build -hidir
> libraries/ghc-prim/dist-install/build -stubdir
> libraries/ghc-prim/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c
> libraries/ghc-prim/./GHC/IntWord64.hs -o
> libraries/ghc-prim/dist-install/build/GHC/IntWord64.o -dyno
> libraries/ghc-prim/dist-install/build/GHC/IntWord64.dyn_o"inplace/bin/ghc-stage1"
> -static  -H32m -O    -package-name ghc-prim-0.3.1.0 -hide-all-packages -i
> -ilibraries/ghc-prim/. -ilibraries/ghc-prim/dist-install/build
> -ilibraries/ghc-prim/dist-install/build/autogen
> -Ilibraries/ghc-prim/dist-install/build
> -Ilibraries/ghc-prim/dist-install/build/autogen -Ilibraries/ghc-prim/.
>  -optP-include
> -optPlibraries/ghc-prim/dist-install/build/autogen/cabal_macros.h -package
> rts-1.0 -split-objs -package-name ghc-prim -XHaskell98 -XCPP -XMagicHash
> -XForeignFunctionInterface -XUnliftedFFITypes -XUnboxedTuples
> -XEmptyDataDecls -XNoImplicitPrelude -O2  -no-user-package-db -rtsopts
>  -dynamic-too -odir libraries/ghc-prim/dist-install/build -hidir
> libraries/ghc-prim/dist-install/build -stubdir
> libraries/ghc-prim/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c
> libraries/ghc-prim/./GHC/IntWord64.hs -o
> libraries/ghc-prim/dist-install/build/GHC/IntWord64.o -dyno
> libraries/ghc-prim/dist-install/build/GHC/IntWord64.dyn_o
> /var/folders/_c/4n2x0zfx7mx5gk_46pdxn3pm0000gn/T/ghc66530_0/ghc66530_1.split__2.s:unknown:missing
> indirect symbols for section (__DATA,__la_sym_ptr2)
>
>
> On Sat, Mar 16, 2013 at 11:08 AM, Andreas Voellmy <
> andreas.voellmy at gmail.com> wrote:
>
>>
>>
>>
>> On Fri, Mar 15, 2013 at 3:54 PM, PHO <pho at cielonegro.org> wrote:
>>
>>> I found the HEAD stopped working on MacOS X 10.5.8 since the parallel
>>> I/O manager got merged to HEAD. Stage-2 compiler successfully builds
>>> (including Language.Haskell.TH.Syntax contrary to the report by Kazu
>>> Yamamoto) but the resulting binary is very unstable especially for
>>> ghci:
>>>
>>>   % inplace/bin/ghc-stage2  --interactive
>>>   GHCi, version 7.7.20130313: http://www.haskell.org/ghc/  :? for help
>>>   Loading package ghc-prim ... linking ... done.
>>>   Loading package integer-gmp ... linking ... done.
>>>   Loading package base ... linking ... done.
>>>   Prelude>
>>>   <stdin>: hGetChar: failed (Operation not supported)
>>>
>>> So I took a dtruss log and found it was kevent(2) that returned
>>> ENOTSUP. GHC.Event.KQueue was just registering the stdin for
>>> EVFILT_READ, whose type was of course tty, and then kevent(2) said
>>> "tty is not supported". Didn't the old I/O manager do the same thing?
>>> Why was it working then?
>>>
>>> After a hard investigation, I concluded that the old I/O manager was
>>> not really working. It just looked fine but in fact wasn't. Here's an
>>> explanation: If a fd to be registered is unsupported by kqueue,
>>> kevent(2) returns -1 iff no incoming event buffer is passed
>>> together. Otherwise it successfully returns with an incoming kevent
>>> whose "flags" is EV_ERROR and "data" contains an errno. The I/O
>>> manager has always been passing a non-empty event buffer until the
>>> commit e5f5cfcd, while it wasn't (and still isn't) checking if a
>>> received event in fact represents an error. That is, the KQueue
>>> backend asks the kernel to monitor the stdin's readability. The kernel
>>> then immediately delivers an event saying ENOTSUP. The KQueue backend
>>> thinks "Hey, the stdin is now readable!" so it invokes a callback
>>> associated with the fd. The thread which called "threadWaitRead" is
>>> now awakened and performs a supposedly non-blocking read on the fd,
>>> which in fact blocks but works anyway.
>>>
>>> However the situation has changed since the commit e5f5cfcd. The I/O
>>> manager now registers fds without passing an incoming event buffer, so
>>> kevent(2) no longer successfully delivers an error event instead it
>>> directly returns -1 with errno set to ENOTSUP, hence the "Operation
>>> not supported" exception.
>>>
>>
>> One thing we can easily do is have the new IO manager pass in an incoming
>> event buffer so we can distinguish this case and treat it exactly as the
>> old IO manager did. Then this exception would not occur and the waiting
>> thread would just continue to retry the read until it succeeded. This is
>> inefficient, but is no worse than the old IO manager.
>>
>> Note that there is nothing about the IO manager that would cause the
>> awakened thread to make a blocking read call - that is determined entirely
>> by how the thread performs the read.  For example, if you take a look at
>> the code in the network package, you will see that whenever a socket is
>> created, the socket is put in non-blocking mode. Then the code to receive
>> from a socket does a recv() which is now non-blocking and calls
>> threadWaitRead if that would block.
>>
>> Going beyond this immediate fix, we can try to really tackle the problem.
>> The simplest and arguably safest approach is probably to just use select
>> for everything (on os x). That would have the downside of limiting the
>> number of files that programs can wait on to 1024 per capability.
>>
>> A better approach would be to try to register with kqueue and then if it
>> doesn't work, register it with an IO manager thread that is using select
>> for the backend. We can probably reuse the IO manager thread that is
>> watching timers for this purpose. With the parallel IO manager, we no
>> longer use it to wait on files, but we certainly could do that. That would
>> save us from adding more threads.  By only failing over to the
>> manager-thread-using-select-backend if kqueue fails, we don't need to
>> maintain a list of files types that kqueue works for, which might be a pain
>> to maintain reliably.
>>
>> -Andi
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130316/535e016b/attachment.htm>


More information about the ghc-devs mailing list