New Windows I/O manager in GHC 8.12

Simon Peyton Jones simonpj at microsoft.com
Mon Jul 20 14:28:24 UTC 2020


Tamar, I salute you!  This is a big piece of work – thank you!

Simon

From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Phyx
Sent: 17 July 2020 16:04
To: ghc-devs at haskell.org Devs <ghc-devs at haskell.org>
Subject: New Windows I/O manager in GHC 8.12

Hi All,

In case you've missed it, about 150 or so commits were committed to master
yesterday.  These commits add WinIO (Windows I/O) to GHC.  This is a new I/O
manager that is designed for the native Windows I/O subsystem instead of
relying on the broken posix-ish compatibility layer that MIO used.

This is one of 3 big patches I have been working on for years now..

So before I continue on why WinIO was made I'll add a TL;DR;

WinIO adds an internal API break compared to previous GHC releases.  That is
the internal code was modified to support a completely asynchronous I/O system.

What this means is that we have to keep track of the file pointer offset which
previously was done by the C runtime.  This is because in async I/O you cannot
assume the offset to be at any given location.

What does this mean for you? Very little. If you did not use internal GHC I/O code.
In particular if you haven't used Buffer, BufferIO and RawIO. If you have you will
to explicitly add support for GHC 8.12+.

Because FDs are a Unix concept and don't behave as you would expect on Windows, the
new I/O manager also uses HANDLE instead of FD. This means that any library that has
used the internal GHC Fd type won't work with WinIO. Luckily the number of libraries
that have seems quite low. If you can please stick to the external Handle interface
for I/O functions.

The boot libraries have been updated, and in particular process *requires* the version
that is shipped with GHC.  Please respect the version bounds here!  I will be writing
a migration guide for those that need to migrate code.  The amount of work is usually
trivial as Base provides shims to do most of the common things you would have used Fd for.

Also if I may make a plea to GHC developers.. Do not add non-trivial implementations
in the external exposed modules (e.g. System.xxx, Data.xxx) but rather add them to internal
modules (GHC.xxx) and re-export them from the external modules.  This allows us to avoid
import cycles inside the internal modules :)

--

So why WinIO? Over the years a number of hard to fix issues popped up on Windows, including
proper Unicode console I/O, cooked inputs, ability to cancel I/O requests. This also allows libraries like Brick to work on Windows without re-inventing the wheel or have to hide their I/O from the I/O manager.

In order to attempt to do some of these with MIO layer upon layers of hacks were added.  This means that things sometimes worked.., but when it didn't was rather unpredictable.  Some of the issues were simply unfixable with MIO.  I will be making some posts about how WinIO works (and also archiving them on the wiki don't worry :)) but for now some highlights:

WinIO is 3 years of work, First started by Joey Hess, then picked up by Mikhail Glushenkov before landing at my feet.  While the majority has been rewritten their work did provide a great jumping off point so thanks!  Also thanks to Ben and AndreasK for helping me get it over the line.. As you can imagine I was exhausted by this point :).

Some stats: ~8000 new lines and ~1100 removed ones spread over 130+ commits (sorry this was the smallest we could get it while not losing some historical context) and with over 153 files changed not counting the changes to boot libraries.

It Fixes #18307, #17035, #16917, #15366, #14530, #13516, #13396, #13359, #12873, #12869, #11394, #10542, #10484, #10477, #9940, #7593, #7353, #5797, #5305, #4471, #3937, #3081, #12117, #2408, #10956, #2189
(but only on native windows consoles, so no msys shells) and #806 which is 14 years old!

WinIO is a dynamic choice, so you can switch between I/O managers using the RTS flag --io-manager=[native|posix].

On non-Windows native is the same as posix.

The chosen Async interface for this implementation is using Completion Ports.

The I/O manager uses a new interface added in Windows Vista called GetQueuedCompletionStatusEx which allows us to service multiple request interrupts in one go.

Some highlights:

* Drops Windows Vista support
  Vista is out of extended support as of 2017. The new minimum is Windows 7.  This allows us to use much more efficient OS provided abstractions.

* Replace Events and Monitor locks with much faster and efficient Conditional Variables and SlimReaderWriterLocks.
* Change GHC's Buffer and I/O structs to support asynchronous operation by not relying on the OS managing File Offset.
* Implement a new command line flag +RTS --io-manager=[native|posix] to control which I/O manager is used.
* Implement a new Console I/O interface supporting much faster reads/writes and unicode output correctly.  Also supports things like cooked input etc.
* In new I/O manager if the user still has their code-page set to OEM, then we use UTF-8 by default. This allows Unicode to work correctly out of the box.
* Add Atomic Exchange PrimOp and implement Atomic Ptr exchanges.
* Flush event logs eagerly as to not rely on finalizers running.
* A lot of refactoring and more use of hsc2hs to share constants
* Control aborts Ctrl+C should be a bit more reliable.
* Add a new IOPort primitive that should be only used for these I/O operations. Essentially an IOPort is based on an MVar with the following major
  differences:
  - Does not allow multiple pending writes. If the port is full a second write is just discarded.
  - There is no deadlock avoidance guarantee. If you block on an IOPort and your Haskell application does not have any work left to do the whole application is
stalled.  In the threaded RTS we just continue idling, in the non-threaded rts the scheduler is blocked.

* Support various optimizations in the Windows I/O manager such as skipping I/O Completion if the request finished synchronously etc.
* The I/O manager is now agnostic to the handle type. i.e. There is no socket specific code in the manager.  This is now all pushed to the network library. Completely de-coupling these.
* Unified threaded and non-threaded I/O code. The only major difference is where event loop is driven from and that the non-threaded rts will always use a single OS thread to service requests. We cannot use more as there are no rts locks to make concurrent modifications safe.

Cheers,
Tamar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200720/c754b194/attachment.html>


More information about the ghc-devs mailing list