[Git][ghc/ghc][wip/T25377] 2 commits: rts: Add assertions surrounding tracking of StgAsyncIOResult

Ben Gamari (@bgamari) gitlab at gitlab.haskell.org
Wed Oct 16 18:20:47 UTC 2024



Ben Gamari pushed to branch wip/T25377 at Glasgow Haskell Compiler / GHC


Commits:
13c20249 by Ben Gamari at 2024-10-16T12:10:25-04:00
rts: Add assertions surrounding tracking of StgAsyncIOResult

We were previously pretty cavalier in handling StgAsyncIOResults which
has proven in #25377 to be problematic. We now initialize the
AsyncResult field of the `stg_block_async_void` frame to NULL when
pushing, assert that it is NULL in `awaitRequest` and that it is not
NULL in `stg_block_async_void`.

- - - - -
659ab83c by Ben Gamari at 2024-10-16T14:17:51-04:00
rts: Fix resumption of interrupted async IO operations

Issue #25377 revealed a somewhat subtle consequence of thunk resumption
after asynchronous exception suspension: The mutator may enter the
asynchronous I/O completion stack frames (e.g. `stg_block_async`) despite
the associated I/O not having been completed. We now handle this case by
declaring the operation has having failed.

This may result in `threadDelay` delaying less than the requested
duration but I don't consider this to be problematic since it is
impossible to observe this issue without rather dodgy use of
`unsafePerformIO`.

- - - - -


2 changed files:

- rts/HeapStackCheck.cmm
- rts/win32/AsyncMIO.c


Changes:

=====================================
rts/HeapStackCheck.cmm
=====================================
@@ -706,14 +706,75 @@ stg_block_throwto (P_ tso, P_ exception)
 }
 
 #if defined(mingw32_HOST_OS)
+#define ERROR_OPERATION_ABORTED 0x3e3
+
+/*
+ * Note [Resuming of interrupted asynchronous IO operations]
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * The stg_block_async stack frames must account for the possibility
+ * that they will be entered without their associated I/O having
+ * been completed. This can happen due to evaluation suspension due to
+ * exception unwinding (see Note []) and manifests as the `ares` field
+ * not being initialized.
+ *
+ * One case where this was noticed is #25377, which involved the testsuite
+ * test `T25300`:
+ *
+ *     x :: Int
+ *     x = unsafePerformIO $ do
+ *       putStrLn "entered x"
+ *       threadDelay 1000000000
+ *       putStrLn "leaving x"
+ *       return 42
+ *
+ *     main :: IO ()
+ *     main = do
+ *       t <- forkIO $ evaluate x >> return ()
+ *       threadDelay 1000 -- ensure that the forked thread hits the `threadDelay` in `x`
+ *       killThread t
+ *       evaluate x
+ *
+ * The failure develops as follows:
+
+ *   1. TSO1 enter main and forks `evaluate x` (TSO2)
+ *   2. TSO2 enters `x`, pushing an update frame
+ *   3. TSO2 blocks due to threadDelay, which pushes a `stg_block_async_void` frame
+ *      to its stack and yields to the scheduler
+ *   4. TSO1 continues, `killThread` throws an asynchronous `KillThread`
+ *      exception to TSO2
+ *   5. TSO2 unwinds its stack up to the update frame, suspending the unwound
+ *      stack in an `AP_STACK` and updating `x` to resume from the suspended stack
+ *   6. TSO2 dies
+ *   7. TSO1 continues execution with `evaluate x`
+ *   8. TSO1 enters `x`
+ *   9. TSO1 resumes at the head of the suspended stack (the
+ *      `stg_block_async_void` frame) despite the fact that the `StgAsyncIOResult`
+ *      field was never initialized.
+ *   10. stg_block_async_void attempted to `free()` an invalid `StgAsyncIOResult*`
+ *
+ * Fixing this in stg_block_async_void is quite straightforward as it is only used by
+ * threadDelay: simply don't attempt to `free()` a NULL `StgAsyncIOResult*`.
+ * However, we need to be more careful in the case of `stg_block_async` which is used
+ * for asynchronous IO; we don't want to mislead the caller into believing that their IO
+ * operation succeeded. For this reason we take care to return an error
+ * (`ERROR_OPERATION_ABORTED`) in this case.
+ *
+ */
+
 INFO_TABLE_RET ( stg_block_async, RET_SMALL, W_ info_ptr, W_ ares )
     return ()
 {
     W_ len, errC;
 
-    len = TO_W_(StgAsyncIOResult_len(ares));
-    errC = TO_W_(StgAsyncIOResult_errCode(ares));
-    ccall free(ares "ptr");
+    if (ares != 0) {
+        len = TO_W_(StgAsyncIOResult_len(ares));
+        errC = TO_W_(StgAsyncIOResult_errCode(ares));
+        ccall free(ares "ptr");
+    } else {
+        // See Note [Resuming interrupted asynchronous IO operations]
+        len = 0;
+        errC = ERROR_OPERATION_ABORTED;
+    }
     return (len, errC);
 }
 
@@ -730,7 +791,10 @@ stg_block_async
 INFO_TABLE_RET ( stg_block_async_void, RET_SMALL, W_ info_ptr, W_ ares )
     return ()
 {
-    ccall free(ares "ptr");
+    if (ares != 0) {
+        // See Note [Resuming interrupted asynchronous IO operations]
+        ccall free(ares "ptr");
+    }
     return ();
 }
 
@@ -738,6 +802,7 @@ stg_block_async_void
 {
     Sp_adj(-2);
     Sp(0) = stg_block_async_void_info;
+    Sp(1) = 0;  // this is the StgAsyncIOResult, which will be filled in by awaitRequests.
     BLOCK_GENERIC;
 }
 


=====================================
rts/win32/AsyncMIO.c
=====================================
@@ -203,6 +203,22 @@ shutdownAsyncIO(bool wait_threads)
     OS_CLOSE_LOCK(&queue_lock);
 }
 
+static void
+assertValidBlockAsyncFrame(StgPtr sp) {
+#if defined(DEBUG)
+    StgPtr info = sp[0];
+    if (info != &stg_block_async_void_info &&
+        info != &stg_block_async_info) {
+        barf("assertValidAsyncFrame: invalid frame type");
+    }
+    if (sp[1] != 0) {
+        barf("assertValidAsyncFrame: non-null StgAsyncIOResult");
+    }
+#else
+    (void) sp;
+#endif
+}
+
 /*
  * Function: awaitRequests(wait)
  *
@@ -326,6 +342,7 @@ start:
                         // stg_block_async_info stack frame, because
                         // the block_info field will be overwritten by
                         // pushOnRunQueue().
+                        assertValidBlockAsyncFrame(tso->stackobj->sp);
                         tso->stackobj->sp[1] = (W_)tso->block_info.async_result;
                         pushOnRunQueue(&MainCapability, tso);
                         break;



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/916fd6755ccd01cb4c85db0a12f00c789100c5dd...659ab83c0c8697708820919974cad458558fcadd

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/916fd6755ccd01cb4c85db0a12f00c789100c5dd...659ab83c0c8697708820919974cad458558fcadd
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20241016/08a6d5c0/attachment-0001.html>


More information about the ghc-commits mailing list