[commit: ghc] master: Add delete retry loop. [ci skip] (1f366b8)

git at git.haskell.org git at git.haskell.org
Sat Jan 28 04:23:51 UTC 2017


Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/1f366b8d15feaa05931bd2d81d8b0c5bae92f3b8/ghc

>---------------------------------------------------------------

commit 1f366b8d15feaa05931bd2d81d8b0c5bae92f3b8
Author: Tamar Christina <tamar at zhox.com>
Date:   Sat Jan 28 04:19:02 2017 +0000

    Add delete retry loop. [ci skip]
    
    Summary:
    On Windows we have to retry the delete a couple of times.
    The reason for this is that a `FileDelete` command just marks a
    file for deletion. The file is really only removed when the last
    handle to the file is closed. Unfortunately there are a lot of
    system services that can have a file temporarily opened using a shared
    readonly lock, such as the built in AV and search indexer.
    
    We can't really guarantee that these are all off, so what we can do is
    whenever after a `rmtree` the folder still exists to try again and wait a bit.
    
    Based on what I've seen from the tests on CI server, is that this is relatively rare.
    So overall we won't be retrying a lot. If after a reasonable amount of time the folder is
    still locked then abort the current test by throwing an exception, this so it won't fail
    with an even more cryptic error.
    
    The issue is that these services often open a file using `FILE_SHARE_DELETE` permissions.
    So they can seemingly be removed, and for most intended purposes they are, but recreating
    the file with the same name will fail as the FS will prevent data loss.
    
    The MSDN docs for `DeleteFile` says:
    
    ```
    The DeleteFile function marks a file for deletion on close.
    Therefore, the file deletion does not occur until the last handle
    to the file is closed. Subsequent calls to CreateFile to open the
    file fail with ERROR_ACCESS_DENIED.
    ```
    
    Retrying seems to be a common pattern, SQLite has it in their driver
    http://www.sqlite.org/src/info/89f1848d7f
    
    The only way to avoid this is to run each way of a test in it's own folder.
    This would also have the added bonus of increased parallelism.
    
    Reviewers: austin, bgamari
    
    Reviewed By: bgamari
    
    Subscribers: thomie, #ghc_windows_task_force
    
    Differential Revision: https://phabricator.haskell.org/D2936
    
    GHC Trac Issues: #12661, #13162


>---------------------------------------------------------------

1f366b8d15feaa05931bd2d81d8b0c5bae92f3b8
 testsuite/driver/testlib.py | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/testsuite/driver/testlib.py b/testsuite/driver/testlib.py
index c0135f0..78e2c6f 100644
--- a/testsuite/driver/testlib.py
+++ b/testsuite/driver/testlib.py
@@ -1893,26 +1893,41 @@ def find_expected_file(name, suff):
 
 if config.msys:
     import stat
+    import time
     def cleanup():
         testdir = getTestOpts().testdir
-
+        max_attemps = 5
+        retries = max_attemps
         def on_error(function, path, excinfo):
             # At least one test (T11489) removes the write bit from a file it
             # produces. Windows refuses to delete read-only files with a
             # permission error. Try setting the write bit and try again.
-            if excinfo[1].errno == 13:
-                os.chmod(path, stat.S_IWRITE)
-                function(path)
-
-        shutil.rmtree(testdir, ignore_errors=False, onerror=on_error)
-
-        if os.path.exists(testdir):
-            # And now we try to cleanup the folder again, since the above
-            # Would have removed the problematic file(s), but not the folder.
-            # The onerror doesn't seem to be raised during the tree walk, only
-            # afterwards to report the failures.
-            # See https://bugs.python.org/issue8523 and https://bugs.python.org/issue19643
-            shutil.rmtree(testdir, ignore_errors=False)
+            os.chmod(path, stat.S_IWRITE)
+            function(path)
+
+        # On Windows we have to retry the delete a couple of times.
+        # The reason for this is that a FileDelete command just marks a
+        # file for deletion. The file is really only removed when the last
+        # handle to the file is closed. Unfortunately there are a lot of
+        # system services that can have a file temporarily opened using a shared
+        # readonly lock, such as the built in AV and search indexer.
+        #
+        # We can't really guarantee that these are all off, so what we can do is
+        # whenever after a rmtree the folder still exists to try again and wait a bit.
+        #
+        # Based on what I've seen from the tests on CI server, is that this is relatively rare.
+        # So overall we won't be retrying a lot. If after a reasonable amount of time the folder is
+        # still locked then abort the current test by throwing an exception, this so it won't fail
+        # with an even more cryptic error.
+        #
+        # See Trac #13162
+        while retries > 0 and os.path.exists(testdir):
+            time.sleep((max_attemps-retries)*6)
+            shutil.rmtree(testdir, onerror=on_error, ignore_errors=False)
+            retries=-1
+
+        if retries == 0 and os.path.exists(testdir):
+            raise Exception("Unable to remove folder '" + testdir + "'. Unable to start current test.")
 else:
     def cleanup():
         testdir = getTestOpts().testdir



More information about the ghc-commits mailing list