[GHC] #9494: Probable data corruption with GHCi 7.8.* and Zlib

GHC ghc-devs at haskell.org
Thu Aug 21 14:46:32 UTC 2014


#9494: Probable data corruption with GHCi 7.8.* and Zlib
-------------------------------------+-------------------------------------
       Reporter:  nominolo           |                   Owner:
           Type:  bug                |                  Status:  new
       Priority:  normal             |               Milestone:
      Component:  GHCi               |                 Version:  7.8.3
       Keywords:                     |        Operating System:
   Architecture:  Unknown/Multiple   |  Unknown/Multiple
     Difficulty:  Unknown            |         Type of failure:  Incorrect
     Blocked By:                     |  result at runtime
Related Tickets:                     |               Test Case:
                                     |                Blocking:
                                     |  Differential Revisions:
-------------------------------------+-------------------------------------
 The following program causes Zlib data corruption errors when run from
 inside GHCi. It launches two threads which then concurrently read a file,
 compress it, and immediately decompress it. You need libraries `zlib`,
 `SHA`, and `async`.

 {{{
 module Main where

 import qualified Codec.Compression.Zlib   as Zlib
 import qualified Data.ByteString.Lazy     as BL
 import qualified Data.ByteString.Internal as BI

 import Control.Exception        (bracket)
 import Control.Concurrent
 import Control.Monad
 import Control.Exception        ( evaluate)
 import Data.Digest.Pure.SHA     ( sha1)  -- from the 'SHA' package
 import Control.Concurrent.Async ( mapConcurrently)
 import System.Mem               ( performGC )

 import Debug.Trace

 test :: Int -> IO String
 test _ = do
     tid <- myThreadId

     -- testdata is: dd if=/dev/urandom of=/tmp/testdata bs=100k count=100
     -- Could also be replaced by: (BL.take (10^7) "/dev/urandom")
     dat <- BL.readFile "/tmp/testdata"

     let cbuf = Zlib.compress $ traceChunks tid $ dat
     s <- evaluate $ sha1 $ Zlib.decompress $ cbuf
     return $ show s
   where
     -- We used this to check whether buffers were reused by different
 threads, but that
     -- doesn't seem to be the case. Removing the call to traceChunks,
 however, makes it
     -- harder to reproduce possibly because of scheduler effects. In a
 much larger program
     -- it could be reproduced more easily without the trace, but in this
 small example
     -- tracing seems to cause the right amount of nondeterminism.
     traceChunks tid bs =
         BL.fromChunks
       $ zipWith (\n x -> trace (show tid ++ ":" ++ showBS x) x) [1..]
       $ BL.toChunks bs

     showBS (BI.PS ptr off len) = show ptr


 main = do
     r <- withGCThread $ mapConcurrently (test) ([1..2] :: [Int])
     putStrLn $ show $ r
   where
     -- Regularly forcing the GC makes the test-case more reproducible.
     withGCThread io =
         bracket (forkIO $ forever $ performGC >> threadDelay 1000)
                 killThread
                 (const io)
 }}}

 The output should be something like:
 {{{
 ...
 ThreadId 51:0x00000001091ee010
 ThreadId 49:0x00000001091a7010
 ...
 ThreadId 49:0x000000010986f010
 zlib-test-case.hs: user error (Codec.Compression.Zlib: incorrect data
 check)
 }}}

 You'll get different Zlib errors, depending on where it detects the
 inconsistency.  Sometimes Zlib doesn't throw an error, but the checksums
 are different.

 So far we've only been able to reproduce this using GHCi 7.8.3 on both
 Linux (NixOS) and Mac. We haven't been able to trigger it with a compiled
 executable, nor with GHCi 7.6.3. It '''was''' reproducable with HEAD from
 Jan 30 (I had that lying around somewhere).

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9494>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list