Subtle bug in Data.ByteString and/or GHC (perhaps)

Nils Anders Danielsson nad at cs.chalmers.se
Mon Jun 4 14:36:15 EDT 2007


Hi,

The following program prints different outputs depending on whether
optimisations are turned on or not (using GHC 6.6.1 and binary 0.3):

  module Bug (main) where

  import Data.Binary
  import Data.Binary.Put
  import Data.Binary.Get
  import Data.Binary.Builder
  import qualified Data.ByteString.Lazy as BL
  import qualified Data.ByteString.Base as BB

  append' :: BL.ByteString -> BL.ByteString -> BL.ByteString
  append' = BL.append
  -- append' (BB.LPS xs) (BB.LPS ys) = BB.LPS (xs ++ ys)

  encode' :: Char -> BL.ByteString
  encode' x = encode (st `seq` 'a') `append'` toLazyByteString builder
    where (st, builder) = unPut (put x)

  decode' :: BL.ByteString -> Char
  decode' s = decode s'
    where (_, s', _) = runGetState (get :: Get Char) s 0

  main :: IO ()
  main = mapM_ (print . test) "abc"
    where test x = decode' (encode' x) == x

  $ ghc --make -O Bug.hs -main-is Bug.main -o bug
  [...]
  $ ./bug
  True
  False
  False
  $ rm Bug.o; ghc --make Bug.hs -main-is Bug.main -o bug
  [...]
  $ ./bug
  True
  True
  True

If the commented-out version of append' (which is a bit lazier than
the other one) is used instead, then this problem disappears. The
commented-out version is the one used in the darcs version of
Data.ByteString, so the problem above has, in a sense, already been
fixed.

However, it is disconcerting that the result of a program can depend
on optimisation flags, and changing the strictness of a function
should only make a (pure) program more or less defined, not change a
result from True to False. Hence I wonder if anyone knows the real
cause of this bug, and what the risk of encountering similar bugs in
the future is. It took lots of time to find and fix the problem above
(which of course came up in a larger piece of code), so I hope that I
won't encounter similar problems again.

My guess is that the problem has something to do with unsound rewrite
rules in Data.ByteString, by the way. Or maybe the problem lies in
GHC's optimiser.

-- 
/NAD



More information about the Libraries mailing list