[GHC] #15300: Unboxed Sums Crash

GHC ghc-devs at haskell.org
Tue Jul 3 15:21:57 UTC 2018


#15300: Unboxed Sums Crash
-------------------------------------+-------------------------------------
        Reporter:  andrewthad        |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.5
      Resolution:                    |             Keywords:  UnboxedSums
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by bgamari:

Old description:

> I've made it a little further in my experiments with unboxed tuples in
> the `packed` library. However, I've run into another issue that I
> strongly suspect is the result of bad behavior of unboxed tuples. To
> replicate this issue (with GHC 8.4.3), do the following:
>
> {{{
> git clone https://github.com/andrewthad/packed
> cd packed
> cabal new-build
> }}}
>
> We use `cabal new-build` for its side effect of creating a
> `.ghc.environment.xyz` file. Now, create a minimal example in the
> directory called `eol.hs` with the following contents:
>
> {{{
> import Packed.Bytes.Parser (Parser)
> import Data.Word
> import Packed.Bytes (Bytes)
> import GHC.Exts (RealWorld)
> import Packed.Bytes.Stream.IO (ByteStream)
> import qualified Packed.Bytes as B
> import qualified Data.Char
> import qualified Packed.Bytes.Parser as P
> import qualified Packed.Bytes.Stream.IO as Stream
>
> main :: IO ()
> main = do
>   r <- runExampleParser
>     ( do P.takeBytesUntilEndOfLineConsume
>          P.takeBytesUntilEndOfLineConsume
>          P.takeBytesUntilEndOfLineConsume
>     ) (foldMap Stream.singleton (map charToWord8
> "the\nemporium\rhas\narrived"))
>   print r
>
> runExampleParser :: Parser e () a -> ByteStream RealWorld -> IO (Maybe a,
> Maybe String)
> runExampleParser parser stream = do
>   P.Result mleftovers r _ <- P.parseStreamIO stream () parser
>   mextra <- case mleftovers of
>     Nothing -> return Nothing
>     Just (P.Leftovers chunk remainingStream) -> do
>       bs <- Stream.unpack remainingStream
>       return (Just (map word8ToChar (B.unpack chunk ++ bs)))
>   return (either (const Nothing) Just r,mextra)
>
> word8ToChar :: Word8 -> Char
> word8ToChar = Data.Char.chr . fromIntegral
>
> charToWord8 :: Char -> Word8
> charToWord8 = fromIntegral . Data.Char.ord
>
> s2b :: String -> Bytes
> s2b = B.pack . map charToWord8
>
> c2w :: Char -> Word8
> c2w = charToWord8
> }}}
>
> Finally, build this with `ghc -O2 eol.hs`, and then run the executable
> this produces to get the following:
>
> {{{
> (Nothing,Just "\rhas\narrived")
> eol: internal error: stg_ap_n_ret
>     (GHC version 8.4.3 for x86_64_unknown_linux)
>     Please report this as a GHC bug:
> http://www.haskell.org/ghc/reportabug
> Aborted (core dumped)
> }}}
>
> Things worth noting:
>
> 1. I think the program fails in the final GC that runs right before the
> program terminates. We can see that it produces a correct result of
> `(Nothing,Just "\rhas\narrived")`, but something on the heap has
> definitely been corrupted.
> 2. This only happens with `-O2` turned on.
> 3. This only happens when the parser does not successfully parse its
> input.
>
> Here's some more context around this. I've been working on a parser that
> uses unboxed sums instead of continuations. After #15038 was fixed,
> everything had been going well. Then, I took the parser type and added
> two things to it: (1) context and (2) typed errors. Context is basically
> like throwing `StateT` on top and errors are like throwing `ExceptT` on
> top. After this, everything in my test suite kept working except for a
> single test, which now consistently crashes my test suite. So, I
> originally had this:
>
> {{{
> type Bytes# = (# ByteArray#, Int#, Int# #)
> type Maybe# (a :: TYPE r) = (# (# #) | a #)
> type Leftovers# s = (# Bytes# , ByteStream s #)
> type Result# s (r :: RuntimeRep) (a :: TYPE r) =
>   (# Maybe# (Leftovers# s), Maybe# a #)
> newtype ParserLevity (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
>   { getParserLevity :: forall s.
>        Maybe# (Leftovers# s)
>     -> State# s
>     -> (# State# s, Result# s r a #)
>   }
> }}}
>
> But I changed it to this:
>
> {{{
> type Bytes# = (# ByteArray#, Int#, Int# #)
> type Maybe# (a :: TYPE r) = (# (# #) | a #)
> type Either# a (b :: TYPE r) = (# a | b #)
> type Leftovers# s = (# Bytes# , ByteStream s #)
> type Result# e c s (r :: RuntimeRep) (a :: TYPE r) =
>   (# Maybe# (Leftovers# s), Either# (Maybe e) a, c #)
>
> newtype ParserLevity e c (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
>   { getParserLevity :: forall s.
>        c
>     -> Maybe# (Leftovers# s)
>     -> State# s
>     -> (# State# s, Result# e c s r a #)
>   }
> }}}
>
> Specifically, the function causing trouble is (as currently defined):
>
> {{{
> {-# NOINLINE takeBytesUntilEndOfLineConsumeUnboxed #-}
> takeBytesUntilEndOfLineConsumeUnboxed :: ParserLevity e c BytesRep Bytes#
> takeBytesUntilEndOfLineConsumeUnboxed = ParserLevity (go (# (# #) | #))
> where
>   go :: Maybe# Bytes# -> c -> Maybe# (Leftovers# s) -> State# s -> (#
> State# s, Result# e c s BytesRep Bytes# #)
>   go !_ c (# (# #) | #) s0 = (# s0, (# (# (# #) | #), (# Nothing | #), c
> #) #)
>   go !mbytes c (# | (# bytes0@(# arr0, off0, len0 #),
> !stream0@(ByteStream streamFunc) #) #) s0 = case BAW.findAnyByte2 (I#
> off0) (I# len0) 10 13 (ByteArray arr0) of
>     Nothing -> case streamFunc s0 of
>       (# s1, r #) -> go (# | appendMaybeBytes mbytes bytes0 #) c r s1
>     Just (I# ix, W8# theByte) -> case theByte of
>       10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 1# ) bytes0,
> stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #)
> #), c #) #)
>       -- second case means it was 13
>       _ -> case ix <# (off0 +# len0 -# 1#) of
>         1# -> case indexWord8Array# arr0 (ix +# 1# ) of
>           10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 2# )
> bytes0, stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -#
> off0 #) #), c #) #)
>           _ -> (# s0, (# (# | (# unsafeDrop# (ix -# off0) bytes0, stream0
> #) #), (# Nothing | #), c #) #)
>         _ -> case nextNonEmpty stream0 s0 of
>           (# s1, m #) -> case m of
>             (# (# #) | #) -> (# s1, (# (# | (# unboxBytes (B.singleton
> 13), Stream.empty #) #), (# Nothing | #), c #) #)
>             (# | (# bytes1@(# arr1, _, _ #), stream1 #) #) -> case
> indexWord8Array# arr1 0# of
>               10## -> (# s1, (# (# | (# unsafeDrop# 1# bytes1, stream1 #)
> #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
>               _ -> (# s1, (# (# | (# unboxBytes (B.cons 13 (boxBytes
> bytes1)), stream1 #) #), (# Nothing | #), c #) #)
> }}}
>
> That's all I've got for now. If no one's able to make headway, I'll
> probably come back to this and try to make a more minimal example at some
> point. I won't have time to do this soon though.

New description:

 I've made it a little further in my experiments with unboxed tuples in the
 `packed` library. However, I've run into another issue that I strongly
 suspect is the result of bad behavior of unboxed tuples. To replicate this
 issue (with GHC 8.4.3), do the following:

 {{{
 git clone https://github.com/andrewthad/packed
 cd packed
 cabal new-build
 }}}

 We use `cabal new-build` for its side effect of creating a
 `.ghc.environment.xyz` file. Now, create a minimal example in the
 directory called `eol.hs` with the following contents:

 {{{#!hs
 import Packed.Bytes.Parser (Parser)
 import Data.Word
 import Packed.Bytes (Bytes)
 import GHC.Exts (RealWorld)
 import Packed.Bytes.Stream.IO (ByteStream)
 import qualified Packed.Bytes as B
 import qualified Data.Char
 import qualified Packed.Bytes.Parser as P
 import qualified Packed.Bytes.Stream.IO as Stream

 main :: IO ()
 main = do
   r <- runExampleParser
     ( do P.takeBytesUntilEndOfLineConsume
          P.takeBytesUntilEndOfLineConsume
          P.takeBytesUntilEndOfLineConsume
     ) (foldMap Stream.singleton (map charToWord8
 "the\nemporium\rhas\narrived"))
   print r

 runExampleParser :: Parser e () a -> ByteStream RealWorld -> IO (Maybe a,
 Maybe String)
 runExampleParser parser stream = do
   P.Result mleftovers r _ <- P.parseStreamIO stream () parser
   mextra <- case mleftovers of
     Nothing -> return Nothing
     Just (P.Leftovers chunk remainingStream) -> do
       bs <- Stream.unpack remainingStream
       return (Just (map word8ToChar (B.unpack chunk ++ bs)))
   return (either (const Nothing) Just r,mextra)

 word8ToChar :: Word8 -> Char
 word8ToChar = Data.Char.chr . fromIntegral

 charToWord8 :: Char -> Word8
 charToWord8 = fromIntegral . Data.Char.ord

 s2b :: String -> Bytes
 s2b = B.pack . map charToWord8

 c2w :: Char -> Word8
 c2w = charToWord8
 }}}

 Finally, build this with `ghc -O2 eol.hs`, and then run the executable
 this produces to get the following:

 {{{
 (Nothing,Just "\rhas\narrived")
 eol: internal error: stg_ap_n_ret
     (GHC version 8.4.3 for x86_64_unknown_linux)
     Please report this as a GHC bug:
 http://www.haskell.org/ghc/reportabug
 Aborted (core dumped)
 }}}

 Things worth noting:

 1. I think the program fails in the final GC that runs right before the
 program terminates. We can see that it produces a correct result of
 `(Nothing,Just "\rhas\narrived")`, but something on the heap has
 definitely been corrupted.
 2. This only happens with `-O2` turned on.
 3. This only happens when the parser does not successfully parse its
 input.

 Here's some more context around this. I've been working on a parser that
 uses unboxed sums instead of continuations. After #15038 was fixed,
 everything had been going well. Then, I took the parser type and added two
 things to it: (1) context and (2) typed errors. Context is basically like
 throwing `StateT` on top and errors are like throwing `ExceptT` on top.
 After this, everything in my test suite kept working except for a single
 test, which now consistently crashes my test suite. So, I originally had
 this:

 {{{#!hs
 type Bytes# = (# ByteArray#, Int#, Int# #)
 type Maybe# (a :: TYPE r) = (# (# #) | a #)
 type Leftovers# s = (# Bytes# , ByteStream s #)
 type Result# s (r :: RuntimeRep) (a :: TYPE r) =
   (# Maybe# (Leftovers# s), Maybe# a #)
 newtype ParserLevity (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
   { getParserLevity :: forall s.
        Maybe# (Leftovers# s)
     -> State# s
     -> (# State# s, Result# s r a #)
   }
 }}}

 But I changed it to this:

 {{{#!hs
 type Bytes# = (# ByteArray#, Int#, Int# #)
 type Maybe# (a :: TYPE r) = (# (# #) | a #)
 type Either# a (b :: TYPE r) = (# a | b #)
 type Leftovers# s = (# Bytes# , ByteStream s #)
 type Result# e c s (r :: RuntimeRep) (a :: TYPE r) =
   (# Maybe# (Leftovers# s), Either# (Maybe e) a, c #)

 newtype ParserLevity e c (r :: RuntimeRep) (a :: TYPE r) = ParserLevity
   { getParserLevity :: forall s.
        c
     -> Maybe# (Leftovers# s)
     -> State# s
     -> (# State# s, Result# e c s r a #)
   }
 }}}

 Specifically, the function causing trouble is (as currently defined):

 {{{#!hs
 {-# NOINLINE takeBytesUntilEndOfLineConsumeUnboxed #-}
 takeBytesUntilEndOfLineConsumeUnboxed :: ParserLevity e c BytesRep Bytes#
 takeBytesUntilEndOfLineConsumeUnboxed = ParserLevity (go (# (# #) | #))
 where
   go :: Maybe# Bytes# -> c -> Maybe# (Leftovers# s) -> State# s -> (#
 State# s, Result# e c s BytesRep Bytes# #)
   go !_ c (# (# #) | #) s0 = (# s0, (# (# (# #) | #), (# Nothing | #), c
 #) #)
   go !mbytes c (# | (# bytes0@(# arr0, off0, len0 #), !stream0@(ByteStream
 streamFunc) #) #) s0 = case BAW.findAnyByte2 (I# off0) (I# len0) 10 13
 (ByteArray arr0) of
     Nothing -> case streamFunc s0 of
       (# s1, r #) -> go (# | appendMaybeBytes mbytes bytes0 #) c r s1
     Just (I# ix, W8# theByte) -> case theByte of
       10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 1# ) bytes0,
 stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #)
 #), c #) #)
       -- second case means it was 13
       _ -> case ix <# (off0 +# len0 -# 1#) of
         1# -> case indexWord8Array# arr0 (ix +# 1# ) of
           10## -> (# s0, (# (# | (# unsafeDrop# ((ix -# off0) +# 2# )
 bytes0, stream0 #) #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -#
 off0 #) #), c #) #)
           _ -> (# s0, (# (# | (# unsafeDrop# (ix -# off0) bytes0, stream0
 #) #), (# Nothing | #), c #) #)
         _ -> case nextNonEmpty stream0 s0 of
           (# s1, m #) -> case m of
             (# (# #) | #) -> (# s1, (# (# | (# unboxBytes (B.singleton
 13), Stream.empty #) #), (# Nothing | #), c #) #)
             (# | (# bytes1@(# arr1, _, _ #), stream1 #) #) -> case
 indexWord8Array# arr1 0# of
               10## -> (# s1, (# (# | (# unsafeDrop# 1# bytes1, stream1 #)
 #), (# | appendMaybeBytes mbytes (# arr0, off0, ix -# off0 #) #), c #) #)
               _ -> (# s1, (# (# | (# unboxBytes (B.cons 13 (boxBytes
 bytes1)), stream1 #) #), (# Nothing | #), c #) #)
 }}}

 That's all I've got for now. If no one's able to make headway, I'll
 probably come back to this and try to make a more minimal example at some
 point. I won't have time to do this soon though.

--

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15300#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list