[Haskell-cafe] GHC worker/wrapper on multiple return values

Tyson Whitehead twhitehead at gmail.com
Thu Feb 18 03:59:57 UTC 2016


Recently discovered GHC generates sub-optimal code when you have

- two levels of loops in the IO monad
- the inner loop doesn't do that much work
- the outer loop does a lot of work

 From dumping core I believe the issue is the worker/wrapper transformation doesn't produce return value unboxing (from the inner loop to the outer in this case) when there are multiple return values (as you windup with the IO monad due to passing around the State# RealWorld behind the scenes).

Here is some simple code that demonstrates the issue.  With a single return value

{-# LANGUAGE BangPatterns #-}

module Main (main) where

main :: IO ()
main = do
   let a = l1 0 100000000
   print a

l1 :: Int -> Int -> Int
l1 !a !n =
   case n>0 of
    True  -> let m = l2 0 n 3
             in l1 (a+m) (n-1)
    False -> a

l2 :: Int -> Int -> Int -> Int
l2 !a !n !m =
   case m>0 of
    True  -> l2 (a+n) (n-1) (m-1)
    False -> a

you get nice clean core that performs no allocations and runs beautifully.  Doing the same thing within the IO monad (effectively lifting the return value above into a unboxed tupple)

{-# LANGUAGE BangPatterns, MagicHash, UnboxedTuples #-}

module Main (main) where

main :: IO ()
main = do
   a <- l1 0 100000000
   print a

l1 :: Int -> Int -> IO Int
l1 !a !n = do
   case n>0 of
    True  -> do m <- l2 0 n 3
                l1 (a+m) (n-1)
    False -> return a

l2 :: Int -> Int -> Int -> IO Int
l2 !a !n !m = do
   case m>0 of
    True  -> l2 (a+n) (n-1) (m-1)
    False -> return a

you don't get boxing and unboxing the return value between l1 and l2.  This has quite a dramatic effect when l2 doesn't do the that much work (massive amounts of additional memory allocation and a 2x slow down).

$ time ./Simple-pure
14999999850000000
           51,912 bytes allocated in the heap
...
real	0m0.322s
...

$ time ./Simple-io
14999999850000000
    1,600,051,936 bytes allocated in the heap
...
real	0m0.634s

Should I open a ticket against GHC about this issue?  Possibly it wouldn't be that hard to get the work/wrapper stuff to work its unboxing magic on multiple return values too?

Cheers!  -Tyson

[*] https://gist.github.com/twhitehead/5744eee28bcde85b9fa4


More information about the Haskell-Cafe mailing list