Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton rrnewton at gmail.com
Sat Aug 3 09:45:18 CEST 2013


>  > Re: expectations.  You don't get a funny feeling when monadic values are
> > used as first class rather than second class ;-)?  Whether in the
> > accumulator of a fold, or in cases like this:
> >
> >   do act <- f x
> >        act
>
> I don't. I don't even believe that the compiler can spot the difference
> between the two. (But maybe it's just my ignorance.)
>
> Do you have a specific example where this is a problem?
>

Well, I just generally assumed that anywhere where inlining is foiled
would, any kind of allocation-eliminating optimization is unlikely to
happen.

As one example, in the monad-par scheduler(s)
<https://github.com/simonmar/monad-par/blob/0e0401ea2cfbab787c67c2ed826c123c5d83112f/monad-par/Control/Monad/Par/Scheds/TraceInternal.hs#L144>we
do inscrutable computations to retrieve a monadic-value, and then run it.
 This works fine, but it does mean that we pay the full cost of an abstract
monadic action at that point (indirect jump, etc), rather than the highly
discounted rate we are used to paying when we chain together monadic
actions in a syntactically adjacent way within a function.

Personally, I don't understand all the GHC optimization steps well.  But I
have verified certain things that it *doesn't *do.  For example, if I have
an allocating action in the tail (only exiting case) of a recursive monadic
function, it cannot do "loop peeling" to eliminate that allocation against
whatever context is calling the recursive function.  For example the STG
for the following leaves the allocation of the tuple and two Ints:

*foo :: Int -> IO (Int,Int)*
*foo x | x < 10 = return (x, 2*x)*
*foo x = foo (x-1)*
*
*
*main = do*
*  (x,y) <- foo 1000*
*  print x*
*  print y*

Which is normally fine if the trip count is high.  But consider something
like compare-and-swap where there's a loop, but the expected trip count is
very low.  Perhaps this is just one example of how GHC tends not to
explicitly represent and optimize loops?

The Data.Map.Base.foldRWithKey function discussed in this thread is another
example.  That's a place where even after it inlines the provided function
into the fold, we end up with the below STG with an allocation of an IO ()
function inside the inner loop:

*go10_r3TD :: IO () -> Map Int Int -> IO ()*
*[GblId, Arity=2, Str=DmdType LS, Unf=OtherCon []] =*
*    sat-only \r srt:(0,*bitmap*) [z'_s3Yk ds_s3Y7]*
*        case ds_s3Y7 of _ {*
*          Bin rb_s43z kx_s3Ye x_s43A l_s3Ys r_s3Yl ->*
*              case kx_s3Ye of _ {*
*                I# x1_s3Yi ->*
*                    let {*
*                      sat_s43x :: IO ()*
*                      [LclId] =*
*                          \r srt:(0,*bitmap*) [eta_s3Ym]*
*                              case x1_s3Yi of _ {*
*                                __DEFAULT -> go10_r3TD z'_s3Yk r_s3Yl
eta_s3Ym;*
*                                500000 ->*
*                                    case hPutStr2 stdout lvl_r3Ty True
eta_s3Ym of _ {*
*                                      (#,#) ipv_s3Yq _ -> go10_r3TD
z'_s3Yk r_s3Yl ipv_s3Yq;*
*                                    };*
*                              };*
*                    } in  go10_r3TD sat_s43x l_s3Ys;*
*              };*
*          Tip -> z'_s3Yk;*
*        };*
*SRT(go10_r3TD): [hPutStr2, stdout, lvl_r3Ty, go10_r3TD]*

The specific problem in this example seems to be that -- based on a literal
reading of the above -- it's creating a closure that closes over the Int#, *
x1_s3Yi*.  Shachaf's version that uses a newtype seems to avoid this
trouble by not allowing IO's (>>=) into it at all.  The traverseWithKey_
version is attached below [1], and it manages to get rid of the IO newtype
in the loop and resolves to a State#.

Perhaps there is a missing optimization tweak that would help GHC get rid
of the IO type in the above STG?

Cheers,
  -Ryan

[1] P.S.  Here's the non-allocating loop produced by traverseWithKey_:

*a_r3X7*
*  :: Map Int Int -> State# RealWorld -> (# State# RealWorld, () #)*
*[GblId, Arity=2, Str=DmdType SL, Unf=OtherCon []] =*
*    sat-only \r srt:(0,*bitmap*) [ds_s41Z eta_s42c]*
*        case ds_s41Z of _ {*
*          Bin rb_s47H k_s426 v_s47I l_s42b r_s42g ->*
*              case k_s426 of _ {*
*                I# x_s429 ->*
*                    case x_s429 of _ {*
*                      __DEFAULT ->*
*                          case a_r3X7 l_s42b eta_s42c of _ {*
*                            (#,#) ipv_s42h _ -> a_r3X7 r_s42g ipv_s42h;*
*                          };*
*                      500000 ->*
*                          case hPutStr2 stdout lvl_r3X1 True eta_s42c of _
{*
*                            (#,#) ipv_s42l _ ->*
*                                case a_r3X7 l_s42b ipv_s42l of _ {*
*                                  (#,#) ipv2_s42p _ -> a_r3X7 r_s42g
ipv2_s42p;*
*                                };*
*                          };*
*                    };*
*              };*
*          Tip -> (#,#) [eta_s42c ()];*
*        };*
*SRT(a_r3X7): [hPutStr2, stdout, lvl_r3X1, a_r3X7]*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130803/51638d8c/attachment.htm>


More information about the Libraries mailing list