deeqSeq proposal

Mon Apr 10 18:43:49 EDT 2006

On Mon, Apr 10, 2006 at 02:40:44PM -0700, Andy Gill wrote:
> >it is unlikely it will even be possible to implement in jhc without
> >radical changes to its internals. there is just no where to attach
> >a bit
> >to, and even if there were, there is no generic way to evaluate
> >something to WHNF, or even a concept of WHNF in final grin. (grin code
> >can look inside unevaluated closures, hopefully making the thunk
> >non-updatable)
>
> I do not understand.
>
> - (A) I'm calling for a recursive descent function that does seq. I
> could
> write it in Haskell, for any specific type.  How is seq implemented jhs?

it is true, if you can express it in core, then jhc can implement it.
but there is no way to express 'the bit you want to steal' in core.
indeed there is no where to steal a bit from. there isn't a thunk type
at runtime in jhc. so implementing deepSeq in the traditinal way is
fine, optimizing it in the way you describe is not.

> - (B) Once we have this recursive function, I'm advocating for an
> optimization
> which will make it cheap. Why can't we just steal a bit in the (GHC)
> info table,
> rather than mess with LSB of pointers, or have two info tables?
>
> Yes, in grin this information would need to be used at compile time
>  but the resulting code would be considerably faster. A deepSeq is
> a gift to the compiler from the programmer.

actually, it may be slower in jhc. reducing to normal form is not
necessarily a win in jhc, and if you do do it, you want to evaluate a
function as _early_ as possible, as in, as close to its definition
point. 'evals' are inlined to have a branch for every possible way a
value could come about, since you could deepseq pretty much any type of
object, you would end up with huge expanded evals, but worse, it would
cause all these thunks to be updatable when they didn't need to be.

imagine a use of a thunk, rather than evaluate it to WHNF, jhc will just
pull the components right out of the unevaluated thunk, if at some point
in the past it might have been deepsequed, you suddenly need a case
statement to determine whether it has been evaluated or not. knowing
something has not been evaluated is just as valuable as knowing it has
definitely been.

for a quick example, imagine you have this function and it is not
optimized away in core for some reason or another.

> mktup x y = (y,x)

> foo x = case x of (x,y) -> bar x y
> main = let ... in foo (mktup a b)

now we convert to grin

> fmktup x y = do
>         return (CTuple y x)
>
> ffoo x = do
>         y <- eval x
>         update x y
>         (CTuple a b) <- y
>         bar a b
>
> main = do
>         ...
>         x <- Store (Fmktup a b)
>         ffoo x

things starting with capital letters are tags, parenthesized things are
nodes on the heap.

now, we do eval-expansion in ffoo, points to analysis determines a
suspended Fmktup may be passed into ffoo.

> ffoo x = do
>         x' <- fetch x
>         y <- case x' of
>                 (Fmktup a b) ->
>                         fmktup a b
>         update x y
>         (CTuple a b) <- y
>         bar a b

now, the case-of-case code motion (the y scrutinization is treated as a
simple case pulls the code into ffoo

> ffoo x = do
>         x' <- fetch x
>         case x' of
>                 (Fmktup a b) ->
>                         y <- fmktup a b
>                         update x y
>                         (CTuple a b) <- y
>                         bar a b

now, points-to analysis showed that nothing scrutinized this memory
location looking for a CTuple, therefor the update is uneeded.

also, fmktup is trivial so it is inlined

> ffoo x = do
>         x' <- fetch x
>         case x' of
>                 (Fmktup a b) ->
>                         y <- Return (CTuple b a)  -- inlined fmktup
>                         (CTuple a b) <- y
>                         bar a b

wrich trivially simplifise too

> ffoo x = do
>         x' <- fetch x
>         case x' of
>                 (Fmktup a b) ->
>                         bar b a
>
> ffoo x = do
>         (Fmktup a b) <- fetch x
>         bar b a

notice, there is no longer a concept of WHNF, Fmktup effectivly has
become the head normal form of that call due to standard optimization,
this type of transformation might happen to some suspended versions of
mktup, but not others, there is no way to tell from the heap location
itself whether it should be evaluated into WHNF or if uses are going to
pull its arguments right out of its closure or not.

deepseqing this case would only hurt performance as it would mean we couldn't
get rid of the 'update' or 'check if it is already a tuple' case. of
course, if mktup were some expensive call, then the opposite might be
true. in any case, deepseq is not always a win.

the above ffoo actually has another optimization waiting, arity raising,
since it knows x is always a pointer to a Fmktup, it pulls the arguments
out and passes 'a and b' to ffoo directly. since all function calls are
explicit in grin, we just go through and modify each call site
accordingly.

        John

--
John Meacham - ⑆repetae.net⑆john⑈