# Fusing loops by specializing on functions with SpecConstr?

Sebastian Graf sgraf1337 at gmail.com
Tue Mar 31 13:08:15 UTC 2020

```We can formulate SF as a classic Stream that needs an `a` to produce its
next element of type `b` like this (SF2 below):

{-# LANGUAGE BangPatterns #-}

module Lib where

newtype SF a b = SF { runSF :: a -> (b, SF a b) }

inc1 :: SF Int Int
inc1 = SF \$ \a -> let !b = a+1 in (b, inc1)

data Step s a = Yield !s a

data SF2 a b where
SF2 :: !(a -> s -> Step s b) -> !s -> SF2 a b

inc2 :: SF2 Int Int
inc2 = SF2 go ()
where
go a _ = let !b = a+1 in Yield () b

runSF2 :: SF2 a b -> a -> (b, SF2 a b)
runSF2 (SF2 f s) a = case f a s of
Yield s' b -> (b, (SF2 f s'))

Note the absence of recursion in inc2. This resolves the tension around
having to specialise for a function argument that is recursive and having
to do the unrolling. I bet that similar to stream fusion, we can arrange
that only the consumer has to be explicitly recursive. Indeed, I think this
will help you inline mapping combinators such as `second`, because it won't
be recursive itself anymore.
Now we "only" have to solve the same problems as with good old stream
fusion.

The tricky case (after realising that we need to add `Skip` to `Step` for
`filterSF2`) is when we want to optimise a signal of signals, e.g.
something like `concatMapSF2 :: (b -> SF2 a c) -> SF2 a b -> SF2 a c` or
some such. And here we are again in #855/#915.

Also if you need convincing that we can embed any SF into SF2, look at this:

embed :: SF Int Int -> SF2 Int Int
embed origSF = SF2 go origSF
where
go a sf = case runSF sf a of
(b, sf') -> Yield sf' b

Cheers,
Sebastian

Am Di., 31. März 2020 um 13:12 Uhr schrieb Simon Peyton Jones <
simonpj at microsoft.com>:

> Wow – tricky stuff!   I would never have thought of trying to optimise
> that program, but it’s fascinating that you get lots and lots of them from
> FRP.
>
>
>
>    - Don’t lose this thread!  Make a ticket, or a wiki page. If the
>    former, put the main payload (including Alexis’s examples) into the
>    Descriptions, not deep in the discussion.
>    - I wonder whether it’d be possible to adjust the FRP library to
>    generate easier-to-optimise code. Probably not, but worth asking.
>    - Alexis’s proposed solution relies on
>       - Specialising on a function argument.  Clearly this must be
>       possible, and it’d be very beneficial.
>       - Unrolling one layer of a recursive function.  That seems harder:
>       how we know to **stop** unrolling as we successively simplify?  One
>       idea: do one layer of unrolling by hand, perhaps even in FRP source code:
>
> add1rec = SF (\a -> let !b = a+1 in (b,add1rec))
>
> add1 = SF (\a -> let !b = a+1 in (b,add1rec))
>
>
>
> Simon
>
>
>
> *From:* ghc-devs <ghc-devs-bounces at haskell.org> *On Behalf Of *Sebastian
> Graf
> *Sent:* 29 March 2020 15:34
> *To:* Alexis King <lexi.lambda at gmail.com>
> *Cc:* ghc-devs <ghc-devs at haskell.org>
> *Subject:* Re: Fusing loops by specializing on functions with SpecConstr?
>
>
>
> Hi Alexis,
>
>
>
> I've been wondering the same things and have worked on it on and off. See
> .
>
>
>
> The big problem with solving the higher-order specialisation problem
> through SpecConstr (which is what I did in my reports in #855) is indeed
> that it's hard to
>
>    1. Anticipate what the rewritten program looks like without doing a
>    Simplifier pass after each specialisation, so that we can see and exploit
>    new specialisation opportunities. SpecConstr does use the simple Core
>    optimiser but, that often is not enough IIRC (think of ArgOccs from
>    recursive calls). In particular, it will not do RULE rewrites. Interleaving
>    SpecConstr with the Simplifier, apart from nigh impossible conceptually, is
>    computationally intractable and would quickly drift off into Partial
>    Evaluation swamp.
>    2. Make the RULE engine match and rewrite call sites in all call
>    patterns they can apply.
>    I.e., `f (\x -> Just (x +1))` calls its argument with one argument and
>    scrutinises the resulting Maybe (that's what is described by the argument's
>    `ArgOcc`), so that we want to specialise to a call pattern `f (\x -> Just
>    <some expression using x>)`, giving rise to the specialisation `\$sf ctx`,
>    where `ctx x` describes the `<some expression using x>` part. In an ideal
>    world, we want a (higher-order pattern unification) RULE for `forall f ctx.
>    f (\x -> Just (ctx x)) ==> \$sf ctx`. But from what I remember, GHC's RULE
>    engine works quite different from that and isn't even concerned with
>    finding unifiers (rather than just matching concrete call sites without
>    meta variables against RULEs with meta variables) at all.
>
> Note that matching on specific Ids binding functions is just an
> approximation using representional equality (on the Id's Unique) rather
> than some sort of more semantic equality. My latest endeavour into the
> matter in #915 from December was using types as the representational entity
> and type class specialisation. I think I got ultimately blocked on thttps://
> but apparently I didn't document the problematic program.
>
>
>
> Maybe my failure so far is that I want it to apply and optimise all cases
> and for more complex stream pipelines, rather than just doing a better best
> effort job.
>
>
>
> Hope that helps. Anyway, I'm also really keen on nailing this! It's one of
> my high-risk, high-reward research topics. So if you need someone to
> collaborate/exchange ideas with, I'm happy to help!
>
>
>
> All the best,
>
> Sebastian
>
>
>
> Am So., 29. März 2020 um 10:39 Uhr schrieb Alexis King <
> lexi.lambda at gmail.com>:
>
> Hi all,
>
> I have recently been toying with FRP, and I’ve noticed that
> traditional formulations generate a lot of tiny loops that GHC does
> a very poor job optimizing. Here’s a simplified example:
>
>     newtype SF a b = SF { runSF :: a -> (b, SF a b) }
>
>     add1_snd :: SF (String, Int) (String, Int)
>       add1 = SF \$ \a -> let !b = a + 1 in (b, add1)
>       second f = SF \$ \(a, b) ->
>         let !(c, f') = runSF f b
>         in ((a, c), second f')
>
> Here, `add1_snd` is defined in terms of two recursive bindings,
> `add1` and `second`. Because they’re both recursive, GHC doesn’t
> know what to do with them, and the optimized program still has two
> separate recursive knots. But this is a missed optimization, as
> `add1_snd` is equivalent to the following definition, which fuses
> the two loops together and consequently has just one recursive knot:
>
>     add1_snd_fused :: SF (String, Int) (String, Int)
>     add1_snd_fused = SF \$ \(a, b) ->
>       let !c = b + 1
>
> SpecConstr could do it! Suppose we specialize `second` at the call
>
>
>     second_add1 = SF \$ \(a, b) ->
>       let !(c, f') = runSF add1 b
>       in ((a, c), second f')
>
> This doesn’t immediately look like an improvement, but we’re
> actually almost there. If we unroll `add1` once on the RHS of
> `second_add1`, the simplifier will get us the rest of the way. We’ll
> end up with
>
>     let !b1 = b + 1
>         !(c, f') = (b1, add1)
>     in ((a, c), second f')
>
> and after substituting f' to get `second add1`, the RULE will tie
> the knot for us.
>
> This may look like small potatoes in isolation, but real programs
> can generate hundreds of these tiny, tiny loops, and fusing them
> together would be a big win. The only problem is SpecConstr doesn’t
> currently specialize on functions! The original paper, “Call-pattern
> Specialisation for Haskell Programs,” mentions this as a possibility
> in Section 6.2, but it points out that actually doing this in
> practice would be pretty tricky:
>
> > Specialising for function arguments is more slippery than for
> > constructor arguments. In the example above the argument was a
> > simple variable, but what if it was instead a lambda term? [...]
> >
> > The trouble is that lambda abstractions are much more fragile than
> > constructor applications, in the sense that simple transformations
> > may make two abstractions look different although they have the
> > same value.
>
> Still, the difference this could make in a program of mine is so
> large that I am interested in exploring it anyway. I am wondering if
> anyone has investigated this possibility any further since the paper
> was published, or if anyone knows of other use cases that would
> benefit from this capability.
>
> Thanks,
> Alexis
> _______________________________________________
> ghc-devs mailing list