Fusing loops by specializing on functions with SpecConstr?

Sebastian Graf sgraf1337 at gmail.com
Tue Mar 31 22:05:16 UTC 2020


>
> This is a neat trick, though I’ve had trouble getting it to work reliably
> in my experiments (even though I was using GHC.Types.SPEC). That said, I
> also feel like I don’t understand the subtleties of SpecConstr very well,
> so it could have been my fault.
>

Yeah, SPEC is quite unreliable, because IIRC at some point it's either
consumed or irrelevant. But none of the combinators you mentioned should
rely on SpecConstr! They are all non-recursive, so the Simplifier will take
care of "specialisation". And it works just fine, I just tried it:

https://gist.github.com/sgraf812/d15cd3ee9cc9bd2e72704f90567ef35b

`test` there is optimised reasonably well. The problem is that we don't
have the concrete a..f so we can't cancel away all allocations.
If you give me a closed program where we fail to optimise away every bit of
allocation (and it isn't due to size concerns), then I would be surprised.
Although there might be a bug in how I encoded the streams, maybe we can be
a bit stricter here or there if need be.

`test2 = (double &&& inc) >>> arr (uncurry (+)) :: SF Int Int` is such a
function that we optimise down to (the equivalent of) `arr (\n -> 3*n+1)`.

Maybe you can give a medium-sized program where you think GHC does a poor
job at optimising?

Am Di., 31. März 2020 um 23:18 Uhr schrieb Alexis King <
lexi.lambda at gmail.com>:

> Sebastian and Simon,
>
> Thank you both for your responses—they are all quite helpful! I agree with
> both of you that figuring out how to do this kind of specialization without
> any guidance from the programmer seems rather intractable. It’s too hard to
> divine where it would actually be beneficial, and even if you could, it
> seems likely that other optimizations would get in the way of it actually
> working out.
>
> I’ve been trying to figure out if it would be possible to help the
> optimizer out by annotating the program with special combinators like the
> existing ones provided by GHC.Magic. However, I haven’t been able to come
> up with anything yet that seems like it would actually work.
>
> On Mar 31, 2020, at 06:12, Simon Peyton Jones <simonpj at microsoft.com>
> wrote:
>
> Wow – tricky stuff!   I would never have thought of trying to optimise
> that program, but it’s fascinating that you get lots and lots of them from
> FRP.
>
>
> For context, the reason you get all these tiny loops is that arrowized FRP
> uses the Arrow and ArrowChoice interfaces to build its programs, and those
> interfaces use tiny combinator functions like these:
>
>     first :: Arrow a => a b c -> a (b, d) (c, d)
>     (***) :: Arrow a => a b d -> a c e -> a (b, c) (d, e)
>     (|||) :: ArrowChoice a => a b d -> a c d -> a (Either b c) d
>
> This means you end up with programs built out of dozens or hundreds of
> uses of these tiny combinators. You get code that looks like
>
>     first (left (arr f >>> g ||| right h) *** second i)
>
> and this is a textbook situation where you want to specialize and inline
> all the combinators! For arrows without this tricky recursion, doing that
> works as intended, and GHC’s simplifier will do what it’s supposed to, and
> you get fast code.
>
> But with FRP, each of these combinators is recursive. This means you often
> get really awful code that looks like this:
>
>     arr (\case { Nothing -> Left (); Just x -> Right x }) >>> (f ||| g)
>
> This converts a Maybe to an Either, then branches on it. It’s analogous to
> writing something like this in direct-style code:
>
>     let y = case x of { Nothing -> Left (); Just x -> Right x }
>     in case y of { Left () -> f; Right x -> g x }
>
> We really want the optimizer to eliminate the intermediate Either and just
> branch on it directly, and if GHC could fuse these tiny recursive loops, it
> could! But without that, all this pointless shuffling of values around
> remains in the optimized program.
>
>
>    - I wonder whether it’d be possible to adjust the FRP library to
>    generate easier-to-optimise code. Probably not, but worth asking.
>
>
> I think it’s entirely possible to somehow annotate these combinators to
> communicate this information to the optimizer, but I don’t know what the
> annotations ought to look like. (That’s the research part!)
>
> But I’m not very optimistic about getting the library to generate
> easier-to-optimize code with the tools available today. Sebastian’s example
> of SF2 and stream fusion sort of works, but in my experience, something
> like that doesn’t handle enough cases well enough to work on real arrow
> programs.
>
>
>    - Unrolling one layer of a recursive function.  That seems harder: how
>       we know to **stop** unrolling as we successively simplify?  One
>       idea: do one layer of unrolling by hand, perhaps even in FRP source code:
>
> add1rec = SF (\a -> let !b = a+1 in (b,add1rec))
> add1 = SF (\a -> let !b = a+1 in (b,add1rec))
>
>
> Yes, I was playing with the idea at one point of some kind of RULE that
> inserts GHC.Magic.inline on the specialized RHS. That way the programmer
> could ask for the unrolling explicitly, as otherwise it seems unreasonable
> to ask the compiler to figure it out.
>
> On Mar 31, 2020, at 08:08, Sebastian Graf <sgraf1337 at gmail.com> wrote:
>
> We can formulate SF as a classic Stream that needs an `a` to produce its
> next element of type `b` like this (SF2 below)
>
>
> This is a neat trick, though I’ve had trouble getting it to work reliably
> in my experiments (even though I was using GHC.Types.SPEC). That said, I
> also feel like I don’t understand the subtleties of SpecConstr very well,
> so it could have been my fault.
>
> The more fundamental problem I’ve found with that approach is that it
> doesn’t do very well for arrow combinators like (***) and (|||), which come
> up very often in arrow programs but rarely in streaming. Fusing long chains
> of first/second/left/right is actually pretty easy with ordinary RULEs, but
> (***) and (|||) are much harder, since they have multiple continuations.
>
> It seems at first appealing to rewrite `f *** g` into `first f >>> second
> g`, which solves the immediate problem, but this is actually a lot less
> efficient after repeated rewritings. You end up rewriting `(f ||| g) *** h`
> into `first (left f) >>> first (right g) >>> second h`, turning two
> distinct branches into four, and larger programs have much worse
> exponential blowups.
>
> So that’s where I’ve gotten stuck! I’ve been toying with the idea of
> thinking about expression “shells”, so if you have something like
>
>     first (a ||| b) >>> c *** second (d ||| e) >>> f
>
> then you have a “shell” of the shape
>
>     first (● ||| ●) >>> ● *** second (● ||| ●) >>> ●
>
> which theoretically serves as a key for the specialization. You can then
> generate a specialization and a rule:
>
>     $s a b c d e f = ...
>     {-# RULE forall a b c d e f.
>             first (a ||| b) >>> c *** second (d ||| e) >>> f = $s a b c d
> e f #-}
>
> The question then becomes: how do you specify what these shells are, and
> how do you specify how to transform the shell into a specialized function?
> I don’t know, but it’s something a Core plugin could theoretically do.
> Maybe it makes sense for this domain-specific optimization to be a Core
> pass that runs before the simplifier, like the typeclass specializer
> currently is, but I haven’t explored that yet.
>
> Alexis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200401/f1f40542/attachment.html>


More information about the ghc-devs mailing list