Alternatives for representing a reverse postorder numbering

Mon Dec 6 21:50:57 UTC 2021

Reverse postorder numbering is a superpower for control-flow analysis
and other back-end things.  For example,

  - In a reducible flow graph, a node Q is a loop header if and only if
    it is targeted by an edge P -> Q where Q's reverse postorder
    number is not greater than P's.

  - If a loop has multiple exits, the reverse postorder numbering of
    the exit nodes tells exactly the order in which the nodes must
    appear so they can be reached by multilevel `exit`/`break`
    statements, as are found in WebAssembly.

  - Reverse postorder numbers enable efficient computations of set
    intersection for dominator analysis.

One could go on.

In a perfect world, our representation of control-flow graphs would
provide a place to store a reverse postorder number for each
(reachable) basic block.  Alas, we live in an imperfect world, and I
am struggling to figure out how best to store reverse postorder
numbers for the blocks in a `GenCmmGraph`.

 1. One could apply ideas from "trees that grow" to the `Block` type
    from `GHC.Cmm.Dataflow.Block`.  But this type is already one of
    the more complicated types I have ever encountered in the wild,
    and the thought of further complexity fills me with horror.

 2. One could generalize quite a few types in `Cmm`.  In particular,
    one could create an analog of the `GenCmmGraph` type.  The analog,
    instead of taking a node type `n` as its parameter, would take a
    block type as its parameter.  It would use `Graph'` as defined in
    `GHC.Cmm.Dataflow.Graph`.  This change would ripple into
    `GHC.Cmm.Dataflow` without doing a whole lot of violence to the
    code that it there.  It would then become possible to do dataflow
    analysis (and perhaps other operations) over graphs with annotated
    blocks.

    It's worth noting that the `Graph'` representation already exists,
    but it doesn't seem to be used anywhere.  I'm not sure how it
    survived earlier rounds of culling.

 3. One could simple build an auxiliary `LabelMap` that includes the
    reverse postorder number of every node.  This idea bugs me a bit.
    I don't love spending O(log N) steps twice every time I look to
    see the direction of a control-flow edge.  But what I *really*
    don't love is what happens to the interfaces.  I can compute the
    reverse postorder map twice, or I can pollute my interfaces by
    saying "here is the dominator relation, and by the way, here also
    are the reverse postorder numbers, which I had to compute."

I'm currently using alternative 3: when I need reverse postorder
numbers, I call `revPostorderFrom` (defined in `GHC.Cmm.Dataflow.Graph`), 
then zip with `[1..]`.  But I'm really tempted by alternative 2,
which would allow me to, for example, define a graph annotated with
reverse postorder numbers, then do both the dominator analysis and my
translation on that graph.

How deep into the weeds should I go?  Make dataflow analysis even more polymorphic?
or learn to love `LabelMap`?

Norman