waffle::backend

Module reducify

Source
Expand description

Reducification: turning a potentially irreducible CFG into a reducible CFG. We perform context-sensitive code duplication to “peel off” the parts of loops that are reached by side-entrances, branching back to the main loop as soon as control passes through the loop header again.

§Limitations

WARNING EXPONENTIAL BLOWUP POTENTIAL WARNING

This pass is designed on the assumption that irreducible control flow is rare, and needs to be handled somehow but it’s OK to, e.g., duplicate most of a loop body to do so. The tradeoff that we’re aiming for is that we want zero runtime overhead – we do not want a performance cliff if someone accidentally introduces an irreducible edge – and we’re hoping that this remains rare. If you feed this pass a state machine, or a fully-connected clique, for example, or even a deep nest of loops, one can get much worse than 2x code-size increase. You have been warned!

In the future we may consider a hybrid approach where we start with this algorithm, keep track of block-count increase, and abort and move to a Relooper-style (dynamic label variable-based) approach with no code duplication if a threshold is reached.

WARNING EXPONENTIAL BLOWUP POTENTIAL WARNING

§Finding Loop Headers

The basic idea is that we compute RPO and treat all backedges in RPO (i.e., edges from rpo-index i to rpo-index j, where j <= i) as loop backedges, with all blocks “under the edge” (with RPO indices i..=j) in the loop. We then “properly nest” loops, so if we have, e.g.:

    block0
    block1  |
    block2  | loop     |
    block3  |          |
    block4             | loop

we “fix the nesting” by pushing down the lower extent of the first loop to block4. We do so in a single post-pass fixup scan that keeps a stack, pushes when meeting a loop header, pops while the innermost is no longer in the initial header-set, then ensures that all header-blockson the stack are inserted into every header-set it passes over.

The effect of this is to compute a loop nest as if irreducible edges (side loop entrances) did not exist. We’ll fix them up later with the code duplication.

§Finding Irreducible Loop Headers

After computing header-sets, find edges from B1 to B2 such that headers(B2) - headers(B1) - {B2} is non-empty – that is, we add a header block (enter a new loop) going from B1 to B2, and that new header block is not B2 itself. This is a “side-entrance” into a loop, and is irreducible.

§Duplicating Code

We create blocks under contexts defined by “skipped headers”, where the context is computed at at an edge (From, To) as (where U is set union, - is set subtraction, & is set intersection, !S is the set complement):

    Gen = (headers(To) - headers(From)) - {To}
        = headers(To) & !headers(From) & !{To}
    Kill = (headers(From) - headers(To)) U {To}
         = (headers(From) & !headers(To)) U {To}

let ToContext = (FromContext - Kill) U Gen
              = (FromContext & !Kill) U Gen
              = (FromContext & !((headers(From) & !headers(To)) U {To})) U
                (headers(To) & !headers(From) & !{To})
              = (FromContext & !((headers(From) U {To}) & (!headers(To) U {To}))) U
                (headers(To) & !headers(From) & !{To})
              = (FromContext & (!(headers(From) U {To}) U !(!headers(To) U {To}))) U
                (headers(To) & !headers(From) & !{To})
              = (FromContext & ((!headers(From) & !{To}) U (headers(To) & !{To}))) U
                (headers(To) & !headers(From) & !{To})
              = (FromContext & !headers(From) & !{To}) U
                (FromContext & headers(To) & !{To}) U
                (headers(To) & !headers(From) & !{To})

invariant: for every B, we only ever have a context C where C c headers(B)

then the first term goes away (FromContext & !headers(From) = 0) and we can simplify to:

let ToContext = headers(To) & !{To} & (FromContext U !headers(From))

in other words: when entering a loop except through its header, add to context; stay in that context inside the loop; leave the context when we leave the loop.

Note that in the case with no irreducible edges, this becomes the special case where every context is {} and no blocks are actually duplicated (but we returned early above to avoid this no-op transform).

Patching up use-def links is somewhat tricky. Consider the CFG:

        1
       / \
      /   \
     2 --> 3
     2 <-- 3
          /
         4

Which is irreducible (it contains the canonical irreducible graph 1->2, 2->3, 3->2) and has an exit-path with block 4 that is dominated by block 3. Block 4 can thus use values defined in block 3, but if we perform elaboration as:

    1
  /  \__
 2<.<--3'
 v ^   |
 3-/  _|
  \ /
   4

that is, we have two copies of the block 3,and each has an exit to the one copy of 4.

Any values defined in 3 and used in 4 in the original CFG will need to pass through blockparams to merge the two versions in the elaborated CFG.

To fix this problem, we perform a max-SSA cut at all blocks that have an in-edge from a block with a larger header-set (i.e., a loop exit edge) if the exited loop has a side-entrance; this is the only way in which we can have a merge-point between different copies of the same subgraph.

Structs§