cranelift_codegen::ir

Module pcc

Source
Expand description

Proof-carrying code. We attach “facts” to values and then check that they remain true after compilation.

A few key design principle of this approach are:

  • The producer of the IR provides the axioms. All “ground truth”, such as what memory is accessible – is meant to come by way of facts on the function arguments and global values. In some sense, all we are doing here is validating the “internal consistency” of the facts that are provided on values, and the actions performed on those values.

  • We do not derive and forward-propagate facts eagerly. Rather, the producer needs to provide breadcrumbs – a “proof witness” of sorts – to allow the checking to complete. That means that as an address is computed, or pointer chains are dereferenced, each intermediate value will likely have some fact attached.

    This does create more verbose IR, but a significant positive benefit is that it avoids unnecessary work: we do not build up a knowledge base that effectively encodes the integer ranges of many or most values in the program. Rather, we only check specifically the memory-access sequences. In practice, each such sequence is likely to be a carefully-controlled sequence of IR operations from, e.g., a sandboxing compiler (such as Wasmtime) so adding annotations here to communicate intent (ranges, bounds-checks, and the like) is no problem.

Facts are attached to SSA values in CLIF, and are maintained through optimizations and through lowering. They are thus also present on VRegs in the VCode. In theory, facts could be checked at either level, though in practice it is most useful to check them at the VCode level if the goal is an end-to-end verification of certain properties (e.g., memory sandboxing).

Checking facts entails visiting each instruction that defines a value with a fact, and checking the result’s fact against the facts on arguments and the operand. For VCode, this is fundamentally a question of the target ISA’s semantics, so we call into the LowerBackend for this. Note that during checking there is also limited forward propagation / inference, but only within an instruction: for example, an addressing mode commonly can include an addition, multiplication/shift, or extend operation, and there is no way to attach facts to the intermediate values “inside” the instruction, so instead the backend can use FactContext::add() and friends to forward-propagate facts.

TODO:

Deployment:

  • Add to fuzzing
  • Turn on during wasm spec-tests

More checks:

  • Check that facts on vmctx GVs are subsumed by the actual facts on the vmctx arg in block0 (function arg).

Generality:

  • facts on outputs (in func signature)?
  • Implement checking at the CLIF level as well.
  • Check instructions that can trap as well?

Nicer errors:

  • attach instruction index or some other identifier to errors

Text format cleanup:

  • make the bitwidth on max facts optional in the CLIF text format?
  • make offset in mem fact optional in the text format?

Bikeshed colors (syntax):

  • Put fact bang-annotations after types? v0: i64 ! fact(..) vs. v0 ! fact(..): i64

Structs§

  • A bound expression.
  • A “context” in which we can evaluate and derive facts. This context carries environment/global properties, such as the machine pointer width.

Enums§

  • The base part of a bound expression.
  • A fact on a value.
  • The two kinds of inequalities: “strict” (<, >) and “loose” (<=, >=), the latter of which admit equality.
  • An error or inconsistency discovered when checking proof-carrying code.

Functions§

  • Top-level entry point after compilation: this checks the facts in VCode.

Type Aliases§

  • The result of checking proof-carrying-code facts.