Enum cranelift_isle::trie::TrieSymbol[−][src]

pub enum TrieSymbol {
    Match {
        op: PatternInst,
    },
    EndOfMatch,
}

Expand description

One “input symbol” for the decision tree that handles matching on a term. Each symbol represents one step: we either run a match op, or we finish the match.

Note that in the original Peepmatic scheme, the input-symbol to the FSM was specified slightly differently. The automaton responded to alphabet symbols that corresponded only to match results, and the “extra state” was used at each automaton node to represent the op to run next. This extra state differentiated nodes that would otherwise be merged together by deduplication. That scheme works well enough, but the “extra state” is slightly confusing and diverges slightly from a pure automaton.

Instead, here, we imagine that the user of the automaton/trie can query the possible transition edges out of the current state. Each of these edges corresponds to one possible match op to run. After running a match op, we reach a new state corresponding to successful matches up to that point.

However, it’s a bit more subtle than this. Consider the prioritization problem. We want to give the DSL user the ability to change the order in which rules apply, for example to have a tier of “fallback rules” that apply only if more custom rules do not match.

A somewhat simplistic answer to this problem is “more specific rule wins”. However, this implies the existence of a total ordering of linearized match sequences that may not fully capture the intuitive meaning of “more specific”. Consider three left-hand sides:

(A _ _)
(A (B _) _)
(A _ (B _))

Intuitively, the first is the least specific. Given the input (A (B 1) (B 2)), we can say for sure that the first should not be chosen, because either the second or third would match “more” of the input tree. But which of the second and third should be chosen? A “lexicographic ordering” rule would say that we sort left-hand sides such that the (B _) sub-pattern comes before the wildcard _, so the second rule wins. But that is arbitrarily privileging one over the other based on the order of the arguments.

Instead, we can accept explicit priorities from the user to allow either choice. So we need a data structure that can associate matching inputs with priorities to outputs.

Next, we build a decision tree rather than an FSM. Why? Because we’re compiling to a structured language, Rust, and states become program points rather than data, we cannot easily support a DAG structure. In other words, we are not producing a FSM that we can interpret at runtime; rather we are compiling code in which each state corresponds to a sequence of statements and control-flow that branches to a next state, we naturally need nesting; we cannot codegen arbitrary state transitions in an efficient manner. We could support a limited form of DAG that reifies “diamonds” (two alternate paths that reconverge), but supporting this in a way that lets the output refer to values from either side is very complex (we need to invent phi-nodes), and the cases where we want to do this rather than invoke a sub-term (that is compiled to a separate function) are rare. Finally, note that one reason to deduplicate nodes and turn a tree back into a DAG – “output-suffix sharing” as some other instruction-rewriter engines, such as Peepmatic, do – is not done, because all “output” occurs at leaf nodes; this is necessary because we do not want to start invoking external constructors until we are sure of the match. Some of the code-sharing advantages of the “suffix sharing” scheme can be obtained in a more flexible and user-controllable way (with less understanding of internal compiler logic needed) by factoring logic into different internal terms, which become different compiled functions. This is likely to happen anyway as part of good software engineering practice.

We prepare for codegen by building a “prioritized trie”, where the trie associates input strings with priorities to output values. Each input string is a sequence of match operators followed by an “end of match” token, and each output is a sequence of ops that build the output expression. Each input-output mapping is associated with a priority. The goal of the trie is to generate a decision-tree procedure that lets us execute match ops in a deterministic way, eventually landing at a state that corresponds to the highest-priority matching rule and can produce the output.

To build this trie, we construct nodes with edges to child nodes; each edge consists of (i) one input token (a PatternInst or EOM), and (ii) the minimum and maximum priorities of rules along this edge. In a way this resembles an interval tree, though the intervals of children need not be disjoint.

To add a rule to this trie, we perform the usual trie-insertion logic, creating edges and subnodes where necessary, and updating the priority-range of each edge that we traverse to include the priority of the inserted rule.

However, we need to be a little bit careful, because with only priority ranges in place and the potential for overlap, we have something that resembles an NFA. For example, consider the case where we reach a node in the trie and have two edges with two match ops, one corresponding to a rule with priority 10, and the other corresponding to two rules, with priorities 20 and 0. The final match could lie along either path, so we have to traverse both.

So, to avoid this, we perform a sort of moral equivalent to the NFA-to-DFA conversion “on the fly” as we insert nodes by duplicating subtrees. At any node, when inserting with a priority P and when outgoing edges lie in a range [P_lo, P_hi] such that P >= P_lo and P <= P_hi, we “priority-split the edges” at priority P.

To priority-split the edges in a node at priority P:

For each out-edge with priority [P_lo, P_hi] s.g. P \in [P_lo, P_hi], and token T:
- Trim the subnode at P, yielding children C_lo and C_hi.
- Both children must be non-empty (have at least one leaf) because the original node must have had a leaf at P_lo and a leaf at P_hi.
- Replace the one edge with two edges, one for each child, with the original match op, and with ranges calculated according to the trimmed children.

To trim a node into range [P_lo, P_hi]:

For a decision node:
- If any edges have a range outside the bounds of the trimming range, trim the bounds of the edge, and trim the subtree under the edge into the trimmed edge’s range. If the subtree is trimmed to None, remove the edge.
- If all edges are removed, the decision node becomes None.
For a leaf node:
- If the priority is outside the range, the node becomes None.

As we descend a path to insert a leaf node, we (i) priority-split if any edges’ priority ranges overlap the insertion priority range, and (ii) expand priority ranges on edges to include the new leaf node’s priority.

As long as we do this, we ensure the two key priority-trie invariants:

At a given node, no two edges exist with priority ranges R_1, R_2 such that R_1 ∩ R_2 ≠ ∅, unless R_1 and R_2 are unit ranges ([x, x]) and are on edges with different match-ops.
Along the path from the root to any leaf node with priority P, each edge has a priority range R such that P ∈ R.

Note that this means that multiple edges with a single match-op may exist, with different priorities.

Variants

`Match`

Fields

op: PatternInst

The match operation to run.

Run a match operation to continue matching a LHS.

`EndOfMatch`

We successfully matched a LHS.

Enum cranelift_isle::trie::TrieSymbol[−][src]

Variants

Match

Fields

EndOfMatch

Trait Implementations

impl Clone for TrieSymbol

fn clone(&self) -> TrieSymbol

fn clone_from(&mut self, source: &Self)

impl Debug for TrieSymbol

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Ord for TrieSymbol

fn cmp(&self, other: &TrieSymbol) -> Ordering

fn max(self, other: Self) -> Self

fn min(self, other: Self) -> Self

fn clamp(self, min: Self, max: Self) -> Self

impl PartialEq<TrieSymbol> for TrieSymbol

fn eq(&self, other: &TrieSymbol) -> bool

fn ne(&self, other: &TrieSymbol) -> bool

impl PartialOrd<TrieSymbol> for TrieSymbol

fn partial_cmp(&self, other: &TrieSymbol) -> Option<Ordering>

fn lt(&self, other: &Rhs) -> bool

fn le(&self, other: &Rhs) -> bool

fn gt(&self, other: &Rhs) -> bool

fn ge(&self, other: &Rhs) -> bool

impl Eq for TrieSymbol

impl StructuralEq for TrieSymbol

impl StructuralPartialEq for TrieSymbol

Auto Trait Implementations

impl RefUnwindSafe for TrieSymbol

impl Send for TrieSymbol

impl Sync for TrieSymbol

impl Unpin for TrieSymbol

impl UnwindSafe for TrieSymbol

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

pub fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

pub fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

pub fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

pub fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

pub fn into(self) -> U

impl<T> ToOwned for T where T: Clone,

type Owned = T

pub fn to_owned(&self) -> T

pub fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

pub fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

`Match`

`EndOfMatch`

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T> ToOwned for T where
T: Clone,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,