pub enum TrieSymbol {
    Match {
        op: PatternInst,
    },
    EndOfMatch,
}
Expand description

One “input symbol” for the decision tree that handles matching on a term. Each symbol represents one step: we either run a match op, or we finish the match.

Note that in the original Peepmatic scheme, the input-symbol to the FSM was specified slightly differently. The automaton responded to alphabet symbols that corresponded only to match results, and the “extra state” was used at each automaton node to represent the op to run next. This extra state differentiated nodes that would otherwise be merged together by deduplication. That scheme works well enough, but the “extra state” is slightly confusing and diverges slightly from a pure automaton.

Instead, here, we imagine that the user of the automaton/trie can query the possible transition edges out of the current state. Each of these edges corresponds to one possible match op to run. After running a match op, we reach a new state corresponding to successful matches up to that point.

However, it’s a bit more subtle than this. Consider the prioritization problem. We want to give the DSL user the ability to change the order in which rules apply, for example to have a tier of “fallback rules” that apply only if more custom rules do not match.

A somewhat simplistic answer to this problem is “more specific rule wins”. However, this implies the existence of a total ordering of linearized match sequences that may not fully capture the intuitive meaning of “more specific”. Consider three left-hand sides:

  • (A _ _)
  • (A (B _) _)
  • (A _ (B _))

Intuitively, the first is the least specific. Given the input (A (B 1) (B 2)), we can say for sure that the first should not be chosen, because either the second or third would match “more” of the input tree. But which of the second and third should be chosen? A “lexicographic ordering” rule would say that we sort left-hand sides such that the (B _) sub-pattern comes before the wildcard _, so the second rule wins. But that is arbitrarily privileging one over the other based on the order of the arguments.

Instead, we can accept explicit priorities from the user to allow either choice. So we need a data structure that can associate matching inputs with priorities to outputs.

Next, we build a decision tree rather than an FSM. Why? Because we’re compiling to a structured language, Rust, and states become program points rather than data, we cannot easily support a DAG structure. In other words, we are not producing a FSM that we can interpret at runtime; rather we are compiling code in which each state corresponds to a sequence of statements and control-flow that branches to a next state, we naturally need nesting; we cannot codegen arbitrary state transitions in an efficient manner. We could support a limited form of DAG that reifies “diamonds” (two alternate paths that reconverge), but supporting this in a way that lets the output refer to values from either side is very complex (we need to invent phi-nodes), and the cases where we want to do this rather than invoke a sub-term (that is compiled to a separate function) are rare. Finally, note that one reason to deduplicate nodes and turn a tree back into a DAG – “output-suffix sharing” as some other instruction-rewriter engines, such as Peepmatic, do – is not done, because all “output” occurs at leaf nodes; this is necessary because we do not want to start invoking external constructors until we are sure of the match. Some of the code-sharing advantages of the “suffix sharing” scheme can be obtained in a more flexible and user-controllable way (with less understanding of internal compiler logic needed) by factoring logic into different internal terms, which become different compiled functions. This is likely to happen anyway as part of good software engineering practice.

We prepare for codegen by building a “prioritized trie”, where the trie associates input strings with priorities to output values. Each input string is a sequence of match operators followed by an “end of match” token, and each output is a sequence of ops that build the output expression. Each input-output mapping is associated with a priority. The goal of the trie is to generate a decision-tree procedure that lets us execute match ops in a deterministic way, eventually landing at a state that corresponds to the highest-priority matching rule and can produce the output.

To build this trie, we construct nodes with edges to child nodes; each edge consists of (i) one input token (a PatternInst or EOM), and (ii) the priority of rules along this edge. We do not merge rules of different priorities, because the logic to do so is complex and error-prone, necessitating “splits” when we merge together a set of rules over a priority range but later introduce a new possible match op in the “middle” of the range. (E.g., match op A at prio 10, B at prio 5, A at prio 0.) In fact, a previous version of the ISLE compiler worked this way, but in practice the complexity was unneeded.

To add a rule to this trie, we perform the usual trie-insertion logic, creating edges and subnodes where necessary. A new edge is necessary whenever an edge does not exist for the (priority, symbol) tuple.

Note that this means that multiple edges with a single match-op may exist, with different priorities.

Variants

Match

Fields

op: PatternInst

The match operation to run.

Run a match operation to continue matching a LHS.

EndOfMatch

We successfully matched a LHS.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

This method returns an Ordering between self and other. Read more

Compares and returns the maximum of two values. Read more

Compares and returns the minimum of two values. Read more

Restrict a value to a certain interval. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

This method returns an ordering between self and other values if one exists. Read more

This method tests less than (for self and other) and is used by the < operator. Read more

This method tests less than or equal to (for self and other) and is used by the <= operator. Read more

This method tests greater than (for self and other) and is used by the > operator. Read more

This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.