Enum cranelift_isle::trie::TrieSymbol
source · pub enum TrieSymbol {
Match {
op: PatternInst,
},
EndOfMatch,
}
Expand description
One “input symbol” for the decision tree that handles matching on a term. Each symbol represents one step: we either run a match op, or we finish the match.
Note that in the original Peepmatic scheme, the input-symbol to the FSM was specified slightly differently. The automaton responded to alphabet symbols that corresponded only to match results, and the “extra state” was used at each automaton node to represent the op to run next. This extra state differentiated nodes that would otherwise be merged together by deduplication. That scheme works well enough, but the “extra state” is slightly confusing and diverges slightly from a pure automaton.
Instead, here, we imagine that the user of the automaton/trie can query the possible transition edges out of the current state. Each of these edges corresponds to one possible match op to run. After running a match op, we reach a new state corresponding to successful matches up to that point.
However, it’s a bit more subtle than this. Consider the prioritization problem. We want to give the DSL user the ability to change the order in which rules apply, for example to have a tier of “fallback rules” that apply only if more custom rules do not match.
A somewhat simplistic answer to this problem is “more specific rule wins”. However, this implies the existence of a total ordering of linearized match sequences that may not fully capture the intuitive meaning of “more specific”. Consider three left-hand sides:
- (A _ _)
- (A (B _) _)
- (A _ (B _))
Intuitively, the first is the least specific. Given the input (A (B 1) (B 2))
, we can say for sure that the first should not be
chosen, because either the second or third would match “more” of
the input tree. But which of the second and third should be
chosen? A “lexicographic ordering” rule would say that we sort
left-hand sides such that the (B _)
sub-pattern comes before the
wildcard _
, so the second rule wins. But that is arbitrarily
privileging one over the other based on the order of the
arguments.
Instead, we can accept explicit priorities from the user to allow either choice. So we need a data structure that can associate matching inputs with priorities to outputs.
Next, we build a decision tree rather than an FSM. Why? Because we’re compiling to a structured language, Rust, and states become program points rather than data, we cannot easily support a DAG structure. In other words, we are not producing a FSM that we can interpret at runtime; rather we are compiling code in which each state corresponds to a sequence of statements and control-flow that branches to a next state, we naturally need nesting; we cannot codegen arbitrary state transitions in an efficient manner. We could support a limited form of DAG that reifies “diamonds” (two alternate paths that reconverge), but supporting this in a way that lets the output refer to values from either side is very complex (we need to invent phi-nodes), and the cases where we want to do this rather than invoke a sub-term (that is compiled to a separate function) are rare. Finally, note that one reason to deduplicate nodes and turn a tree back into a DAG – “output-suffix sharing” as some other instruction-rewriter engines, such as Peepmatic, do – is not done, because all “output” occurs at leaf nodes; this is necessary because we do not want to start invoking external constructors until we are sure of the match. Some of the code-sharing advantages of the “suffix sharing” scheme can be obtained in a more flexible and user-controllable way (with less understanding of internal compiler logic needed) by factoring logic into different internal terms, which become different compiled functions. This is likely to happen anyway as part of good software engineering practice.
We prepare for codegen by building a “prioritized trie”, where the trie associates input strings with priorities to output values. Each input string is a sequence of match operators followed by an “end of match” token, and each output is a sequence of ops that build the output expression. Each input-output mapping is associated with a priority. The goal of the trie is to generate a decision-tree procedure that lets us execute match ops in a deterministic way, eventually landing at a state that corresponds to the highest-priority matching rule and can produce the output.
To build this trie, we construct nodes with edges to child nodes;
each edge consists of (i) one input token (a PatternInst
or
EOM), and (ii) the priority of rules along this edge. We do not
merge rules of different priorities, because the logic to do so is
complex and error-prone, necessitating “splits” when we merge
together a set of rules over a priority range but later introduce
a new possible match op in the “middle” of the range. (E.g., match
op A at prio 10, B at prio 5, A at prio 0.) In fact, a previous
version of the ISLE compiler worked this way, but in practice the
complexity was unneeded.
To add a rule to this trie, we perform the usual trie-insertion logic, creating edges and subnodes where necessary. A new edge is necessary whenever an edge does not exist for the (priority, symbol) tuple.
Note that this means that multiple edges with a single match-op may exist, with different priorities.
Variants§
Match
Fields
op: PatternInst
The match operation to run.
Run a match operation to continue matching a LHS.
EndOfMatch
We successfully matched a LHS.
Trait Implementations§
source§impl Clone for TrieSymbol
impl Clone for TrieSymbol
source§fn clone(&self) -> TrieSymbol
fn clone(&self) -> TrieSymbol
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for TrieSymbol
impl Debug for TrieSymbol
source§impl Ord for TrieSymbol
impl Ord for TrieSymbol
source§fn cmp(&self, other: &TrieSymbol) -> Ordering
fn cmp(&self, other: &TrieSymbol) -> Ordering
1.21.0 · source§fn max(self, other: Self) -> Selfwhere
Self: Sized,
fn max(self, other: Self) -> Selfwhere Self: Sized,
source§impl PartialEq<TrieSymbol> for TrieSymbol
impl PartialEq<TrieSymbol> for TrieSymbol
source§fn eq(&self, other: &TrieSymbol) -> bool
fn eq(&self, other: &TrieSymbol) -> bool
self
and other
values to be equal, and is used
by ==
.source§impl PartialOrd<TrieSymbol> for TrieSymbol
impl PartialOrd<TrieSymbol> for TrieSymbol
source§fn partial_cmp(&self, other: &TrieSymbol) -> Option<Ordering>
fn partial_cmp(&self, other: &TrieSymbol) -> Option<Ordering>
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read more