Enum cranelift_isle::trie::TrieSymbol

source ·

pub enum TrieSymbol {
    Match {
        op: PatternInst,
    },
    EndOfMatch,
}

Expand description

One “input symbol” for the decision tree that handles matching on a term. Each symbol represents one step: we either run a match op, or we finish the match.

Note that in the original Peepmatic scheme, the input-symbol to the FSM was specified slightly differently. The automaton responded to alphabet symbols that corresponded only to match results, and the “extra state” was used at each automaton node to represent the op to run next. This extra state differentiated nodes that would otherwise be merged together by deduplication. That scheme works well enough, but the “extra state” is slightly confusing and diverges slightly from a pure automaton.

Instead, here, we imagine that the user of the automaton/trie can query the possible transition edges out of the current state. Each of these edges corresponds to one possible match op to run. After running a match op, we reach a new state corresponding to successful matches up to that point.

However, it’s a bit more subtle than this. Consider the prioritization problem. We want to give the DSL user the ability to change the order in which rules apply, for example to have a tier of “fallback rules” that apply only if more custom rules do not match.

A somewhat simplistic answer to this problem is “more specific rule wins”. However, this implies the existence of a total ordering of linearized match sequences that may not fully capture the intuitive meaning of “more specific”. Consider three left-hand sides:

(A _ _)
(A (B _) _)
(A _ (B _))

Intuitively, the first is the least specific. Given the input (A (B 1) (B 2)), we can say for sure that the first should not be chosen, because either the second or third would match “more” of the input tree. But which of the second and third should be chosen? A “lexicographic ordering” rule would say that we sort left-hand sides such that the (B _) sub-pattern comes before the wildcard _, so the second rule wins. But that is arbitrarily privileging one over the other based on the order of the arguments.

Instead, we can accept explicit priorities from the user to allow either choice. So we need a data structure that can associate matching inputs with priorities to outputs.

Next, we build a decision tree rather than an FSM. Why? Because we’re compiling to a structured language, Rust, and states become program points rather than data, we cannot easily support a DAG structure. In other words, we are not producing a FSM that we can interpret at runtime; rather we are compiling code in which each state corresponds to a sequence of statements and control-flow that branches to a next state, we naturally need nesting; we cannot codegen arbitrary state transitions in an efficient manner. We could support a limited form of DAG that reifies “diamonds” (two alternate paths that reconverge), but supporting this in a way that lets the output refer to values from either side is very complex (we need to invent phi-nodes), and the cases where we want to do this rather than invoke a sub-term (that is compiled to a separate function) are rare. Finally, note that one reason to deduplicate nodes and turn a tree back into a DAG – “output-suffix sharing” as some other instruction-rewriter engines, such as Peepmatic, do – is not done, because all “output” occurs at leaf nodes; this is necessary because we do not want to start invoking external constructors until we are sure of the match. Some of the code-sharing advantages of the “suffix sharing” scheme can be obtained in a more flexible and user-controllable way (with less understanding of internal compiler logic needed) by factoring logic into different internal terms, which become different compiled functions. This is likely to happen anyway as part of good software engineering practice.

We prepare for codegen by building a “prioritized trie”, where the trie associates input strings with priorities to output values. Each input string is a sequence of match operators followed by an “end of match” token, and each output is a sequence of ops that build the output expression. Each input-output mapping is associated with a priority. The goal of the trie is to generate a decision-tree procedure that lets us execute match ops in a deterministic way, eventually landing at a state that corresponds to the highest-priority matching rule and can produce the output.

To build this trie, we construct nodes with edges to child nodes; each edge consists of (i) one input token (a PatternInst or EOM), and (ii) the priority of rules along this edge. We do not merge rules of different priorities, because the logic to do so is complex and error-prone, necessitating “splits” when we merge together a set of rules over a priority range but later introduce a new possible match op in the “middle” of the range. (E.g., match op A at prio 10, B at prio 5, A at prio 0.) In fact, a previous version of the ISLE compiler worked this way, but in practice the complexity was unneeded.

To add a rule to this trie, we perform the usual trie-insertion logic, creating edges and subnodes where necessary. A new edge is necessary whenever an edge does not exist for the (priority, symbol) tuple.

Note that this means that multiple edges with a single match-op may exist, with different priorities.

Variants§

§

Match

Fields

§op: PatternInst

The match operation to run.

Run a match operation to continue matching a LHS.

§

EndOfMatch

We successfully matched a LHS.

Enum cranelift_isle::trie::TrieSymbol

Variants§

Match

Fields

EndOfMatch

Trait Implementations§

impl Clone for TrieSymbol

fn clone(&self) -> TrieSymbol

fn clone_from(&mut self, source: &Self)

impl Debug for TrieSymbol

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Ord for TrieSymbol

fn cmp(&self, other: &TrieSymbol) -> Ordering

fn max(self, other: Self) -> Selfwhere Self: Sized,

fn min(self, other: Self) -> Selfwhere Self: Sized,

fn clamp(self, min: Self, max: Self) -> Selfwhere Self: Sized + PartialOrd<Self>,

impl PartialEq<TrieSymbol> for TrieSymbol

fn eq(&self, other: &TrieSymbol) -> bool

fn ne(&self, other: &Rhs) -> bool

impl PartialOrd<TrieSymbol> for TrieSymbol

fn partial_cmp(&self, other: &TrieSymbol) -> Option<Ordering>

fn lt(&self, other: &Rhs) -> bool

fn le(&self, other: &Rhs) -> bool

fn gt(&self, other: &Rhs) -> bool

fn ge(&self, other: &Rhs) -> bool

impl Eq for TrieSymbol

impl StructuralEq for TrieSymbol

impl StructuralPartialEq for TrieSymbol

Auto Trait Implementations§

impl RefUnwindSafe for TrieSymbol

impl Send for TrieSymbol

impl Sync for TrieSymbol

impl Unpin for TrieSymbol

impl UnwindSafe for TrieSymbol

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>