Trait regex_automata::dfa::Automaton

source · [−]

pub unsafe trait Automaton {
Show 25 methods
    fn next_state(&self, current: StateID, input: u8) -> StateID;
    unsafe fn next_state_unchecked(
        &self, 
        current: StateID, 
        input: u8
    ) -> StateID;
    fn next_eoi_state(&self, current: StateID) -> StateID;
    fn start_state_forward(
        &self, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize
    ) -> StateID;
    fn start_state_reverse(
        &self, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize
    ) -> StateID;
    fn is_special_state(&self, id: StateID) -> bool;
    fn is_dead_state(&self, id: StateID) -> bool;
    fn is_quit_state(&self, id: StateID) -> bool;
    fn is_match_state(&self, id: StateID) -> bool;
    fn is_start_state(&self, id: StateID) -> bool;
    fn is_accel_state(&self, id: StateID) -> bool;
    fn pattern_count(&self) -> usize;
    fn match_count(&self, id: StateID) -> usize;
    fn match_pattern(&self, id: StateID, index: usize) -> PatternID;

    fn accelerator(&self, _id: StateID) -> &[u8]ⓘNotable traits for &'_ [u8]impl<'_> Read for &'_ [u8]impl<'_> Write for &'_ mut [u8] { ... }
    fn find_earliest_fwd(
        &self, 
        bytes: &[u8]
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_earliest_rev(
        &self, 
        bytes: &[u8]
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_leftmost_fwd(
        &self, 
        bytes: &[u8]
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_leftmost_rev(
        &self, 
        bytes: &[u8]
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_overlapping_fwd(
        &self, 
        bytes: &[u8], 
        state: &mut OverlappingState
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_earliest_fwd_at(
        &self, 
        pre: Option<&mut Scanner<'_>>, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_earliest_rev_at(
        &self, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_leftmost_fwd_at(
        &self, 
        pre: Option<&mut Scanner<'_>>, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_leftmost_rev_at(
        &self, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
    fn find_overlapping_fwd_at(
        &self, 
        pre: Option<&mut Scanner<'_>>, 
        pattern_id: Option<PatternID>, 
        bytes: &[u8], 
        start: usize, 
        end: usize, 
        state: &mut OverlappingState
    ) -> Result<Option<HalfMatch>, MatchError> { ... }
}

Expand description

A trait describing the interface of a deterministic finite automaton (DFA).

The complexity of this trait probably means that it’s unlikely for others to implement it. The primary purpose of the trait is to provide for a way of abstracting over different types of DFAs. In this crate, that means dense DFAs and sparse DFAs. (Dense DFAs are fast but memory hungry, where as sparse DFAs are slower but come with a smaller memory footprint. But they otherwise provide exactly equivalent expressive power.) For example, a dfa::regex::Regex is generic over this trait.

Normally, a DFA’s execution model is very simple. You might have a single start state, zero or more final or “match” states and a function that transitions from one state to the next given the next byte of input. Unfortunately, the interface described by this trait is significantly more complicated than this. The complexity has a number of different reasons, mostly motivated by performance, functionality or space savings:

A DFA can search for multiple patterns simultaneously. This means extra information is returned when a match occurs. Namely, a match is not just an offset, but an offset plus a pattern ID. Automaton::pattern_count returns the number of patterns compiled into the DFA, Automaton::match_count returns the total number of patterns that match in a particular state and Automaton::match_pattern permits iterating over the patterns that match in a particular state.
A DFA can have multiple start states, and the choice of which start state to use depends on the content of the string being searched and position of the search, as well as whether the search is an anchored search for a specific pattern in the DFA. Moreover, computing the start state also depends on whether you’re doing a forward or a reverse search. Automaton::start_state_forward and Automaton::start_state_reverse are used to compute the start state for forward and reverse searches, respectively.
All matches are delayed by one byte to support things like $ and \b at the end of a pattern. Therefore, every use of a DFA is required to use Automaton::next_eoi_state at the end of the search to compute the final transition.
For optimization reasons, some states are treated specially. Every state is either special or not, which can be determined via the Automaton::is_special_state method. If it’s special, then the state must be at least one of a few possible types of states. (Note that some types can overlap, for example, a match state can also be an accel state. But some types can’t. If a state is a dead state, then it can never be any other type of state.) Those types are:
- A dead state. A dead state means the DFA will never enter a match state. This can be queried via the Automaton::is_dead_state method.
- A quit state. A quit state occurs if the DFA had to stop the search prematurely for some reason. This can be queried via the Automaton::is_quit_state method.
- A match state. A match state occurs when a match is found. When a DFA enters a match state, the search may stop immediately (when looking for the earliest match), or it may continue to find the leftmost-first match. This can be queried via the Automaton::is_match_state method.
- A start state. A start state is where a search begins. For every search, there is exactly one start state that is used, however, a DFA may contain many start states. When the search is in a start state, it may use a prefilter to quickly skip to candidate matches without executing the DFA on every byte. This can be queried via the Automaton::is_start_state method.
- An accel state. An accel state is a state that is accelerated. That is, it is a state where most of its transitions loop back to itself and only a small number of transitions lead to other states. This kind of state is said to be accelerated because a search routine can quickly look for the bytes leading out of the state instead of continuing to execute the DFA on each byte. This can be queried via the Automaton::is_accel_state method. And the bytes that lead out of the state can be queried via the Automaton::accelerator method.

There are a number of provided methods on this trait that implement efficient searching (for forwards and backwards) with a DFA using all of the above features of this trait. In particular, given the complexity of all these features, implementing a search routine in this trait is not straight forward. If you need to do this for specialized reasons, then it’s recommended to look at the source of this crate. It is intentionally well commented to help with this. With that said, it is possible to somewhat simplify the search routine. For example, handling accelerated states is strictly optional, since it is always correct to assume that Automaton::is_accel_state returns false. However, one complex part of writing a search routine using this trait is handling the 1-byte delay of a match. That is not optional.

Safety

This trait is unsafe to implement because DFA searching may rely on the correctness of the implementation for memory safety. For example, DFA searching may use explicit bounds check elision, which will in turn rely on the correctness of every function that returns a state ID.

When implementing this trait, one must uphold the documented correctness guarantees. Otherwise, undefined behavior may occur.

Trait regex_automata::dfa::Automaton

Required methods

fn next_state(&self, current: StateID, input: u8) -> StateID

unsafe fn next_state_unchecked(&self, current: StateID, input: u8) -> StateID

fn next_eoi_state(&self, current: StateID) -> StateID

fn start_state_forward( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> StateID

fn start_state_reverse( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> StateID

fn is_special_state(&self, id: StateID) -> bool

fn is_dead_state(&self, id: StateID) -> bool

fn is_quit_state(&self, id: StateID) -> bool

fn is_match_state(&self, id: StateID) -> bool

fn is_start_state(&self, id: StateID) -> bool

fn is_accel_state(&self, id: StateID) -> bool

fn pattern_count(&self) -> usize

fn match_count(&self, id: StateID) -> usize

fn match_pattern(&self, id: StateID, index: usize) -> PatternID

Provided methods

fn accelerator(&self, _id: StateID) -> &[u8]ⓘNotable traits for &'_ [u8]impl<'_> Read for &'_ [u8]impl<'_> Write for &'_ mut [u8]

fn find_earliest_fwd( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_rev( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_fwd( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_rev( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_overlapping_fwd( &self, bytes: &[u8], state: &mut OverlappingState) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_fwd_at( &self, pre: Option<&mut Scanner<'_>>, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_rev_at( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_fwd_at( &self, pre: Option<&mut Scanner<'_>>, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_rev_at( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_overlapping_fwd_at( &self, pre: Option<&mut Scanner<'_>>, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize, state: &mut OverlappingState) -> Result<Option<HalfMatch>, MatchError>

Implementations on Foreign Types

impl<'a, T: Automaton> Automaton for &'a T

fn next_state(&self, current: StateID, input: u8) -> StateID

unsafe fn next_state_unchecked(&self, current: StateID, input: u8) -> StateID

fn next_eoi_state(&self, current: StateID) -> StateID

fn start_state_forward( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> StateID

fn start_state_reverse( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> StateID

fn is_special_state(&self, id: StateID) -> bool

fn is_dead_state(&self, id: StateID) -> bool

fn is_quit_state(&self, id: StateID) -> bool

fn is_match_state(&self, id: StateID) -> bool

fn is_start_state(&self, id: StateID) -> bool

fn is_accel_state(&self, id: StateID) -> bool

fn pattern_count(&self) -> usize

fn match_count(&self, id: StateID) -> usize

fn match_pattern(&self, id: StateID, index: usize) -> PatternID

fn accelerator(&self, id: StateID) -> &[u8]ⓘNotable traits for &'_ [u8]impl<'_> Read for &'_ [u8]impl<'_> Write for &'_ mut [u8]

fn find_earliest_fwd( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_rev( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_fwd( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_rev( &self, bytes: &[u8]) -> Result<Option<HalfMatch>, MatchError>

fn find_overlapping_fwd( &self, bytes: &[u8], state: &mut OverlappingState) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_fwd_at( &self, pre: Option<&mut Scanner<'_>>, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_rev_at( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_fwd_at( &self, pre: Option<&mut Scanner<'_>>, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_rev_at( &self, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize) -> Result<Option<HalfMatch>, MatchError>

fn find_overlapping_fwd_at( &self, pre: Option<&mut Scanner<'_>>, pattern_id: Option<PatternID>, bytes: &[u8], start: usize, end: usize, state: &mut OverlappingState) -> Result<Option<HalfMatch>, MatchError>

Implementors

impl<T: AsRef<[u8]>> Automaton for regex_automata::dfa::sparse::DFA<T>

impl<T: AsRef<[u32]>> Automaton for regex_automata::dfa::dense::DFA<T>

fn start_state_forward(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize
) -> StateID

fn start_state_reverse(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize
) -> StateID

fn accelerator(&self, _id: StateID) -> &[u8 ]ⓘNotable traits for &'_ [u8 ]`impl<'_> Read for &'_ [u8]impl<'_> Write for &'_ mut [u8]`

fn find_earliest_fwd(
&self,
bytes: &[u8 ]
) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_rev(
&self,
bytes: &[u8 ]
) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_fwd(
&self,
bytes: &[u8 ]
) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_rev(
&self,
bytes: &[u8 ]
) -> Result<Option<HalfMatch>, MatchError>

fn find_overlapping_fwd(
&self,
bytes: &[u8 ],
state: &mut OverlappingState
) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>

fn find_earliest_rev_at(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>

fn find_leftmost_rev_at(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>

fn find_overlapping_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8 ],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<HalfMatch>, MatchError>