Struct regex_automata::hybrid::LazyStateID
source · [−]pub struct LazyStateID(_);
Expand description
A state identifier especially tailored for lazy DFAs.
A lazy state ID logically represents a pointer to a DFA state. In practice, by limiting the number of DFA states it can address, it reserves some bits of its representation to encode some additional information. That additional information is called a “tag.” That tag is used to record whether the state it points to is an unknown, dead, quit, start or match state.
When implementing a low level search routine with a lazy DFA, it is necessary to query the type of the current state to know what to do:
- Unknown - The state has not yet been computed. The
parameters used to get this state ID must be re-passed to
DFA::next_state
, which will never return an unknown state ID. - Dead - A dead state only has transitions to itself. It indicates that the search cannot do anything else and should stop with whatever result it has.
- Quit - A quit state indicates that the automaton could not answer
whether a match exists or not. Correct search implementations must return a
MatchError::Quit
. - Start - A start state indicates that the automaton will begin searching at a starting state. Branching on this isn’t required for correctness, but a common optimization is to use this to more quickly look for a prefix.
- Match - A match state indicates that a match has been found. Depending on the semantics of your search implementation, it may either continue until the end of the haystack or a dead state, or it might quit and return the match immediately.
As an optimization, the is_tagged
predicate
can be used to determine if a tag exists at all. This is useful to avoid
branching on all of the above types for every byte searched.
Example
This example shows how LazyStateID
can be used to implement a correct
search routine with minimal branching. In particular, this search routine
implements “leftmost” matching, which means that it doesn’t immediately
stop once a match is found. Instead, it continues until it reaches a dead
state.
Notice also how a correct search implementation deals with
CacheError
s returned by some of
the lazy DFA routines. When a CacheError
occurs, it returns
MatchError::GaveUp
.
use regex_automata::{
hybrid::dfa::{Cache, DFA},
HalfMatch, MatchError, PatternID,
};
fn find_leftmost_first(
dfa: &DFA,
cache: &mut Cache,
haystack: &[u8],
) -> Result<Option<HalfMatch>, MatchError> {
// The start state is determined by inspecting the position and the
// initial bytes of the haystack. Note that start states can never
// be match states (since DFAs in this crate delay matches by 1
// byte), so we don't need to check if the start state is a match.
let mut sid = dfa.start_state_forward(
cache, None, haystack, 0, haystack.len(),
).map_err(|_| MatchError::GaveUp { offset: 0 })?;
let mut last_match = None;
// Walk all the bytes in the haystack. We can quit early if we see
// a dead or a quit state. The former means the automaton will
// never transition to any other state. The latter means that the
// automaton entered a condition in which its search failed.
for (i, &b) in haystack.iter().enumerate() {
sid = dfa
.next_state(cache, sid, b)
.map_err(|_| MatchError::GaveUp { offset: i })?;
if sid.is_tagged() {
if sid.is_match() {
last_match = Some(HalfMatch::new(
dfa.match_pattern(cache, sid, 0),
i,
));
} else if sid.is_dead() {
return Ok(last_match);
} else if sid.is_quit() {
// It is possible to enter into a quit state after
// observing a match has occurred. In that case, we
// should return the match instead of an error.
if last_match.is_some() {
return Ok(last_match);
}
return Err(MatchError::Quit { byte: b, offset: i });
}
// Implementors may also want to check for start states and
// handle them differently for performance reasons. But it is
// not necessary for correctness.
}
}
// Matches are always delayed by 1 byte, so we must explicitly walk
// the special "EOI" transition at the end of the search.
sid = dfa
.next_eoi_state(cache, sid)
.map_err(|_| MatchError::GaveUp { offset: haystack.len() })?;
if sid.is_match() {
last_match = Some(HalfMatch::new(
dfa.match_pattern(cache, sid, 0),
haystack.len(),
));
}
Ok(last_match)
}
// We use a greedy '+' operator to show how the search doesn't just stop
// once a match is detected. It continues extending the match. Using
// '[a-z]+?' would also work as expected and stop the search early.
// Greediness is built into the automaton.
let dfa = DFA::new(r"[a-z]+")?;
let mut cache = dfa.create_cache();
let haystack = "123 foobar 4567".as_bytes();
let mat = find_leftmost_first(&dfa, &mut cache, haystack)?.unwrap();
assert_eq!(mat.pattern().as_usize(), 0);
assert_eq!(mat.offset(), 10);
// Here's another example that tests our handling of the special
// EOI transition. This will fail to find a match if we don't call
// 'next_eoi_state' at the end of the search since the match isn't found
// until the final byte in the haystack.
let dfa = DFA::new(r"[0-9]{4}")?;
let mut cache = dfa.create_cache();
let haystack = "123 foobar 4567".as_bytes();
let mat = find_leftmost_first(&dfa, &mut cache, haystack)?.unwrap();
assert_eq!(mat.pattern().as_usize(), 0);
assert_eq!(mat.offset(), 15);
// And note that our search implementation above automatically works
// with multi-DFAs. Namely, `dfa.match_pattern(match_state, 0)` selects
// the appropriate pattern ID for us.
let dfa = DFA::new_many(&[r"[a-z]+", r"[0-9]+"])?;
let mut cache = dfa.create_cache();
let haystack = "123 foobar 4567".as_bytes();
let mat = find_leftmost_first(&dfa, &mut cache, haystack)?.unwrap();
assert_eq!(mat.pattern().as_usize(), 1);
assert_eq!(mat.offset(), 3);
let mat = find_leftmost_first(&dfa, &mut cache, &haystack[3..])?.unwrap();
assert_eq!(mat.pattern().as_usize(), 0);
assert_eq!(mat.offset(), 7);
let mat = find_leftmost_first(&dfa, &mut cache, &haystack[10..])?.unwrap();
assert_eq!(mat.pattern().as_usize(), 1);
assert_eq!(mat.offset(), 5);
Implementations
sourceimpl LazyStateID
impl LazyStateID
sourcepub const fn is_tagged(&self) -> bool
pub const fn is_tagged(&self) -> bool
Return true if and only if this lazy state ID is tagged.
When a lazy state ID is tagged, then one can conclude that it is one of a match, start, dead, quit or unknown state.
sourcepub const fn is_unknown(&self) -> bool
pub const fn is_unknown(&self) -> bool
Return true if and only if this represents a lazy state ID that is “unknown.” That is, the state has not yet been created. When a caller sees this state ID, it generally means that a state has to be computed in order to proceed.
sourcepub const fn is_dead(&self) -> bool
pub const fn is_dead(&self) -> bool
Return true if and only if this represents a dead state. A dead state is a state that can never transition to any other state except the dead state. When a dead state is seen, it generally indicates that a search should stop.
sourcepub const fn is_quit(&self) -> bool
pub const fn is_quit(&self) -> bool
Return true if and only if this represents a quit state. A quit state is a state that is representationally equivalent to a dead state, except it indicates the automaton has reached a point at which it can no longer determine whether a match exists or not. In general, this indicates an error during search and the caller must either pass this error up or use a different search technique.
Trait Implementations
sourceimpl Clone for LazyStateID
impl Clone for LazyStateID
sourcefn clone(&self) -> LazyStateID
fn clone(&self) -> LazyStateID
Returns a copy of the value. Read more
1.0.0 · sourcefn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from source
. Read more
sourceimpl Debug for LazyStateID
impl Debug for LazyStateID
sourceimpl Default for LazyStateID
impl Default for LazyStateID
sourcefn default() -> LazyStateID
fn default() -> LazyStateID
Returns the “default value” for a type. Read more
sourceimpl Hash for LazyStateID
impl Hash for LazyStateID
sourceimpl Ord for LazyStateID
impl Ord for LazyStateID
sourceimpl PartialEq<LazyStateID> for LazyStateID
impl PartialEq<LazyStateID> for LazyStateID
sourcefn eq(&self, other: &LazyStateID) -> bool
fn eq(&self, other: &LazyStateID) -> bool
This method tests for self
and other
values to be equal, and is used
by ==
. Read more
sourcefn ne(&self, other: &LazyStateID) -> bool
fn ne(&self, other: &LazyStateID) -> bool
This method tests for !=
.
sourceimpl PartialOrd<LazyStateID> for LazyStateID
impl PartialOrd<LazyStateID> for LazyStateID
sourcefn partial_cmp(&self, other: &LazyStateID) -> Option<Ordering>
fn partial_cmp(&self, other: &LazyStateID) -> Option<Ordering>
This method returns an ordering between self
and other
values if one exists. Read more
1.0.0 · sourcefn lt(&self, other: &Rhs) -> bool
fn lt(&self, other: &Rhs) -> bool
This method tests less than (for self
and other
) and is used by the <
operator. Read more
1.0.0 · sourcefn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
impl Copy for LazyStateID
impl Eq for LazyStateID
impl StructuralEq for LazyStateID
impl StructuralPartialEq for LazyStateID
Auto Trait Implementations
impl RefUnwindSafe for LazyStateID
impl Send for LazyStateID
impl Sync for LazyStateID
impl Unpin for LazyStateID
impl UnwindSafe for LazyStateID
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
sourceimpl<T> ToOwned for T where
T: Clone,
impl<T> ToOwned for T where
T: Clone,
type Owned = T
type Owned = T
The resulting type after obtaining ownership.
sourcefn clone_into(&self, target: &mut T)
fn clone_into(&self, target: &mut T)
toowned_clone_into
)Uses borrowed data to replace owned data, usually by cloning. Read more