Struct regex_automata::hybrid::regex::Regex

source · [−]

pub struct Regex { /* private fields */ }

Expand description

A regular expression that uses hybrid NFA/DFAs (also called “lazy DFAs”) for searching.

A regular expression is comprised of two lazy DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.

A Regex can also have a prefilter set via the set_prefilter method. By default, no prefilter is enabled.

Earliest vs Leftmost vs Overlapping

The search routines exposed on a Regex reflect three different ways of searching:

“earliest” means to stop as soon as a match has been detected.
“leftmost” means to continue matching until the underlying automaton cannot advance. This reflects “standard” searching you might be used to in other regex engines. e.g., This permits non-greedy and greedy searching to work as you would expect.
“overlapping” means to find all possible matches, even if they overlap.

Generally speaking, when doing an overlapping search, you’ll want to build your regex lazy DFAs with MatchKind::All semantics. Using MatchKind::LeftmostFirst semantics with overlapping searches is likely to lead to odd behavior since LeftmostFirst specifically omits some matches that can never be reported due to its semantics.

The following example shows the differences between how these different types of searches impact looking for matches of [a-z]+ in the haystack abc.

use regex_automata::{hybrid::{dfa, regex}, MatchKind, MultiMatch};

let pattern = r"[a-z]+";
let haystack = "abc".as_bytes();

// With leftmost-first semantics, we test "earliest" and "leftmost".
let re = regex::Builder::new()
    .dfa(dfa::Config::new().match_kind(MatchKind::LeftmostFirst))
    .build(pattern)?;
let mut cache = re.create_cache();

// "earliest" searching isn't impacted by greediness
let mut it = re.find_earliest_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 3)), it.next());
assert_eq!(None, it.next());

// "leftmost" searching supports greediness (and non-greediness)
let mut it = re.find_leftmost_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(None, it.next());

// For overlapping, we want "all" match kind semantics.
let re = regex::Builder::new()
    .dfa(dfa::Config::new().match_kind(MatchKind::All))
    .build(pattern)?;
let mut cache = re.create_cache();

// In the overlapping search, we find all three possible matches
// starting at the beginning of the haystack.
let mut it = re.find_overlapping_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 0, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(None, it.next());

Fallibility

In non-default configurations, the lazy DFAs generated in this module may return an error during a search. (Currently, the only way this happens is if quit bytes are added, Unicode word boundaries are heuristically enabled, or if the cache is configured to “give up” on a search if it has been cleared too many times. All of these are turned off by default, which means a search can never fail in the default configuration.) For convenience, the main search routines, like find_leftmost, will panic if an error occurs. However, if you need to use DFAs which may produce an error at search time, then there are fallible equivalents of all search routines. For example, for find_leftmost, its fallible analog is try_find_leftmost. The routines prefixed with try_ return Result<Option<MultiMatch>, MatchError>, where as the infallible routines simply return Option<MultiMatch>.

Example

This example shows how to cause a search to terminate if it sees a \n byte, and handle the error returned. This could be useful if, for example, you wanted to prevent a user supplied pattern from matching across a line boundary.

use regex_automata::{hybrid::{dfa, regex::Regex}, MatchError};

let re = Regex::builder()
    .dfa(dfa::Config::new().quit(b'\n', true))
    .build(r"foo\p{any}+bar")?;
let mut cache = re.create_cache();

let haystack = "foo\nbar".as_bytes();
// Normally this would produce a match, since \p{any} contains '\n'.
// But since we instructed the automaton to enter a quit state if a
// '\n' is observed, this produces a match error instead.
let expected = MatchError::Quit { byte: 0x0A, offset: 3 };
let got = re.try_find_leftmost(&mut cache, haystack).unwrap_err();
assert_eq!(expected, got);

Struct regex_automata::hybrid::regex::Regex

Implementations

impl Regex

pub fn new(pattern: &str) -> Result<Regex, BuildError>

pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, BuildError>

pub fn config() -> Config

pub fn builder() -> Builder

pub fn create_cache(&self) -> Cache

pub fn reset_cache(&self, cache: &mut Cache)

impl Regex

pub fn is_match(&self, cache: &mut Cache, haystack: &[u8]) -> bool

pub fn find_earliest( &self, cache: &mut Cache, haystack: &[u8]) -> Option<MultiMatch>

pub fn find_leftmost( &self, cache: &mut Cache, haystack: &[u8]) -> Option<MultiMatch>

pub fn find_overlapping( &self, cache: &mut Cache, haystack: &[u8], state: &mut OverlappingState) -> Option<MultiMatch>

pub fn find_earliest_iter<'r, 'c, 't>( &'r self, cache: &'c mut Cache, haystack: &'t [u8]) -> FindEarliestMatches<'r, 'c, 't>ⓘNotable traits for FindEarliestMatches<'r, 'c, 't>impl<'r, 'c, 't> Iterator for FindEarliestMatches<'r, 'c, 't> type Item = MultiMatch;

pub fn find_leftmost_iter<'r, 'c, 't>( &'r self, cache: &'c mut Cache, haystack: &'t [u8]) -> FindLeftmostMatches<'r, 'c, 't>ⓘNotable traits for FindLeftmostMatches<'r, 'c, 't>impl<'r, 'c, 't> Iterator for FindLeftmostMatches<'r, 'c, 't> type Item = MultiMatch;

pub fn find_overlapping_iter<'r, 'c, 't>( &'r self, cache: &'c mut Cache, haystack: &'t [u8]) -> FindOverlappingMatches<'r, 'c, 't>ⓘNotable traits for FindOverlappingMatches<'r, 'c, 't>impl<'r, 'c, 't> Iterator for FindOverlappingMatches<'r, 'c, 't> type Item = MultiMatch;

impl Regex

pub fn is_match_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize) -> bool

pub fn find_earliest_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize) -> Option<MultiMatch>

pub fn find_leftmost_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize) -> Option<MultiMatch>

pub fn find_overlapping_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize, state: &mut OverlappingState) -> Option<MultiMatch>

impl Regex

pub fn try_is_match( &self, cache: &mut Cache, haystack: &[u8]) -> Result<bool, MatchError>

pub fn try_find_earliest( &self, cache: &mut Cache, haystack: &[u8]) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost( &self, cache: &mut Cache, haystack: &[u8]) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping( &self, cache: &mut Cache, haystack: &[u8], state: &mut OverlappingState) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_earliest_iter<'r, 'c, 't>( &'r self, cache: &'c mut Cache, haystack: &'t [u8]) -> TryFindEarliestMatches<'r, 'c, 't>ⓘNotable traits for TryFindEarliestMatches<'r, 'c, 't>impl<'r, 'c, 't> Iterator for TryFindEarliestMatches<'r, 'c, 't> type Item = Result<MultiMatch, MatchError>;

pub fn try_find_leftmost_iter<'r, 'c, 't>( &'r self, cache: &'c mut Cache, haystack: &'t [u8]) -> TryFindLeftmostMatches<'r, 'c, 't>ⓘNotable traits for TryFindLeftmostMatches<'r, 'c, 't>impl<'r, 'c, 't> Iterator for TryFindLeftmostMatches<'r, 'c, 't> type Item = Result<MultiMatch, MatchError>;

impl Regex

pub fn try_is_match_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize) -> Result<bool, MatchError>

pub fn try_find_earliest_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping_at( &self, cache: &mut Cache, haystack: &[u8], start: usize, end: usize, state: &mut OverlappingState) -> Result<Option<MultiMatch>, MatchError>

impl Regex

pub fn forward(&self) -> &DFA

pub fn reverse(&self) -> &DFA

pub fn pattern_count(&self) -> usize

pub fn prefilter(&self) -> Option<&dyn Prefilter>

pub fn set_prefilter(&mut self, pre: Option<Box<dyn Prefilter>>)

Trait Implementations

impl Debug for Regex

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations

impl !RefUnwindSafe for Regex

impl !Send for Regex

impl !Sync for Regex

impl Unpin for Regex

impl !UnwindSafe for Regex

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

pub fn is_match(&self, cache: &mut Cache, haystack: &[u8 ]) -> bool

pub fn find_earliest(
&self,
cache: &mut Cache,
haystack: &[u8 ]
) -> Option<MultiMatch>

pub fn find_leftmost(
&self,
cache: &mut Cache,
haystack: &[u8 ]
) -> Option<MultiMatch>

pub fn find_overlapping(
&self,
cache: &mut Cache,
haystack: &[u8 ],
state: &mut OverlappingState
) -> Option<MultiMatch>

pub fn find_earliest_iter<'r, 'c, 't>(
&'r self,
cache: &'c mut Cache,
haystack: &'t [u8 ]
) -> FindEarliestMatches<'r, 'c, 't>ⓘNotable traits for FindEarliestMatches<'r, 'c, 't>`impl<'r, 'c, 't> Iterator for FindEarliestMatches<'r, 'c, 't> type Item = MultiMatch;`

pub fn find_leftmost_iter<'r, 'c, 't>(
&'r self,
cache: &'c mut Cache,
haystack: &'t [u8 ]
) -> FindLeftmostMatches<'r, 'c, 't>ⓘNotable traits for FindLeftmostMatches<'r, 'c, 't>`impl<'r, 'c, 't> Iterator for FindLeftmostMatches<'r, 'c, 't> type Item = MultiMatch;`

pub fn find_overlapping_iter<'r, 'c, 't>(
&'r self,
cache: &'c mut Cache,
haystack: &'t [u8 ]
) -> FindOverlappingMatches<'r, 'c, 't>ⓘNotable traits for FindOverlappingMatches<'r, 'c, 't>`impl<'r, 'c, 't> Iterator for FindOverlappingMatches<'r, 'c, 't> type Item = MultiMatch;`

pub fn is_match_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize
) -> bool

pub fn find_earliest_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize
) -> Option<MultiMatch>

pub fn find_leftmost_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize
) -> Option<MultiMatch>

pub fn find_overlapping_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Option<MultiMatch>

pub fn try_is_match(
&self,
cache: &mut Cache,
haystack: &[u8 ]
) -> Result<bool, MatchError>

pub fn try_find_earliest(
&self,
cache: &mut Cache,
haystack: &[u8 ]
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost(
&self,
cache: &mut Cache,
haystack: &[u8 ]
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping(
&self,
cache: &mut Cache,
haystack: &[u8 ],
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_is_match_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize
) -> Result<bool, MatchError>

pub fn try_find_earliest_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping_at(
&self,
cache: &mut Cache,
haystack: &[u8 ],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,