Struct regex_automata::dfa::regex::Regex

source · [−]

pub struct Regex<A = DFA<Vec<u32>>, P = None> { /* private fields */ }

Expand description

A regular expression that uses deterministic finite automata for fast searching.

A regular expression is comprised of two DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.

The type of the DFA used by a Regex corresponds to the A type parameter, which must satisfy the Automaton trait. Typically, A is either a dense::DFA or a sparse::DFA, where dense DFAs use more memory but search faster, while sparse DFAs use less memory but search more slowly.

By default, a regex’s automaton type parameter is set to dense::DFA<Vec<u32>> when the alloc feature is enabled. For most in-memory work loads, this is the most convenient type that gives the best search performance. When the alloc feature is disabled, no default type is used.

A Regex also has a P type parameter, which is used to select the prefilter used during search. By default, no prefilter is enabled by setting the type to default to [prefilter::None]. A prefilter can be enabled by using the Regex::prefilter method.

When should I use this?

Generally speaking, if you can afford the overhead of building a full DFA for your regex, and you don’t need things like capturing groups, then this is a good choice if you’re looking to optimize for matching speed. Note however that its speed may be worse than a general purpose regex engine if you don’t select a good [prefilter].

Earliest vs Leftmost vs Overlapping

The search routines exposed on a Regex reflect three different ways of searching:

“earliest” means to stop as soon as a match has been detected.
“leftmost” means to continue matching until the underlying automaton cannot advance. This reflects “standard” searching you might be used to in other regex engines. e.g., This permits non-greedy and greedy searching to work as you would expect.
“overlapping” means to find all possible matches, even if they overlap.

Generally speaking, when doing an overlapping search, you’ll want to build your regex DFAs with MatchKind::All semantics. Using MatchKind::LeftmostFirst semantics with overlapping searches is likely to lead to odd behavior since LeftmostFirst specifically omits some matches that can never be reported due to its semantics.

The following example shows the differences between how these different types of searches impact looking for matches of [a-z]+ in the haystack abc.

use regex_automata::{dfa::{self, dense}, MatchKind, MultiMatch};

let pattern = r"[a-z]+";
let haystack = "abc".as_bytes();

// With leftmost-first semantics, we test "earliest" and "leftmost".
let re = dfa::regex::Builder::new()
    .dense(dense::Config::new().match_kind(MatchKind::LeftmostFirst))
    .build(pattern)?;

// "earliest" searching isn't impacted by greediness
let mut it = re.find_earliest_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 3)), it.next());
assert_eq!(None, it.next());

// "leftmost" searching supports greediness (and non-greediness)
let mut it = re.find_leftmost_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(None, it.next());

// For overlapping, we want "all" match kind semantics.
let re = dfa::regex::Builder::new()
    .dense(dense::Config::new().match_kind(MatchKind::All))
    .build(pattern)?;

// In the overlapping search, we find all three possible matches
// starting at the beginning of the haystack.
let mut it = re.find_overlapping_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 0, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(None, it.next());

Sparse DFAs

Since a Regex is generic over the Automaton trait, it can be used with any kind of DFA. While this crate constructs dense DFAs by default, it is easy enough to build corresponding sparse DFAs, and then build a regex from them:

use regex_automata::dfa::regex::Regex;

// First, build a regex that uses dense DFAs.
let dense_re = Regex::new("foo[0-9]+")?;

// Second, build sparse DFAs from the forward and reverse dense DFAs.
let fwd = dense_re.forward().to_sparse()?;
let rev = dense_re.reverse().to_sparse()?;

// Third, build a new regex from the constituent sparse DFAs.
let sparse_re = Regex::builder().build_from_dfas(fwd, rev);

// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert_eq!(true, sparse_re.is_match(b"foo123"));

Alternatively, one can use a Builder to construct a sparse DFA more succinctly. (Note though that dense DFAs are still constructed first internally, and then converted to sparse DFAs, as in the example above.)

use regex_automata::dfa::regex::Regex;

let sparse_re = Regex::builder().build_sparse(r"foo[0-9]+")?;
// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert!(sparse_re.is_match(b"foo123"));

Fallibility

In non-default configurations, the DFAs generated in this module may return an error during a search. (Currently, the only way this happens is if quit bytes are added or Unicode word boundaries are heuristically enabled, both of which are turned off by default.) For convenience, the main search routines, like find_leftmost, will panic if an error occurs. However, if you need to use DFAs which may produce an error at search time, then there are fallible equivalents of all search routines. For example, for find_leftmost, its fallible analog is try_find_leftmost. The routines prefixed with try_ return Result<Option<MultiMatch>, MatchError>, where as the infallible routines simply return Option<MultiMatch>.

Example

This example shows how to cause a search to terminate if it sees a \n byte, and handle the error returned. This could be useful if, for example, you wanted to prevent a user supplied pattern from matching across a line boundary.

use regex_automata::{dfa::{self, regex::Regex}, MatchError};

let re = Regex::builder()
    .dense(dfa::dense::Config::new().quit(b'\n', true))
    .build(r"foo\p{any}+bar")?;

let haystack = "foo\nbar".as_bytes();
// Normally this would produce a match, since \p{any} contains '\n'.
// But since we instructed the automaton to enter a quit state if a
// '\n' is observed, this produces a match error instead.
let expected = MatchError::Quit { byte: 0x0A, offset: 3 };
let got = re.try_find_leftmost(haystack).unwrap_err();
assert_eq!(expected, got);

Struct regex_automata::dfa::regex::Regex

Implementations

impl Regex

pub fn new(pattern: &str) -> Result<Regex, Error>

pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, Error>

impl Regex<DFA<Vec<u8>>>

pub fn new_sparse(pattern: &str) -> Result<Regex<DFA<Vec<u8>>>, Error>

pub fn new_many_sparse<P: AsRef<str>>( patterns: &[P]) -> Result<Regex<DFA<Vec<u8>>>, Error>

impl Regex

pub fn config() -> Config

pub fn builder() -> Builder

impl<A: Automaton, P: Prefilter> Regex<A, P>

pub fn is_match(&self, haystack: &[u8]) -> bool

pub fn find_earliest(&self, haystack: &[u8]) -> Option<MultiMatch>

pub fn find_leftmost(&self, haystack: &[u8]) -> Option<MultiMatch>

pub fn find_overlapping( &self, haystack: &[u8], state: &mut OverlappingState) -> Option<MultiMatch>

pub fn find_earliest_iter<'r, 't>( &'r self, haystack: &'t [u8]) -> FindEarliestMatches<'r, 't, A, P>ⓘNotable traits for FindEarliestMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindEarliestMatches<'r, 't, A, P> type Item = MultiMatch;

pub fn find_leftmost_iter<'r, 't>( &'r self, haystack: &'t [u8]) -> FindLeftmostMatches<'r, 't, A, P>ⓘNotable traits for FindLeftmostMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindLeftmostMatches<'r, 't, A, P> type Item = MultiMatch;

pub fn find_overlapping_iter<'r, 't>( &'r self, haystack: &'t [u8]) -> FindOverlappingMatches<'r, 't, A, P>ⓘNotable traits for FindOverlappingMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindOverlappingMatches<'r, 't, A, P> type Item = MultiMatch;

impl<A: Automaton, P: Prefilter> Regex<A, P>

pub fn is_match_at(&self, haystack: &[u8], start: usize, end: usize) -> bool

pub fn find_earliest_at( &self, haystack: &[u8], start: usize, end: usize) -> Option<MultiMatch>

pub fn find_leftmost_at( &self, haystack: &[u8], start: usize, end: usize) -> Option<MultiMatch>

pub fn find_overlapping_at( &self, haystack: &[u8], start: usize, end: usize, state: &mut OverlappingState) -> Option<MultiMatch>

impl<A: Automaton, P: Prefilter> Regex<A, P>

pub fn try_is_match(&self, haystack: &[u8]) -> Result<bool, MatchError>

pub fn try_find_earliest( &self, haystack: &[u8]) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost( &self, haystack: &[u8]) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping( &self, haystack: &[u8], state: &mut OverlappingState) -> Result<Option<MultiMatch>, MatchError>

impl<A: Automaton, P: Prefilter> Regex<A, P>

pub fn try_is_match_at( &self, haystack: &[u8], start: usize, end: usize) -> Result<bool, MatchError>

pub fn try_find_earliest_at( &self, haystack: &[u8], start: usize, end: usize) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost_at( &self, haystack: &[u8], start: usize, end: usize) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping_at( &self, haystack: &[u8], start: usize, end: usize, state: &mut OverlappingState) -> Result<Option<MultiMatch>, MatchError>

impl<A: Automaton, P: Prefilter> Regex<A, P>

pub fn with_prefilter<Q: Prefilter>(self, prefilter: Q) -> Regex<A, Q>

pub fn without_prefilter(self) -> Regex<A>

pub fn forward(&self) -> &A

pub fn reverse(&self) -> &A

pub fn pattern_count(&self) -> usize

pub fn prefilter(&self) -> Option<&dyn Prefilter>

Trait Implementations

impl<A: Clone, P: Clone> Clone for Regex<A, P>

fn clone(&self) -> Regex<A, P>

fn clone_from(&mut self, source: &Self)

impl<A: Debug, P: Debug> Debug for Regex<A, P>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations

impl<A, P> RefUnwindSafe for Regex<A, P> where A: RefUnwindSafe, P: RefUnwindSafe,

impl<A, P> Send for Regex<A, P> where A: Send, P: Send,

impl<A, P> Sync for Regex<A, P> where A: Sync, P: Sync,

impl<A, P> Unpin for Regex<A, P> where A: Unpin, P: Unpin,

impl<A, P> UnwindSafe for Regex<A, P> where A: UnwindSafe, P: UnwindSafe,

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

fn into(self) -> U

impl<T> ToOwned for T where T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

pub fn new_many_sparse<P: AsRef<str>>(
patterns: &[P]
) -> Result<Regex<DFA<Vec<u8>>>, Error>

pub fn is_match(&self, haystack: &[u8 ]) -> bool

pub fn find_earliest(&self, haystack: &[u8 ]) -> Option<MultiMatch>

pub fn find_leftmost(&self, haystack: &[u8 ]) -> Option<MultiMatch>

pub fn find_overlapping(
&self,
haystack: &[u8 ],
state: &mut OverlappingState
) -> Option<MultiMatch>

pub fn find_earliest_iter<'r, 't>(
&'r self,
haystack: &'t [u8 ]
) -> FindEarliestMatches<'r, 't, A, P>ⓘNotable traits for FindEarliestMatches<'r, 't, A, P>`impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindEarliestMatches<'r, 't, A, P> type Item = MultiMatch;`

pub fn find_leftmost_iter<'r, 't>(
&'r self,
haystack: &'t [u8 ]
) -> FindLeftmostMatches<'r, 't, A, P>ⓘNotable traits for FindLeftmostMatches<'r, 't, A, P>`impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindLeftmostMatches<'r, 't, A, P> type Item = MultiMatch;`

pub fn find_overlapping_iter<'r, 't>(
&'r self,
haystack: &'t [u8 ]
) -> FindOverlappingMatches<'r, 't, A, P>ⓘNotable traits for FindOverlappingMatches<'r, 't, A, P>`impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindOverlappingMatches<'r, 't, A, P> type Item = MultiMatch;`

pub fn is_match_at(&self, haystack: &[u8 ], start: usize, end: usize) -> bool

pub fn find_earliest_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize
) -> Option<MultiMatch>

pub fn find_leftmost_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize
) -> Option<MultiMatch>

pub fn find_overlapping_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Option<MultiMatch>

pub fn try_is_match(&self, haystack: &[u8 ]) -> Result<bool, MatchError>

pub fn try_find_earliest(
&self,
haystack: &[u8 ]
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost(
&self,
haystack: &[u8 ]
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping(
&self,
haystack: &[u8 ],
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_is_match_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize
) -> Result<bool, MatchError>

pub fn try_find_earliest_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_leftmost_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>

pub fn try_find_overlapping_at(
&self,
haystack: &[u8 ],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>

impl<A, P> RefUnwindSafe for Regex<A, P> where
A: RefUnwindSafe,
P: RefUnwindSafe,

impl<A, P> Send for Regex<A, P> where
A: Send,
P: Send,

impl<A, P> Sync for Regex<A, P> where
A: Sync,
P: Sync,

impl<A, P> Unpin for Regex<A, P> where
A: Unpin,
P: Unpin,

impl<A, P> UnwindSafe for Regex<A, P> where
A: UnwindSafe,
P: UnwindSafe,

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T> ToOwned for T where
T: Clone,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,