Expand description
A regular expression that uses deterministic finite automata for fast searching.
A regular expression is comprised of two DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.
The type of the DFA used by a Regex
corresponds to the A
type
parameter, which must satisfy the Automaton
trait. Typically,
A
is either a dense::DFA
or a
sparse::DFA
, where dense DFAs use more
memory but search faster, while sparse DFAs use less memory but search
more slowly.
By default, a regex’s automaton type parameter is set to
dense::DFA<Vec<u32>>
when the alloc
feature is enabled. For most
in-memory work loads, this is the most convenient type that gives the
best search performance. When the alloc
feature is disabled, no
default type is used.
A Regex
also has a P
type parameter, which is used to select the
prefilter used during search. By default, no prefilter is enabled by
setting the type to default to [prefilter::None
]. A prefilter can be
enabled by using the Regex::prefilter
method.
When should I use this?
Generally speaking, if you can afford the overhead of building a full DFA for your regex, and you don’t need things like capturing groups, then this is a good choice if you’re looking to optimize for matching speed. Note however that its speed may be worse than a general purpose regex engine if you don’t select a good [prefilter].
Earliest vs Leftmost vs Overlapping
The search routines exposed on a Regex
reflect three different ways
of searching:
- “earliest” means to stop as soon as a match has been detected.
- “leftmost” means to continue matching until the underlying automaton cannot advance. This reflects “standard” searching you might be used to in other regex engines. e.g., This permits non-greedy and greedy searching to work as you would expect.
- “overlapping” means to find all possible matches, even if they overlap.
Generally speaking, when doing an overlapping search, you’ll want to
build your regex DFAs with MatchKind::All
semantics. Using
MatchKind::LeftmostFirst
semantics with overlapping searches is
likely to lead to odd behavior since LeftmostFirst
specifically omits
some matches that can never be reported due to its semantics.
The following example shows the differences between how these different
types of searches impact looking for matches of [a-z]+
in the
haystack abc
.
use regex_automata::{dfa::{self, dense}, MatchKind, MultiMatch};
let pattern = r"[a-z]+";
let haystack = "abc".as_bytes();
// With leftmost-first semantics, we test "earliest" and "leftmost".
let re = dfa::regex::Builder::new()
.dense(dense::Config::new().match_kind(MatchKind::LeftmostFirst))
.build(pattern)?;
// "earliest" searching isn't impacted by greediness
let mut it = re.find_earliest_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 3)), it.next());
assert_eq!(None, it.next());
// "leftmost" searching supports greediness (and non-greediness)
let mut it = re.find_leftmost_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(None, it.next());
// For overlapping, we want "all" match kind semantics.
let re = dfa::regex::Builder::new()
.dense(dense::Config::new().match_kind(MatchKind::All))
.build(pattern)?;
// In the overlapping search, we find all three possible matches
// starting at the beginning of the haystack.
let mut it = re.find_overlapping_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 0, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(None, it.next());
Sparse DFAs
Since a Regex
is generic over the Automaton
trait, it can be
used with any kind of DFA. While this crate constructs dense DFAs by
default, it is easy enough to build corresponding sparse DFAs, and then
build a regex from them:
use regex_automata::dfa::regex::Regex;
// First, build a regex that uses dense DFAs.
let dense_re = Regex::new("foo[0-9]+")?;
// Second, build sparse DFAs from the forward and reverse dense DFAs.
let fwd = dense_re.forward().to_sparse()?;
let rev = dense_re.reverse().to_sparse()?;
// Third, build a new regex from the constituent sparse DFAs.
let sparse_re = Regex::builder().build_from_dfas(fwd, rev);
// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert_eq!(true, sparse_re.is_match(b"foo123"));
Alternatively, one can use a Builder
to construct a sparse DFA
more succinctly. (Note though that dense DFAs are still constructed
first internally, and then converted to sparse DFAs, as in the example
above.)
use regex_automata::dfa::regex::Regex;
let sparse_re = Regex::builder().build_sparse(r"foo[0-9]+")?;
// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert!(sparse_re.is_match(b"foo123"));
Fallibility
In non-default configurations, the DFAs generated in this module may
return an error during a search. (Currently, the only way this happens
is if quit bytes are added or Unicode word boundaries are heuristically
enabled, both of which are turned off by default.) For convenience, the
main search routines, like find_leftmost
,
will panic if an error occurs. However, if you need to use DFAs
which may produce an error at search time, then there are fallible
equivalents of all search routines. For example, for find_leftmost
,
its fallible analog is try_find_leftmost
.
The routines prefixed with try_
return Result<Option<MultiMatch>, MatchError>
, where as the infallible routines simply return
Option<MultiMatch>
.
Example
This example shows how to cause a search to terminate if it sees a
\n
byte, and handle the error returned. This could be useful if, for
example, you wanted to prevent a user supplied pattern from matching
across a line boundary.
use regex_automata::{dfa::{self, regex::Regex}, MatchError};
let re = Regex::builder()
.dense(dfa::dense::Config::new().quit(b'\n', true))
.build(r"foo\p{any}+bar")?;
let haystack = "foo\nbar".as_bytes();
// Normally this would produce a match, since \p{any} contains '\n'.
// But since we instructed the automaton to enter a quit state if a
// '\n' is observed, this produces a match error instead.
let expected = MatchError::Quit { byte: 0x0A, offset: 3 };
let got = re.try_find_leftmost(haystack).unwrap_err();
assert_eq!(expected, got);
Implementations
sourceimpl Regex
impl Regex
sourcepub fn new(pattern: &str) -> Result<Regex, Error>
pub fn new(pattern: &str) -> Result<Regex, Error>
Parse the given regular expression using the default configuration and return the corresponding regex.
If you want a non-default configuration, then use the Builder
to
set your own configuration.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(
Some(MultiMatch::must(0, 3, 14)),
re.find_leftmost(b"zzzfoo12345barzzz"),
);
sourcepub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, Error>
pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, Error>
Like new
, but parses multiple patterns into a single “regex set.”
This similarly uses the default regex configuration.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
let re = Regex::new_many(&["[a-z]+", "[0-9]+"])?;
let mut it = re.find_leftmost_iter(b"abc 1 foo 4567 0 quux");
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(Some(MultiMatch::must(1, 4, 5)), it.next());
assert_eq!(Some(MultiMatch::must(0, 6, 9)), it.next());
assert_eq!(Some(MultiMatch::must(1, 10, 14)), it.next());
assert_eq!(Some(MultiMatch::must(1, 15, 16)), it.next());
assert_eq!(Some(MultiMatch::must(0, 17, 21)), it.next());
assert_eq!(None, it.next());
sourceimpl Regex<DFA<Vec<u8>>>
impl Regex<DFA<Vec<u8>>>
sourcepub fn new_sparse(pattern: &str) -> Result<Regex<DFA<Vec<u8>>>, Error>
pub fn new_sparse(pattern: &str) -> Result<Regex<DFA<Vec<u8>>>, Error>
Parse the given regular expression using the default configuration, except using sparse DFAs, and return the corresponding regex.
If you want a non-default configuration, then use the Builder
to
set your own configuration.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
let re = Regex::new_sparse("foo[0-9]+bar")?;
assert_eq!(
Some(MultiMatch::must(0, 3, 14)),
re.find_leftmost(b"zzzfoo12345barzzz"),
);
sourcepub fn new_many_sparse<P: AsRef<str>>(
patterns: &[P]
) -> Result<Regex<DFA<Vec<u8>>>, Error>
pub fn new_many_sparse<P: AsRef<str>>(
patterns: &[P]
) -> Result<Regex<DFA<Vec<u8>>>, Error>
Like new
, but parses multiple patterns into a single “regex set”
using sparse DFAs. This otherwise similarly uses the default regex
configuration.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
let re = Regex::new_many_sparse(&["[a-z]+", "[0-9]+"])?;
let mut it = re.find_leftmost_iter(b"abc 1 foo 4567 0 quux");
assert_eq!(Some(MultiMatch::must(0, 0, 3)), it.next());
assert_eq!(Some(MultiMatch::must(1, 4, 5)), it.next());
assert_eq!(Some(MultiMatch::must(0, 6, 9)), it.next());
assert_eq!(Some(MultiMatch::must(1, 10, 14)), it.next());
assert_eq!(Some(MultiMatch::must(1, 15, 16)), it.next());
assert_eq!(Some(MultiMatch::must(0, 17, 21)), it.next());
assert_eq!(None, it.next());
sourceimpl Regex
impl Regex
Convenience routines for regex construction.
sourcepub fn config() -> Config
pub fn config() -> Config
Return a default configuration for a Regex
.
This is a convenience routine to avoid needing to import the Config
type when customizing the construction of a regex.
Example
This example shows how to disable UTF-8 mode for Regex
iteration.
When UTF-8 mode is disabled, the position immediately following an
empty match is where the next search begins, instead of the next
position of a UTF-8 encoded codepoint.
use regex_automata::{dfa::regex::Regex, MultiMatch};
let re = Regex::builder()
.configure(Regex::config().utf8(false))
.build(r"")?;
let haystack = "a☃z".as_bytes();
let mut it = re.find_leftmost_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 0)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 3, 3)), it.next());
assert_eq!(Some(MultiMatch::must(0, 4, 4)), it.next());
assert_eq!(Some(MultiMatch::must(0, 5, 5)), it.next());
assert_eq!(None, it.next());
sourcepub fn builder() -> Builder
pub fn builder() -> Builder
Return a builder for configuring the construction of a Regex
.
This is a convenience routine to avoid needing to import the
Builder
type in common cases.
Example
This example shows how to use the builder to disable UTF-8 mode everywhere.
use regex_automata::{
dfa::regex::Regex,
nfa::thompson,
MultiMatch, SyntaxConfig,
};
let re = Regex::builder()
.configure(Regex::config().utf8(false))
.syntax(SyntaxConfig::new().utf8(false))
.thompson(thompson::Config::new().utf8(false))
.build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(MultiMatch::must(0, 1, 9));
let got = re.find_leftmost(haystack);
assert_eq!(expected, got);
sourceimpl<A: Automaton, P: Prefilter> Regex<A, P>
impl<A: Automaton, P: Prefilter> Regex<A, P>
Standard search routines for finding and iterating over matches.
sourcepub fn is_match(&self, haystack: &[u8]) -> bool
pub fn is_match(&self, haystack: &[u8]) -> bool
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. In particular, if the underlying
DFA enters a match state or a dead state, then this routine will return
true
or false
, respectively, without inspecting any future input.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_is_match
.
Example
use regex_automata::dfa::regex::Regex;
let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(true, re.is_match(b"foo12345bar"));
assert_eq!(false, re.is_match(b"foobar"));
sourcepub fn find_earliest(&self, haystack: &[u8]) -> Option<MultiMatch>
pub fn find_earliest(&self, haystack: &[u8]) -> Option<MultiMatch>
Returns the first position at which a match is found.
This routine stops scanning input in precisely the same circumstances
as is_match
. The key difference is that this routine returns the
position at which it stopped scanning input if and only if a match
was found. If no match is found, then None
is returned.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_earliest
.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
// Normally, the leftmost first match would greedily consume as many
// decimal digits as it could. But a match is detected as soon as one
// digit is seen.
let re = Regex::new("foo[0-9]+")?;
assert_eq!(
Some(MultiMatch::must(0, 0, 4)),
re.find_earliest(b"foo12345"),
);
// Normally, the end of the leftmost first match here would be 3,
// but the "earliest" match semantics detect a match earlier.
let re = Regex::new("abc|a")?;
assert_eq!(Some(MultiMatch::must(0, 0, 1)), re.find_earliest(b"abc"));
sourcepub fn find_leftmost(&self, haystack: &[u8]) -> Option<MultiMatch>
pub fn find_leftmost(&self, haystack: &[u8]) -> Option<MultiMatch>
Returns the start and end offset of the leftmost match. If no match
exists, then None
is returned.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_leftmost
.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
// Greediness is applied appropriately when compared to find_earliest.
let re = Regex::new("foo[0-9]+")?;
assert_eq!(
Some(MultiMatch::must(0, 3, 11)),
re.find_leftmost(b"zzzfoo12345zzz"),
);
// Even though a match is found after reading the first byte (`a`),
// the default leftmost-first match semantics demand that we find the
// earliest match that prefers earlier parts of the pattern over latter
// parts.
let re = Regex::new("abc|a")?;
assert_eq!(Some(MultiMatch::must(0, 0, 3)), re.find_leftmost(b"abc"));
sourcepub fn find_overlapping(
&self,
haystack: &[u8],
state: &mut OverlappingState
) -> Option<MultiMatch>
pub fn find_overlapping(
&self,
haystack: &[u8],
state: &mut OverlappingState
) -> Option<MultiMatch>
Search for the first overlapping match in haystack
.
This routine is principally useful when searching for multiple patterns on inputs where multiple patterns may match the same regions of text. In particular, callers must preserve the automaton’s search state from prior calls so that the implementation knows where the last match occurred and which pattern was reported.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_overlapping
.
Example
This example shows how to run an overlapping search with multiple regexes.
use regex_automata::{dfa::{self, regex::Regex}, MatchKind, MultiMatch};
let re = Regex::builder()
.dense(dfa::dense::Config::new().match_kind(MatchKind::All))
.build_many(&[r"\w+$", r"\S+$"])?;
let haystack = "@foo".as_bytes();
let mut state = dfa::OverlappingState::start();
let expected = Some(MultiMatch::must(1, 0, 4));
let got = re.find_overlapping(haystack, &mut state);
assert_eq!(expected, got);
// The first pattern also matches at the same position, so re-running
// the search will yield another match. Notice also that the first
// pattern is returned after the second. This is because the second
// pattern begins its match before the first, is therefore an earlier
// match and is thus reported first.
let expected = Some(MultiMatch::must(0, 1, 4));
let got = re.find_overlapping(haystack, &mut state);
assert_eq!(expected, got);
sourcepub fn find_earliest_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> FindEarliestMatches<'r, 't, A, P>ⓘNotable traits for FindEarliestMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindEarliestMatches<'r, 't, A, P> type Item = MultiMatch;
pub fn find_earliest_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> FindEarliestMatches<'r, 't, A, P>ⓘNotable traits for FindEarliestMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindEarliestMatches<'r, 't, A, P> type Item = MultiMatch;
Returns an iterator over all non-overlapping “earliest” matches.
Match positions are reported as soon as a match is known to occur, even if the standard leftmost match would be longer.
Panics
If the underlying DFAs return an error during iteration, then iteration panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_earliest_iter
.
Example
This example shows how to run an “earliest” iterator.
use regex_automata::{dfa::regex::Regex, MultiMatch};
let re = Regex::new("[0-9]+")?;
let haystack = "123".as_bytes();
// Normally, a standard leftmost iterator would return a single
// match, but since "earliest" detects matches earlier, we get
// three matches.
let mut it = re.find_earliest_iter(haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 3)), it.next());
assert_eq!(None, it.next());
sourcepub fn find_leftmost_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> FindLeftmostMatches<'r, 't, A, P>ⓘNotable traits for FindLeftmostMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindLeftmostMatches<'r, 't, A, P> type Item = MultiMatch;
pub fn find_leftmost_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> FindLeftmostMatches<'r, 't, A, P>ⓘNotable traits for FindLeftmostMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindLeftmostMatches<'r, 't, A, P> type Item = MultiMatch;
Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.
This corresponds to the “standard” regex search iterator.
Panics
If the underlying DFAs return an error during iteration, then iteration panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_leftmost_iter
.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
let re = Regex::new("foo[0-9]+")?;
let text = b"foo1 foo12 foo123";
let matches: Vec<MultiMatch> = re.find_leftmost_iter(text).collect();
assert_eq!(matches, vec![
MultiMatch::must(0, 0, 4),
MultiMatch::must(0, 5, 10),
MultiMatch::must(0, 11, 17),
]);
sourcepub fn find_overlapping_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> FindOverlappingMatches<'r, 't, A, P>ⓘNotable traits for FindOverlappingMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindOverlappingMatches<'r, 't, A, P> type Item = MultiMatch;
pub fn find_overlapping_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> FindOverlappingMatches<'r, 't, A, P>ⓘNotable traits for FindOverlappingMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for FindOverlappingMatches<'r, 't, A, P> type Item = MultiMatch;
Returns an iterator over all overlapping matches in the given haystack.
This routine is principally useful when searching for multiple patterns on inputs where multiple patterns may match the same regions of text. The iterator takes care of handling the overlapping state that must be threaded through every search.
Panics
If the underlying DFAs return an error during iteration, then iteration panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_overlapping_iter
.
Example
This example shows how to run an overlapping search with multiple regexes.
use regex_automata::{dfa::{self, regex::Regex}, MatchKind, MultiMatch};
let re = Regex::builder()
.dense(dfa::dense::Config::new().match_kind(MatchKind::All))
.build_many(&[r"\w+$", r"\S+$"])?;
let haystack = "@foo".as_bytes();
let mut it = re.find_overlapping_iter(haystack);
assert_eq!(Some(MultiMatch::must(1, 0, 4)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 4)), it.next());
assert_eq!(None, it.next());
sourceimpl<A: Automaton, P: Prefilter> Regex<A, P>
impl<A: Automaton, P: Prefilter> Regex<A, P>
Lower level infallible search routines that permit controlling where
the search starts and ends in a particular sequence. This is useful for
executing searches that need to take surrounding context into account. This
is required for correctly implementing iteration because of look-around
operators (^
, $
, \b
).
sourcepub fn is_match_at(&self, haystack: &[u8], start: usize, end: usize) -> bool
pub fn is_match_at(&self, haystack: &[u8], start: usize, end: usize) -> bool
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. In particular, if the underlying
DFA enters a match state or a dead state, then this routine will return
true
or false
, respectively, without inspecting any future input.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_is_match_at
.
sourcepub fn find_earliest_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Option<MultiMatch>
pub fn find_earliest_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Option<MultiMatch>
Returns the first position at which a match is found.
This routine stops scanning input in precisely the same circumstances
as is_match
. The key difference is that this routine returns the
position at which it stopped scanning input if and only if a match
was found. If no match is found, then None
is returned.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
This is useful when implementing an iterator over matches
within the same haystack, which cannot be done correctly by simply
providing a subslice of haystack
.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_earliest_at
.
sourcepub fn find_leftmost_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Option<MultiMatch>
pub fn find_leftmost_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Option<MultiMatch>
Returns the same as find_leftmost
, but starts the search at the given
offset.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, if the DFA is anchored, then
a match can only occur when start == 0
.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
This is useful when implementing an iterator over matches within the
same haystack, which cannot be done correctly by simply providing a
subslice of haystack
.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_leftmost_at
.
sourcepub fn find_overlapping_at(
&self,
haystack: &[u8],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Option<MultiMatch>
pub fn find_overlapping_at(
&self,
haystack: &[u8],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Option<MultiMatch>
Search for the first overlapping match within a given range of
haystack
.
This routine is principally useful when searching for multiple patterns on inputs where multiple patterns may match the same regions of text. In particular, callers must preserve the automaton’s search state from prior calls so that the implementation knows where the last match occurred and which pattern was reported.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
This is useful when implementing an iterator over matches
within the same haystack, which cannot be done correctly by simply
providing a subslice of haystack
.
Panics
If the underlying DFAs return an error, then this routine panics. This only occurs in non-default configurations where quit bytes are used or Unicode word boundaries are heuristically enabled.
The fallible version of this routine is
try_find_overlapping_at
.
sourceimpl<A: Automaton, P: Prefilter> Regex<A, P>
impl<A: Automaton, P: Prefilter> Regex<A, P>
Fallible search routines. These may return an error when the underlying DFAs have been configured in a way that permits them to fail during a search.
Errors during search only occur when the DFA has been explicitly configured to do so, usually by specifying one or more “quit” bytes or by heuristically enabling Unicode word boundaries.
Errors will never be returned using the default configuration. So these fallible routines are only needed for particular configurations.
sourcepub fn try_is_match(&self, haystack: &[u8]) -> Result<bool, MatchError>
pub fn try_is_match(&self, haystack: &[u8]) -> Result<bool, MatchError>
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. In particular, if the underlying
DFA enters a match state or a dead state, then this routine will return
true
or false
, respectively, without inspecting any future input.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
is_match
.
sourcepub fn try_find_earliest(
&self,
haystack: &[u8]
) -> Result<Option<MultiMatch>, MatchError>
pub fn try_find_earliest(
&self,
haystack: &[u8]
) -> Result<Option<MultiMatch>, MatchError>
Returns the first position at which a match is found.
This routine stops scanning input in precisely the same circumstances
as is_match
. The key difference is that this routine returns the
position at which it stopped scanning input if and only if a match
was found. If no match is found, then None
is returned.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_earliest
.
sourcepub fn try_find_leftmost(
&self,
haystack: &[u8]
) -> Result<Option<MultiMatch>, MatchError>
pub fn try_find_leftmost(
&self,
haystack: &[u8]
) -> Result<Option<MultiMatch>, MatchError>
Returns the start and end offset of the leftmost match. If no match
exists, then None
is returned.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_leftmost
.
sourcepub fn try_find_overlapping(
&self,
haystack: &[u8],
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>
pub fn try_find_overlapping(
&self,
haystack: &[u8],
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>
Search for the first overlapping match in haystack
.
This routine is principally useful when searching for multiple patterns on inputs where multiple patterns may match the same regions of text. In particular, callers must preserve the automaton’s search state from prior calls so that the implementation knows where the last match occurred and which pattern was reported.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_overlapping
.
sourcepub fn try_find_earliest_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> TryFindEarliestMatches<'r, 't, A, P>ⓘNotable traits for TryFindEarliestMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for TryFindEarliestMatches<'r, 't, A, P> type Item = Result<MultiMatch, MatchError>;
pub fn try_find_earliest_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> TryFindEarliestMatches<'r, 't, A, P>ⓘNotable traits for TryFindEarliestMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for TryFindEarliestMatches<'r, 't, A, P> type Item = Result<MultiMatch, MatchError>;
Returns an iterator over all non-overlapping “earliest” matches.
Match positions are reported as soon as a match is known to occur, even if the standard leftmost match would be longer.
Errors
This iterator only yields errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_earliest_iter
.
sourcepub fn try_find_leftmost_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> TryFindLeftmostMatches<'r, 't, A, P>ⓘNotable traits for TryFindLeftmostMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for TryFindLeftmostMatches<'r, 't, A, P> type Item = Result<MultiMatch, MatchError>;
pub fn try_find_leftmost_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> TryFindLeftmostMatches<'r, 't, A, P>ⓘNotable traits for TryFindLeftmostMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for TryFindLeftmostMatches<'r, 't, A, P> type Item = Result<MultiMatch, MatchError>;
Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.
This corresponds to the “standard” regex search iterator.
Errors
This iterator only yields errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_leftmost_iter
.
sourcepub fn try_find_overlapping_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> TryFindOverlappingMatches<'r, 't, A, P>ⓘNotable traits for TryFindOverlappingMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for TryFindOverlappingMatches<'r, 't, A, P> type Item = Result<MultiMatch, MatchError>;
pub fn try_find_overlapping_iter<'r, 't>(
&'r self,
haystack: &'t [u8]
) -> TryFindOverlappingMatches<'r, 't, A, P>ⓘNotable traits for TryFindOverlappingMatches<'r, 't, A, P>impl<'r, 't, A: Automaton, P: Prefilter> Iterator for TryFindOverlappingMatches<'r, 't, A, P> type Item = Result<MultiMatch, MatchError>;
Returns an iterator over all overlapping matches in the given haystack.
This routine is principally useful when searching for multiple patterns on inputs where multiple patterns may match the same regions of text. The iterator takes care of handling the overlapping state that must be threaded through every search.
Errors
This iterator only yields errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_overlapping_iter
.
sourceimpl<A: Automaton, P: Prefilter> Regex<A, P>
impl<A: Automaton, P: Prefilter> Regex<A, P>
Lower level fallible search routines that permit controlling where the search starts and ends in a particular sequence.
sourcepub fn try_is_match_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Result<bool, MatchError>
pub fn try_is_match_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Result<bool, MatchError>
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. In particular, if the underlying
DFA enters a match state or a dead state, then this routine will return
true
or false
, respectively, without inspecting any future input.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used, Unicode word boundaries are heuristically enabled or limits are set on the number of times the lazy DFA’s cache may be cleared.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
is_match_at
.
sourcepub fn try_find_earliest_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>
pub fn try_find_earliest_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>
Returns the first position at which a match is found.
This routine stops scanning input in precisely the same circumstances
as is_match
. The key difference is that this routine returns the
position at which it stopped scanning input if and only if a match
was found. If no match is found, then None
is returned.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
This is useful when implementing an iterator over matches
within the same haystack, which cannot be done correctly by simply
providing a subslice of haystack
.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_earliest_at
.
sourcepub fn try_find_leftmost_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>
pub fn try_find_leftmost_at(
&self,
haystack: &[u8],
start: usize,
end: usize
) -> Result<Option<MultiMatch>, MatchError>
Returns the start and end offset of the leftmost match. If no match
exists, then None
is returned.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
This is useful when implementing an iterator over matches
within the same haystack, which cannot be done correctly by simply
providing a subslice of haystack
.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_leftmost_at
.
sourcepub fn try_find_overlapping_at(
&self,
haystack: &[u8],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>
pub fn try_find_overlapping_at(
&self,
haystack: &[u8],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<MultiMatch>, MatchError>
Search for the first overlapping match within a given range of
haystack
.
This routine is principally useful when searching for multiple patterns on inputs where multiple patterns may match the same regions of text. In particular, callers must preserve the automaton’s search state from prior calls so that the implementation knows where the last match occurred and which pattern was reported.
Searching a substring of the haystack
Being an “at” search routine, this permits callers to search a
substring of haystack
by specifying a range in haystack
.
Why expose this as an API instead of just asking callers to use
&input[start..end]
? The reason is that regex matching often wants
to take the surrounding context into account in order to handle
look-around (^
, $
and \b
).
This is useful when implementing an iterator over matches
within the same haystack, which cannot be done correctly by simply
providing a subslice of haystack
.
Errors
This routine only errors if the search could not complete. For DFA-based regexes, this only occurs in a non-default configuration where quit bytes are used or Unicode word boundaries are heuristically enabled.
When a search cannot complete, callers cannot know whether a match exists or not.
The infallible (panics on error) version of this routine is
find_overlapping_at
.
sourceimpl<A: Automaton, P: Prefilter> Regex<A, P>
impl<A: Automaton, P: Prefilter> Regex<A, P>
Non-search APIs for querying information about the regex and setting a prefilter.
sourcepub fn with_prefilter<Q: Prefilter>(self, prefilter: Q) -> Regex<A, Q>
pub fn with_prefilter<Q: Prefilter>(self, prefilter: Q) -> Regex<A, Q>
Attach the given prefilter to this regex.
sourcepub fn without_prefilter(self) -> Regex<A>
pub fn without_prefilter(self) -> Regex<A>
Remove any prefilter from this regex.
sourcepub fn forward(&self) -> &A
pub fn forward(&self) -> &A
Return the underlying DFA responsible for forward matching.
This is useful for accessing the underlying DFA and converting it to
some other format or size. See the Builder::build_from_dfas
docs
for an example of where this might be useful.
sourcepub fn reverse(&self) -> &A
pub fn reverse(&self) -> &A
Return the underlying DFA responsible for reverse matching.
This is useful for accessing the underlying DFA and converting it to
some other format or size. See the Builder::build_from_dfas
docs
for an example of where this might be useful.
sourcepub fn pattern_count(&self) -> usize
pub fn pattern_count(&self) -> usize
Returns the total number of patterns matched by this regex.
Example
use regex_automata::{MultiMatch, dfa::regex::Regex};
let re = Regex::new_many(&[r"[a-z]+", r"[0-9]+", r"\w+"])?;
assert_eq!(3, re.pattern_count());
Trait Implementations
Auto Trait Implementations
impl<A, P> RefUnwindSafe for Regex<A, P> where
A: RefUnwindSafe,
P: RefUnwindSafe,
impl<A, P> Send for Regex<A, P> where
A: Send,
P: Send,
impl<A, P> Sync for Regex<A, P> where
A: Sync,
P: Sync,
impl<A, P> Unpin for Regex<A, P> where
A: Unpin,
P: Unpin,
impl<A, P> UnwindSafe for Regex<A, P> where
A: UnwindSafe,
P: UnwindSafe,
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
sourceimpl<T> ToOwned for T where
T: Clone,
impl<T> ToOwned for T where
T: Clone,
type Owned = T
type Owned = T
The resulting type after obtaining ownership.
sourcefn clone_into(&self, target: &mut T)
fn clone_into(&self, target: &mut T)
toowned_clone_into
)Uses borrowed data to replace owned data, usually by cloning. Read more