pub struct Builder { /* private fields */ }
dfa-search
or dfa-onepass
) and crate feature dfa-search
only.Expand description
A builder for a regex based on deterministic finite automatons.
This builder permits configuring options for the syntax of a pattern, the NFA construction, the DFA construction and finally the regex searching itself. This builder is different from a general purpose regex builder in that it permits fine grain configuration of the construction process. The trade off for this is complexity, and the possibility of setting a configuration that might not make sense. For example, there are two different UTF-8 modes:
syntax::Config::utf8
controls whether the pattern itself can contain sub-expressions that match invalid UTF-8.thompson::Config::utf8
controls how the regex iterators themselves advance the starting position of the next search when a match with zero length is found.
Generally speaking, callers will want to either enable all of these or disable all of these.
Internally, building a regex requires building two DFAs, where one is
responsible for finding the end of a match and the other is responsible
for finding the start of a match. If you only need to detect whether
something matched, or only the end of a match, then you should use a
dense::Builder
to construct a single DFA, which is cheaper than
building two DFAs.
§Build methods
This builder has a few “build” methods. In general, it’s the result of combining the following parameters:
- Building one or many regexes.
- Building a regex with dense or sparse DFAs.
The simplest “build” method is Builder::build
. It accepts a single
pattern and builds a dense DFA using usize
for the state identifier
representation.
The most general “build” method is Builder::build_many
, which permits
building a regex that searches for multiple patterns simultaneously while
using a specific state identifier representation.
The most flexible “build” method, but hardest to use, is
Builder::build_from_dfas
. This exposes the fact that a Regex
is
just a pair of DFAs, and this method allows you to specify those DFAs
exactly.
§Example
This example shows how to disable UTF-8 mode in the syntax and the regex itself. This is generally what you want for matching on arbitrary bytes.
use regex_automata::{
dfa::regex::Regex, nfa::thompson, util::syntax, Match,
};
let re = Regex::builder()
.syntax(syntax::Config::new().utf8(false))
.thompson(thompson::Config::new().utf8(false))
.build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
let got = re.find(haystack);
assert_eq!(expected, got);
// Notice that `(?-u:[^b])` matches invalid UTF-8,
// but the subsequent `.*` does not! Disabling UTF-8
// on the syntax permits this.
assert_eq!(b"foo\xFFarzz", &haystack[got.unwrap().range()]);
Implementations§
source§impl Builder
impl Builder
sourcepub fn build(&self, pattern: &str) -> Result<Regex, BuildError>
Available on crate features syntax
and dfa-build
only.
pub fn build(&self, pattern: &str) -> Result<Regex, BuildError>
syntax
and dfa-build
only.Build a regex from the given pattern.
If there was a problem parsing or compiling the pattern, then an error is returned.
sourcepub fn build_sparse(
&self,
pattern: &str,
) -> Result<Regex<DFA<Vec<u8>>>, BuildError>
Available on crate features syntax
and dfa-build
only.
pub fn build_sparse( &self, pattern: &str, ) -> Result<Regex<DFA<Vec<u8>>>, BuildError>
syntax
and dfa-build
only.Build a regex from the given pattern using sparse DFAs.
If there was a problem parsing or compiling the pattern, then an error is returned.
sourcepub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<Regex, BuildError>
Available on crate features syntax
and dfa-build
only.
pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<Regex, BuildError>
syntax
and dfa-build
only.Build a regex from the given patterns.
sourcepub fn build_many_sparse<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<Regex<DFA<Vec<u8>>>, BuildError>
Available on crate features syntax
and dfa-build
only.
pub fn build_many_sparse<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<Regex<DFA<Vec<u8>>>, BuildError>
syntax
and dfa-build
only.Build a sparse regex from the given patterns.
sourcepub fn build_from_dfas<A: Automaton>(&self, forward: A, reverse: A) -> Regex<A>
pub fn build_from_dfas<A: Automaton>(&self, forward: A, reverse: A) -> Regex<A>
Build a regex from its component forward and reverse DFAs.
This is useful when deserializing a regex from some arbitrary memory region. This is also useful for building regexes from other types of DFAs.
If you’re building the DFAs from scratch instead of building new DFAs from other DFAs, then you’ll need to make sure that the reverse DFA is configured correctly to match the intended semantics. Namely:
- It should be anchored.
- It should use
MatchKind::All
semantics. - It should match in reverse.
- Otherwise, its configuration should match the forward DFA.
If these conditions aren’t satisfied, then the behavior of searches is unspecified.
Note that when using this constructor, no configuration is applied. Since this routine provides the DFAs to the builder, there is no opportunity to apply other configuration options.
§Example
This example is a bit a contrived. The usual use of these methods
would involve serializing initial_re
somewhere and then deserializing
it later to build a regex. But in this case, we do everything in
memory.
use regex_automata::dfa::regex::Regex;
let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));
let (fwd, rev) = (initial_re.forward(), initial_re.reverse());
let re = Regex::builder().build_from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));
This example shows how to build a Regex
that uses sparse DFAs instead
of dense DFAs without using one of the convenience build_sparse
routines:
use regex_automata::dfa::regex::Regex;
let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));
let fwd = initial_re.forward().to_sparse()?;
let rev = initial_re.reverse().to_sparse()?;
let re = Regex::builder().build_from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));
sourcepub fn syntax(&mut self, config: Config) -> &mut Builder
Available on crate features syntax
and dfa-build
only.
pub fn syntax(&mut self, config: Config) -> &mut Builder
syntax
and dfa-build
only.Set the syntax configuration for this builder using
syntax::Config
.
This permits setting things like case insensitivity, Unicode and multi line mode.
sourcepub fn thompson(&mut self, config: Config) -> &mut Builder
Available on crate features syntax
and dfa-build
only.
pub fn thompson(&mut self, config: Config) -> &mut Builder
syntax
and dfa-build
only.Set the Thompson NFA configuration for this builder using
nfa::thompson::Config
.
This permits setting things like whether additional time should be spent shrinking the size of the NFA.
sourcepub fn dense(&mut self, config: Config) -> &mut Builder
Available on crate feature dfa-build
only.
pub fn dense(&mut self, config: Config) -> &mut Builder
dfa-build
only.Set the dense DFA compilation configuration for this builder using
dense::Config
.
This permits setting things like whether the underlying DFAs should be minimized.
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for Builder
impl !RefUnwindSafe for Builder
impl Send for Builder
impl !Sync for Builder
impl Unpin for Builder
impl UnwindSafe for Builder
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)