pub struct Builder { /* private fields */ }
Expand description
A builder for constructing a deterministic finite automaton from regular expressions.
This builder provides two main things:
- It provides a few different
build
routines for actually constructing a DFA from different kinds of inputs. The most convenient isBuilder::build
, which builds a DFA directly from a pattern string. The most flexible isBuilder::build_from_nfa
, which builds a DFA straight from an NFA. - The builder permits configuring a number of things.
Builder::configure
is used withConfig
to configure aspects of the DFA and the construction process itself.Builder::syntax
andBuilder::thompson
permit configuring the regex parser and Thompson NFA construction, respectively. The syntax and thompson configurations only apply when building from a pattern string.
This builder always constructs a single DFA. As such, this builder
can only be used to construct regexes that either detect the presence
of a match or find the end location of a match. A single DFA cannot
produce both the start and end of a match. For that information, use a
Regex
, which can be similarly configured
using regex::Builder
. The main reason to
use a DFA directly is if the end location of a match is enough for your use
case. Namely, a Regex
will construct two DFAs instead of one, since a
second reverse DFA is needed to find the start of a match.
Note that if one wants to build a sparse DFA, you must first build a dense DFA and convert that to a sparse DFA. There is no way to build a sparse DFA without first building a dense DFA.
Example
This example shows how to build a minimized DFA that completely disables Unicode. That is:
- Things such as
\w
,.
and\b
are no longer Unicode-aware.\w
and\b
are ASCII-only while.
matches any byte except for\n
(instead of any UTF-8 encoding of a Unicode scalar value except for\n
). Things that are Unicode only, such as\pL
, are not allowed. - The pattern itself is permitted to match invalid UTF-8. For example,
things like
[^a]
that match any byte except fora
are permitted. - Unanchored patterns can search through invalid UTF-8. That is, for
unanchored patterns, the implicit prefix is
(?s-u:.)*?
instead of(?s:.)*?
.
use regex_automata::{
dfa::{Automaton, dense},
nfa::thompson,
HalfMatch, SyntaxConfig,
};
let dfa = dense::Builder::new()
.configure(dense::Config::new().minimize(false))
.syntax(SyntaxConfig::new().unicode(false).utf8(false))
.thompson(thompson::Config::new().utf8(false))
.build(r"foo[^b]ar.*")?;
let haystack = b"\xFEfoo\xFFar\xE2\x98\xFF\n";
let expected = Some(HalfMatch::must(0, 10));
let got = dfa.find_leftmost_fwd(haystack)?;
assert_eq!(expected, got);
Implementations
sourceimpl Builder
impl Builder
sourcepub fn build(&self, pattern: &str) -> Result<DFA<Vec<u32>>, Error>
pub fn build(&self, pattern: &str) -> Result<DFA<Vec<u32>>, Error>
Build a DFA from the given pattern.
If there was a problem parsing or compiling the pattern, then an error is returned.
sourcepub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P]
) -> Result<DFA<Vec<u32>>, Error>
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P]
) -> Result<DFA<Vec<u32>>, Error>
Build a DFA from the given patterns.
When matches are returned, the pattern ID corresponds to the index of the pattern in the slice given.
sourcepub fn build_from_nfa(&self, nfa: &NFA) -> Result<DFA<Vec<u32>>, Error>
pub fn build_from_nfa(&self, nfa: &NFA) -> Result<DFA<Vec<u32>>, Error>
Build a DFA from the given NFA.
Example
This example shows how to build a DFA if you already have an NFA in hand.
use regex_automata::{
dfa::{Automaton, dense},
nfa::thompson,
HalfMatch,
};
let haystack = "foo123bar".as_bytes();
// This shows how to set non-default options for building an NFA.
let nfa = thompson::Builder::new()
.configure(thompson::Config::new().shrink(false))
.build(r"[0-9]+")?;
let dfa = dense::Builder::new().build_from_nfa(&nfa)?;
let expected = Some(HalfMatch::must(0, 6));
let got = dfa.find_leftmost_fwd(haystack)?;
assert_eq!(expected, got);
sourcepub fn configure(&mut self, config: Config) -> &mut Builder
pub fn configure(&mut self, config: Config) -> &mut Builder
Apply the given dense DFA configuration options to this builder.
sourcepub fn syntax(&mut self, config: SyntaxConfig) -> &mut Builder
pub fn syntax(&mut self, config: SyntaxConfig) -> &mut Builder
Set the syntax configuration for this builder using
SyntaxConfig
.
This permits setting things like case insensitivity, Unicode and multi line mode.
These settings only apply when constructing a DFA directly from a pattern.
sourcepub fn thompson(&mut self, config: Config) -> &mut Builder
pub fn thompson(&mut self, config: Config) -> &mut Builder
Set the Thompson NFA configuration for this builder using
nfa::thompson::Config
.
This permits setting things like whether the DFA should match the regex in reverse or if additional time should be spent shrinking the size of the NFA.
These settings only apply when constructing a DFA directly from a pattern.
Trait Implementations
Auto Trait Implementations
impl RefUnwindSafe for Builder
impl Send for Builder
impl Sync for Builder
impl Unpin for Builder
impl UnwindSafe for Builder
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
sourceimpl<T> ToOwned for T where
T: Clone,
impl<T> ToOwned for T where
T: Clone,
type Owned = T
type Owned = T
The resulting type after obtaining ownership.
sourcefn clone_into(&self, target: &mut T)
fn clone_into(&self, target: &mut T)
toowned_clone_into
)Uses borrowed data to replace owned data, usually by cloning. Read more