Struct regex_automata::dfa::regex::Builder

source · [−]

pub struct Builder { /* private fields */ }

Expand description

A builder for a regex based on deterministic finite automatons.

This builder permits configuring options for the syntax of a pattern, the NFA construction, the DFA construction and finally the regex searching itself. This builder is different from a general purpose regex builder in that it permits fine grain configuration of the construction process. The trade off for this is complexity, and the possibility of setting a configuration that might not make sense. For example, there are three different UTF-8 modes:

SyntaxConfig::utf8 controls whether the pattern itself can contain sub-expressions that match invalid UTF-8.
nfa::thompson::Config::utf8 controls whether the implicit unanchored prefix added to the NFA can match through invalid UTF-8 or not.
Config::utf8 controls how the regex iterators themselves advance the starting position of the next search when a match with zero length is found.

Generally speaking, callers will want to either enable all of these or disable all of these.

Internally, building a regex requires building two DFAs, where one is responsible for finding the end of a match and the other is responsible for finding the start of a match. If you only need to detect whether something matched, or only the end of a match, then you should use a dense::Builder to construct a single DFA, which is cheaper than building two DFAs.

Build methods

This builder has a few “build” methods. In general, it’s the result of combining the following parameters:

Building one or many regexes.
Building a regex with dense or sparse DFAs.

The simplest “build” method is Builder::build. It accepts a single pattern and builds a dense DFA using usize for the state identifier representation.

The most general “build” method is Builder::build_many, which permits building a regex that searches for multiple patterns simultaneously while using a specific state identifier representation.

The most flexible “build” method, but hardest to use, is Builder::build_from_dfas. This exposes the fact that a Regex is just a pair of DFAs, and this method allows you to specify those DFAs exactly.

Example

This example shows how to disable UTF-8 mode in the syntax, the NFA and the regex itself. This is generally what you want for matching on arbitrary bytes.

use regex_automata::{
    dfa::regex::Regex, nfa::thompson, MultiMatch, SyntaxConfig
};

let re = Regex::builder()
    .configure(Regex::config().utf8(false))
    .syntax(SyntaxConfig::new().utf8(false))
    .thompson(thompson::Config::new().utf8(false))
    .build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(MultiMatch::must(0, 1, 9));
let got = re.find_leftmost(haystack);
assert_eq!(expected, got);
// Notice that `(?-u:[^b])` matches invalid UTF-8,
// but the subsequent `.*` does not! Disabling UTF-8
// on the syntax permits this. Notice also that the
// search was unanchored and skipped over invalid UTF-8.
// Disabling UTF-8 on the Thompson NFA permits this.
//
// N.B. This example does not show the impact of
// disabling UTF-8 mode on Config, since that
// only impacts regexes that can produce matches of
// length 0.
assert_eq!(b"foo\xFFarzz", &haystack[got.unwrap().range()]);

Struct regex_automata::dfa::regex::Builder

Implementations

impl Builder

pub fn new() -> Builder

pub fn build(&self, pattern: &str) -> Result<Regex, Error>

pub fn build_sparse(&self, pattern: &str) -> Result<Regex<DFA<Vec<u8>>>, Error>

pub fn build_many<P: AsRef<str>>(&self, patterns: &[P]) -> Result<Regex, Error>

pub fn build_many_sparse<P: AsRef<str>>( &self, patterns: &[P]) -> Result<Regex<DFA<Vec<u8>>>, Error>

pub fn build_from_dfas<A: Automaton>(&self, forward: A, reverse: A) -> Regex<A>

pub fn configure(&mut self, config: Config) -> &mut Builder

pub fn syntax(&mut self, config: SyntaxConfig) -> &mut Builder

pub fn thompson(&mut self, config: Config) -> &mut Builder

pub fn dense(&mut self, config: Config) -> &mut Builder

Trait Implementations

impl Clone for Builder

fn clone(&self) -> Builder

fn clone_from(&mut self, source: &Self)

impl Debug for Builder

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for Builder

fn default() -> Builder

Auto Trait Implementations

impl RefUnwindSafe for Builder

impl Send for Builder

impl Sync for Builder

impl Unpin for Builder

impl UnwindSafe for Builder

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

fn into(self) -> U

impl<T> ToOwned for T where T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

pub fn build_many_sparse<P: AsRef<str>>(
&self,
patterns: &[P]
) -> Result<Regex<DFA<Vec<u8>>>, Error>

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T> ToOwned for T where
T: Clone,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,