pub struct Config { /* private fields */ }
Expand description

The configuration used for compiling a hybrid NFA/DFA regex.

A regex configuration is a simple data object that is typically used with Builder::configure.

Implementations

Return a new default regex compiler configuration.

Whether to enable UTF-8 mode or not.

When UTF-8 mode is enabled (the default) and an empty match is seen, the iterators on Regex will always start the next search at the next UTF-8 encoded codepoint when searching valid UTF-8. When UTF-8 mode is disabled, such searches are begun at the next byte offset.

If this mode is enabled and invalid UTF-8 is given to search, then behavior is unspecified.

Generally speaking, one should enable this when SyntaxConfig::utf8 and thompson::Config::utf8 are enabled, and disable it otherwise.

Example

This example demonstrates the differences between when this option is enabled and disabled. The differences only arise when the regex can return matches of length zero.

In this first snippet, we show the results when UTF-8 mode is disabled.

use regex_automata::{hybrid::regex::Regex, MultiMatch};

let re = Regex::builder()
    .configure(Regex::config().utf8(false))
    .build(r"")?;
let mut cache = re.create_cache();

let haystack = "a☃z".as_bytes();
let mut it = re.find_leftmost_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 0)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 3, 3)), it.next());
assert_eq!(Some(MultiMatch::must(0, 4, 4)), it.next());
assert_eq!(Some(MultiMatch::must(0, 5, 5)), it.next());
assert_eq!(None, it.next());

And in this snippet, we execute the same search on the same haystack, but with UTF-8 mode enabled. Notice that byte offsets that would otherwise split the encoding of are not returned.

use regex_automata::{hybrid::regex::Regex, MultiMatch};

let re = Regex::builder()
    .configure(Regex::config().utf8(true))
    .build(r"")?;
let mut cache = re.create_cache();

let haystack = "a☃z".as_bytes();
let mut it = re.find_leftmost_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 0)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 4, 4)), it.next());
assert_eq!(Some(MultiMatch::must(0, 5, 5)), it.next());
assert_eq!(None, it.next());

Returns true if and only if this configuration has UTF-8 mode enabled.

When UTF-8 mode is enabled and an empty match is seen, the iterators on Regex will always start the next search at the next UTF-8 encoded codepoint. When UTF-8 mode is disabled, such searches are begun at the next byte offset.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Returns the “default value” for a type. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.