pub struct Config { /* private fields */ }
Expand description
The configuration used for compiling a hybrid NFA/DFA regex.
A regex configuration is a simple data object that is typically used with
Builder::configure
.
Implementations
sourceimpl Config
impl Config
sourcepub fn utf8(self, yes: bool) -> Config
pub fn utf8(self, yes: bool) -> Config
Whether to enable UTF-8 mode or not.
When UTF-8 mode is enabled (the default) and an empty match is seen,
the iterators on Regex
will always start the next search at the
next UTF-8 encoded codepoint when searching valid UTF-8. When UTF-8
mode is disabled, such searches are begun at the next byte offset.
If this mode is enabled and invalid UTF-8 is given to search, then behavior is unspecified.
Generally speaking, one should enable this when
SyntaxConfig::utf8
and
thompson::Config::utf8
are enabled, and disable it otherwise.
Example
This example demonstrates the differences between when this option is enabled and disabled. The differences only arise when the regex can return matches of length zero.
In this first snippet, we show the results when UTF-8 mode is disabled.
use regex_automata::{hybrid::regex::Regex, MultiMatch};
let re = Regex::builder()
.configure(Regex::config().utf8(false))
.build(r"")?;
let mut cache = re.create_cache();
let haystack = "a☃z".as_bytes();
let mut it = re.find_leftmost_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 0)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 2, 2)), it.next());
assert_eq!(Some(MultiMatch::must(0, 3, 3)), it.next());
assert_eq!(Some(MultiMatch::must(0, 4, 4)), it.next());
assert_eq!(Some(MultiMatch::must(0, 5, 5)), it.next());
assert_eq!(None, it.next());
And in this snippet, we execute the same search on the same haystack,
but with UTF-8 mode enabled. Notice that byte offsets that would
otherwise split the encoding of ☃
are not returned.
use regex_automata::{hybrid::regex::Regex, MultiMatch};
let re = Regex::builder()
.configure(Regex::config().utf8(true))
.build(r"")?;
let mut cache = re.create_cache();
let haystack = "a☃z".as_bytes();
let mut it = re.find_leftmost_iter(&mut cache, haystack);
assert_eq!(Some(MultiMatch::must(0, 0, 0)), it.next());
assert_eq!(Some(MultiMatch::must(0, 1, 1)), it.next());
assert_eq!(Some(MultiMatch::must(0, 4, 4)), it.next());
assert_eq!(Some(MultiMatch::must(0, 5, 5)), it.next());
assert_eq!(None, it.next());
sourcepub fn get_utf8(&self) -> bool
pub fn get_utf8(&self) -> bool
Returns true if and only if this configuration has UTF-8 mode enabled.
When UTF-8 mode is enabled and an empty match is seen, the iterators on
Regex
will always start the next search at the next UTF-8 encoded
codepoint. When UTF-8 mode is disabled, such searches are begun at the
next byte offset.
Trait Implementations
impl Copy for Config
Auto Trait Implementations
impl RefUnwindSafe for Config
impl Send for Config
impl Sync for Config
impl Unpin for Config
impl UnwindSafe for Config
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
sourceimpl<T> ToOwned for T where
T: Clone,
impl<T> ToOwned for T where
T: Clone,
type Owned = T
type Owned = T
The resulting type after obtaining ownership.
sourcefn clone_into(&self, target: &mut T)
fn clone_into(&self, target: &mut T)
toowned_clone_into
)Uses borrowed data to replace owned data, usually by cloning. Read more