Struct grep_pcre2::RegexMatcherBuilder
source · pub struct RegexMatcherBuilder { /* private fields */ }
Expand description
A builder for configuring the compilation of a PCRE2 regex.
Implementations§
source§impl RegexMatcherBuilder
impl RegexMatcherBuilder
sourcepub fn new() -> RegexMatcherBuilder
pub fn new() -> RegexMatcherBuilder
Create a new matcher builder with a default configuration.
sourcepub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error>
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error>
Compile the given pattern into a PCRE matcher using the current configuration.
If there was a problem compiling the pattern, then an error is returned.
sourcepub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error>
pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<RegexMatcher, Error>
Compile all of the given patterns into a single regex that matches when at least one of the patterns matches.
If there was a problem building the regex, then an error is returned.
sourcepub fn caseless(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn caseless(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enables case insensitive matching.
If the utf
option is also set, then Unicode case folding is used
to determine case insensitivity. When the utf
option is not set,
then only standard ASCII case insensitivity is considered.
This option corresponds to the i
flag.
sourcepub fn case_smart(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn case_smart(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Whether to enable “smart case” or not.
When smart case is enabled, the builder will automatically enable case insensitive matching based on how the pattern is written. Namely, case insensitive mode is enabled when both of the following things are believed to be true:
- The pattern contains at least one literal character. For example,
a\w
contains a literal (a
) but\w
does not. - Of the literals in the pattern, none of them are considered to be
uppercase according to Unicode. For example,
foo\pL
has no uppercase literals butFoo\pL
does.
Note that the implementation of this is not perfect. Namely, \p{Ll}
will prevent case insensitive matching even though it is part of a meta
sequence. This bug will probably never be fixed.
sourcepub fn dotall(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn dotall(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enables “dot all” matching.
When enabled, the .
metacharacter in the pattern matches any
character, include \n
. When disabled (the default), .
will match
any character except for \n
.
This option corresponds to the s
flag.
sourcepub fn extended(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn extended(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable “extended” mode in the pattern, where whitespace is ignored.
This option corresponds to the x
flag.
sourcepub fn multi_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn multi_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable multiline matching mode.
When enabled, the ^
and $
anchors will match both at the beginning
and end of a subject string, in addition to matching at the start of
a line and the end of a line. When disabled, the ^
and $
anchors
will only match at the beginning and end of a subject string.
This option corresponds to the m
flag.
sourcepub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable matching of CRLF as a line terminator.
When enabled, anchors such as ^
and $
will match any of the
following as a line terminator: \r
, \n
or \r\n
.
This is disabled by default, in which case, only \n
is recognized as
a line terminator.
sourcepub fn word(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn word(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Require that all matches occur on word boundaries.
Enabling this option is subtly different than putting \b
assertions
on both sides of your pattern. In particular, a \b
assertion requires
that one side of it match a word character while the other match a
non-word character. This option, in contrast, merely requires that
one side match a non-word character.
For example, \b-2\b
will not match foo -2 bar
since -
is not a
word character. However, -2
with this word
option enabled will
match the -2
in foo -2 bar
.
sourcepub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Whether the patterns should be treated as literal strings or not. When this is active, all characters, including ones that would normally be special regex meta characters, are matched literally.
sourcepub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Whether each pattern should match the entire line or not. This is
equivalent to surrounding the pattern with (?m:^)
and (?m:$)
.
sourcepub fn ucp(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn ucp(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable Unicode matching mode.
When enabled, the following patterns become Unicode aware: \b
, \B
,
\d
, \D
, \s
, \S
, \w
, \W
.
When set, this implies UTF matching mode. It is not possible to enable Unicode matching mode without enabling UTF matching mode.
This is disabled by default.
sourcepub fn utf(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn utf(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable UTF matching mode.
When enabled, characters are treated as sequences of code units that
make up a single codepoint instead of as single bytes. For example,
this will cause .
to match any single UTF-8 encoded codepoint, where
as when this is disabled, .
will any single byte (except for \n
in
both cases, unless “dot all” mode is enabled).
Note that when UTF matching mode is enabled, every search performed
will do a UTF-8 validation check, which can impact performance. The
UTF-8 check can be disabled via the disable_utf_check
option, but it
is undefined behavior to enable UTF matching mode and search invalid
UTF-8.
This is disabled by default.
sourcepub fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder
👎Deprecated since 0.2.4: now a no-op due to new PCRE2 features
pub fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder
This is now deprecated and is a no-op.
Previously, this option permitted disabling PCRE2’s UTF-8 validity
check, which could result in undefined behavior if the haystack was
not valid UTF-8. But PCRE2 introduced a new option, PCRE2_MATCH_INVALID_UTF
,
in 10.34 which this crate always sets. When this option is enabled,
PCRE2 claims to not have undefined behavior when the haystack is
invalid UTF-8.
Therefore, disabling the UTF-8 check is not something that is exposed by this crate.
sourcepub fn jit(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn jit(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable PCRE2’s JIT and return an error if it’s not available.
This generally speeds up matching quite a bit. The downside is that it can increase the time it takes to compile a pattern.
If the JIT isn’t available or if JIT compilation returns an error, then regex compilation will fail with the corresponding error.
This is disabled by default, and always overrides jit_if_available
.
sourcepub fn jit_if_available(&mut self, yes: bool) -> &mut RegexMatcherBuilder
pub fn jit_if_available(&mut self, yes: bool) -> &mut RegexMatcherBuilder
Enable PCRE2’s JIT if it’s available.
This generally speeds up matching quite a bit. The downside is that it can increase the time it takes to compile a pattern.
If the JIT isn’t available or if JIT compilation returns an error, then a debug message with the error will be emitted and the regex will otherwise silently fall back to non-JIT matching.
This is disabled by default, and always overrides jit
.
sourcepub fn max_jit_stack_size(
&mut self,
bytes: Option<usize>,
) -> &mut RegexMatcherBuilder
pub fn max_jit_stack_size( &mut self, bytes: Option<usize>, ) -> &mut RegexMatcherBuilder
Set the maximum size of PCRE2’s JIT stack, in bytes. If the JIT is not enabled, then this has no effect.
When None
is given, no custom JIT stack will be created, and instead,
the default JIT stack is used. When the default is used, its maximum
size is 32 KB.
When this is set, then a new JIT stack will be created with the given maximum size as its limit.
Increasing the stack size can be useful for larger regular expressions.
By default, this is set to None
.
Trait Implementations§
source§impl Clone for RegexMatcherBuilder
impl Clone for RegexMatcherBuilder
source§fn clone(&self) -> RegexMatcherBuilder
fn clone(&self) -> RegexMatcherBuilder
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreAuto Trait Implementations§
impl Freeze for RegexMatcherBuilder
impl RefUnwindSafe for RegexMatcherBuilder
impl Send for RegexMatcherBuilder
impl Sync for RegexMatcherBuilder
impl Unpin for RegexMatcherBuilder
impl UnwindSafe for RegexMatcherBuilder
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)