Trait grep_matcher::Matcher
source · pub trait Matcher {
type Captures: Captures;
type Error: Display;
Show 25 methods
// Required methods
fn find_at(
&self,
haystack: &[u8],
at: usize
) -> Result<Option<Match>, Self::Error>;
fn new_captures(&self) -> Result<Self::Captures, Self::Error>;
// Provided methods
fn capture_count(&self) -> usize { ... }
fn capture_index(&self, _name: &str) -> Option<usize> { ... }
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> { ... }
fn find_iter<F>(
&self,
haystack: &[u8],
matched: F
) -> Result<(), Self::Error>
where F: FnMut(Match) -> bool { ... }
fn find_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
matched: F
) -> Result<(), Self::Error>
where F: FnMut(Match) -> bool { ... }
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
matched: F
) -> Result<Result<(), E>, Self::Error>
where F: FnMut(Match) -> Result<bool, E> { ... }
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
matched: F
) -> Result<Result<(), E>, Self::Error>
where F: FnMut(Match) -> Result<bool, E> { ... }
fn captures(
&self,
haystack: &[u8],
caps: &mut Self::Captures
) -> Result<bool, Self::Error> { ... }
fn captures_iter<F>(
&self,
haystack: &[u8],
caps: &mut Self::Captures,
matched: F
) -> Result<(), Self::Error>
where F: FnMut(&Self::Captures) -> bool { ... }
fn captures_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F
) -> Result<(), Self::Error>
where F: FnMut(&Self::Captures) -> bool { ... }
fn try_captures_iter<F, E>(
&self,
haystack: &[u8],
caps: &mut Self::Captures,
matched: F
) -> Result<Result<(), E>, Self::Error>
where F: FnMut(&Self::Captures) -> Result<bool, E> { ... }
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F
) -> Result<Result<(), E>, Self::Error>
where F: FnMut(&Self::Captures) -> Result<bool, E> { ... }
fn captures_at(
&self,
_haystack: &[u8],
_at: usize,
_caps: &mut Self::Captures
) -> Result<bool, Self::Error> { ... }
fn replace<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
append: F
) -> Result<(), Self::Error>
where F: FnMut(Match, &mut Vec<u8>) -> bool { ... }
fn replace_with_captures<F>(
&self,
haystack: &[u8],
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F
) -> Result<(), Self::Error>
where F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool { ... }
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F
) -> Result<(), Self::Error>
where F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool { ... }
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> { ... }
fn is_match_at(
&self,
haystack: &[u8],
at: usize
) -> Result<bool, Self::Error> { ... }
fn shortest_match(
&self,
haystack: &[u8]
) -> Result<Option<usize>, Self::Error> { ... }
fn shortest_match_at(
&self,
haystack: &[u8],
at: usize
) -> Result<Option<usize>, Self::Error> { ... }
fn non_matching_bytes(&self) -> Option<&ByteSet> { ... }
fn line_terminator(&self) -> Option<LineTerminator> { ... }
fn find_candidate_line(
&self,
haystack: &[u8]
) -> Result<Option<LineMatchKind>, Self::Error> { ... }
}
Expand description
A matcher defines an interface for regular expression implementations.
While this trait is large, there are only two required methods that
implementors must provide: find_at
and new_captures
. If captures aren’t
supported by your implementation, then new_captures
can be implemented
with NoCaptures
. If your implementation does support capture groups,
then you should also implement the other capture related methods, as
dictated by the documentation. Crucially, this includes captures_at
.
The rest of the methods on this trait provide default implementations on
top of find_at
and new_captures
. It is not uncommon for implementations
to be able to provide faster variants of some methods; in those cases,
simply override the default implementation.
Required Associated Types§
Required Methods§
sourcefn find_at(
&self,
haystack: &[u8],
at: usize
) -> Result<Option<Match>, Self::Error>
fn find_at( &self, haystack: &[u8], at: usize ) -> Result<Option<Match>, Self::Error>
Returns the start and end byte range of the first match in haystack
after at
, where the byte offsets are relative to that start of
haystack
(and not at
). If no match exists, then None
is returned.
The text encoding of haystack
is not strictly specified. Matchers are
advised to assume UTF-8, or at worst, some ASCII compatible encoding.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn new_captures(&self) -> Result<Self::Captures, Self::Error>
fn new_captures(&self) -> Result<Self::Captures, Self::Error>
Creates an empty group of captures suitable for use with the capturing APIs of this trait.
Implementations that don’t support capturing groups should use
the NoCaptures
type and implement this method by calling
NoCaptures::new()
.
Provided Methods§
sourcefn capture_count(&self) -> usize
fn capture_count(&self) -> usize
Returns the total number of capturing groups in this matcher.
If a matcher supports capturing groups, then this value must always be at least 1, where the first capturing group always corresponds to the overall match.
If a matcher does not support capturing groups, then this should always return 0.
By default, capturing groups are not supported, so this always returns 0.
sourcefn capture_index(&self, _name: &str) -> Option<usize>
fn capture_index(&self, _name: &str) -> Option<usize>
Maps the given capture group name to its corresponding capture group
index, if one exists. If one does not exist, then None
is returned.
If the given capture group name maps to multiple indices, then it is not specified which one is returned. However, it is guaranteed that one of them is returned.
By default, capturing groups are not supported, so this always returns
None
.
sourcefn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error>
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error>
Returns the start and end byte range of the first match in haystack
.
If no match exists, then None
is returned.
The text encoding of haystack
is not strictly specified. Matchers are
advised to assume UTF-8, or at worst, some ASCII compatible encoding.
sourcefn find_iter<F>(&self, haystack: &[u8], matched: F) -> Result<(), Self::Error>
fn find_iter<F>(&self, haystack: &[u8], matched: F) -> Result<(), Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
. If no match exists, then the given function is never
called. If the function returns false
, then iteration stops.
sourcefn find_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
matched: F
) -> Result<(), Self::Error>
fn find_iter_at<F>( &self, haystack: &[u8], at: usize, matched: F ) -> Result<(), Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
. If no match exists, then the given function is never
called. If the function returns false
, then iteration stops.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn try_find_iter<F, E>(
&self,
haystack: &[u8],
matched: F
) -> Result<Result<(), E>, Self::Error>
fn try_find_iter<F, E>( &self, haystack: &[u8], matched: F ) -> Result<Result<(), E>, Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
. If no match exists, then the given function is never
called. If the function returns false
, then iteration stops.
Similarly, if the function returns an error then iteration stops and
the error is yielded. If an error occurs while executing the search,
then it is converted to
E
.
sourcefn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
matched: F
) -> Result<Result<(), E>, Self::Error>
fn try_find_iter_at<F, E>( &self, haystack: &[u8], at: usize, matched: F ) -> Result<Result<(), E>, Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
. If no match exists, then the given function is never
called. If the function returns false
, then iteration stops.
Similarly, if the function returns an error then iteration stops and
the error is yielded. If an error occurs while executing the search,
then it is converted to
E
.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn captures(
&self,
haystack: &[u8],
caps: &mut Self::Captures
) -> Result<bool, Self::Error>
fn captures( &self, haystack: &[u8], caps: &mut Self::Captures ) -> Result<bool, Self::Error>
Populates the first set of capture group matches from haystack
into
caps
. If no match exists, then false
is returned.
The text encoding of haystack
is not strictly specified. Matchers are
advised to assume UTF-8, or at worst, some ASCII compatible encoding.
sourcefn captures_iter<F>(
&self,
haystack: &[u8],
caps: &mut Self::Captures,
matched: F
) -> Result<(), Self::Error>
fn captures_iter<F>( &self, haystack: &[u8], caps: &mut Self::Captures, matched: F ) -> Result<(), Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
with capture groups extracted from each match. If no
match exists, then the given function is never called. If the function
returns false
, then iteration stops.
sourcefn captures_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F
) -> Result<(), Self::Error>
fn captures_iter_at<F>( &self, haystack: &[u8], at: usize, caps: &mut Self::Captures, matched: F ) -> Result<(), Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
with capture groups extracted from each match. If no
match exists, then the given function is never called. If the function
returns false
, then iteration stops.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn try_captures_iter<F, E>(
&self,
haystack: &[u8],
caps: &mut Self::Captures,
matched: F
) -> Result<Result<(), E>, Self::Error>
fn try_captures_iter<F, E>( &self, haystack: &[u8], caps: &mut Self::Captures, matched: F ) -> Result<Result<(), E>, Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
with capture groups extracted from each match. If no
match exists, then the given function is never called. If the function
returns false
, then iteration stops. Similarly, if the function
returns an error then iteration stops and the error is yielded. If
an error occurs while executing the search, then it is converted to
E
.
sourcefn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F
) -> Result<Result<(), E>, Self::Error>
fn try_captures_iter_at<F, E>( &self, haystack: &[u8], at: usize, caps: &mut Self::Captures, matched: F ) -> Result<Result<(), E>, Self::Error>
Executes the given function over successive non-overlapping matches
in haystack
with capture groups extracted from each match. If no
match exists, then the given function is never called. If the function
returns false
, then iteration stops. Similarly, if the function
returns an error then iteration stops and the error is yielded. If
an error occurs while executing the search, then it is converted to
E
.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn captures_at(
&self,
_haystack: &[u8],
_at: usize,
_caps: &mut Self::Captures
) -> Result<bool, Self::Error>
fn captures_at( &self, _haystack: &[u8], _at: usize, _caps: &mut Self::Captures ) -> Result<bool, Self::Error>
Populates the first set of capture group matches from haystack
into matches
after at
, where the byte offsets in each capturing
group are relative to the start of haystack
(and not at
). If no
match exists, then false
is returned and the contents of the given
capturing groups are unspecified.
The text encoding of haystack
is not strictly specified. Matchers are
advised to assume UTF-8, or at worst, some ASCII compatible encoding.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
By default, capturing groups aren’t supported, and this implementation will always behave as if a match were impossible.
Implementors that provide support for capturing groups must guarantee
that when a match occurs, the first capture match (at index 0
) is
always set to the overall match offsets.
Note that if implementors seek to support capturing groups, then they should implement this method. Other methods that match based on captures will then work automatically.
sourcefn replace<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
append: F
) -> Result<(), Self::Error>
fn replace<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, append: F ) -> Result<(), Self::Error>
Replaces every match in the given haystack with the result of calling
append
. append
is given the start and end of a match, along with
a handle to the dst
buffer provided.
If the given append
function returns false
, then replacement stops.
sourcefn replace_with_captures<F>(
&self,
haystack: &[u8],
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F
) -> Result<(), Self::Error>
fn replace_with_captures<F>( &self, haystack: &[u8], caps: &mut Self::Captures, dst: &mut Vec<u8>, append: F ) -> Result<(), Self::Error>
Replaces every match in the given haystack with the result of calling
append
with the matching capture groups.
If the given append
function returns false
, then replacement stops.
sourcefn replace_with_captures_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F
) -> Result<(), Self::Error>
fn replace_with_captures_at<F>( &self, haystack: &[u8], at: usize, caps: &mut Self::Captures, dst: &mut Vec<u8>, append: F ) -> Result<(), Self::Error>
Replaces every match in the given haystack with the result of calling
append
with the matching capture groups.
If the given append
function returns false
, then replacement stops.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error>
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error>
Returns true if and only if the matcher matches the given haystack.
By default, this method is implemented by calling shortest_match
.
sourcefn is_match_at(&self, haystack: &[u8], at: usize) -> Result<bool, Self::Error>
fn is_match_at(&self, haystack: &[u8], at: usize) -> Result<bool, Self::Error>
Returns true if and only if the matcher matches the given haystack starting at the given position.
By default, this method is implemented by calling shortest_match_at
.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn shortest_match(&self, haystack: &[u8]) -> Result<Option<usize>, Self::Error>
fn shortest_match(&self, haystack: &[u8]) -> Result<Option<usize>, Self::Error>
Returns an end location of the first match in haystack
. If no match
exists, then None
is returned.
Note that the end location reported by this method may be less than the
same end location reported by find
. For example, running find
with
the pattern a+
on the haystack aaa
should report a range of [0, 3)
, but shortest_match
may report 1
as the ending location since
that is the place at which a match is guaranteed to occur.
This method should never report false positives or false negatives. The
point of this method is that some implementors may be able to provide
a faster implementation of this than what find
does.
By default, this method is implemented by calling find
.
sourcefn shortest_match_at(
&self,
haystack: &[u8],
at: usize
) -> Result<Option<usize>, Self::Error>
fn shortest_match_at( &self, haystack: &[u8], at: usize ) -> Result<Option<usize>, Self::Error>
Returns an end location of the first match in haystack
starting at
the given position. If no match exists, then None
is returned.
Note that the end location reported by this method may be less than the
same end location reported by find
. For example, running find
with
the pattern a+
on the haystack aaa
should report a range of [0, 3)
, but shortest_match
may report 1
as the ending location since
that is the place at which a match is guaranteed to occur.
This method should never report false positives or false negatives. The
point of this method is that some implementors may be able to provide
a faster implementation of this than what find
does.
By default, this method is implemented by calling find_at
.
The significance of the starting point is that it takes the surrounding
context into consideration. For example, the \A
anchor can only
match when at == 0
.
sourcefn non_matching_bytes(&self) -> Option<&ByteSet>
fn non_matching_bytes(&self) -> Option<&ByteSet>
If available, return a set of bytes that will never appear in a match produced by an implementation.
Specifically, if such a set can be determined, then it’s possible for callers to perform additional operations on the basis that certain bytes may never match.
For example, if a search is configured to possibly produce results that span multiple lines but a caller provided pattern can never match across multiple lines, then it may make sense to divert to more optimized line oriented routines that don’t need to handle the multi-line match case.
Implementations that produce this set must never report false positives, but may produce false negatives. That is, is a byte is in this set then it must be guaranteed that it is never in a match. But, if a byte is not in this set, then callers cannot assume that a match exists with that byte.
By default, this returns None
.
sourcefn line_terminator(&self) -> Option<LineTerminator>
fn line_terminator(&self) -> Option<LineTerminator>
If this matcher was compiled as a line oriented matcher, then this
method returns the line terminator if and only if the line terminator
never appears in any match produced by this matcher. If this wasn’t
compiled as a line oriented matcher, or if the aforementioned guarantee
cannot be made, then this must return None
, which is the default.
It is never wrong to return None
, but returning a line terminator
when it can appear in a match results in unspecified behavior.
The line terminator is typically b'\n'
, but can be any single byte or
CRLF
.
By default, this returns None
.
sourcefn find_candidate_line(
&self,
haystack: &[u8]
) -> Result<Option<LineMatchKind>, Self::Error>
fn find_candidate_line( &self, haystack: &[u8] ) -> Result<Option<LineMatchKind>, Self::Error>
Return one of the following: a confirmed line match, a candidate line match (which may be a false positive) or no match at all (which must not be a false negative). When reporting a confirmed or candidate match, the position returned can be any position in the line.
By default, this never returns a candidate match, and always either returns a confirmed match or no match at all.
When a matcher can match spans over multiple lines, then the behavior
of this method is unspecified. Namely, use of this method only
makes sense in a context where the caller is looking for the next
matching line. That is, callers should only use this method when
line_terminator
does not return None
.
Design rationale
A line matcher is, fundamentally, a normal matcher with the addition
of one optional method: finding a line. By default, this routine
is implemented via the matcher’s shortest_match
method, which
always yields either no match or a LineMatchKind::Confirmed
. However,
implementors may provide a routine for this that can return candidate
lines that need subsequent verification to be confirmed as a match.
This can be useful in cases where it may be quicker to find candidate
lines via some other means instead of relying on the more general
implementations for find
and shortest_match
.
For example, consider the regex \w+foo\s+
. Both find
and
shortest_match
must consider the entire regex, including the \w+
and \s+
, while searching. However, this method could look for lines
containing foo
and return them as candidates. Finding foo
might
be implemented as a highly optimized substring search routine (like
memmem
), which is likely to be faster than whatever more generalized
routine is required for resolving \w+foo\s+
. The caller is then
responsible for confirming whether a match exists or not.
Note that while this method may report false positives, it must never report false negatives. That is, it can never skip over lines that contain a match.