Struct sequoia_openpgp::regex::RegexSet
source · pub struct RegexSet { /* private fields */ }
Expand description
A set of regular expressions.
A RegexSet
encapsulates a set of regular expressions. The
regular expressions are compiled according to the rules defined in
Section 8 of RFC 4880 modulo two differences. First, the
compiler only works on UTF-8 strings (not bytes). Second, ranges
in character classes are between UTF-8 characters, not just ASCII
characters. Further, by default, strings that don’t pass a sanity
check (in particular, include Unicode control characters) never
match. This behavior can be customized using
RegexSet::disable_sanitizations
.
RegexSet
implements the semantics of [regular expression]s used
in Trust Signatures. In particular, a RegexSet
makes it
easier to deal with trust signatures that:
- Contain multiple Regular Expression subpackts,
- Have no Regular Expression subpackets, and/or
- Include one or more Regular Expression subpackets that are invalid.
RegexSet
compiles each regular expression individually. If
there are no regular expressions, the RegexSet
matches
everything. If a regular expression is invalid, RegexSet
treats
it as if it doesn’t match anything. Thus, if all regular
expressions are invalid, the RegexSet
matches nothing (not
everything!).
See the module-level documentation for more details.
§A note on equality
We define equality on RegexSet
as the equality of the uncompiled
regular expressions given to the constructor and whether
sanitizations are enabled.
Implementations§
source§impl RegexSet
impl RegexSet
sourcepub fn new<'a, RE, I>(res: I) -> Result<Self>
pub fn new<'a, RE, I>(res: I) -> Result<Self>
Parses and compiles the regular expressions.
Invalid regular expressions do not cause this to fail. See
RegexSet
’s top-level documentation for details.
By default, strings that don’t pass a sanity check (in
particular, include Unicode control characters) never match.
This behavior can be customized using
RegexSet::disable_sanitizations
.
§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;
// Extract the regex and compile it.
let res = &[
"<[^>]+[@.]example\\.org>$",
// Invalid.
"[..",
];
let res = RegexSet::new(res)?;
assert!(res.is_match("Alice <alice@example.org>"));
assert!(! res.is_match("Bob <bob@example.com>"));
sourcepub fn from_bytes<'a, I, RE>(res: I) -> Result<Self>
pub fn from_bytes<'a, I, RE>(res: I) -> Result<Self>
Parses and compiles the regular expressions.
The regular expressions are first converted to UTF-8 strings.
Byte sequences that are not valid UTF-8 strings are considered
to be invalid regular expressions. Invalid regular
expressions do not cause this to fail. See RegexSet
’s
top-level documentation for details.
By default, strings that don’t pass a sanity check (in
particular, include Unicode control characters) never match.
This behavior can be customized using
RegexSet::disable_sanitizations
.
§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;
// A valid and an invalid UTF-8 byte sequence. The invalid
// sequence doesn't match anything. But, that doesn't impact
// the other regular expressions.
let res: &[ &[u8] ] = &[
&b"<[^>]+[@.]example\\.org>$"[..],
// Invalid UTF-8.
&b"\xC3\x28"[..],
];
assert!(std::str::from_utf8(res[0]).is_ok());
assert!(std::str::from_utf8(res[1]).is_err());
let re_set = RegexSet::from_bytes(res.into_iter())?;
assert!(re_set.is_match("Alice <alice@example.org>"));
assert!(! re_set.is_match("Bob <bob@example.com>"));
// If we only have invalid UTF-8 strings, then nothing
// matches.
let res: &[ &[u8] ] = &[
// Invalid UTF-8.
&b"\xC3\x28"[..],
];
assert!(std::str::from_utf8(res[0]).is_err());
let re_set = RegexSet::from_bytes(res.into_iter())?;
assert!(! re_set.is_match("Alice <alice@example.org>"));
assert!(! re_set.is_match("Bob <bob@example.com>"));
// But, if we have no regular expressions, everything matches.
let res: &[ &[u8] ] = &[];
let re_set = RegexSet::from_bytes(res.into_iter())?;
assert!(re_set.is_match("Alice <alice@example.org>"));
assert!(re_set.is_match("Bob <bob@example.com>"));
sourcepub fn as_bytes(&self) -> &[Vec<u8>]
pub fn as_bytes(&self) -> &[Vec<u8>]
Returns the bytes-representation of the regular expressions.
sourcepub fn from_signature(sig: &Signature) -> Result<Self>
pub fn from_signature(sig: &Signature) -> Result<Self>
Creates a RegexSet
from the regular expressions stored in a
trust signature.
This method is a convenience function, which extracts any
regular expressions from a Trust Signature and wraps them in a
RegexSet
.
If the signature is not a valid trust signature (its type is GenericCertification, PersonaCertification, CasualCertification, or PositiveCertification, and the Trust Signature subpacket is present), this returns an error.
By default, strings that don’t pass a sanity check (in
particular, include Unicode control characters) never match.
This behavior can be customized using
RegexSet::disable_sanitizations
.
§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;
// certification is a trust signature, which contains two regular
// expressions: one that matches all mail addresses for 'example.org'
// and another that matches all mail addresses for 'example.com'.
let certification: &Signature = // ...;
// Extract the regex and compile it.
let res = RegexSet::from_signature(certification)?;
// Some positive examples.
assert!(res.is_match("Alice <alice@example.org>"));
assert!(res.is_match("Bob <bob@example.com>"));
// Wrong domain.
assert!(! res.is_match("Carol <carol@acme.com>"));
// The standard regex, "<[^>]+[@.]example\\.org>$" only matches
// email addresses wrapped in <>.
assert!(! res.is_match("dave@example.com"));
// And, it is case sensitive.
assert!(res.is_match("Ellen <ellen@example.com>"));
assert!(! res.is_match("Ellen <ellen@EXAMPLE.COM>"));
sourcepub fn everything() -> Result<Self>
pub fn everything() -> Result<Self>
Returns a RegexSet
that matches everything.
Note: sanitizations are still enabled. So, to really match
everything, you still need to call
RegexSet::disable_sanitizations
.
This can be used to optimize the evaluation of scoping rules
along a path: if a RegexSet
matches everything, then it
doesn’t further constrain the path.
sourcepub fn matches_everything(&self) -> bool
pub fn matches_everything(&self) -> bool
Returns whether a RegexSet
matches everything.
Normally, this only returns true if the RegexSet
was created
using RegexSet::everything
. RegexSet::new
,
RegexSet::from_bytes
, RegexSet::from_signature
do
detect some regular expressions that match everything (e.g.,
if no regular expressions are supplied). But, they do not
guarantee that a RegexSet
containing a regular expression
like .?
, which does in fact match everything, is detected as
matching everything.
§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;
assert!(RegexSet::everything()?.matches_everything());
let empty: &[ &str ] = &[];
assert!(RegexSet::new(empty)?.matches_everything());
// A regular expression that matches everything. But
// `RegexSet` returns false, because it can't detect it.
let res: &[ &str ] = &[
&".?"[..],
];
let re_set = RegexSet::new(res.into_iter())?;
assert!(! re_set.matches_everything());
sourcepub fn disable_sanitizations(&mut self, allowed: bool)
pub fn disable_sanitizations(&mut self, allowed: bool)
Controls whether strings with control characters are allowed.
If false
(the default), i.e., sanity checks are enabled, and
the string doesn’t pass the sanity check (in particular, it
contains a Unicode control character according to
char::is_control
, including newlines and an embedded NUL
byte), this returns false
.
sourcepub fn is_match(&self, s: &str) -> bool
pub fn is_match(&self, s: &str) -> bool
Returns whether the regular expression set matches the string.
If sanity checks are enabled (the default) and the string
doesn’t pass the sanity check (in particular, it contains a
Unicode control character according to char::is_control
,
including newlines and an embedded NUL
byte), this returns
false
.
If the RegexSet
contains one or more regular expressions,
this method returns whether at least one of the regular
expressions matches. Invalid regular expressions never match.
If the RegexSet
does not contain any regular expressions
(valid or otherwise), this method returns true
.
§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;
// A regular expression that matches anything. (Note: this is
// equivalent to providing no regular expressions.)
let res: &[ &str ] = &[
&""[..],
];
let re_set = RegexSet::new(res.into_iter())?;
assert!(re_set.is_match("Alice Lovelace <alice@example.org>"));
// If a User ID has an embedded control character, it doesn't
// match.
assert!(! re_set.is_match("Alice <alice@example.org>\0"));
sourcepub fn matches_userid(&self, u: &UserID) -> bool
pub fn matches_userid(&self, u: &UserID) -> bool
Returns whether the regular expression matches the User ID.
If the User ID is not a valid UTF-8 string, this returns false
.
If sanity checks are enabled (the default) and the string
doesn’t pass the sanity check (in particular, it contains a
Unicode control character according to char::is_control
,
including newlines and an embedded NUL
byte), this returns
false
.
If the RegexSet
contains one or more regular expressions,
this method returns whether at least one of the regular
expressions matches. Invalid regular expressions never match.
If the RegexSet
does not contain any regular expressions
(valid or otherwise), this method returns true
.
§Examples
use sequoia_openpgp as openpgp;
use openpgp::packet::UserID;
use openpgp::regex::RegexSet;
// A regular expression that matches anything. (Note: this is
// equivalent to providing no regular expressions.)
let res: &[ &str ] = &[
"",
];
let re_set = RegexSet::new(res.into_iter())?;
assert!(re_set.matches_userid(
&UserID::from(&b"Alice Lovelace <alice@example.org>"[..])));
// If a User ID is not valid UTF-8, it never matches.
assert!(! re_set.matches_userid(
&UserID::from(&b"Alice \xC3\x28 Lovelace <alice@example.org>"[..])));
// If a User ID has an embedded control character, it doesn't
// match.
assert!(! re_set.matches_userid(
&UserID::from(&b"Alice <alice@example.org>\0"[..])));
Trait Implementations§
source§impl PartialEq for RegexSet
impl PartialEq for RegexSet
impl Eq for RegexSet
Auto Trait Implementations§
impl Freeze for RegexSet
impl RefUnwindSafe for RegexSet
impl Send for RegexSet
impl Sync for RegexSet
impl Unpin for RegexSet
impl UnwindSafe for RegexSet
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§default unsafe fn clone_to_uninit(&self, dst: *mut T)
default unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)