Struct sequoia_openpgp::regex::RegexSet

source ·
pub struct RegexSet { /* private fields */ }
Expand description

A set of regular expressions.

A RegexSet encapsulates a set of regular expressions. The regular expressions are compiled according to the rules defined in Section 8 of RFC 4880 modulo two differences. First, the compiler only works on UTF-8 strings (not bytes). Second, ranges in character classes are between UTF-8 characters, not just ASCII characters. Further, by default, strings that don’t pass a sanity check (in particular, include Unicode control characters) never match. This behavior can be customized using RegexSet::disable_sanitizations.

RegexSet implements the semantics of [regular expression]s used in Trust Signatures. In particular, a RegexSet makes it easier to deal with trust signatures that:

  • Contain multiple Regular Expression subpackts,
  • Have no Regular Expression subpackets, and/or
  • Include one or more Regular Expression subpackets that are invalid.

RegexSet compiles each regular expression individually. If there are no regular expressions, the RegexSet matches everything. If a regular expression is invalid, RegexSet treats it as if it doesn’t match anything. Thus, if all regular expressions are invalid, the RegexSet matches nothing (not everything!).

See the module-level documentation for more details.

§A note on equality

We define equality on RegexSet as the equality of the uncompiled regular expressions given to the constructor and whether sanitizations are enabled.

Implementations§

source§

impl RegexSet

source

pub fn new<'a, RE, I>(res: I) -> Result<Self>
where RE: Borrow<&'a str>, I: IntoIterator<Item = RE>,

Parses and compiles the regular expressions.

Invalid regular expressions do not cause this to fail. See RegexSet’s top-level documentation for details.

By default, strings that don’t pass a sanity check (in particular, include Unicode control characters) never match. This behavior can be customized using RegexSet::disable_sanitizations.

§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;

// Extract the regex and compile it.
let res = &[
    "<[^>]+[@.]example\\.org>$",
    // Invalid.
    "[..",
];

let res = RegexSet::new(res)?;

assert!(res.is_match("Alice <alice@example.org>"));
assert!(! res.is_match("Bob <bob@example.com>"));
source

pub fn from_bytes<'a, I, RE>(res: I) -> Result<Self>
where I: IntoIterator<Item = RE>, RE: Borrow<&'a [u8]>,

Parses and compiles the regular expressions.

The regular expressions are first converted to UTF-8 strings. Byte sequences that are not valid UTF-8 strings are considered to be invalid regular expressions. Invalid regular expressions do not cause this to fail. See RegexSet’s top-level documentation for details.

By default, strings that don’t pass a sanity check (in particular, include Unicode control characters) never match. This behavior can be customized using RegexSet::disable_sanitizations.

§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;

// A valid and an invalid UTF-8 byte sequence.  The invalid
// sequence doesn't match anything.  But, that doesn't impact
// the other regular expressions.
let res: &[ &[u8] ] = &[
    &b"<[^>]+[@.]example\\.org>$"[..],
    // Invalid UTF-8.
    &b"\xC3\x28"[..],
];
assert!(std::str::from_utf8(res[0]).is_ok());
assert!(std::str::from_utf8(res[1]).is_err());

let re_set = RegexSet::from_bytes(res.into_iter())?;

assert!(re_set.is_match("Alice <alice@example.org>"));
assert!(! re_set.is_match("Bob <bob@example.com>"));

// If we only have invalid UTF-8 strings, then nothing
// matches.
let res: &[ &[u8] ] = &[
    // Invalid UTF-8.
    &b"\xC3\x28"[..],
];
assert!(std::str::from_utf8(res[0]).is_err());

let re_set = RegexSet::from_bytes(res.into_iter())?;

assert!(! re_set.is_match("Alice <alice@example.org>"));
assert!(! re_set.is_match("Bob <bob@example.com>"));


// But, if we have no regular expressions, everything matches.
let res: &[ &[u8] ] = &[];
let re_set = RegexSet::from_bytes(res.into_iter())?;

assert!(re_set.is_match("Alice <alice@example.org>"));
assert!(re_set.is_match("Bob <bob@example.com>"));
source

pub fn as_bytes(&self) -> &[Vec<u8>]

Returns the bytes-representation of the regular expressions.

source

pub fn from_signature(sig: &Signature) -> Result<Self>

Creates a RegexSet from the regular expressions stored in a trust signature.

This method is a convenience function, which extracts any regular expressions from a Trust Signature and wraps them in a RegexSet.

If the signature is not a valid trust signature (its type is GenericCertification, PersonaCertification, CasualCertification, or PositiveCertification, and the Trust Signature subpacket is present), this returns an error.

By default, strings that don’t pass a sanity check (in particular, include Unicode control characters) never match. This behavior can be customized using RegexSet::disable_sanitizations.

§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;

// certification is a trust signature, which contains two regular
// expressions: one that matches all mail addresses for 'example.org'
// and another that matches all mail addresses for 'example.com'.
let certification: &Signature = // ...;

// Extract the regex and compile it.
let res = RegexSet::from_signature(certification)?;

// Some positive examples.
assert!(res.is_match("Alice <alice@example.org>"));
assert!(res.is_match("Bob <bob@example.com>"));

// Wrong domain.
assert!(! res.is_match("Carol <carol@acme.com>"));

// The standard regex, "<[^>]+[@.]example\\.org>$" only matches
// email addresses wrapped in <>.
assert!(! res.is_match("dave@example.com"));

// And, it is case sensitive.
assert!(res.is_match("Ellen <ellen@example.com>"));
assert!(! res.is_match("Ellen <ellen@EXAMPLE.COM>"));
source

pub fn everything() -> Result<Self>

Returns a RegexSet that matches everything.

Note: sanitizations are still enabled. So, to really match everything, you still need to call RegexSet::disable_sanitizations.

This can be used to optimize the evaluation of scoping rules along a path: if a RegexSet matches everything, then it doesn’t further constrain the path.

source

pub fn matches_everything(&self) -> bool

Returns whether a RegexSet matches everything.

Normally, this only returns true if the RegexSet was created using RegexSet::everything. RegexSet::new, RegexSet::from_bytes, RegexSet::from_signature do detect some regular expressions that match everything (e.g., if no regular expressions are supplied). But, they do not guarantee that a RegexSet containing a regular expression like .?, which does in fact match everything, is detected as matching everything.

§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;

assert!(RegexSet::everything()?.matches_everything());
let empty: &[ &str ] = &[];
assert!(RegexSet::new(empty)?.matches_everything());

// A regular expression that matches everything.  But
// `RegexSet` returns false, because it can't detect it.
let res: &[ &str ] = &[
    &".?"[..],
];
let re_set = RegexSet::new(res.into_iter())?;
assert!(! re_set.matches_everything());
source

pub fn disable_sanitizations(&mut self, allowed: bool)

Controls whether strings with control characters are allowed.

If false (the default), i.e., sanity checks are enabled, and the string doesn’t pass the sanity check (in particular, it contains a Unicode control character according to char::is_control, including newlines and an embedded NUL byte), this returns false.

source

pub fn is_match(&self, s: &str) -> bool

Returns whether the regular expression set matches the string.

If sanity checks are enabled (the default) and the string doesn’t pass the sanity check (in particular, it contains a Unicode control character according to char::is_control, including newlines and an embedded NUL byte), this returns false.

If the RegexSet contains one or more regular expressions, this method returns whether at least one of the regular expressions matches. Invalid regular expressions never match.

If the RegexSet does not contain any regular expressions (valid or otherwise), this method returns true.

§Examples
use sequoia_openpgp as openpgp;
use openpgp::regex::RegexSet;

// A regular expression that matches anything.  (Note: this is
// equivalent to providing no regular expressions.)
let res: &[ &str ] = &[
    &""[..],
];
let re_set = RegexSet::new(res.into_iter())?;

assert!(re_set.is_match("Alice Lovelace <alice@example.org>"));

// If a User ID has an embedded control character, it doesn't
// match.
assert!(! re_set.is_match("Alice <alice@example.org>\0"));
source

pub fn matches_userid(&self, u: &UserID) -> bool

Returns whether the regular expression matches the User ID.

If the User ID is not a valid UTF-8 string, this returns false.

If sanity checks are enabled (the default) and the string doesn’t pass the sanity check (in particular, it contains a Unicode control character according to char::is_control, including newlines and an embedded NUL byte), this returns false.

If the RegexSet contains one or more regular expressions, this method returns whether at least one of the regular expressions matches. Invalid regular expressions never match.

If the RegexSet does not contain any regular expressions (valid or otherwise), this method returns true.

§Examples
use sequoia_openpgp as openpgp;
use openpgp::packet::UserID;
use openpgp::regex::RegexSet;

// A regular expression that matches anything.  (Note: this is
// equivalent to providing no regular expressions.)
let res: &[ &str ] = &[
    "",
];
let re_set = RegexSet::new(res.into_iter())?;

assert!(re_set.matches_userid(
    &UserID::from(&b"Alice Lovelace <alice@example.org>"[..])));

// If a User ID is not valid UTF-8, it never matches.
assert!(! re_set.matches_userid(
    &UserID::from(&b"Alice \xC3\x28 Lovelace <alice@example.org>"[..])));

// If a User ID has an embedded control character, it doesn't
// match.
assert!(! re_set.matches_userid(
    &UserID::from(&b"Alice <alice@example.org>\0"[..])));

Trait Implementations§

source§

impl Clone for RegexSet

source§

fn clone(&self) -> RegexSet

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for RegexSet

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for RegexSet

source§

fn eq(&self, other: &Self) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for RegexSet

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

default unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> DynClone for T
where T: Clone,

source§

fn __clone_box(&self, _: Private) -> *mut ()

source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Same for T

§

type Output = T

Should always be Self
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> ErasedDestructor for T
where T: 'static,

source§

impl<T> MaybeSendSync for T