Struct regex_syntax::hir::literal::Extractor

source ·

pub struct Extractor { /* private fields */ }

Expand description

Extracts prefix or suffix literal sequences from Hir expressions.

Literal extraction is based on the following observations:

Many regexes start with one or a small number of literals.
Substring search for literals is often much faster (sometimes by an order of magnitude) than a regex search.

Thus, in many cases, one can search for literals to find candidate starting locations of a match, and then only run the full regex engine at each such location instead of over the full haystack.

The main downside of literal extraction is that it can wind up causing a search to be slower overall. For example, if there are many matches or if there are many candidates that don’t ultimately lead to a match, then a lot of overhead will be spent in shuffing back-and-forth between substring search and the regex engine. This is the fundamental reason why literal optimizations for regex patterns is sometimes considered a “black art.”

Look-around assertions

Literal extraction treats all look-around assertions as-if they match every empty string. So for example, the regex \bquux\b will yield a sequence containing a single exact literal quux. However, not all occurrences of quux correspond to a match a of the regex. For example, \bquux\b does not match ZquuxZ anywhere because quux does not fall on a word boundary.

In effect, if your regex contains look-around assertions, then a match of an exact literal does not necessarily mean the regex overall matches. So you may still need to run the regex engine in such cases to confirm the match.

The precise guarantee you get from a literal sequence is: if every literal in the sequence is exact and the original regex contains zero look-around assertions, then a preference-order multi-substring search of those literals will precisely match a preference-order search of the original regex.

Example

This shows how to extract prefixes:

use regex_syntax::{hir::literal::{Extractor, Literal, Seq}, parse};

let hir = parse(r"(a|b|c)(x|y|z)[A-Z]+foo")?;
let got = Extractor::new().extract(&hir);
// All literals returned are "inexact" because none of them reach the
// match state.
let expected = Seq::from_iter([
    Literal::inexact("ax"),
    Literal::inexact("ay"),
    Literal::inexact("az"),
    Literal::inexact("bx"),
    Literal::inexact("by"),
    Literal::inexact("bz"),
    Literal::inexact("cx"),
    Literal::inexact("cy"),
    Literal::inexact("cz"),
]);
assert_eq!(expected, got);

This shows how to extract suffixes:

use regex_syntax::{
    hir::literal::{Extractor, ExtractKind, Literal, Seq},
    parse,
};

let hir = parse(r"foo|[A-Z]+bar")?;
let got = Extractor::new().kind(ExtractKind::Suffix).extract(&hir);
// Since 'foo' gets to a match state, it is considered exact. But 'bar'
// does not because of the '[A-Z]+', and thus is marked inexact.
let expected = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
assert_eq!(expected, got);

Struct regex_syntax::hir::literal::Extractor

Implementations§

impl Extractor

pub fn new() -> Extractor

pub fn extract(&self, hir: &Hir) -> Seq

pub fn kind(&mut self, kind: ExtractKind) -> &mut Extractor

pub fn limit_class(&mut self, limit: usize) -> &mut Extractor

pub fn limit_repeat(&mut self, limit: usize) -> &mut Extractor

pub fn limit_literal_len(&mut self, limit: usize) -> &mut Extractor

pub fn limit_total(&mut self, limit: usize) -> &mut Extractor

Trait Implementations§

impl Clone for Extractor

fn clone(&self) -> Extractor

fn clone_from(&mut self, source: &Self)

impl Debug for Extractor

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for Extractor

fn default() -> Extractor

Auto Trait Implementations§

impl RefUnwindSafe for Extractor

impl Send for Extractor

impl Sync for Extractor

impl Unpin for Extractor

impl UnwindSafe for Extractor

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>