Struct tantivy_fst::Regex

source ·
pub struct Regex { /* private fields */ }
Expand description

A regular expression for searching FSTs with Unicode support.

Regular expressions are compiled down to a deterministic finite automaton that can efficiently search any finite state transducer. Notably, most regular expressions only need to explore a small portion of a finite state transducer without loading all of it into memory.

Syntax

Regex supports fully featured regular expressions. Namely, it supports all of the same constructs as the standard regex crate except for the following things:

  1. Lazy quantifiers, since a regular expression automaton only reports whether a key matches at all, and not its location. Namely, lazy quantifiers such as +? only modify the location of a match, but never change a non-match into a match or a match into a non-match.
  2. Word boundaries (i.e., \b). Because such things are hard to do in a deterministic finite automaton, but not impossible. As such, these may be allowed some day.
  3. Other zero width assertions like ^ and $. These are easier to support than word boundaries, but are still tricky and usually aren’t as useful when searching dictionaries.

Otherwise, the full syntax of the regex crate is supported. This includes all Unicode support and relevant flags. (The U and m flags are no-ops because of (1) and (3) above, respectively.)

Matching semantics

A regular expression matches a key in a finite state transducer if and only if it matches from the start of a key all the way to end. Stated differently, every regular expression (re) is matched as if it were ^(re)$. This means that if you want to do a substring match, then you must use .*substring.*.

Caution: Starting a regular expression with .* means that it could potentially match any key in a finite state transducer. This implies that all keys could be visited, which could be slow. It is possible that this crate will grow facilities for detecting regular expressions that will scan a large portion of a transducer and optionally disallow them.

Implementations§

source§

impl Regex

source

pub fn new(re: &str) -> Result<Regex, Error>

Create a new regular expression query.

The query finds all terms matching the regular expression.

If the regular expression is malformed or if it results in an automaton that is too big, then an error is returned.

A Regex value satisfies the Automaton trait, which means it can be used with the search method of any finite state transducer.

Trait Implementations§

source§

impl Automaton for Regex

§

type State = Option<usize>

The type of the state used in the automaton.
source§

fn start(&self) -> Option<usize>

Returns a single start state for this automaton. Read more
source§

fn is_match(&self, state: &Option<usize>) -> bool

Returns true if and only if state is a match state.
source§

fn can_match(&self, state: &Option<usize>) -> bool

Returns true if and only if state can lead to a match in zero or more steps. Read more
source§

fn accept(&self, state: &Option<usize>, byte: u8) -> Option<usize>

Return the next state given state and an input.
source§

fn will_always_match(&self, _state: &Self::State) -> bool

Returns true if and only if state matches and must match no matter what steps are taken. Read more
source§

fn starts_with(self) -> StartsWith<Self>where Self: Sized,

Returns an automaton that matches the strings that start with something this automaton matches.
source§

fn union<Rhs: Automaton>(self, rhs: Rhs) -> Union<Self, Rhs>where Self: Sized,

Returns an automaton that matches the strings matched by either this or the other automaton.
source§

fn intersection<Rhs: Automaton>(self, rhs: Rhs) -> Intersection<Self, Rhs>where Self: Sized,

Returns an automaton that matches the strings matched by both this and the other automaton.
source§

fn complement(self) -> Complement<Self>where Self: Sized,

Returns an automaton that matches the strings not matched by this automaton.
source§

impl Debug for Regex

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl RefUnwindSafe for Regex

§

impl Send for Regex

§

impl Sync for Regex

§

impl Unpin for Regex

§

impl UnwindSafe for Regex

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.