regex_automata::util

Module prefilter

Source
Expand description

Defines a prefilter for accelerating regex searches.

A prefilter can be created by building a Prefilter value.

A prefilter represents one of the most important optimizations available for accelerating regex searches. The idea of a prefilter is to very quickly find candidate locations in a haystack where a regex could match. Once a candidate is found, it is then intended for the regex engine to run at that position to determine whether the candidate is a match or a false positive.

In the aforementioned description of the prefilter optimization also lay its demise. Namely, if a prefilter has a high false positive rate and it produces lots of candidates, then a prefilter can overall make a regex search slower. It can run more slowly because more time is spent ping-ponging between the prefilter search and the regex engine attempting to confirm each candidate as a match. This ping-ponging has overhead that adds up, and is exacerbated by a high false positive rate.

Nevertheless, the optimization is still generally worth performing in most cases. Particularly given just how much throughput can be improved. (It is not uncommon for prefilter optimizations to improve throughput by one or two orders of magnitude.)

Typically a prefilter is used to find occurrences of literal prefixes from a regex pattern, but this isn’t required. A prefilter can be used to look for suffixes or even inner literals.

Note that as of now, prefilters throw away information about which pattern each literal comes from. In other words, when a prefilter finds a match, there’s no way to know which pattern (or patterns) it came from. Therefore, in order to confirm a match, you’ll have to check all of the patterns by running the full regex engine.

Structs§

  • A prefilter for accelerating regex searches.