The README has a quick introduction to the performance of this crate.
This will look at some examples and compare them to the Oniguruma engine.
## Catastrophic backtracking
Backtracking engines can have worst-case performance when the regular
expression forces the engine to consider an exponentially increasing
number of sub-cases.
For a good explanation of that, read
[Runaway Regular Expressions: Catastrophic Backtracking][].
Let's look at the regex from the README again:
(a|b|ab)*bc
And the input text:
ababababababababababababababababababababababababababababac
Python's engine has exponential runtime. The regex crate and fancy-regex
however have no problem with it.
## Oniguruma
[Oniguruma][] implements a backtracking
engine. So we'd expect it to have a problem with the above regex too.
However, in the above case, it quickly finds that there's no match. How
is that possible? The answer is that it has optimizations which
sometimes help it avoid having to do any matching at all:
In the pattern `(a|b|ab)*bc`, you might notice that if the input doesn't
contain `bc`, the pattern will never match. Oniguruma detects that and,
before it tries to do any matching, tries to find `bc` in the input.
But what happens if we add `bc` at the end of the input, like this:
ababababababababababababababababababababababababababababacbc
Now the optimization doesn't help anymore, and Oniguruma is slow too.
## fancy-regex
For `(a|b|ab)*bc` fancy-regex is fast in all cases because it can
delegate to the regex crate which matches it in linear runtime.
Let's look at another regex, one that makes use of a "fancy" look-ahead:
(a|b|ab)*(?=c)
When fancy-regex matches it against this input:
abababababababababababababababababababababababababababab
It's slow! The reason is that `(?=c)` is not supported by the regex
crate, so we need to handle it with backtracking. And because
`(a|b|ab)*` is before it, we need to do it with backtracking as well.
Oniguruma doesn't have a problem with this particular case because its
optimization saves it again: It checks if there's a `c` in the input
before doing any matching.
There's nothing preventing fancy-regex from adding similar optimizations
in the future, but it's not done yet.
Note that how much fancy-regex can do without backtracking depends on
the structure of the regex. For example, with `(?=(a|b|ab)*bc)`, the
inner part of the look-ahead can be delegated to regex entirely.
### Summary
* If the regex doesn't use fancy features, fancy-regex should have
linear runtime compared to Oniguruma's exponential worst-case.
* Even if the regex doesn't use any fancy features, Oniguruma can be
faster because it is a mature and highly optimized engine.
* With fancy features, Oniguruma can be faster because of optimizations.
[Runaway Regular Expressions: Catastrophic Backtracking]: https://www.regular-expressions.info/catastrophic.html
[Oniguruma]: https://github.com/kkos/oniguruma