* Consider refactoring the NFA representation such that it can be instantly
loaded from a `&[u8]`, just like a sparse DFA. Main downside is that this
could negatively impact using the NFA with deserialization costs. Before
doing this, we should write PikeVM and backtracking implementations so that
they can be benchmarked.
* Add captures to NFA.
* Once we're happy, re-organize the public API such that NFAs are exported
and usable on their own.
* Investigate why NFA shrinking seems to produce bigger DFAs after
determinization, even though it makes determinization substantially
faster. This might be because of its use of sparse NFA states, which have
a lower constant overhead associated with them.