pub struct DFA<T> { /* private fields */ }
Expand description
A dense table-based deterministic finite automaton (DFA).
All dense DFAs have one or more start states, zero or more match states and a transition table that maps the current state and the current byte of input to the next state. A DFA can use this information to implement fast searching. In particular, the use of a dense DFA generally makes the trade off that match speed is the most valuable characteristic, even if building the DFA may take significant time and space. (More concretely, building a DFA takes time and space that is exponential in the size of the pattern in the worst case.) As such, the processing of every byte of input is done with a small constant number of operations that does not vary with the pattern, its size or the size of the alphabet. If your needs don’t line up with this trade off, then a dense DFA may not be an adequate solution to your problem.
In contrast, a sparse::DFA
makes the opposite
trade off: it uses less space but will execute a variable number of
instructions per byte at match time, which makes it slower for matching.
(Note that space usage is still exponential in the size of the pattern in
the worst case.)
A DFA can be built using the default configuration via the
DFA::new
constructor. Otherwise, one can
configure various aspects via dense::Builder
.
A single DFA fundamentally supports the following operations:
- Detection of a match.
- Location of the end of a match.
- In the case of a DFA with multiple patterns, which pattern matched is reported as well.
A notable absence from the above list of capabilities is the location of
the start of a match. In order to provide both the start and end of
a match, two DFAs are required. This functionality is provided by a
Regex
.
Type parameters
A DFA
has one type parameter, T
, which is used to represent state IDs,
pattern IDs and accelerators. T
is typically a Vec<u32>
or a &[u32]
.
The Automaton
trait
This type implements the Automaton
trait, which means it can be used
for searching. For example:
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
let dfa = DFA::new("foo[0-9]+")?;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
Implementations
sourceimpl DFA<Vec<u32>>
impl DFA<Vec<u32>>
sourcepub fn new(pattern: &str) -> Result<DFA<Vec<u32>>, Error>
pub fn new(pattern: &str) -> Result<DFA<Vec<u32>>, Error>
Parse the given regular expression using a default configuration and return the corresponding DFA.
If you want a non-default configuration, then use the
dense::Builder
to set your own configuration.
Example
use regex_automata::{dfa::{Automaton, dense}, HalfMatch};
let dfa = dense::DFA::new("foo[0-9]+bar")?;
let expected = HalfMatch::must(0, 11);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345bar")?);
sourcepub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<DFA<Vec<u32>>, Error>
pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<DFA<Vec<u32>>, Error>
Parse the given regular expressions using a default configuration and return the corresponding multi-DFA.
If you want a non-default configuration, then use the
dense::Builder
to set your own configuration.
Example
use regex_automata::{dfa::{Automaton, dense}, HalfMatch};
let dfa = dense::DFA::new_many(&["[0-9]+", "[a-z]+"])?;
let expected = HalfMatch::must(1, 3);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345bar")?);
sourceimpl DFA<Vec<u32>>
impl DFA<Vec<u32>>
sourcepub fn always_match() -> Result<DFA<Vec<u32>>, Error>
pub fn always_match() -> Result<DFA<Vec<u32>>, Error>
Create a new DFA that matches every input.
Example
use regex_automata::{dfa::{Automaton, dense}, HalfMatch};
let dfa = dense::DFA::always_match()?;
let expected = HalfMatch::must(0, 0);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"")?);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo")?);
sourceimpl<T: AsRef<[u32]>> DFA<T>
impl<T: AsRef<[u32]>> DFA<T>
sourcepub fn as_ref(&self) -> DFA<&[u32]>
pub fn as_ref(&self) -> DFA<&[u32]>
Cheaply return a borrowed version of this dense DFA. Specifically,
the DFA returned always uses &[u32]
for its transition table.
sourcepub fn to_owned(&self) -> DFA<Vec<u32>>
pub fn to_owned(&self) -> DFA<Vec<u32>>
Return an owned version of this sparse DFA. Specifically, the DFA
returned always uses Vec<u32>
for its transition table.
Effectively, this returns a dense DFA whose transition table lives on the heap.
sourcepub fn has_starts_for_each_pattern(&self) -> bool
pub fn has_starts_for_each_pattern(&self) -> bool
Returns true only if this DFA has starting states for each pattern.
When a DFA has starting states for each pattern, then a search with the
DFA can be configured to only look for anchored matches of a specific
pattern. Specifically, APIs like Automaton::find_earliest_fwd_at
can accept a non-None pattern_id
if and only if this method returns
true. Otherwise, calling find_earliest_fwd_at
will panic.
Note that if the DFA has no patterns, this always returns false.
sourcepub fn alphabet_len(&self) -> usize
pub fn alphabet_len(&self) -> usize
Returns the total number of elements in the alphabet for this DFA.
That is, this returns the total number of transitions that each state in this DFA must have. Typically, a normal byte oriented DFA would always have an alphabet size of 256, corresponding to the number of unique values in a single byte. However, this implementation has two peculiarities that impact the alphabet length:
- Every state has a special “EOI” transition that is only followed
after the end of some haystack is reached. This EOI transition is
necessary to account for one byte of look-ahead when implementing
things like
\b
and$
. - Bytes are grouped into equivalence classes such that no two bytes in
the same class can distinguish a match from a non-match. For example,
in the regex
^[a-z]+$
, the ASCII bytesa-z
could all be in the same equivalence class. This leads to a massive space savings.
Note though that the alphabet length does not necessarily equal the
total stride space taken up by a single DFA state in the transition
table. Namely, for performance reasons, the stride is always the
smallest power of two that is greater than or equal to the alphabet
length. For this reason, DFA::stride
or DFA::stride2
are
often more useful. The alphabet length is typically useful only for
informational purposes.
sourcepub fn stride2(&self) -> usize
pub fn stride2(&self) -> usize
Returns the total stride for every state in this DFA, expressed as the exponent of a power of 2. The stride is the amount of space each state takes up in the transition table, expressed as a number of transitions. (Unused transitions map to dead states.)
The stride of a DFA is always equivalent to the smallest power of 2 that is greater than or equal to the DFA’s alphabet length. This definition uses extra space, but permits faster translation between premultiplied state identifiers and contiguous indices (by using shifts instead of relying on integer division).
For example, if the DFA’s stride is 16 transitions, then its stride2
is 4
since 2^4 = 16
.
The minimum stride2
value is 1
(corresponding to a stride of 2
)
while the maximum stride2
value is 9
(corresponding to a stride of
512
). The maximum is not 8
since the maximum alphabet size is 257
when accounting for the special EOI transition. However, an alphabet
length of that size is exceptionally rare since the alphabet is shrunk
into equivalence classes.
sourcepub fn stride(&self) -> usize
pub fn stride(&self) -> usize
Returns the total stride for every state in this DFA. This corresponds to the total number of transitions used by each state in this DFA’s transition table.
Please see DFA::stride2
for more information. In particular, this
returns the stride as the number of transitions, where as stride2
returns it as the exponent of a power of 2.
sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Returns the memory usage, in bytes, of this DFA.
The memory usage is computed based on the number of bytes used to represent this DFA.
This does not include the stack size used up by this DFA. To
compute that, use std::mem::size_of::<dense::DFA>()
.
sourceimpl<T: AsRef<[u32]>> DFA<T>
impl<T: AsRef<[u32]>> DFA<T>
Routines for converting a dense DFA to other representations, such as sparse DFAs or raw bytes suitable for persistent storage.
sourcepub fn to_sparse(&self) -> Result<DFA<Vec<u8>>, Error>
pub fn to_sparse(&self) -> Result<DFA<Vec<u8>>, Error>
Convert this dense DFA to a sparse DFA.
If a StateID
is too small to represent all states in the sparse
DFA, then this returns an error. In most cases, if a dense DFA is
constructable with StateID
then a sparse DFA will be as well.
However, it is not guaranteed.
Example
use regex_automata::{dfa::{Automaton, dense}, HalfMatch};
let dense = dense::DFA::new("foo[0-9]+")?;
let sparse = dense.to_sparse()?;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), sparse.find_leftmost_fwd(b"foo12345")?);
sourcepub fn to_bytes_little_endian(&self) -> (Vec<u8>, usize)
pub fn to_bytes_little_endian(&self) -> (Vec<u8>, usize)
Serialize this DFA as raw bytes to a Vec<u8>
in little endian
format. Upon success, the Vec<u8>
and the initial padding length are
returned.
The written bytes are guaranteed to be deserialized correctly and
without errors in a semver compatible release of this crate by a
DFA
’s deserialization APIs (assuming all other criteria for the
deserialization APIs has been satisfied):
The padding returned is non-zero if the returned Vec<u8>
starts at
an address that does not have the same alignment as u32
. The padding
corresponds to the number of leading bytes written to the returned
Vec<u8>
.
Example
This example shows how to serialize and deserialize a DFA:
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
// N.B. We use native endianness here to make the example work, but
// using to_bytes_little_endian would work on a little endian target.
let (buf, _) = original_dfa.to_bytes_native_endian();
// Even if buf has initial padding, DFA::from_bytes will automatically
// ignore it.
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf)?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
sourcepub fn to_bytes_big_endian(&self) -> (Vec<u8>, usize)
pub fn to_bytes_big_endian(&self) -> (Vec<u8>, usize)
Serialize this DFA as raw bytes to a Vec<u8>
in big endian
format. Upon success, the Vec<u8>
and the initial padding length are
returned.
The written bytes are guaranteed to be deserialized correctly and
without errors in a semver compatible release of this crate by a
DFA
’s deserialization APIs (assuming all other criteria for the
deserialization APIs has been satisfied):
The padding returned is non-zero if the returned Vec<u8>
starts at
an address that does not have the same alignment as u32
. The padding
corresponds to the number of leading bytes written to the returned
Vec<u8>
.
Example
This example shows how to serialize and deserialize a DFA:
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
// N.B. We use native endianness here to make the example work, but
// using to_bytes_big_endian would work on a big endian target.
let (buf, _) = original_dfa.to_bytes_native_endian();
// Even if buf has initial padding, DFA::from_bytes will automatically
// ignore it.
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf)?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
sourcepub fn to_bytes_native_endian(&self) -> (Vec<u8>, usize)
pub fn to_bytes_native_endian(&self) -> (Vec<u8>, usize)
Serialize this DFA as raw bytes to a Vec<u8>
in native endian
format. Upon success, the Vec<u8>
and the initial padding length are
returned.
The written bytes are guaranteed to be deserialized correctly and
without errors in a semver compatible release of this crate by a
DFA
’s deserialization APIs (assuming all other criteria for the
deserialization APIs has been satisfied):
The padding returned is non-zero if the returned Vec<u8>
starts at
an address that does not have the same alignment as u32
. The padding
corresponds to the number of leading bytes written to the returned
Vec<u8>
.
Generally speaking, native endian format should only be used when you know that the target you’re compiling the DFA for matches the endianness of the target on which you’re compiling DFA. For example, if serialization and deserialization happen in the same process or on the same machine. Otherwise, when serializing a DFA for use in a portable environment, you’ll almost certainly want to serialize both a little endian and a big endian version and then load the correct one based on the target’s configuration.
Example
This example shows how to serialize and deserialize a DFA:
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
let (buf, _) = original_dfa.to_bytes_native_endian();
// Even if buf has initial padding, DFA::from_bytes will automatically
// ignore it.
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf)?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
sourcepub fn write_to_little_endian(
&self,
dst: &mut [u8]
) -> Result<usize, SerializeError>
pub fn write_to_little_endian(
&self,
dst: &mut [u8]
) -> Result<usize, SerializeError>
Serialize this DFA as raw bytes to the given slice, in little endian
format. Upon success, the total number of bytes written to dst
is
returned.
The written bytes are guaranteed to be deserialized correctly and
without errors in a semver compatible release of this crate by a
DFA
’s deserialization APIs (assuming all other criteria for the
deserialization APIs has been satisfied):
Note that unlike the various to_byte_*
routines, this does not write
any padding. Callers are responsible for handling alignment correctly.
Errors
This returns an error if the given destination slice is not big enough
to contain the full serialized DFA. If an error occurs, then nothing
is written to dst
.
Example
This example shows how to serialize and deserialize a DFA without dynamic memory allocation.
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
// Create a 4KB buffer on the stack to store our serialized DFA.
let mut buf = [0u8; 4 * (1<<10)];
// N.B. We use native endianness here to make the example work, but
// using write_to_little_endian would work on a little endian target.
let written = original_dfa.write_to_native_endian(&mut buf)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf[..written])?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
sourcepub fn write_to_big_endian(
&self,
dst: &mut [u8]
) -> Result<usize, SerializeError>
pub fn write_to_big_endian(
&self,
dst: &mut [u8]
) -> Result<usize, SerializeError>
Serialize this DFA as raw bytes to the given slice, in big endian
format. Upon success, the total number of bytes written to dst
is
returned.
The written bytes are guaranteed to be deserialized correctly and
without errors in a semver compatible release of this crate by a
DFA
’s deserialization APIs (assuming all other criteria for the
deserialization APIs has been satisfied):
Note that unlike the various to_byte_*
routines, this does not write
any padding. Callers are responsible for handling alignment correctly.
Errors
This returns an error if the given destination slice is not big enough
to contain the full serialized DFA. If an error occurs, then nothing
is written to dst
.
Example
This example shows how to serialize and deserialize a DFA without dynamic memory allocation.
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
// Create a 4KB buffer on the stack to store our serialized DFA.
let mut buf = [0u8; 4 * (1<<10)];
// N.B. We use native endianness here to make the example work, but
// using write_to_big_endian would work on a big endian target.
let written = original_dfa.write_to_native_endian(&mut buf)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf[..written])?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
sourcepub fn write_to_native_endian(
&self,
dst: &mut [u8]
) -> Result<usize, SerializeError>
pub fn write_to_native_endian(
&self,
dst: &mut [u8]
) -> Result<usize, SerializeError>
Serialize this DFA as raw bytes to the given slice, in native endian
format. Upon success, the total number of bytes written to dst
is
returned.
The written bytes are guaranteed to be deserialized correctly and
without errors in a semver compatible release of this crate by a
DFA
’s deserialization APIs (assuming all other criteria for the
deserialization APIs has been satisfied):
Generally speaking, native endian format should only be used when you know that the target you’re compiling the DFA for matches the endianness of the target on which you’re compiling DFA. For example, if serialization and deserialization happen in the same process or on the same machine. Otherwise, when serializing a DFA for use in a portable environment, you’ll almost certainly want to serialize both a little endian and a big endian version and then load the correct one based on the target’s configuration.
Note that unlike the various to_byte_*
routines, this does not write
any padding. Callers are responsible for handling alignment correctly.
Errors
This returns an error if the given destination slice is not big enough
to contain the full serialized DFA. If an error occurs, then nothing
is written to dst
.
Example
This example shows how to serialize and deserialize a DFA without dynamic memory allocation.
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
// Create a 4KB buffer on the stack to store our serialized DFA.
let mut buf = [0u8; 4 * (1<<10)];
let written = original_dfa.write_to_native_endian(&mut buf)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf[..written])?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
sourcepub fn write_to_len(&self) -> usize
pub fn write_to_len(&self) -> usize
Return the total number of bytes required to serialize this DFA.
This is useful for determining the size of the buffer required to pass to one of the serialization routines:
Passing a buffer smaller than the size returned by this method will result in a serialization error. Serialization routines are guaranteed to succeed when the buffer is big enough.
Example
This example shows how to dynamically allocate enough room to serialize a DFA.
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
// Compile our original DFA.
let original_dfa = DFA::new("foo[0-9]+")?;
let mut buf = vec![0; original_dfa.write_to_len()];
let written = original_dfa.write_to_native_endian(&mut buf)?;
let dfa: DFA<&[u32]> = DFA::from_bytes(&buf[..written])?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
Note that this example isn’t actually guaranteed to work! In
particular, if buf
is not aligned to a 4-byte boundary, then the
DFA::from_bytes
call will fail. If you need this to work, then you
either need to deal with adding some initial padding yourself, or use
one of the to_bytes
methods, which will do it for you.
sourceimpl<'a> DFA<&'a [u32]>
impl<'a> DFA<&'a [u32]>
sourcepub fn from_bytes(
slice: &'a [u8]
) -> Result<(DFA<&'a [u32]>, usize), DeserializeError>
pub fn from_bytes(
slice: &'a [u8]
) -> Result<(DFA<&'a [u32]>, usize), DeserializeError>
Safely deserialize a DFA with a specific state identifier representation. Upon success, this returns both the deserialized DFA and the number of bytes read from the given slice. Namely, the contents of the slice beyond the DFA are not read.
Deserializing a DFA using this routine will never allocate heap memory.
For safety purposes, the DFA’s transition table will be verified such
that every transition points to a valid state. If this verification is
too costly, then a DFA::from_bytes_unchecked
API is provided, which
will always execute in constant time.
The bytes given must be generated by one of the serialization APIs
of a DFA
using a semver compatible release of this crate. Those
include:
DFA::to_bytes_little_endian
DFA::to_bytes_big_endian
DFA::to_bytes_native_endian
DFA::write_to_little_endian
DFA::write_to_big_endian
DFA::write_to_native_endian
The to_bytes
methods allocate and return a Vec<u8>
for you, along
with handling alignment correctly. The write_to
methods do not
allocate and write to an existing slice (which may be on the stack).
Since deserialization always uses the native endianness of the target
platform, the serialization API you use should match the endianness of
the target platform. (It’s often a good idea to generate serialized
DFAs for both forms of endianness and then load the correct one based
on endianness.)
Errors
Generally speaking, it’s easier to state the conditions in which an error is not returned. All of the following must be true:
- The bytes given must be produced by one of the serialization APIs on this DFA, as mentioned above.
- The endianness of the target platform matches the endianness used to serialized the provided DFA.
- The slice given must have the same alignment as
u32
.
If any of the above are not true, then an error will be returned.
Panics
This routine will never panic for any input.
Example
This example shows how to serialize a DFA to raw bytes, deserialize it and then use it for searching.
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
let initial = DFA::new("foo[0-9]+")?;
let (bytes, _) = initial.to_bytes_native_endian();
let dfa: DFA<&[u32]> = DFA::from_bytes(&bytes)?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
Example: dealing with alignment and padding
In the above example, we used the to_bytes_native_endian
method to
serialize a DFA, but we ignored part of its return value corresponding
to padding added to the beginning of the serialized DFA. This is OK
because deserialization will skip this initial padding. What matters
is that the address immediately following the padding has an alignment
that matches u32
. That is, the following is an equivalent but
alternative way to write the above example:
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
let initial = DFA::new("foo[0-9]+")?;
// Serialization returns the number of leading padding bytes added to
// the returned Vec<u8>.
let (bytes, pad) = initial.to_bytes_native_endian();
let dfa: DFA<&[u32]> = DFA::from_bytes(&bytes[pad..])?.0;
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
This padding is necessary because Rust’s standard library does
not expose any safe and robust way of creating a Vec<u8>
with a
guaranteed alignment other than 1. Now, in practice, the underlying
allocator is likely to provide a Vec<u8>
that meets our alignment
requirements, which means pad
is zero in practice most of the time.
The purpose of exposing the padding like this is flexibility for the
caller. For example, if one wants to embed a serialized DFA into a
compiled program, then it’s important to guarantee that it starts at a
u32
-aligned address. The simplest way to do this is to discard the
padding bytes and set it up so that the serialized DFA itself begins at
a properly aligned address. We can show this in two parts. The first
part is serializing the DFA to a file:
use regex_automata::dfa::{Automaton, dense::DFA};
let dfa = DFA::new("foo[0-9]+")?;
let (bytes, pad) = dfa.to_bytes_big_endian();
// Write the contents of the DFA *without* the initial padding.
std::fs::write("foo.bigendian.dfa", &bytes[pad..])?;
// Do it again, but this time for little endian.
let (bytes, pad) = dfa.to_bytes_little_endian();
std::fs::write("foo.littleendian.dfa", &bytes[pad..])?;
And now the second part is embedding the DFA into the compiled program and deserializing it at runtime on first use. We use conditional compilation to choose the correct endianness.
use regex_automata::{dfa::{Automaton, dense}, HalfMatch};
type S = u32;
type DFA = dense::DFA<&'static [S]>;
fn get_foo() -> &'static DFA {
use std::cell::Cell;
use std::mem::MaybeUninit;
use std::sync::Once;
// This struct with a generic B is used to permit unsizing
// coercions, specifically, where B winds up being a [u8]. We also
// need repr(C) to guarantee that _align comes first, which forces
// a correct alignment.
#[repr(C)]
struct Aligned<B: ?Sized> {
_align: [S; 0],
bytes: B,
}
// This assignment is made possible (implicitly) via the
// CoerceUnsized trait.
static ALIGNED: &Aligned<[u8]> = &Aligned {
_align: [],
#[cfg(target_endian = "big")]
bytes: *include_bytes!("foo.bigendian.dfa"),
#[cfg(target_endian = "little")]
bytes: *include_bytes!("foo.littleendian.dfa"),
};
struct Lazy(Cell<MaybeUninit<DFA>>);
// SAFETY: This is safe because DFA impls Sync.
unsafe impl Sync for Lazy {}
static INIT: Once = Once::new();
static DFA: Lazy = Lazy(Cell::new(MaybeUninit::uninit()));
INIT.call_once(|| {
let (dfa, _) = DFA::from_bytes(&ALIGNED.bytes)
.expect("serialized DFA should be valid");
// SAFETY: This is guaranteed to only execute once, and all
// we do with the pointer is write the DFA to it.
unsafe {
(*DFA.0.as_ptr()).as_mut_ptr().write(dfa);
}
});
// SAFETY: DFA is guaranteed to by initialized via INIT and is
// stored in static memory.
unsafe {
let dfa = (*DFA.0.as_ptr()).as_ptr();
std::mem::transmute::<*const DFA, &'static DFA>(dfa)
}
}
let dfa = get_foo();
let expected = HalfMatch::must(0, 8);
assert_eq!(Ok(Some(expected)), dfa.find_leftmost_fwd(b"foo12345"));
Alternatively, consider using
lazy_static
or
once_cell
,
which will guarantee safety for you. You will still need to use the
Aligned
trick above to force correct alignment, but this is safe to
do and from_bytes
will return an error if you get it wrong.
sourcepub unsafe fn from_bytes_unchecked(
slice: &'a [u8]
) -> Result<(DFA<&'a [u32]>, usize), DeserializeError>
pub unsafe fn from_bytes_unchecked(
slice: &'a [u8]
) -> Result<(DFA<&'a [u32]>, usize), DeserializeError>
Deserialize a DFA with a specific state identifier representation in constant time by omitting the verification of the validity of the transition table and other data inside the DFA.
This is just like DFA::from_bytes
, except it can potentially return
a DFA that exhibits undefined behavior if its transition table contains
invalid state identifiers.
This routine is useful if you need to deserialize a DFA cheaply
and cannot afford the transition table validation performed by
from_bytes
.
Example
use regex_automata::{dfa::{Automaton, dense::DFA}, HalfMatch};
let initial = DFA::new("foo[0-9]+")?;
let (bytes, _) = initial.to_bytes_native_endian();
// SAFETY: This is guaranteed to be safe since the bytes given come
// directly from a compatible serialization routine.
let dfa: DFA<&[u32]> = unsafe { DFA::from_bytes_unchecked(&bytes)?.0 };
let expected = HalfMatch::must(0, 8);
assert_eq!(Some(expected), dfa.find_leftmost_fwd(b"foo12345")?);
Trait Implementations
sourceimpl<T: AsRef<[u32]>> Automaton for DFA<T>
impl<T: AsRef<[u32]>> Automaton for DFA<T>
sourcefn is_special_state(&self, id: StateID) -> bool
fn is_special_state(&self, id: StateID) -> bool
Returns true if and only if the given identifier corresponds to a “special” state. A special state is one or more of the following: a dead state, a quit state, a match state, a start state or an accelerated state. Read more
sourcefn is_dead_state(&self, id: StateID) -> bool
fn is_dead_state(&self, id: StateID) -> bool
Returns true if and only if the given identifier corresponds to a dead state. When a DFA enters a dead state, it is impossible to leave. That is, every transition on a dead state by definition leads back to the same dead state. Read more
sourcefn is_quit_state(&self, id: StateID) -> bool
fn is_quit_state(&self, id: StateID) -> bool
Returns true if and only if the given identifier corresponds to a quit state. A quit state is like a dead state (it has no transitions other than to itself), except it indicates that the DFA failed to complete the search. When this occurs, callers can neither accept or reject that a match occurred. Read more
sourcefn is_match_state(&self, id: StateID) -> bool
fn is_match_state(&self, id: StateID) -> bool
Returns true if and only if the given identifier corresponds to a match state. A match state is also referred to as a “final” state and indicates that a match has been found. Read more
sourcefn is_start_state(&self, id: StateID) -> bool
fn is_start_state(&self, id: StateID) -> bool
Returns true if and only if the given identifier corresponds to a start state. A start state is a state in which a DFA begins a search. All searches begin in a start state. Moreover, since all matches are delayed by one byte, a start state can never be a match state. Read more
sourcefn is_accel_state(&self, id: StateID) -> bool
fn is_accel_state(&self, id: StateID) -> bool
Returns true if and only if the given identifier corresponds to an accelerated state. Read more
sourcefn next_state(&self, current: StateID, input: u8) -> StateID
fn next_state(&self, current: StateID, input: u8) -> StateID
Transitions from the current state to the next state, given the next byte of input. Read more
sourceunsafe fn next_state_unchecked(&self, current: StateID, input: u8) -> StateID
unsafe fn next_state_unchecked(&self, current: StateID, input: u8) -> StateID
Transitions from the current state to the next state, given the next byte of input. Read more
sourcefn next_eoi_state(&self, current: StateID) -> StateID
fn next_eoi_state(&self, current: StateID) -> StateID
Transitions from the current state to the next state for the special EOI symbol. Read more
sourcefn pattern_count(&self) -> usize
fn pattern_count(&self) -> usize
Returns the total number of patterns compiled into this DFA. Read more
sourcefn match_count(&self, id: StateID) -> usize
fn match_count(&self, id: StateID) -> usize
Returns the total number of patterns that match in this state. Read more
sourcefn match_pattern(&self, id: StateID, match_index: usize) -> PatternID
fn match_pattern(&self, id: StateID, match_index: usize) -> PatternID
Returns the pattern ID corresponding to the given match index in the given state. Read more
sourcefn start_state_forward(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> StateID
fn start_state_forward(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> StateID
Return the ID of the start state for this DFA when executing a forward search. Read more
sourcefn start_state_reverse(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> StateID
fn start_state_reverse(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> StateID
Return the ID of the start state for this DFA when executing a reverse search. Read more
sourcefn accelerator(&self, id: StateID) -> &[u8]ⓘNotable traits for &'_ [u8]impl<'_> Read for &'_ [u8]impl<'_> Write for &'_ mut [u8]
fn accelerator(&self, id: StateID) -> &[u8]ⓘNotable traits for &'_ [u8]impl<'_> Read for &'_ [u8]impl<'_> Write for &'_ mut [u8]
Return a slice of bytes to accelerate for the given state, if possible. Read more
sourcefn find_earliest_fwd(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
fn find_earliest_fwd(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
Executes a forward search and returns the end position of the first
match that is found as early as possible. If no match exists, then
None
is returned. Read more
sourcefn find_earliest_rev(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
fn find_earliest_rev(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
Executes a reverse search and returns the start position of the first
match that is found as early as possible. If no match exists, then
None
is returned. Read more
sourcefn find_leftmost_fwd(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
fn find_leftmost_fwd(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
Executes a forward search and returns the end position of the leftmost
match that is found. If no match exists, then None
is returned. Read more
sourcefn find_leftmost_rev(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
fn find_leftmost_rev(
&self,
bytes: &[u8]
) -> Result<Option<HalfMatch>, MatchError>
Executes a reverse search and returns the start of the position of the
leftmost match that is found. If no match exists, then None
is
returned. Read more
sourcefn find_overlapping_fwd(
&self,
bytes: &[u8],
state: &mut OverlappingState
) -> Result<Option<HalfMatch>, MatchError>
fn find_overlapping_fwd(
&self,
bytes: &[u8],
state: &mut OverlappingState
) -> Result<Option<HalfMatch>, MatchError>
Executes an overlapping forward search and returns the end position of
matches as they are found. If no match exists, then None
is returned. Read more
sourcefn find_earliest_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
fn find_earliest_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
Executes a forward search and returns the end position of the first
match that is found as early as possible. If no match exists, then
None
is returned. Read more
sourcefn find_earliest_rev_at(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
fn find_earliest_rev_at(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
Executes a reverse search and returns the start position of the first
match that is found as early as possible. If no match exists, then
None
is returned. Read more
sourcefn find_leftmost_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
fn find_leftmost_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
Executes a forward search and returns the end position of the leftmost
match that is found. If no match exists, then None
is returned. Read more
sourcefn find_leftmost_rev_at(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
fn find_leftmost_rev_at(
&self,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize
) -> Result<Option<HalfMatch>, MatchError>
Executes a reverse search and returns the start of the position of the
leftmost match that is found. If no match exists, then None
is
returned. Read more
sourcefn find_overlapping_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<HalfMatch>, MatchError>
fn find_overlapping_fwd_at(
&self,
pre: Option<&mut Scanner<'_>>,
pattern_id: Option<PatternID>,
bytes: &[u8],
start: usize,
end: usize,
state: &mut OverlappingState
) -> Result<Option<HalfMatch>, MatchError>
Executes an overlapping forward search and returns the end position of
matches as they are found. If no match exists, then None
is returned. Read more
Auto Trait Implementations
impl<T> RefUnwindSafe for DFA<T> where
T: RefUnwindSafe,
impl<T> Send for DFA<T> where
T: Send,
impl<T> Sync for DFA<T> where
T: Sync,
impl<T> Unpin for DFA<T> where
T: Unpin,
impl<T> UnwindSafe for DFA<T> where
T: UnwindSafe,
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
sourceimpl<T> ToOwned for T where
T: Clone,
impl<T> ToOwned for T where
T: Clone,
type Owned = T
type Owned = T
The resulting type after obtaining ownership.
sourcefn clone_into(&self, target: &mut T)
fn clone_into(&self, target: &mut T)
toowned_clone_into
)Uses borrowed data to replace owned data, usually by cloning. Read more