regex_automata/util/
iter.rs

1/*!
2Generic helpers for iteration of matches from a regex engine in a haystack.
3
4The principle type in this module is a [`Searcher`]. A `Searcher` provides
5its own lower level iterator-like API in addition to methods for constructing
6types that implement `Iterator`. The documentation for `Searcher` explains a
7bit more about why these different APIs exist.
8
9Currently, this module supports iteration over any regex engine that works
10with the [`HalfMatch`], [`Match`] or [`Captures`] types.
11*/
12
13#[cfg(feature = "alloc")]
14use crate::util::captures::Captures;
15use crate::util::search::{HalfMatch, Input, Match, MatchError};
16
17/// A searcher for creating iterators and performing lower level iteration.
18///
19/// This searcher encapsulates the logic required for finding all successive
20/// non-overlapping matches in a haystack. In theory, iteration would look
21/// something like this:
22///
23/// 1. Setting the start position to `0`.
24/// 2. Execute a regex search. If no match, end iteration.
25/// 3. Report the match and set the start position to the end of the match.
26/// 4. Go back to (2).
27///
28/// And if this were indeed the case, it's likely that `Searcher` wouldn't
29/// exist. Unfortunately, because a regex may match the empty string, the above
30/// logic won't work for all possible regexes. Namely, if an empty match is
31/// found, then step (3) would set the start position of the search to the
32/// position it was at. Thus, iteration would never end.
33///
34/// Instead, a `Searcher` knows how to detect these cases and forcefully
35/// advance iteration in the case of an empty match that overlaps with a
36/// previous match.
37///
38/// If you know that your regex cannot match any empty string, then the simple
39/// algorithm described above will work correctly.
40///
41/// When possible, prefer the iterators defined on the regex engine you're
42/// using. This tries to abstract over the regex engine and is thus a bit more
43/// unwieldy to use.
44///
45/// In particular, a `Searcher` is not itself an iterator. Instead, it provides
46/// `advance` routines that permit moving the search along explicitly. It also
47/// provides various routines, like [`Searcher::into_matches_iter`], that
48/// accept a closure (representing how a regex engine executes a search) and
49/// returns a conventional iterator.
50///
51/// The lifetime parameters come from the [`Input`] type passed to
52/// [`Searcher::new`]:
53///
54/// * `'h` is the lifetime of the underlying haystack.
55///
56/// # Searcher vs Iterator
57///
58/// Why does a search type with "advance" APIs exist at all when we also have
59/// iterators? Unfortunately, the reasoning behind this split is a complex
60/// combination of the following things:
61///
62/// 1. While many of the regex engines expose their own iterators, it is also
63/// nice to expose this lower level iteration helper because it permits callers
64/// to provide their own `Input` configuration. Moreover, a `Searcher` can work
65/// with _any_ regex engine instead of only the ones defined in this crate.
66/// This way, everyone benefits from a shared iteration implementation.
67/// 2. There are many different regex engines that, while they have the same
68/// match semantics, they have slightly different APIs. Iteration is just
69/// complex enough to want to share code, and so we need a way of abstracting
70/// over those different regex engines. While we could define a new trait that
71/// describes any regex engine search API, it would wind up looking very close
72/// to a closure. While there may still be reasons for the more generic trait
73/// to exist, for now and for the purposes of iteration, we use a closure.
74/// Closures also provide a lot of easy flexibility at the call site, in that
75/// they permit the caller to borrow any kind of state they want for use during
76/// each search call.
77/// 3. As a result of using closures, and because closures are anonymous types
78/// that cannot be named, it is difficult to encapsulate them without both
79/// costs to speed and added complexity to the public API. For example, in
80/// defining an iterator type like
81/// [`dfa::regex::FindMatches`](crate::dfa::regex::FindMatches),
82/// if we use a closure internally, it's not possible to name this type in the
83/// return type of the iterator constructor. Thus, the only way around it is
84/// to erase the type by boxing it and turning it into a `Box<dyn FnMut ...>`.
85/// This boxed closure is unlikely to be inlined _and_ it infects the public
86/// API in subtle ways. Namely, unless you declare the closure as implementing
87/// `Send` and `Sync`, then the resulting iterator type won't implement it
88/// either. But there are practical issues with requiring the closure to
89/// implement `Send` and `Sync` that result in other API complexities that
90/// are beyond the scope of this already long exposition.
91/// 4. Some regex engines expose more complex match information than just
92/// "which pattern matched" and "at what offsets." For example, the PikeVM
93/// exposes match spans for each capturing group that participated in the
94/// match. In such cases, it can be quite beneficial to reuse the capturing
95/// group allocation on subsequent searches. A proper iterator doesn't permit
96/// this API due to its interface, so it's useful to have something a bit lower
97/// level that permits callers to amortize allocations while also reusing a
98/// shared implementation of iteration. (See the documentation for
99/// [`Searcher::advance`] for an example of using the "advance" API with the
100/// PikeVM.)
101///
102/// What this boils down to is that there are "advance" APIs which require
103/// handing a closure to it for every call, and there are also APIs to create
104/// iterators from a closure. The former are useful for _implementing_
105/// iterators or when you need more flexibility, while the latter are useful
106/// for conveniently writing custom iterators on-the-fly.
107///
108/// # Example: iterating with captures
109///
110/// Several regex engines in this crate over convenient iterator APIs over
111/// [`Captures`] values. To do so, this requires allocating a new `Captures`
112/// value for each iteration step. This can perhaps be more costly than you
113/// might want. Instead of implementing your own iterator to avoid that
114/// cost (which can be a little subtle if you want to handle empty matches
115/// correctly), you can use this `Searcher` to do it for you:
116///
117/// ```
118/// use regex_automata::{
119///     nfa::thompson::pikevm::PikeVM,
120///     util::iter::Searcher,
121///     Input, Span,
122/// };
123///
124/// let re = PikeVM::new("foo(?P<numbers>[0-9]+)")?;
125/// let haystack = "foo1 foo12 foo123";
126///
127/// let mut caps = re.create_captures();
128/// let mut cache = re.create_cache();
129/// let mut matches = vec![];
130/// let mut searcher = Searcher::new(Input::new(haystack));
131/// while let Some(_) = searcher.advance(|input| {
132///     re.search(&mut cache, input, &mut caps);
133///     Ok(caps.get_match())
134/// }) {
135///     // The unwrap is OK since 'numbers' matches if the pattern matches.
136///     matches.push(caps.get_group_by_name("numbers").unwrap());
137/// }
138/// assert_eq!(matches, vec![
139///     Span::from(3..4),
140///     Span::from(8..10),
141///     Span::from(14..17),
142/// ]);
143///
144/// # Ok::<(), Box<dyn std::error::Error>>(())
145/// ```
146#[derive(Clone, Debug)]
147pub struct Searcher<'h> {
148    /// The input parameters to give to each regex engine call.
149    ///
150    /// The start position of the search is mutated during iteration.
151    input: Input<'h>,
152    /// Records the end offset of the most recent match. This is necessary to
153    /// handle a corner case for preventing empty matches from overlapping with
154    /// the ending bounds of a prior match.
155    last_match_end: Option<usize>,
156}
157
158impl<'h> Searcher<'h> {
159    /// Create a new fallible non-overlapping matches iterator.
160    ///
161    /// The given `input` provides the parameters (including the haystack),
162    /// while the `finder` represents a closure that calls the underlying regex
163    /// engine. The closure may borrow any additional state that is needed,
164    /// such as a prefilter scanner.
165    pub fn new(input: Input<'h>) -> Searcher<'h> {
166        Searcher { input, last_match_end: None }
167    }
168
169    /// Returns the current `Input` used by this searcher.
170    ///
171    /// The `Input` returned is generally equivalent to the one given to
172    /// [`Searcher::new`], but its start position may be different to reflect
173    /// the start of the next search to be executed.
174    pub fn input<'s>(&'s self) -> &'s Input<'h> {
175        &self.input
176    }
177
178    /// Return the next half match for an infallible search if one exists, and
179    /// advance to the next position.
180    ///
181    /// This is like `try_advance_half`, except errors are converted into
182    /// panics.
183    ///
184    /// # Panics
185    ///
186    /// If the given closure returns an error, then this panics. This is useful
187    /// when you know your underlying regex engine has been configured to not
188    /// return an error.
189    ///
190    /// # Example
191    ///
192    /// This example shows how to use a `Searcher` to iterate over all matches
193    /// when using a DFA, which only provides "half" matches.
194    ///
195    /// ```
196    /// use regex_automata::{
197    ///     hybrid::dfa::DFA,
198    ///     util::iter::Searcher,
199    ///     HalfMatch, Input,
200    /// };
201    ///
202    /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
203    /// let mut cache = re.create_cache();
204    ///
205    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
206    /// let mut it = Searcher::new(input);
207    ///
208    /// let expected = Some(HalfMatch::must(0, 10));
209    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
210    /// assert_eq!(expected, got);
211    ///
212    /// let expected = Some(HalfMatch::must(0, 21));
213    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
214    /// assert_eq!(expected, got);
215    ///
216    /// let expected = Some(HalfMatch::must(0, 32));
217    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
218    /// assert_eq!(expected, got);
219    ///
220    /// let expected = None;
221    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
222    /// assert_eq!(expected, got);
223    ///
224    /// # Ok::<(), Box<dyn std::error::Error>>(())
225    /// ```
226    ///
227    /// This correctly moves iteration forward even when an empty match occurs:
228    ///
229    /// ```
230    /// use regex_automata::{
231    ///     hybrid::dfa::DFA,
232    ///     util::iter::Searcher,
233    ///     HalfMatch, Input,
234    /// };
235    ///
236    /// let re = DFA::new(r"a|")?;
237    /// let mut cache = re.create_cache();
238    ///
239    /// let input = Input::new("abba");
240    /// let mut it = Searcher::new(input);
241    ///
242    /// let expected = Some(HalfMatch::must(0, 1));
243    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
244    /// assert_eq!(expected, got);
245    ///
246    /// let expected = Some(HalfMatch::must(0, 2));
247    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
248    /// assert_eq!(expected, got);
249    ///
250    /// let expected = Some(HalfMatch::must(0, 4));
251    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
252    /// assert_eq!(expected, got);
253    ///
254    /// let expected = None;
255    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
256    /// assert_eq!(expected, got);
257    ///
258    /// # Ok::<(), Box<dyn std::error::Error>>(())
259    /// ```
260    #[inline]
261    pub fn advance_half<F>(&mut self, finder: F) -> Option<HalfMatch>
262    where
263        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
264    {
265        match self.try_advance_half(finder) {
266            Ok(m) => m,
267            Err(err) => panic!(
268                "unexpected regex half find error: {}\n\
269                 to handle find errors, use 'try' or 'search' methods",
270                err,
271            ),
272        }
273    }
274
275    /// Return the next match for an infallible search if one exists, and
276    /// advance to the next position.
277    ///
278    /// The search is advanced even in the presence of empty matches by
279    /// forbidding empty matches from overlapping with any other match.
280    ///
281    /// This is like `try_advance`, except errors are converted into panics.
282    ///
283    /// # Panics
284    ///
285    /// If the given closure returns an error, then this panics. This is useful
286    /// when you know your underlying regex engine has been configured to not
287    /// return an error.
288    ///
289    /// # Example
290    ///
291    /// This example shows how to use a `Searcher` to iterate over all matches
292    /// when using a regex based on lazy DFAs:
293    ///
294    /// ```
295    /// use regex_automata::{
296    ///     hybrid::regex::Regex,
297    ///     util::iter::Searcher,
298    ///     Match, Input,
299    /// };
300    ///
301    /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
302    /// let mut cache = re.create_cache();
303    ///
304    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
305    /// let mut it = Searcher::new(input);
306    ///
307    /// let expected = Some(Match::must(0, 0..10));
308    /// let got = it.advance(|input| re.try_search(&mut cache, input));
309    /// assert_eq!(expected, got);
310    ///
311    /// let expected = Some(Match::must(0, 11..21));
312    /// let got = it.advance(|input| re.try_search(&mut cache, input));
313    /// assert_eq!(expected, got);
314    ///
315    /// let expected = Some(Match::must(0, 22..32));
316    /// let got = it.advance(|input| re.try_search(&mut cache, input));
317    /// assert_eq!(expected, got);
318    ///
319    /// let expected = None;
320    /// let got = it.advance(|input| re.try_search(&mut cache, input));
321    /// assert_eq!(expected, got);
322    ///
323    /// # Ok::<(), Box<dyn std::error::Error>>(())
324    /// ```
325    ///
326    /// This example shows the same as above, but with the PikeVM. This example
327    /// is useful because it shows how to use this API even when the regex
328    /// engine doesn't directly return a `Match`.
329    ///
330    /// ```
331    /// use regex_automata::{
332    ///     nfa::thompson::pikevm::PikeVM,
333    ///     util::iter::Searcher,
334    ///     Match, Input,
335    /// };
336    ///
337    /// let re = PikeVM::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
338    /// let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
339    ///
340    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
341    /// let mut it = Searcher::new(input);
342    ///
343    /// let expected = Some(Match::must(0, 0..10));
344    /// let got = it.advance(|input| {
345    ///     re.search(&mut cache, input, &mut caps);
346    ///     Ok(caps.get_match())
347    /// });
348    /// // Note that if we wanted to extract capturing group spans, we could
349    /// // do that here with 'caps'.
350    /// assert_eq!(expected, got);
351    ///
352    /// let expected = Some(Match::must(0, 11..21));
353    /// let got = it.advance(|input| {
354    ///     re.search(&mut cache, input, &mut caps);
355    ///     Ok(caps.get_match())
356    /// });
357    /// assert_eq!(expected, got);
358    ///
359    /// let expected = Some(Match::must(0, 22..32));
360    /// let got = it.advance(|input| {
361    ///     re.search(&mut cache, input, &mut caps);
362    ///     Ok(caps.get_match())
363    /// });
364    /// assert_eq!(expected, got);
365    ///
366    /// let expected = None;
367    /// let got = it.advance(|input| {
368    ///     re.search(&mut cache, input, &mut caps);
369    ///     Ok(caps.get_match())
370    /// });
371    /// assert_eq!(expected, got);
372    ///
373    /// # Ok::<(), Box<dyn std::error::Error>>(())
374    /// ```
375    #[inline]
376    pub fn advance<F>(&mut self, finder: F) -> Option<Match>
377    where
378        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
379    {
380        match self.try_advance(finder) {
381            Ok(m) => m,
382            Err(err) => panic!(
383                "unexpected regex find error: {}\n\
384                 to handle find errors, use 'try' or 'search' methods",
385                err,
386            ),
387        }
388    }
389
390    /// Return the next half match for a fallible search if one exists, and
391    /// advance to the next position.
392    ///
393    /// This is like `advance_half`, except it permits callers to handle errors
394    /// during iteration.
395    #[inline]
396    pub fn try_advance_half<F>(
397        &mut self,
398        mut finder: F,
399    ) -> Result<Option<HalfMatch>, MatchError>
400    where
401        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
402    {
403        let mut m = match finder(&self.input)? {
404            None => return Ok(None),
405            Some(m) => m,
406        };
407        if Some(m.offset()) == self.last_match_end {
408            m = match self.handle_overlapping_empty_half_match(m, finder)? {
409                None => return Ok(None),
410                Some(m) => m,
411            };
412        }
413        self.input.set_start(m.offset());
414        self.last_match_end = Some(m.offset());
415        Ok(Some(m))
416    }
417
418    /// Return the next match for a fallible search if one exists, and advance
419    /// to the next position.
420    ///
421    /// This is like `advance`, except it permits callers to handle errors
422    /// during iteration.
423    #[inline]
424    pub fn try_advance<F>(
425        &mut self,
426        mut finder: F,
427    ) -> Result<Option<Match>, MatchError>
428    where
429        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
430    {
431        let mut m = match finder(&self.input)? {
432            None => return Ok(None),
433            Some(m) => m,
434        };
435        if m.is_empty() && Some(m.end()) == self.last_match_end {
436            m = match self.handle_overlapping_empty_match(m, finder)? {
437                None => return Ok(None),
438                Some(m) => m,
439            };
440        }
441        self.input.set_start(m.end());
442        self.last_match_end = Some(m.end());
443        Ok(Some(m))
444    }
445
446    /// Given a closure that executes a single search, return an iterator over
447    /// all successive non-overlapping half matches.
448    ///
449    /// The iterator returned yields result values. If the underlying regex
450    /// engine is configured to never return an error, consider calling
451    /// [`TryHalfMatchesIter::infallible`] to convert errors into panics.
452    ///
453    /// # Example
454    ///
455    /// This example shows how to use a `Searcher` to create a proper
456    /// iterator over half matches.
457    ///
458    /// ```
459    /// use regex_automata::{
460    ///     hybrid::dfa::DFA,
461    ///     util::iter::Searcher,
462    ///     HalfMatch, Input,
463    /// };
464    ///
465    /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
466    /// let mut cache = re.create_cache();
467    ///
468    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
469    /// let mut it = Searcher::new(input).into_half_matches_iter(|input| {
470    ///     re.try_search_fwd(&mut cache, input)
471    /// });
472    ///
473    /// let expected = Some(Ok(HalfMatch::must(0, 10)));
474    /// assert_eq!(expected, it.next());
475    ///
476    /// let expected = Some(Ok(HalfMatch::must(0, 21)));
477    /// assert_eq!(expected, it.next());
478    ///
479    /// let expected = Some(Ok(HalfMatch::must(0, 32)));
480    /// assert_eq!(expected, it.next());
481    ///
482    /// let expected = None;
483    /// assert_eq!(expected, it.next());
484    ///
485    /// # Ok::<(), Box<dyn std::error::Error>>(())
486    /// ```
487    #[inline]
488    pub fn into_half_matches_iter<F>(
489        self,
490        finder: F,
491    ) -> TryHalfMatchesIter<'h, F>
492    where
493        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
494    {
495        TryHalfMatchesIter { it: self, finder }
496    }
497
498    /// Given a closure that executes a single search, return an iterator over
499    /// all successive non-overlapping matches.
500    ///
501    /// The iterator returned yields result values. If the underlying regex
502    /// engine is configured to never return an error, consider calling
503    /// [`TryMatchesIter::infallible`] to convert errors into panics.
504    ///
505    /// # Example
506    ///
507    /// This example shows how to use a `Searcher` to create a proper
508    /// iterator over matches.
509    ///
510    /// ```
511    /// use regex_automata::{
512    ///     hybrid::regex::Regex,
513    ///     util::iter::Searcher,
514    ///     Match, Input,
515    /// };
516    ///
517    /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
518    /// let mut cache = re.create_cache();
519    ///
520    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
521    /// let mut it = Searcher::new(input).into_matches_iter(|input| {
522    ///     re.try_search(&mut cache, input)
523    /// });
524    ///
525    /// let expected = Some(Ok(Match::must(0, 0..10)));
526    /// assert_eq!(expected, it.next());
527    ///
528    /// let expected = Some(Ok(Match::must(0, 11..21)));
529    /// assert_eq!(expected, it.next());
530    ///
531    /// let expected = Some(Ok(Match::must(0, 22..32)));
532    /// assert_eq!(expected, it.next());
533    ///
534    /// let expected = None;
535    /// assert_eq!(expected, it.next());
536    ///
537    /// # Ok::<(), Box<dyn std::error::Error>>(())
538    /// ```
539    #[inline]
540    pub fn into_matches_iter<F>(self, finder: F) -> TryMatchesIter<'h, F>
541    where
542        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
543    {
544        TryMatchesIter { it: self, finder }
545    }
546
547    /// Given a closure that executes a single search, return an iterator over
548    /// all successive non-overlapping `Captures` values.
549    ///
550    /// The iterator returned yields result values. If the underlying regex
551    /// engine is configured to never return an error, consider calling
552    /// [`TryCapturesIter::infallible`] to convert errors into panics.
553    ///
554    /// Unlike the other iterator constructors, this accepts an initial
555    /// `Captures` value. This `Captures` value is reused for each search, and
556    /// the iterator implementation clones it before returning it. The caller
557    /// must provide this value because the iterator is purposely ignorant
558    /// of the underlying regex engine and thus doesn't know how to create
559    /// one itself. More to the point, a `Captures` value itself has a few
560    /// different constructors, which change which kind of information is
561    /// available to query in exchange for search performance.
562    ///
563    /// # Example
564    ///
565    /// This example shows how to use a `Searcher` to create a proper iterator
566    /// over `Captures` values, which provides access to all capturing group
567    /// spans for each match.
568    ///
569    /// ```
570    /// use regex_automata::{
571    ///     nfa::thompson::pikevm::PikeVM,
572    ///     util::iter::Searcher,
573    ///     Input,
574    /// };
575    ///
576    /// let re = PikeVM::new(
577    ///     r"(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})",
578    /// )?;
579    /// let (mut cache, caps) = (re.create_cache(), re.create_captures());
580    ///
581    /// let haystack = "2010-03-14 2016-10-08 2020-10-22";
582    /// let input = Input::new(haystack);
583    /// let mut it = Searcher::new(input)
584    ///     .into_captures_iter(caps, |input, caps| {
585    ///         re.search(&mut cache, input, caps);
586    ///         Ok(())
587    ///     });
588    ///
589    /// let got = it.next().expect("first date")?;
590    /// let year = got.get_group_by_name("y").expect("must match");
591    /// assert_eq!("2010", &haystack[year]);
592    ///
593    /// let got = it.next().expect("second date")?;
594    /// let month = got.get_group_by_name("m").expect("must match");
595    /// assert_eq!("10", &haystack[month]);
596    ///
597    /// let got = it.next().expect("third date")?;
598    /// let day = got.get_group_by_name("d").expect("must match");
599    /// assert_eq!("22", &haystack[day]);
600    ///
601    /// assert!(it.next().is_none());
602    ///
603    /// # Ok::<(), Box<dyn std::error::Error>>(())
604    /// ```
605    #[cfg(feature = "alloc")]
606    #[inline]
607    pub fn into_captures_iter<F>(
608        self,
609        caps: Captures,
610        finder: F,
611    ) -> TryCapturesIter<'h, F>
612    where
613        F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
614    {
615        TryCapturesIter { it: self, caps, finder }
616    }
617
618    /// Handles the special case of a match that begins where the previous
619    /// match ended. Without this special handling, it'd be possible to get
620    /// stuck where an empty match never results in forward progress. This
621    /// also makes it more consistent with how presiding general purpose regex
622    /// engines work.
623    #[cold]
624    #[inline(never)]
625    fn handle_overlapping_empty_half_match<F>(
626        &mut self,
627        _: HalfMatch,
628        mut finder: F,
629    ) -> Result<Option<HalfMatch>, MatchError>
630    where
631        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
632    {
633        // Since we are only here when 'm.offset()' matches the offset of the
634        // last match, it follows that this must have been an empty match.
635        // Since we both need to make progress *and* prevent overlapping
636        // matches, we discard this match and advance the search by 1.
637        //
638        // Note that this may start a search in the middle of a codepoint. The
639        // regex engines themselves are expected to deal with that and not
640        // report any matches within a codepoint if they are configured in
641        // UTF-8 mode.
642        self.input.set_start(self.input.start().checked_add(1).unwrap());
643        finder(&self.input)
644    }
645
646    /// Handles the special case of an empty match by ensuring that 1) the
647    /// iterator always advances and 2) empty matches never overlap with other
648    /// matches.
649    ///
650    /// (1) is necessary because we principally make progress by setting the
651    /// starting location of the next search to the ending location of the last
652    /// match. But if a match is empty, then this results in a search that does
653    /// not advance and thus does not terminate.
654    ///
655    /// (2) is not strictly necessary, but makes intuitive sense and matches
656    /// the presiding behavior of most general purpose regex engines. The
657    /// "intuitive sense" here is that we want to report NON-overlapping
658    /// matches. So for example, given the regex 'a|(?:)' against the haystack
659    /// 'a', without the special handling, you'd get the matches [0, 1) and [1,
660    /// 1), where the latter overlaps with the end bounds of the former.
661    ///
662    /// Note that we mark this cold and forcefully prevent inlining because
663    /// handling empty matches like this is extremely rare and does require
664    /// quite a bit of code, comparatively. Keeping this code out of the main
665    /// iterator function keeps it smaller and more amenable to inlining
666    /// itself.
667    #[cold]
668    #[inline(never)]
669    fn handle_overlapping_empty_match<F>(
670        &mut self,
671        m: Match,
672        mut finder: F,
673    ) -> Result<Option<Match>, MatchError>
674    where
675        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
676    {
677        assert!(m.is_empty());
678        self.input.set_start(self.input.start().checked_add(1).unwrap());
679        finder(&self.input)
680    }
681}
682
683/// An iterator over all non-overlapping half matches for a fallible search.
684///
685/// The iterator yields a `Result<HalfMatch, MatchError>` value until no more
686/// matches could be found.
687///
688/// The type parameters are as follows:
689///
690/// * `F` represents the type of a closure that executes the search.
691///
692/// The lifetime parameters come from the [`Input`] type:
693///
694/// * `'h` is the lifetime of the underlying haystack.
695///
696/// When possible, prefer the iterators defined on the regex engine you're
697/// using. This tries to abstract over the regex engine and is thus a bit more
698/// unwieldy to use.
699///
700/// This iterator is created by [`Searcher::into_half_matches_iter`].
701pub struct TryHalfMatchesIter<'h, F> {
702    it: Searcher<'h>,
703    finder: F,
704}
705
706impl<'h, F> TryHalfMatchesIter<'h, F> {
707    /// Return an infallible version of this iterator.
708    ///
709    /// Any item yielded that corresponds to an error results in a panic. This
710    /// is useful if your underlying regex engine is configured in a way that
711    /// it is guaranteed to never return an error.
712    pub fn infallible(self) -> HalfMatchesIter<'h, F> {
713        HalfMatchesIter(self)
714    }
715
716    /// Returns the current `Input` used by this iterator.
717    ///
718    /// The `Input` returned is generally equivalent to the one used to
719    /// construct this iterator, but its start position may be different to
720    /// reflect the start of the next search to be executed.
721    pub fn input<'i>(&'i self) -> &'i Input<'h> {
722        self.it.input()
723    }
724}
725
726impl<'h, F> Iterator for TryHalfMatchesIter<'h, F>
727where
728    F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
729{
730    type Item = Result<HalfMatch, MatchError>;
731
732    #[inline]
733    fn next(&mut self) -> Option<Result<HalfMatch, MatchError>> {
734        self.it.try_advance_half(&mut self.finder).transpose()
735    }
736}
737
738impl<'h, F> core::fmt::Debug for TryHalfMatchesIter<'h, F> {
739    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
740        f.debug_struct("TryHalfMatchesIter")
741            .field("it", &self.it)
742            .field("finder", &"<closure>")
743            .finish()
744    }
745}
746
747/// An iterator over all non-overlapping half matches for an infallible search.
748///
749/// The iterator yields a [`HalfMatch`] value until no more matches could be
750/// found.
751///
752/// The type parameters are as follows:
753///
754/// * `F` represents the type of a closure that executes the search.
755///
756/// The lifetime parameters come from the [`Input`] type:
757///
758/// * `'h` is the lifetime of the underlying haystack.
759///
760/// When possible, prefer the iterators defined on the regex engine you're
761/// using. This tries to abstract over the regex engine and is thus a bit more
762/// unwieldy to use.
763///
764/// This iterator is created by [`Searcher::into_half_matches_iter`] and
765/// then calling [`TryHalfMatchesIter::infallible`].
766#[derive(Debug)]
767pub struct HalfMatchesIter<'h, F>(TryHalfMatchesIter<'h, F>);
768
769impl<'h, F> HalfMatchesIter<'h, F> {
770    /// Returns the current `Input` used by this iterator.
771    ///
772    /// The `Input` returned is generally equivalent to the one used to
773    /// construct this iterator, but its start position may be different to
774    /// reflect the start of the next search to be executed.
775    pub fn input<'i>(&'i self) -> &'i Input<'h> {
776        self.0.it.input()
777    }
778}
779
780impl<'h, F> Iterator for HalfMatchesIter<'h, F>
781where
782    F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
783{
784    type Item = HalfMatch;
785
786    #[inline]
787    fn next(&mut self) -> Option<HalfMatch> {
788        match self.0.next()? {
789            Ok(m) => Some(m),
790            Err(err) => panic!(
791                "unexpected regex half find error: {}\n\
792                 to handle find errors, use 'try' or 'search' methods",
793                err,
794            ),
795        }
796    }
797}
798
799/// An iterator over all non-overlapping matches for a fallible search.
800///
801/// The iterator yields a `Result<Match, MatchError>` value until no more
802/// matches could be found.
803///
804/// The type parameters are as follows:
805///
806/// * `F` represents the type of a closure that executes the search.
807///
808/// The lifetime parameters come from the [`Input`] type:
809///
810/// * `'h` is the lifetime of the underlying haystack.
811///
812/// When possible, prefer the iterators defined on the regex engine you're
813/// using. This tries to abstract over the regex engine and is thus a bit more
814/// unwieldy to use.
815///
816/// This iterator is created by [`Searcher::into_matches_iter`].
817pub struct TryMatchesIter<'h, F> {
818    it: Searcher<'h>,
819    finder: F,
820}
821
822impl<'h, F> TryMatchesIter<'h, F> {
823    /// Return an infallible version of this iterator.
824    ///
825    /// Any item yielded that corresponds to an error results in a panic. This
826    /// is useful if your underlying regex engine is configured in a way that
827    /// it is guaranteed to never return an error.
828    pub fn infallible(self) -> MatchesIter<'h, F> {
829        MatchesIter(self)
830    }
831
832    /// Returns the current `Input` used by this iterator.
833    ///
834    /// The `Input` returned is generally equivalent to the one used to
835    /// construct this iterator, but its start position may be different to
836    /// reflect the start of the next search to be executed.
837    pub fn input<'i>(&'i self) -> &'i Input<'h> {
838        self.it.input()
839    }
840}
841
842impl<'h, F> Iterator for TryMatchesIter<'h, F>
843where
844    F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
845{
846    type Item = Result<Match, MatchError>;
847
848    #[inline]
849    fn next(&mut self) -> Option<Result<Match, MatchError>> {
850        self.it.try_advance(&mut self.finder).transpose()
851    }
852}
853
854impl<'h, F> core::fmt::Debug for TryMatchesIter<'h, F> {
855    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
856        f.debug_struct("TryMatchesIter")
857            .field("it", &self.it)
858            .field("finder", &"<closure>")
859            .finish()
860    }
861}
862
863/// An iterator over all non-overlapping matches for an infallible search.
864///
865/// The iterator yields a [`Match`] value until no more matches could be found.
866///
867/// The type parameters are as follows:
868///
869/// * `F` represents the type of a closure that executes the search.
870///
871/// The lifetime parameters come from the [`Input`] type:
872///
873/// * `'h` is the lifetime of the underlying haystack.
874///
875/// When possible, prefer the iterators defined on the regex engine you're
876/// using. This tries to abstract over the regex engine and is thus a bit more
877/// unwieldy to use.
878///
879/// This iterator is created by [`Searcher::into_matches_iter`] and
880/// then calling [`TryMatchesIter::infallible`].
881#[derive(Debug)]
882pub struct MatchesIter<'h, F>(TryMatchesIter<'h, F>);
883
884impl<'h, F> MatchesIter<'h, F> {
885    /// Returns the current `Input` used by this iterator.
886    ///
887    /// The `Input` returned is generally equivalent to the one used to
888    /// construct this iterator, but its start position may be different to
889    /// reflect the start of the next search to be executed.
890    pub fn input<'i>(&'i self) -> &'i Input<'h> {
891        self.0.it.input()
892    }
893}
894
895impl<'h, F> Iterator for MatchesIter<'h, F>
896where
897    F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
898{
899    type Item = Match;
900
901    #[inline]
902    fn next(&mut self) -> Option<Match> {
903        match self.0.next()? {
904            Ok(m) => Some(m),
905            Err(err) => panic!(
906                "unexpected regex find error: {}\n\
907                 to handle find errors, use 'try' or 'search' methods",
908                err,
909            ),
910        }
911    }
912}
913
914/// An iterator over all non-overlapping captures for a fallible search.
915///
916/// The iterator yields a `Result<Captures, MatchError>` value until no more
917/// matches could be found.
918///
919/// The type parameters are as follows:
920///
921/// * `F` represents the type of a closure that executes the search.
922///
923/// The lifetime parameters come from the [`Input`] type:
924///
925/// * `'h` is the lifetime of the underlying haystack.
926///
927/// When possible, prefer the iterators defined on the regex engine you're
928/// using. This tries to abstract over the regex engine and is thus a bit more
929/// unwieldy to use.
930///
931/// This iterator is created by [`Searcher::into_captures_iter`].
932#[cfg(feature = "alloc")]
933pub struct TryCapturesIter<'h, F> {
934    it: Searcher<'h>,
935    caps: Captures,
936    finder: F,
937}
938
939#[cfg(feature = "alloc")]
940impl<'h, F> TryCapturesIter<'h, F> {
941    /// Return an infallible version of this iterator.
942    ///
943    /// Any item yielded that corresponds to an error results in a panic. This
944    /// is useful if your underlying regex engine is configured in a way that
945    /// it is guaranteed to never return an error.
946    pub fn infallible(self) -> CapturesIter<'h, F> {
947        CapturesIter(self)
948    }
949}
950
951#[cfg(feature = "alloc")]
952impl<'h, F> Iterator for TryCapturesIter<'h, F>
953where
954    F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
955{
956    type Item = Result<Captures, MatchError>;
957
958    #[inline]
959    fn next(&mut self) -> Option<Result<Captures, MatchError>> {
960        let TryCapturesIter { ref mut it, ref mut caps, ref mut finder } =
961            *self;
962        let result = it
963            .try_advance(|input| {
964                (finder)(input, caps)?;
965                Ok(caps.get_match())
966            })
967            .transpose()?;
968        match result {
969            Ok(_) => Some(Ok(caps.clone())),
970            Err(err) => Some(Err(err)),
971        }
972    }
973}
974
975#[cfg(feature = "alloc")]
976impl<'h, F> core::fmt::Debug for TryCapturesIter<'h, F> {
977    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
978        f.debug_struct("TryCapturesIter")
979            .field("it", &self.it)
980            .field("caps", &self.caps)
981            .field("finder", &"<closure>")
982            .finish()
983    }
984}
985
986/// An iterator over all non-overlapping captures for an infallible search.
987///
988/// The iterator yields a [`Captures`] value until no more matches could be
989/// found.
990///
991/// The type parameters are as follows:
992///
993/// * `F` represents the type of a closure that executes the search.
994///
995/// The lifetime parameters come from the [`Input`] type:
996///
997/// * `'h` is the lifetime of the underlying haystack.
998///
999/// When possible, prefer the iterators defined on the regex engine you're
1000/// using. This tries to abstract over the regex engine and is thus a bit more
1001/// unwieldy to use.
1002///
1003/// This iterator is created by [`Searcher::into_captures_iter`] and then
1004/// calling [`TryCapturesIter::infallible`].
1005#[cfg(feature = "alloc")]
1006#[derive(Debug)]
1007pub struct CapturesIter<'h, F>(TryCapturesIter<'h, F>);
1008
1009#[cfg(feature = "alloc")]
1010impl<'h, F> Iterator for CapturesIter<'h, F>
1011where
1012    F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
1013{
1014    type Item = Captures;
1015
1016    #[inline]
1017    fn next(&mut self) -> Option<Captures> {
1018        match self.0.next()? {
1019            Ok(m) => Some(m),
1020            Err(err) => panic!(
1021                "unexpected regex captures error: {}\n\
1022                 to handle find errors, use 'try' or 'search' methods",
1023                err,
1024            ),
1025        }
1026    }
1027}