pub struct Toc { /* private fields */ }
Expand description
Trie Occurrence Counter is frequency dictionary that uses any impl Iterator<Item = char>
type as occurrent.
OOB English letters A–Za–z support only.
use t_oc::Toc;
use std::panic::catch_unwind;
let mut toc = Toc::new();
let occurrent = "true";
toc.ins(occurrent.chars(), None);
toc.ins(true.to_string().chars(), None);
assert_eq!(2, toc.acq(occurrent.chars()).unwrap());
toc.put(occurrent.chars(), 15);
assert_eq!(15, toc.acq(occurrent.chars()).unwrap());
let catch = catch_unwind(move|| toc.ins("#&%".chars(), None));
assert!(catch.is_err());
When asymptotic computational complexity is not explicitly specified , it is:
- TC: Θ(s) where s is count of
char
s iterated over. - SC: Θ(0)
Implementations§
Source§impl Toc
impl Toc
Sourcepub fn new() -> Self
pub fn new() -> Self
Constructs default version of Toc
, i.e. via
fn new_with()
with english_letters::ab
and english_letters::ix
.
Sourcepub fn new_with(ix: Ix, ab: Ab) -> Self
pub fn new_with(ix: Ix, ab: Ab) -> Self
Allows to use custom alphabet different from default alphabet.
use t_oc::{ab as ab_fn, Alphabet, Toc};
fn ab() -> Alphabet {
ab_fn(2)
}
fn ix(c: char) -> usize {
match c {
'&' => 0,
'|' => 1,
_ => panic!(),
}
}
let mut toc = Toc::new_with(ix, ab);
let a = "&";
let b = "|";
let aba = "&|&";
toc.ins(a.chars(), None);
toc.ins(a.chars(), None);
toc.ins(b.chars(), None);
toc.ins(aba.chars(), None);
assert_eq!(2, toc.acq(a.chars()).unwrap());
assert_eq!(1, toc.acq(aba.chars()).unwrap());
Sourcepub fn put_trace_cap(&mut self, approx_cap: usize) -> usize
pub fn put_trace_cap(&mut self, approx_cap: usize) -> usize
Toc
uses internal buffer, to avoid excessive allocations and copying, which grows
over time due backtracing in rem
method which backtraces whole path from entry
node to root node.
Use this method to shrink or extend it to fit actual program needs. Neither shrinking nor extending
is guaranteed to be exact. See Vec::with_capacity()
and Vec::reserve()
. For optimal rem
performance, set approx_cap
to, at least, occurrent.count()
.
Some high value is sufficient anyway. Since buffer continuous usage, its capacity will likely expand at some point in time to size sufficient to all occurrents.
Return value is actual buffer capacity.
Note: While String
is UTF8 encoded, its byte length does not have to equal its char
count
which is either equal or lesser.
let sights = "🤩";
assert_eq!(4, sights.len());
assert_eq!(1, sights.chars().count());
let yes = "sí";
assert_eq!(3, yes.len());
assert_eq!(2, yes.chars().nth(1).unwrap().len_utf8());
let abc = "abc";
assert_eq!(3, abc.len());
Sourcepub fn acq_trace_cap(&self) -> usize
pub fn acq_trace_cap(&self) -> usize
Return value is internal backtracing buffer capacity.
Check with fn put_trace_cap
for details.
Sourcepub fn ins(
&mut self,
occurrent: impl Iterator<Item = char>,
val: Option<usize>,
) -> InsRes
pub fn ins( &mut self, occurrent: impl Iterator<Item = char>, val: Option<usize>, ) -> InsRes
Counter is of word size. Add overflow is wrapped using wrapping_add
.
Optional val
parameter can be used to insert exact value.
Return value is InsRes::Ok(Option<usize>)
for non-zero occurrent
and holds previous value, if there was some.
- SC: Θ(q) where q is number of unique nodes, i.e.
char
s in respective branches.
Sourcepub fn acq(&self, occurrent: impl Iterator<Item = char>) -> VerRes
pub fn acq(&self, occurrent: impl Iterator<Item = char>) -> VerRes
Used to acquire value for occurrent
.
If VerRes::Ok(usize)
, usize
is occurrent
occurrences count.
Sourcepub fn put(
&mut self,
occurrent: impl Iterator<Item = char>,
val: usize,
) -> VerRes
pub fn put( &mut self, occurrent: impl Iterator<Item = char>, val: usize, ) -> VerRes
Used to put new value for occurrent
occurrences.
If VerRes::Ok(usize)
, usize
is previous value.
Sourcepub fn rem(&mut self, occurrent: impl Iterator<Item = char>) -> VerRes
pub fn rem(&mut self, occurrent: impl Iterator<Item = char>) -> VerRes
Used to remove occurrent
from tree.
If VerRes::Ok(usize)
, usize
is occurrent
occurrences count.
- s is count of
char
s iterated over. - TC: Ω(s) or ϴ(s) (backtracing buffer capacity dependent complexity)
- SC: ϴ(s)
Check with put_trace_cap
for details on backtracing.