Struct fid_rs::fid::Fid

source ·
pub struct Fid { /* private fields */ }
Expand description

FID (Fully Indexable Dictionary).

This class can handle bit sequence of virtually arbitrary length.

In fact, N (FID’s length) is designed to be limited to: N <= 2^64.
It should be enough for almost all usecases since a binary data of length of 2^64 consumes 2^21 = 2,097,152 TB (terabyte), which is hard to handle by state-of-the-art computer architecture.

§Implementation detail

Index<u64>’s implementation is trivial.

select() just uses binary search of rank() results.

rank()’s implementation is standard but non-trivial. So here explains implementation of rank().

§rank()’s implementation

Say you have the following bit vector.

00001000 01000001 00000100 11000000 00100000 00000101 10100000 00010000 001 ; (N=67)

Answer rank(48) in O(1) time-complexity and o(N) space-complexity.

Naively, you can count the number of ‘1’ from left to right. You will find rank(48) == 10 but it took O(N) time-complexity.

To reduce time-complexity to O(1), you can use memonization technique.
Of course, you can memonize results of rank(i) for every i ([0, N-1]).

Bit vector;   0  0  0  0  1  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  1  0  0  1  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  0  1  [1]  0  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1 ; (N=67)
Memo rank(i); 0  0  0  0  1  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  4  4  4  5  6  6  6  6  6  6  6  6  6  7  7  7  7  7  7  7  7  7  7  7  8  8  9  10  10 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13

From this memo, you can answer rank(48) == 10 in constant time, although space-complexity for this memo is O(N) > o(N).

To reduce space-complexity using memonization, we divide the bit vector into Chunk and Block.

Bit vector; 00001000 01000001 00000100 11000000 00100000 00000101 [1]0100000 00010000 001  ; (N=67)
Chunk;     |                  7                    |                13                  |  ; (size = (log N)^2 = 36)
Block;     |0 |1 |1  |2 |2 |3  |3 |4 |6  |6 |6  |7 |0 |0  |0 |2 |4    |4 |4  |5 |5 |5  |6| ; (size = (log N) / 2 = 3)
  • A Chunk has size of (log N)^2. Its value is rank(index of the last bit of the chunk).
  • A Block has size of (log N) / 2. A chunk has many blocks. Block’s value is the number of ’1’s in [index of the first bit of the chunk the block belongs to, index of the last bit of the block] (note that the value is reset to 0 at the first bit of a chunk).

Now you want to answer rank(48). 48-th bit is in the 2nd chunk, and in the 5th block in the chunk.
So the rank(48) is at least:

7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk)

Then, focus on 3 bits in 5th block in the 2nd chunk; [1]01.
As you can see, only 1 ‘1’ is included up to 48-th bit (101 has 2 ’1’s but 2nd ‘1’ is 50-th bit, irrelevant to rank(48)).

Therefore, the rank(48) is calculated as:

7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk) + 1 (’1’s in 5th block up to 48-th bit)

OK. That’s all… Wait!
rank() must be in O(1) time-complexity.

  • 7 (value of 1st chunk): O(1) if you store chunk value in array structure.
  • 2 (value of 4th block in the 2nd chunk): Same as above.
  • 1 (’1’s in 5th block up to 48-th bit): O(length of block) = O(log N) !

Counting ’1’s in a block must also be O(1), while using o(N) space.
We use Table for this purpose.

Block contentNumber of ’1’s in block
0000
0011
0101
0112
1001
1012
1102
1113

This table is constructed in build(). So we can find the number of ’1’s in block in O(1) time.
Note that this table has O(log N) = o(N) length.

In summary:

rank() = (value of left chunk) + (value of left block) + (value of table keyed by inner block bits).

Implementations§

source§

impl Fid

source

pub fn rank(&self, i: u64) -> u64

Returns the number of 1 in [0, i] elements of the Fid.

§Panics

When i >= length of the Fid.

§Implementation detail
 00001000 01000001 00000100 11000000 00100000 00000101 00100000 00010000 001  Raw data (N=67)
                                                          ^
                                                          i = 51
|                  7                    |                13                |  Chunk (size = (log N)^2 = 36)
                                        ^
               chunk_left            i_chunk = 1      chunk_right

|0 |1 |1  |2 |2 |3  |3 |4 |6  |6 |6  |7 |0 |0  |0 |2 |3 |3 |4  |4 |4 |5  |5|  Block (size = log N / 2 = 3)
                                                        ^
                                                     i_block = 17
                                             block_left | block_right
  1. Find i_chunk. i_chunk = i / chunk_size.
  2. Get chunk_left = Chunks[i_chunk - 1] only if i_chunk > 0.
  3. Get rank from chunk_left if chunk_left exists.
  4. Get chunk_right = Chunks[i_chunk].
  5. Find i_block. i_block = (i - i_chunk * chunk_size) / block size.
  6. Get block_left = chunk_right.blocks[ i_block - 1]_ only if _i_block` > 0.
  7. Get rank from block_left if block_left exists.
  8. Get inner-block data _block_bits. block_bits must be of block size length, fulfilled with 0 in right bits.
  9. Calculate rank of block_bits in O(1) using a table memonizing block size bit’s popcount.
source

pub fn rank0(&self, i: u64) -> u64

Returns the number of 0 in [0, i] elements of the Fid.

§Panics

When i >= length of the Fid.

source

pub fn select(&self, num: u64) -> Option<u64>

Returns the minimum position (0-origin) i where rank(i) == num of num-th 1 if exists. Else returns None.

§Panics

When num > length of the Fid.

§Implementation detail

Binary search using rank().

source

pub fn select0(&self, num: u64) -> Option<u64>

Returns the minimum position (0-origin) i where rank(i) == num of num-th 0 if exists. Else returns None.

§Panics

When num > length of the Fid.

source

pub fn len(&self) -> u64

Returns bit length of this FID.

source

pub fn is_empty(&self) -> bool

Returns whether the FID is empty.

source§

impl<'iter> Fid

source

pub fn iter(&'iter self) -> FidIter<'iter>

Creates an iterator over FID’s bit vector.

§Examples
use fid_rs::Fid;

let fid = Fid::from("1010_1010");
for (i, bit) in fid.iter().enumerate() {
    assert_eq!(bit, fid[i as u64]);
}

Trait Implementations§

source§

impl Clone for Fid

source§

fn clone(&self) -> Fid

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Fid

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl From<&[bool]> for Fid

source§

fn from(bits: &[bool]) -> Self

Constructor from slice of boolean.

§Examples
use fid_rs::Fid;

let bits = [false, true, true, true];
let fid = Fid::from(&bits[..]);
assert_eq!(fid[0], false);
assert_eq!(fid[1], true);
assert_eq!(fid[2], true);
assert_eq!(fid[3], true);
§Panics

When:

  • bits is empty.
source§

impl From<&str> for Fid

source§

fn from(s: &str) -> Self

Constructor from string representation of bit sequence.

  • ‘0’ is interpreted as 0.
  • ‘1’ is interpreted as 1.
  • ‘_’ is just ignored.
§Examples
use fid_rs::Fid;

let fid = Fid::from("01_11");
assert_eq!(fid[0], false);
assert_eq!(fid[1], true);
assert_eq!(fid[2], true);
assert_eq!(fid[3], true);
§Panics

When:

  • s contains any character other than ‘0’, ‘1’, and ‘_’.
  • s does not contain any ‘0’ or ‘1’
source§

impl Index<u64> for Fid

source§

fn index(&self, index: u64) -> &Self::Output

Returns i-th element of the Fid.

§Panics

When i >= length of the Fid.

§

type Output = bool

The returned type after indexing.

Auto Trait Implementations§

§

impl Freeze for Fid

§

impl RefUnwindSafe for Fid

§

impl Send for Fid

§

impl Sync for Fid

§

impl Unpin for Fid

§

impl UnwindSafe for Fid

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> Pointable for T

source§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.