pub struct Fid { /* private fields */ }
Expand description
FID (Fully Indexable Dictionary).
This class can handle bit sequence of virtually arbitrary length.
In fact, N (FID’s length) is designed to be limited to: N <= 2^64.
It should be enough for almost all usecases since a binary data of length of 2^64 consumes 2^21 = 2,097,152 TB (terabyte), which is hard to handle by state-of-the-art computer architecture.
§Implementation detail
Index<u64>’s implementation is trivial.
select() just uses binary search of rank()
results.
rank()’s implementation is standard but non-trivial. So here explains implementation of rank().
§rank()’s implementation
Say you have the following bit vector.
00001000 01000001 00000100 11000000 00100000 00000101 10100000 00010000 001 ; (N=67)
Answer rank(48) in O(1) time-complexity and o(N) space-complexity.
Naively, you can count the number of ‘1’ from left to right. You will find rank(48) == 10 but it took O(N) time-complexity.
To reduce time-complexity to O(1), you can use memonization technique.
Of course, you can memonize results of rank(i) for every i ([0, N-1]).
Bit vector; 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 [1] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 ; (N=67)
Memo rank(i); 0 0 0 0 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 9 10 10 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13
From this memo, you can answer rank(48) == 10 in constant time, although space-complexity for this memo is O(N) > o(N).
To reduce space-complexity using memonization, we divide the bit vector into Chunk and Block.
Bit vector; 00001000 01000001 00000100 11000000 00100000 00000101 [1]0100000 00010000 001 ; (N=67)
Chunk; | 7 | 13 | ; (size = (log N)^2 = 36)
Block; |0 |1 |1 |2 |2 |3 |3 |4 |6 |6 |6 |7 |0 |0 |0 |2 |4 |4 |4 |5 |5 |5 |6| ; (size = (log N) / 2 = 3)
- A Chunk has size of (log N)^2. Its value is rank(index of the last bit of the chunk).
- A Block has size of (log N) / 2. A chunk has many blocks. Block’s value is the number of ’1’s in [index of the first bit of the chunk the block belongs to, index of the last bit of the block] (note that the value is reset to 0 at the first bit of a chunk).
Now you want to answer rank(48). 48-th bit is in the 2nd chunk, and in the 5th block in the chunk.
So the rank(48) is at least:
7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk)
Then, focus on 3 bits in 5th block in the 2nd chunk; [1]01
.
As you can see, only 1 ‘1’ is included up to 48-th bit (101
has 2 ’1’s but 2nd ‘1’ is 50-th bit, irrelevant to rank(48)).
Therefore, the rank(48) is calculated as:
7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk) + 1 (’1’s in 5th block up to 48-th bit)
OK. That’s all… Wait!
rank() must be in O(1) time-complexity.
- 7 (value of 1st chunk): O(1) if you store chunk value in array structure.
- 2 (value of 4th block in the 2nd chunk): Same as above.
- 1 (’1’s in 5th block up to 48-th bit): O(length of block) = O(log N) !
Counting ’1’s in a block must also be O(1), while using o(N) space.
We use Table for this purpose.
Block content | Number of ’1’s in block |
---|---|
000 | 0 |
001 | 1 |
010 | 1 |
011 | 2 |
100 | 1 |
101 | 2 |
110 | 2 |
111 | 3 |
This table is constructed in build()
. So we can find the number of ’1’s in block in O(1) time.
Note that this table has O(log N) = o(N) length.
In summary:
rank() = (value of left chunk) + (value of left block) + (value of table keyed by inner block bits).
Implementations§
source§impl Fid
impl Fid
sourcepub fn rank(&self, i: u64) -> u64
pub fn rank(&self, i: u64) -> u64
Returns the number of 1 in [0, i
] elements of the Fid
.
§Panics
When i
>= length of the Fid
.
§Implementation detail
00001000 01000001 00000100 11000000 00100000 00000101 00100000 00010000 001 Raw data (N=67)
^
i = 51
| 7 | 13 | Chunk (size = (log N)^2 = 36)
^
chunk_left i_chunk = 1 chunk_right
|0 |1 |1 |2 |2 |3 |3 |4 |6 |6 |6 |7 |0 |0 |0 |2 |3 |3 |4 |4 |4 |5 |5| Block (size = log N / 2 = 3)
^
i_block = 17
block_left | block_right
- Find
i_chunk
.i_chunk
=i
/chunk_size
. - Get
chunk_left
= Chunks[i_chunk
- 1] only ifi_chunk
> 0. - Get rank from chunk_left if
chunk_left
exists. - Get
chunk_right
= Chunks[i_chunk
]. - Find
i_block
.i_block
= (i
-i_chunk
*chunk_size
) / block size. - Get
block_left
=chunk_right.blocks
[i_block
- 1]_ only if _
i_block` > 0. - Get rank from block_left if
block_left
exists. - Get inner-block data _
block_bits
.block_bits
must be of block size length, fulfilled with 0 in right bits. - Calculate rank of
block_bits
in O(1) using a table memonizing block size bit’s popcount.
Trait Implementations§
source§impl From<&str> for Fid
impl From<&str> for Fid
source§fn from(s: &str) -> Self
fn from(s: &str) -> Self
Constructor from string representation of bit sequence.
- ‘0’ is interpreted as 0.
- ‘1’ is interpreted as 1.
- ‘_’ is just ignored.
§Examples
use fid_rs::Fid;
let fid = Fid::from("01_11");
assert_eq!(fid[0], false);
assert_eq!(fid[1], true);
assert_eq!(fid[2], true);
assert_eq!(fid[3], true);
§Panics
When:
s
contains any character other than ‘0’, ‘1’, and ‘_’.s
does not contain any ‘0’ or ‘1’
Auto Trait Implementations§
impl Freeze for Fid
impl RefUnwindSafe for Fid
impl Send for Fid
impl Sync for Fid
impl Unpin for Fid
impl UnwindSafe for Fid
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more