datafusion_physical_plan::joins::utils

Struct JoinHashMap

Source
pub struct JoinHashMap { /* private fields */ }
Expand description

Maps a u64 hash value based on the build side [“on” values] to a list of indices with this key’s value.

By allocating a HashMap with capacity for at least the number of rows for entries at the build side, we make sure that we don’t have to re-hash the hashmap, which needs access to the key (the hash in this case) value.

E.g. 1 -> [3, 6, 8] indicates that the column values map to rows 3, 6 and 8 for hash value 1 As the key is a hash value, we need to check possible hash collisions in the probe stage During this stage it might be the case that a row is contained the same hashmap value, but the values don’t match. Those are checked in the equal_rows_arr method.

The indices (values) are stored in a separate chained list stored in the Vec<u64>.

The first value (+1) is stored in the hashmap, whereas the next value is stored in array at the position value.

The chain can be followed until the value “0” has been reached, meaning the end of the list. Also see chapter 5.3 of Balancing vectorized query execution with bandwidth-optimized storage

§Example

See the example below:

Insert (10,1)            <-- insert hash value 10 with row index 1
map:
----------
| 10 | 2 |
----------
next:
---------------------
| 0 | 0 | 0 | 0 | 0 |
---------------------
Insert (20,2)
map:
----------
| 10 | 2 |
| 20 | 3 |
----------
next:
---------------------
| 0 | 0 | 0 | 0 | 0 |
---------------------
Insert (10,3)           <-- collision! row index 3 has a hash value of 10 as well
map:
----------
| 10 | 4 |
| 20 | 3 |
----------
next:
---------------------
| 0 | 0 | 0 | 2 | 0 |  <--- hash value 10 maps to 4,2 (which means indices values 3,1)
---------------------
Insert (10,4)          <-- another collision! row index 4 ALSO has a hash value of 10
map:
---------
| 10 | 5 |
| 20 | 3 |
---------
next:
---------------------
| 0 | 0 | 0 | 2 | 4 | <--- hash value 10 maps to 5,4,2 (which means indices values 4,3,1)
---------------------

Trait Implementations§

Source§

impl Debug for JoinHashMap

Source§

fn fmt(&self, _f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl JoinHashMapType for JoinHashMap

Implementation of JoinHashMapType for JoinHashMap.

Source§

fn get_mut(&mut self) -> (&mut RawTable<(u64, u64)>, &mut Self::NextType)

Get mutable references to the hash map and the next.

Source§

fn get_map(&self) -> &RawTable<(u64, u64)>

Get a reference to the hash map.

Source§

fn get_list(&self) -> &Self::NextType

Get a reference to the next.

Source§

type NextType = Vec<u64>

The type of list used to store the next list
Source§

fn extend_zero(&mut self, _: usize)

Extend with zero
Source§

fn update_from_iter<'a>( &mut self, iter: impl Iterator<Item = (usize, &'a u64)>, deleted_offset: usize, )

Updates hashmap from iterator of row indices & row hashes pairs.
Source§

fn get_matched_indices<'a>( &self, iter: impl Iterator<Item = (usize, &'a u64)>, deleted_offset: Option<usize>, ) -> (Vec<u32>, Vec<u64>)

Returns all pairs of row indices matched by hash. Read more
Source§

fn get_matched_indices_with_limit_offset( &self, hash_values: &[u64], deleted_offset: Option<usize>, limit: usize, offset: (usize, Option<u64>), ) -> (Vec<u32>, Vec<u64>, Option<(usize, Option<u64>)>)

Matches hashes with taking limit and offset into account. Returns pairs of matched indices along with the starting point for next matching iteration (None if limit has not been reached). Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T