Function datafusion_common::utils::memory::estimate_memory_size

source ·
pub fn estimate_memory_size<T>(
    num_elements: usize,
    fixed_size: usize,
) -> Result<usize>
Expand description

Estimates the memory size required for a hash table prior to allocation.

§Parameters

  • num_elements: The number of elements expected in the hash table.
  • fixed_size: A fixed overhead size associated with the collection (e.g., HashSet or HashTable).
  • T: The type of elements stored in the hash table.

§Details

This function calculates the estimated memory size by considering:

  • An overestimation of buckets to keep approximately 1/8 of them empty.
  • The total memory size is computed as:
    • The size of each entry (T) multiplied by the estimated number of buckets.
    • One byte overhead for each bucket.
    • The fixed size overhead of the collection.
  • If the estimation overflows, we return a DataFusionError

§Examples


§From within a struct


struct MyStruct<T> {
    values: Vec<T>,
    other_data: usize,
}

impl<T> MyStruct<T> {
    fn size(&self) -> Result<usize> {
        let num_elements = self.values.len();
        let fixed_size = std::mem::size_of_val(self) +
          std::mem::size_of_val(&self.values);

        estimate_memory_size::<T>(num_elements, fixed_size)
    }
}

§With a simple collection


let num_rows = 100;
let fixed_size = std::mem::size_of::<HashMap<u64, u64>>();
let estimated_hashtable_size =
  estimate_memory_size::<(u64, u64)>(num_rows,fixed_size)
    .expect("Size estimation failed");