datafusion_physical_plan::aggregates::group_values

Trait GroupValues

Source
pub trait GroupValues: Send {
    // Required methods
    fn intern(
        &mut self,
        cols: &[ArrayRef],
        groups: &mut Vec<usize>,
    ) -> Result<()>;
    fn size(&self) -> usize;
    fn is_empty(&self) -> bool;
    fn len(&self) -> usize;
    fn emit(&mut self, emit_to: EmitTo) -> Result<Vec<ArrayRef>>;
    fn clear_shrink(&mut self, batch: &RecordBatch);
}
Expand description

Stores the group values during hash aggregation.

§Background

In a query such as SELECT a, b, count(*) FROM t GROUP BY a, b, the group values identify each group, and correspond to all the distinct values of (a,b).

-- Input has 4 rows with 3 distinct combinations of (a,b) ("groups")
create table t(a int, b varchar)
as values (1, 'a'), (2, 'b'), (1, 'a'), (3, 'c');

select a, b, count(*) from t group by a, b;
----
1 a 2
2 b 1
3 c 1

§Design

Managing group values is a performance critical operation in hash aggregation. The major operations are:

  1. Intern: Quickly finding existing and adding new group values
  2. Emit: Returning the group values as an array

There are multiple specialized implementations of this trait optimized for different data types and number of columns, optimized for these operations. See new_group_values for details.

§Group Ids

Each distinct group in a hash aggregation is identified by a unique group id (usize) which is assigned by instances of this trait. Group ids are continuous without gaps, starting from 0.

Required Methods§

Source

fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec<usize>) -> Result<()>

Calculates the group id for each input row of cols, assigning new group ids as necessary.

When the function returns, groups must contain the group id for each row in cols.

If a row has the same value as a previous row, the same group id is assigned. If a row has a new value, the next available group id is assigned.

Source

fn size(&self) -> usize

Returns the number of bytes of memory used by this GroupValues

Source

fn is_empty(&self) -> bool

Returns true if this GroupValues is empty

Source

fn len(&self) -> usize

The number of values (distinct group values) stored in this GroupValues

Source

fn emit(&mut self, emit_to: EmitTo) -> Result<Vec<ArrayRef>>

Emits the group values

Source

fn clear_shrink(&mut self, batch: &RecordBatch)

Clear the contents and shrink the capacity to the size of the batch (free up memory usage)

Implementors§