pub trait GroupValues: Send {
// Required methods
fn intern(
&mut self,
cols: &[ArrayRef],
groups: &mut Vec<usize>,
) -> Result<()>;
fn size(&self) -> usize;
fn is_empty(&self) -> bool;
fn len(&self) -> usize;
fn emit(&mut self, emit_to: EmitTo) -> Result<Vec<ArrayRef>>;
fn clear_shrink(&mut self, batch: &RecordBatch);
}
Expand description
Stores the group values during hash aggregation.
§Background
In a query such as SELECT a, b, count(*) FROM t GROUP BY a, b
, the group values
identify each group, and correspond to all the distinct values of (a,b)
.
-- Input has 4 rows with 3 distinct combinations of (a,b) ("groups")
create table t(a int, b varchar)
as values (1, 'a'), (2, 'b'), (1, 'a'), (3, 'c');
select a, b, count(*) from t group by a, b;
----
1 a 2
2 b 1
3 c 1
§Design
Managing group values is a performance critical operation in hash aggregation. The major operations are:
- Intern: Quickly finding existing and adding new group values
- Emit: Returning the group values as an array
There are multiple specialized implementations of this trait optimized for
different data types and number of columns, optimized for these operations.
See new_group_values
for details.
§Group Ids
Each distinct group in a hash aggregation is identified by a unique group id (usize) which is assigned by instances of this trait. Group ids are continuous without gaps, starting from 0.
Required Methods§
Sourcefn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec<usize>) -> Result<()>
fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec<usize>) -> Result<()>
Calculates the group id for each input row of cols
, assigning new
group ids as necessary.
When the function returns, groups
must contain the group id for each
row in cols
.
If a row has the same value as a previous row, the same group id is assigned. If a row has a new value, the next available group id is assigned.
Sourcefn size(&self) -> usize
fn size(&self) -> usize
Returns the number of bytes of memory used by this GroupValues
Sourcefn is_empty(&self) -> bool
fn is_empty(&self) -> bool
Returns true if this GroupValues
is empty
Sourcefn len(&self) -> usize
fn len(&self) -> usize
The number of values (distinct group values) stored in this GroupValues
Sourcefn clear_shrink(&mut self, batch: &RecordBatch)
fn clear_shrink(&mut self, batch: &RecordBatch)
Clear the contents and shrink the capacity to the size of the batch (free up memory usage)