pub trait GroupsAccumulator: Send {
// Required methods
fn update_batch(
&mut self,
values: &[ArrayRef],
group_indices: &[usize],
opt_filter: Option<&BooleanArray>,
total_num_groups: usize
) -> Result<()>;
fn evaluate(&mut self, emit_to: EmitTo) -> Result<ArrayRef>;
fn state(&mut self, emit_to: EmitTo) -> Result<Vec<ArrayRef>>;
fn merge_batch(
&mut self,
values: &[ArrayRef],
group_indices: &[usize],
opt_filter: Option<&BooleanArray>,
total_num_groups: usize
) -> Result<()>;
fn size(&self) -> usize;
}
Expand description
GroupAccumulator
implements a single aggregate (e.g. AVG) and
stores the state for all groups internally.
Each group is assigned a group_index
by the hash table and each
accumulator manages the specific state, one per group_index.
group_indexes are contiguous (there aren’t gaps), and thus it is
expected that each GroupAccumulator will use something like Vec<..>
to store the group states.
Required Methods§
sourcefn update_batch(
&mut self,
values: &[ArrayRef],
group_indices: &[usize],
opt_filter: Option<&BooleanArray>,
total_num_groups: usize
) -> Result<()>
fn update_batch( &mut self, values: &[ArrayRef], group_indices: &[usize], opt_filter: Option<&BooleanArray>, total_num_groups: usize ) -> Result<()>
Updates the accumulator’s state from its arguments, encoded as
a vector of ArrayRef
s.
-
values
: the input arguments to the accumulator -
group_indices
: To which groups do the rows invalues
belong, group id) -
opt_filter
: if present, only update aggregate state usingvalues[i]
ifopt_filter[i]
is true -
total_num_groups
: the number of groups (the largest group_index is thustotal_num_groups - 1
).
Note that subsequent calls to update_batch may have larger total_num_groups as new groups are seen.
sourcefn evaluate(&mut self, emit_to: EmitTo) -> Result<ArrayRef>
fn evaluate(&mut self, emit_to: EmitTo) -> Result<ArrayRef>
Returns the final aggregate value for each group as a single
RecordBatch
, resetting the internal state.
The rows returned must be in group_index order: The value for group_index 0, followed by 1, etc. Any group_index that did not have values, should be null.
For example, a SUM
accumulator maintains a running sum for
each group, and evaluate
will produce that running sum as
its output for all groups, in group_index order
If emit_to`` is [
EmitTo::All`], the accumulator should
return all groups and release / reset its internal state
equivalent to when it was first created.
If emit_to
is EmitTo::First
, only the first n
groups
should be emitted and the state for those first groups
removed. State for the remaining groups must be retained for
future use. The group_indices on subsequent calls to
update_batch
or merge_batch
will be shifted down by
n
. See EmitTo::First
for more details.
sourcefn state(&mut self, emit_to: EmitTo) -> Result<Vec<ArrayRef>>
fn state(&mut self, emit_to: EmitTo) -> Result<Vec<ArrayRef>>
Returns the intermediate aggregate state for this accumulator, used for multi-phase grouping, resetting its internal state.
For example, AVG
might return two arrays: SUM
and COUNT
but the MIN
aggregate would just return a single array.
Note more sophisticated internal state can be passed as
single StructArray
rather than multiple arrays.
See Self::evaluate
for details on the required output
order and emit_to
.
sourcefn merge_batch(
&mut self,
values: &[ArrayRef],
group_indices: &[usize],
opt_filter: Option<&BooleanArray>,
total_num_groups: usize
) -> Result<()>
fn merge_batch( &mut self, values: &[ArrayRef], group_indices: &[usize], opt_filter: Option<&BooleanArray>, total_num_groups: usize ) -> Result<()>
Merges intermediate state (the output from Self::state
)
into this accumulator’s values.
For some aggregates (such as SUM
), merge_batch
is the same
as update_batch
, but for some aggregates (such as COUNT
,
where the partial counts must be summed) the operations
differ. See Self::state
for more details on how state is
used and merged.
values
: arrays produced from callingstate
previously to the accumulator
Other arguments are the same as for Self::update_batch
;