Trait datafusion_expr::AggregateUDFImpl
source · pub trait AggregateUDFImpl:
Debug
+ Send
+ Sync {
Show 18 methods
// Required methods
fn as_any(&self) -> &dyn Any;
fn name(&self) -> &str;
fn signature(&self) -> &Signature;
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
fn accumulator(
&self,
acc_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>>;
// Provided methods
fn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>> { ... }
fn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool { ... }
fn create_groups_accumulator(
&self,
_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn GroupsAccumulator>> { ... }
fn aliases(&self) -> &[String] { ... }
fn create_sliding_accumulator(
&self,
args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>> { ... }
fn with_beneficial_ordering(
self: Arc<Self>,
_beneficial_ordering: bool,
) -> Result<Option<Arc<dyn AggregateUDFImpl>>> { ... }
fn order_sensitivity(&self) -> AggregateOrderSensitivity { ... }
fn simplify(&self) -> Option<AggregateFunctionSimplification> { ... }
fn reverse_expr(&self) -> ReversedUDAF { ... }
fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>> { ... }
fn equals(&self, other: &dyn AggregateUDFImpl) -> bool { ... }
fn hash_value(&self) -> u64 { ... }
fn is_descending(&self) -> Option<bool> { ... }
}
Expand description
Trait for implementing AggregateUDF
.
This trait exposes the full API for implementing user defined aggregate functions and can be used to implement any function.
See advanced_udaf.rs
for a full example with complete implementation and
AggregateUDF
for other available options.
§Basic Example
#[derive(Debug, Clone)]
struct GeoMeanUdf {
signature: Signature
}
impl GeoMeanUdf {
fn new() -> Self {
Self {
signature: Signature::uniform(1, vec![DataType::Float64], Volatility::Immutable)
}
}
}
/// Implement the AggregateUDFImpl trait for GeoMeanUdf
impl AggregateUDFImpl for GeoMeanUdf {
fn as_any(&self) -> &dyn Any { self }
fn name(&self) -> &str { "geo_mean" }
fn signature(&self) -> &Signature { &self.signature }
fn return_type(&self, args: &[DataType]) -> Result<DataType> {
if !matches!(args.get(0), Some(&DataType::Float64)) {
return plan_err!("add_one only accepts Float64 arguments");
}
Ok(DataType::Float64)
}
// This is the accumulator factory; DataFusion uses it to create new accumulators.
fn accumulator(&self, _acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> { unimplemented!() }
fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>> {
Ok(vec![
Field::new("value", args.return_type.clone(), true),
Field::new("ordering", DataType::UInt32, true)
])
}
}
// Create a new AggregateUDF from the implementation
let geometric_mean = AggregateUDF::from(GeoMeanUdf::new());
// Call the function `geo_mean(col)`
let expr = geometric_mean.call(vec![col("a")]);
Required Methods§
sourcefn signature(&self) -> &Signature
fn signature(&self) -> &Signature
Returns the function’s Signature
for information about what input
types are accepted and the function’s Volatility.
sourcefn return_type(&self, arg_types: &[DataType]) -> Result<DataType>
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>
What DataType
will be returned by this function, given the types of
the arguments
sourcefn accumulator(
&self,
acc_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>>
fn accumulator( &self, acc_args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>
Return a new Accumulator
that aggregates values for a specific
group during query execution.
acc_args: AccumulatorArgs
contains information about how the
aggregate function was called.
Provided Methods§
sourcefn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>>
fn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>>
Return the fields used to store the intermediate state of this accumulator.
See Accumulator::state
for background information.
args: StateFieldsArgs
contains arguments passed to the
aggregate function’s accumulator.
§Notes:
The default implementation returns a single state field named name
with the same type as value_type
. This is suitable for aggregates such
as SUM
or MIN
where partial state can be combined by applying the
same aggregate.
For aggregates such as AVG
where the partial state is more complex
(e.g. a COUNT and a SUM), this method is used to define the additional
fields.
The name of the fields must be unique within the query and thus should
be derived from name
. See format_state_name
for a utility function
to generate a unique name.
sourcefn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool
fn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool
If the aggregate expression has a specialized
GroupsAccumulator
implementation. If this returns true,
[Self::create_groups_accumulator]
will be called.
§Notes
Even if this function returns true, DataFusion will still use
Self::accumulator
for certain queries, such as when this aggregate is
used as a window function or when there no GROUP BY columns in the
query.
sourcefn create_groups_accumulator(
&self,
_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn GroupsAccumulator>>
fn create_groups_accumulator( &self, _args: AccumulatorArgs<'_>, ) -> Result<Box<dyn GroupsAccumulator>>
Return a specialized GroupsAccumulator
that manages state
for all groups.
For maximum performance, a GroupsAccumulator
should be
implemented in addition to Accumulator
.
sourcefn aliases(&self) -> &[String]
fn aliases(&self) -> &[String]
Returns any aliases (alternate names) for this function.
Note: aliases
should only include names other than Self::name
.
Defaults to []
(no aliases)
sourcefn create_sliding_accumulator(
&self,
args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>>
fn create_sliding_accumulator( &self, args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>
Sliding accumulator is an alternative accumulator that can be used for window functions. It has retract method to revert the previous update.
See retract_batch for more details.
sourcefn with_beneficial_ordering(
self: Arc<Self>,
_beneficial_ordering: bool,
) -> Result<Option<Arc<dyn AggregateUDFImpl>>>
fn with_beneficial_ordering( self: Arc<Self>, _beneficial_ordering: bool, ) -> Result<Option<Arc<dyn AggregateUDFImpl>>>
Sets the indicator whether ordering requirements of the AggregateUDFImpl is
satisfied by its input. If this is not the case, UDFs with order
sensitivity AggregateOrderSensitivity::Beneficial
can still produce
the correct result with possibly more work internally.
§Returns
Returns Ok(Some(updated_udf))
if the process completes successfully.
If the expression can benefit from existing input ordering, but does
not implement the method, returns an error. Order insensitive and hard
requirement aggregators return Ok(None)
.
sourcefn order_sensitivity(&self) -> AggregateOrderSensitivity
fn order_sensitivity(&self) -> AggregateOrderSensitivity
Gets the order sensitivity of the UDF. See AggregateOrderSensitivity
for possible options.
sourcefn simplify(&self) -> Option<AggregateFunctionSimplification>
fn simplify(&self) -> Option<AggregateFunctionSimplification>
Optionally apply per-UDaF simplification / rewrite rules.
This can be used to apply function specific simplification rules during
optimization (e.g. arrow_cast
–> Expr::Cast
). The default
implementation does nothing.
Note that DataFusion handles simplifying arguments and “constant
folding” (replacing a function call with constant arguments such as
my_add(1,2) --> 3
). Thus, there is no need to implement such
optimizations manually for specific UDFs.
§Returns
None if simplify is not defined or,
Or, a closure with two arguments:
- ‘aggregate_function’: crate::expr::AggregateFunction for which simplified has been invoked
- ‘info’: crate::simplify::SimplifyInfo
closure returns simplified Expr or an error.
sourcefn reverse_expr(&self) -> ReversedUDAF
fn reverse_expr(&self) -> ReversedUDAF
Returns the reverse expression of the aggregate function.
sourcefn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>
fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>
Coerce arguments of a function call to types that the function can evaluate.
This function is only called if AggregateUDFImpl::signature
returns crate::TypeSignature::UserDefined
. Most
UDAFs should return one of the other variants of TypeSignature
which handle common
cases
See the type coercion module documentation for more details on type coercion
For example, if your function requires a floating point arguments, but the user calls
it like my_func(1::int)
(aka with 1
as an integer), coerce_types could return [DataType::Float64]
to ensure the argument was cast to 1::double
§Parameters
arg_types
: The argument types of the arguments this function with
§Return value
A Vec the same length as arg_types
. DataFusion will CAST
the function call
arguments to these specific types.
sourcefn equals(&self, other: &dyn AggregateUDFImpl) -> bool
fn equals(&self, other: &dyn AggregateUDFImpl) -> bool
Return true if this aggregate UDF is equal to the other.
Allows customizing the equality of aggregate UDFs.
Must be consistent with Self::hash_value
and follow the same rules as Eq
:
- reflexive:
a.equals(a)
; - symmetric:
a.equals(b)
impliesb.equals(a)
; - transitive:
a.equals(b)
andb.equals(c)
impliesa.equals(c)
.
By default, compares Self::name
and Self::signature
.
sourcefn hash_value(&self) -> u64
fn hash_value(&self) -> u64
Returns a hash value for this aggregate UDF.
Allows customizing the hash code of aggregate UDFs. Similarly to Hash
and Eq
,
if Self::equals
returns true for two UDFs, their hash_value
s must be the same.
By default, hashes Self::name
and Self::signature
.
sourcefn is_descending(&self) -> Option<bool>
fn is_descending(&self) -> Option<bool>
If this function is max, return true if the function is min, return false otherwise return None (the default)
Note: this is used to use special aggregate implementations in certain conditions