pub trait AggregateUDFImpl:
Debug
+ Send
+ Sync {
Show 22 methods
// Required methods
fn as_any(&self) -> &dyn Any;
fn name(&self) -> &str;
fn signature(&self) -> &Signature;
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
fn accumulator(
&self,
acc_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>>;
// Provided methods
fn is_nullable(&self) -> bool { ... }
fn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>> { ... }
fn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool { ... }
fn create_groups_accumulator(
&self,
_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn GroupsAccumulator>> { ... }
fn aliases(&self) -> &[String] { ... }
fn create_sliding_accumulator(
&self,
args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>> { ... }
fn with_beneficial_ordering(
self: Arc<Self>,
_beneficial_ordering: bool,
) -> Result<Option<Arc<dyn AggregateUDFImpl>>> { ... }
fn order_sensitivity(&self) -> AggregateOrderSensitivity { ... }
fn simplify(&self) -> Option<AggregateFunctionSimplification> { ... }
fn reverse_expr(&self) -> ReversedUDAF { ... }
fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>> { ... }
fn equals(&self, other: &dyn AggregateUDFImpl) -> bool { ... }
fn hash_value(&self) -> u64 { ... }
fn is_descending(&self) -> Option<bool> { ... }
fn value_from_stats(
&self,
_statistics_args: &StatisticsArgs<'_>,
) -> Option<ScalarValue> { ... }
fn default_value(&self, data_type: &DataType) -> Result<ScalarValue> { ... }
fn documentation(&self) -> Option<&Documentation> { ... }
}
Expand description
Trait for implementing AggregateUDF
.
This trait exposes the full API for implementing user defined aggregate functions and can be used to implement any function.
See advanced_udaf.rs
for a full example with complete implementation and
AggregateUDF
for other available options.
§Basic Example
#[derive(Debug, Clone)]
struct GeoMeanUdf {
signature: Signature,
}
impl GeoMeanUdf {
fn new() -> Self {
Self {
signature: Signature::uniform(1, vec![DataType::Float64], Volatility::Immutable),
}
}
}
static DOCUMENTATION: OnceLock<Documentation> = OnceLock::new();
fn get_doc() -> &'static Documentation {
DOCUMENTATION.get_or_init(|| {
Documentation::builder()
.with_doc_section(DOC_SECTION_AGGREGATE)
.with_description("calculates a geometric mean")
.with_syntax_example("geo_mean(2.0)")
.with_argument("arg1", "The Float64 number for the geometric mean")
.build()
.unwrap()
})
}
/// Implement the AggregateUDFImpl trait for GeoMeanUdf
impl AggregateUDFImpl for GeoMeanUdf {
fn as_any(&self) -> &dyn Any { self }
fn name(&self) -> &str { "geo_mean" }
fn signature(&self) -> &Signature { &self.signature }
fn return_type(&self, args: &[DataType]) -> Result<DataType> {
if !matches!(args.get(0), Some(&DataType::Float64)) {
return plan_err!("geo_mean only accepts Float64 arguments");
}
Ok(DataType::Float64)
}
// This is the accumulator factory; DataFusion uses it to create new accumulators.
fn accumulator(&self, _acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> { unimplemented!() }
fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>> {
Ok(vec![
Field::new("value", args.return_type.clone(), true),
Field::new("ordering", DataType::UInt32, true)
])
}
fn documentation(&self) -> Option<&Documentation> {
Some(get_doc())
}
}
// Create a new AggregateUDF from the implementation
let geometric_mean = AggregateUDF::from(GeoMeanUdf::new());
// Call the function `geo_mean(col)`
let expr = geometric_mean.call(vec![col("a")]);
Required Methods§
Sourcefn signature(&self) -> &Signature
fn signature(&self) -> &Signature
Returns the function’s Signature
for information about what input
types are accepted and the function’s Volatility.
Sourcefn return_type(&self, arg_types: &[DataType]) -> Result<DataType>
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>
What DataType
will be returned by this function, given the types of
the arguments
Sourcefn accumulator(
&self,
acc_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>>
fn accumulator( &self, acc_args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>
Return a new Accumulator
that aggregates values for a specific
group during query execution.
acc_args: AccumulatorArgs
contains information about how the
aggregate function was called.
Provided Methods§
Sourcefn is_nullable(&self) -> bool
fn is_nullable(&self) -> bool
Whether the aggregate function is nullable.
Nullable means that that the function could return null
for any inputs.
For example, aggregate functions like COUNT
always return a non null value
but others like MIN
will return NULL
if there is nullable input.
Note that if the function is declared as not nullable, make sure the AggregateUDFImpl::default_value
is non-null
Sourcefn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>>
fn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>>
Return the fields used to store the intermediate state of this accumulator.
See Accumulator::state
for background information.
args: StateFieldsArgs
contains arguments passed to the
aggregate function’s accumulator.
§Notes:
The default implementation returns a single state field named name
with the same type as value_type
. This is suitable for aggregates such
as SUM
or MIN
where partial state can be combined by applying the
same aggregate.
For aggregates such as AVG
where the partial state is more complex
(e.g. a COUNT and a SUM), this method is used to define the additional
fields.
The name of the fields must be unique within the query and thus should
be derived from name
. See format_state_name
for a utility function
to generate a unique name.
Sourcefn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool
fn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool
If the aggregate expression has a specialized
GroupsAccumulator
implementation. If this returns true,
[Self::create_groups_accumulator]
will be called.
§Notes
Even if this function returns true, DataFusion will still use
Self::accumulator
for certain queries, such as when this aggregate is
used as a window function or when there no GROUP BY columns in the
query.
Sourcefn create_groups_accumulator(
&self,
_args: AccumulatorArgs<'_>,
) -> Result<Box<dyn GroupsAccumulator>>
fn create_groups_accumulator( &self, _args: AccumulatorArgs<'_>, ) -> Result<Box<dyn GroupsAccumulator>>
Return a specialized GroupsAccumulator
that manages state
for all groups.
For maximum performance, a GroupsAccumulator
should be
implemented in addition to Accumulator
.
Sourcefn aliases(&self) -> &[String]
fn aliases(&self) -> &[String]
Returns any aliases (alternate names) for this function.
Note: aliases
should only include names other than Self::name
.
Defaults to []
(no aliases)
Sourcefn create_sliding_accumulator(
&self,
args: AccumulatorArgs<'_>,
) -> Result<Box<dyn Accumulator>>
fn create_sliding_accumulator( &self, args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>
Sliding accumulator is an alternative accumulator that can be used for window functions. It has retract method to revert the previous update.
See retract_batch for more details.
Sourcefn with_beneficial_ordering(
self: Arc<Self>,
_beneficial_ordering: bool,
) -> Result<Option<Arc<dyn AggregateUDFImpl>>>
fn with_beneficial_ordering( self: Arc<Self>, _beneficial_ordering: bool, ) -> Result<Option<Arc<dyn AggregateUDFImpl>>>
Sets the indicator whether ordering requirements of the AggregateUDFImpl is
satisfied by its input. If this is not the case, UDFs with order
sensitivity AggregateOrderSensitivity::Beneficial
can still produce
the correct result with possibly more work internally.
§Returns
Returns Ok(Some(updated_udf))
if the process completes successfully.
If the expression can benefit from existing input ordering, but does
not implement the method, returns an error. Order insensitive and hard
requirement aggregators return Ok(None)
.
Sourcefn order_sensitivity(&self) -> AggregateOrderSensitivity
fn order_sensitivity(&self) -> AggregateOrderSensitivity
Gets the order sensitivity of the UDF. See AggregateOrderSensitivity
for possible options.
Sourcefn simplify(&self) -> Option<AggregateFunctionSimplification>
fn simplify(&self) -> Option<AggregateFunctionSimplification>
Optionally apply per-UDaF simplification / rewrite rules.
This can be used to apply function specific simplification rules during
optimization (e.g. arrow_cast
–> Expr::Cast
). The default
implementation does nothing.
Note that DataFusion handles simplifying arguments and “constant
folding” (replacing a function call with constant arguments such as
my_add(1,2) --> 3
). Thus, there is no need to implement such
optimizations manually for specific UDFs.
§Returns
None if simplify is not defined or,
Or, a closure with two arguments:
- ‘aggregate_function’: crate::expr::AggregateFunction for which simplified has been invoked
- ‘info’: crate::simplify::SimplifyInfo
closure returns simplified Expr or an error.
Sourcefn reverse_expr(&self) -> ReversedUDAF
fn reverse_expr(&self) -> ReversedUDAF
Returns the reverse expression of the aggregate function.
Sourcefn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>
fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>
Coerce arguments of a function call to types that the function can evaluate.
This function is only called if AggregateUDFImpl::signature
returns crate::TypeSignature::UserDefined
. Most
UDAFs should return one of the other variants of TypeSignature
which handle common
cases
See the type coercion module documentation for more details on type coercion
For example, if your function requires a floating point arguments, but the user calls
it like my_func(1::int)
(aka with 1
as an integer), coerce_types could return [DataType::Float64]
to ensure the argument was cast to 1::double
§Parameters
arg_types
: The argument types of the arguments this function with
§Return value
A Vec the same length as arg_types
. DataFusion will CAST
the function call
arguments to these specific types.
Sourcefn equals(&self, other: &dyn AggregateUDFImpl) -> bool
fn equals(&self, other: &dyn AggregateUDFImpl) -> bool
Return true if this aggregate UDF is equal to the other.
Allows customizing the equality of aggregate UDFs.
Must be consistent with Self::hash_value
and follow the same rules as Eq
:
- reflexive:
a.equals(a)
; - symmetric:
a.equals(b)
impliesb.equals(a)
; - transitive:
a.equals(b)
andb.equals(c)
impliesa.equals(c)
.
By default, compares Self::name
and Self::signature
.
Sourcefn hash_value(&self) -> u64
fn hash_value(&self) -> u64
Returns a hash value for this aggregate UDF.
Allows customizing the hash code of aggregate UDFs. Similarly to Hash
and Eq
,
if Self::equals
returns true for two UDFs, their hash_value
s must be the same.
By default, hashes Self::name
and Self::signature
.
Sourcefn is_descending(&self) -> Option<bool>
fn is_descending(&self) -> Option<bool>
If this function is max, return true If the function is min, return false Otherwise return None (the default)
Note: this is used to use special aggregate implementations in certain conditions
Sourcefn value_from_stats(
&self,
_statistics_args: &StatisticsArgs<'_>,
) -> Option<ScalarValue>
fn value_from_stats( &self, _statistics_args: &StatisticsArgs<'_>, ) -> Option<ScalarValue>
Return the value of this aggregate function if it can be determined entirely from statistics and arguments.
Using a ScalarValue
rather than a runtime computation can significantly
improving query performance.
For example, if the minimum value of column x
is known to be 42
from
statistics, then the aggregate MIN(x)
should return Some(ScalarValue(42))
Sourcefn default_value(&self, data_type: &DataType) -> Result<ScalarValue>
fn default_value(&self, data_type: &DataType) -> Result<ScalarValue>
Returns default value of the function given the input is all null
.
Most of the aggregate function return Null if input is Null,
while count
returns 0 if input is Null
Sourcefn documentation(&self) -> Option<&Documentation>
fn documentation(&self) -> Option<&Documentation>
Returns the documentation for this Aggregate UDF.
Documentation can be accessed programmatically as well as generating publicly facing documentation.