Trait datafusion_expr::AggregateUDFImpl

source ·
pub trait AggregateUDFImpl:
    Debug
    + Send
    + Sync {
Show 18 methods // Required methods fn as_any(&self) -> &dyn Any; fn name(&self) -> &str; fn signature(&self) -> &Signature; fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>; fn accumulator( &self, acc_args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>; // Provided methods fn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>> { ... } fn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool { ... } fn create_groups_accumulator( &self, _args: AccumulatorArgs<'_>, ) -> Result<Box<dyn GroupsAccumulator>> { ... } fn aliases(&self) -> &[String] { ... } fn create_sliding_accumulator( &self, args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>> { ... } fn with_beneficial_ordering( self: Arc<Self>, _beneficial_ordering: bool, ) -> Result<Option<Arc<dyn AggregateUDFImpl>>> { ... } fn order_sensitivity(&self) -> AggregateOrderSensitivity { ... } fn simplify(&self) -> Option<AggregateFunctionSimplification> { ... } fn reverse_expr(&self) -> ReversedUDAF { ... } fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>> { ... } fn equals(&self, other: &dyn AggregateUDFImpl) -> bool { ... } fn hash_value(&self) -> u64 { ... } fn is_descending(&self) -> Option<bool> { ... }
}
Expand description

Trait for implementing AggregateUDF.

This trait exposes the full API for implementing user defined aggregate functions and can be used to implement any function.

See advanced_udaf.rs for a full example with complete implementation and AggregateUDF for other available options.

§Basic Example

#[derive(Debug, Clone)]
struct GeoMeanUdf {
  signature: Signature
}

impl GeoMeanUdf {
  fn new() -> Self {
    Self {
      signature: Signature::uniform(1, vec![DataType::Float64], Volatility::Immutable)
     }
  }
}

/// Implement the AggregateUDFImpl trait for GeoMeanUdf
impl AggregateUDFImpl for GeoMeanUdf {
   fn as_any(&self) -> &dyn Any { self }
   fn name(&self) -> &str { "geo_mean" }
   fn signature(&self) -> &Signature { &self.signature }
   fn return_type(&self, args: &[DataType]) -> Result<DataType> {
     if !matches!(args.get(0), Some(&DataType::Float64)) {
       return plan_err!("add_one only accepts Float64 arguments");
     }
     Ok(DataType::Float64)
   }
   // This is the accumulator factory; DataFusion uses it to create new accumulators.
   fn accumulator(&self, _acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> { unimplemented!() }
   fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>> {
       Ok(vec![
            Field::new("value", args.return_type.clone(), true),
            Field::new("ordering", DataType::UInt32, true)
       ])
   }
}

// Create a new AggregateUDF from the implementation
let geometric_mean = AggregateUDF::from(GeoMeanUdf::new());

// Call the function `geo_mean(col)`
let expr = geometric_mean.call(vec![col("a")]);

Required Methods§

source

fn as_any(&self) -> &dyn Any

Returns this object as an Any trait object

source

fn name(&self) -> &str

Returns this function’s name

source

fn signature(&self) -> &Signature

Returns the function’s Signature for information about what input types are accepted and the function’s Volatility.

source

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>

What DataType will be returned by this function, given the types of the arguments

source

fn accumulator( &self, acc_args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>

Return a new Accumulator that aggregates values for a specific group during query execution.

acc_args: AccumulatorArgs contains information about how the aggregate function was called.

Provided Methods§

source

fn state_fields(&self, args: StateFieldsArgs<'_>) -> Result<Vec<Field>>

Return the fields used to store the intermediate state of this accumulator.

See Accumulator::state for background information.

args: StateFieldsArgs contains arguments passed to the aggregate function’s accumulator.

§Notes:

The default implementation returns a single state field named name with the same type as value_type. This is suitable for aggregates such as SUM or MIN where partial state can be combined by applying the same aggregate.

For aggregates such as AVG where the partial state is more complex (e.g. a COUNT and a SUM), this method is used to define the additional fields.

The name of the fields must be unique within the query and thus should be derived from name. See format_state_name for a utility function to generate a unique name.

source

fn groups_accumulator_supported(&self, _args: AccumulatorArgs<'_>) -> bool

If the aggregate expression has a specialized GroupsAccumulator implementation. If this returns true, [Self::create_groups_accumulator] will be called.

§Notes

Even if this function returns true, DataFusion will still use Self::accumulator for certain queries, such as when this aggregate is used as a window function or when there no GROUP BY columns in the query.

source

fn create_groups_accumulator( &self, _args: AccumulatorArgs<'_>, ) -> Result<Box<dyn GroupsAccumulator>>

Return a specialized GroupsAccumulator that manages state for all groups.

For maximum performance, a GroupsAccumulator should be implemented in addition to Accumulator.

source

fn aliases(&self) -> &[String]

Returns any aliases (alternate names) for this function.

Note: aliases should only include names other than Self::name. Defaults to [] (no aliases)

source

fn create_sliding_accumulator( &self, args: AccumulatorArgs<'_>, ) -> Result<Box<dyn Accumulator>>

Sliding accumulator is an alternative accumulator that can be used for window functions. It has retract method to revert the previous update.

See retract_batch for more details.

source

fn with_beneficial_ordering( self: Arc<Self>, _beneficial_ordering: bool, ) -> Result<Option<Arc<dyn AggregateUDFImpl>>>

Sets the indicator whether ordering requirements of the AggregateUDFImpl is satisfied by its input. If this is not the case, UDFs with order sensitivity AggregateOrderSensitivity::Beneficial can still produce the correct result with possibly more work internally.

§Returns

Returns Ok(Some(updated_udf)) if the process completes successfully. If the expression can benefit from existing input ordering, but does not implement the method, returns an error. Order insensitive and hard requirement aggregators return Ok(None).

source

fn order_sensitivity(&self) -> AggregateOrderSensitivity

Gets the order sensitivity of the UDF. See AggregateOrderSensitivity for possible options.

source

fn simplify(&self) -> Option<AggregateFunctionSimplification>

Optionally apply per-UDaF simplification / rewrite rules.

This can be used to apply function specific simplification rules during optimization (e.g. arrow_cast –> Expr::Cast). The default implementation does nothing.

Note that DataFusion handles simplifying arguments and “constant folding” (replacing a function call with constant arguments such as my_add(1,2) --> 3 ). Thus, there is no need to implement such optimizations manually for specific UDFs.

§Returns

None if simplify is not defined or,

Or, a closure with two arguments:

closure returns simplified Expr or an error.

source

fn reverse_expr(&self) -> ReversedUDAF

Returns the reverse expression of the aggregate function.

source

fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>

Coerce arguments of a function call to types that the function can evaluate.

This function is only called if AggregateUDFImpl::signature returns crate::TypeSignature::UserDefined. Most UDAFs should return one of the other variants of TypeSignature which handle common cases

See the type coercion module documentation for more details on type coercion

For example, if your function requires a floating point arguments, but the user calls it like my_func(1::int) (aka with 1 as an integer), coerce_types could return [DataType::Float64] to ensure the argument was cast to 1::double

§Parameters
  • arg_types: The argument types of the arguments this function with
§Return value

A Vec the same length as arg_types. DataFusion will CAST the function call arguments to these specific types.

source

fn equals(&self, other: &dyn AggregateUDFImpl) -> bool

Return true if this aggregate UDF is equal to the other.

Allows customizing the equality of aggregate UDFs. Must be consistent with Self::hash_value and follow the same rules as Eq:

  • reflexive: a.equals(a);
  • symmetric: a.equals(b) implies b.equals(a);
  • transitive: a.equals(b) and b.equals(c) implies a.equals(c).

By default, compares Self::name and Self::signature.

source

fn hash_value(&self) -> u64

Returns a hash value for this aggregate UDF.

Allows customizing the hash code of aggregate UDFs. Similarly to Hash and Eq, if Self::equals returns true for two UDFs, their hash_values must be the same.

By default, hashes Self::name and Self::signature.

source

fn is_descending(&self) -> Option<bool>

If this function is max, return true if the function is min, return false otherwise return None (the default)

Note: this is used to use special aggregate implementations in certain conditions

Implementors§