Crate datafusion_functions_aggregate

Source
Expand description

Aggregate Function packages for DataFusion.

This crate contains a collection of various aggregate function packages for DataFusion, implemented using the extension API. Users may wish to control which functions are available to control the binary size of their application as well as use dialect specific implementations of functions (e.g. Spark vs Postgres)

Each package is implemented as a separate module, activated by a feature flag.

§Available Packages

See the list of modules in this crate for available packages.

§Using A Package

You can register all functions in all packages using the register_all function.

Each package also exports an expr_fn submodule to help create Exprs that invoke functions using a fluent style. For example:

§Implementing A New Package

To add a new package to this crate, you should follow the model of existing packages. The high level steps are:

  1. Create a new module with the appropriate AggregateUDF implementations.

  2. Use the macros in macros to create standard entry points.

  3. Add a new feature to Cargo.toml, with any optional dependencies

  4. Use the make_package! macro to expose the module when the feature is enabled.

Modules§

approx_distinct
Defines physical expressions that can evaluated at runtime during query execution
approx_median
Defines physical expressions for APPROX_MEDIAN that can be evaluated MEDIAN at runtime during query execution
approx_percentile_cont
approx_percentile_cont_with_weight
array_agg
ARRAY_AGG aggregate implementation: ArrayAgg
average
Defines Avg & Mean aggregate & accumulators
bit_and_or_xor
Defines BitAnd, BitOr, BitXor and BitXor DISTINCT aggregate accumulators
bool_and_or
Defines physical expressions that can evaluated at runtime during query execution
correlation
Correlation: correlation sample aggregations.
count
covariance
CovarianceSample: covariance sample aggregations.
expr_fn
Fluent-style API for creating Exprs
first_last
Defines the FIRST_VALUE/LAST_VALUE aggregations.
grouping
Defines physical expressions that can evaluated at runtime during query execution
hyperloglog
HyperLogLog
macros
median
min_max
Max and MaxAccumulator accumulator for the max function Min and MinAccumulator accumulator for the min function
nth_value
Defines NTH_VALUE aggregate expression which may specify ordering requirement that can evaluated at runtime during query execution
regr
Defines physical expressions that can evaluated at runtime during query execution
stddev
Defines physical expressions that can evaluated at runtime during query execution
string_agg
StringAgg accumulator for the string_agg function
sum
Defines SUM and SUM DISTINCT aggregate accumulators
variance
VarianceSample: variance sample aggregations. VariancePopulation: variance population aggregations.

Functions§

all_default_aggregate_functions
Returns all default aggregate functions
register_all
Registers all enabled packages with a FunctionRegistry