Expand description
Aggregate Function packages for DataFusion.
This crate contains a collection of various aggregate function packages for DataFusion, implemented using the extension API. Users may wish to control which functions are available to control the binary size of their application as well as use dialect specific implementations of functions (e.g. Spark vs Postgres)
Each package is implemented as a separate module, activated by a feature flag.
§Available Packages
See the list of modules in this crate for available packages.
§Using A Package
You can register all functions in all packages using the register_all
function.
Each package also exports an expr_fn
submodule to help create Expr
s that invoke
functions using a fluent style. For example:
§Implementing A New Package
To add a new package to this crate, you should follow the model of existing packages. The high level steps are:
-
Create a new module with the appropriate AggregateUDF implementations.
-
Use the macros in
macros
to create standard entry points. -
Add a new feature to
Cargo.toml
, with any optional dependencies -
Use the
make_package!
macro to expose the module when the feature is enabled.
Modules§
- Defines physical expressions that can evaluated at runtime during query execution
- Defines physical expressions for APPROX_MEDIAN that can be evaluated MEDIAN at runtime during query execution
ARRAY_AGG
aggregate implementation:ArrayAgg
- Defines
Avg
&Mean
aggregate & accumulators - Defines
BitAnd
,BitOr
,BitXor
andBitXor DISTINCT
aggregate accumulators - Defines physical expressions that can evaluated at runtime during query execution
Correlation
: correlation sample aggregations.CovarianceSample
: covariance sample aggregations.- Fluent-style API for creating
Expr
s - Defines the FIRST_VALUE/LAST_VALUE aggregations.
- Defines physical expressions that can evaluated at runtime during query execution
- HyperLogLog
Max
andMaxAccumulator
accumulator for themax
functionMin
andMinAccumulator
accumulator for themax
function- Defines NTH_VALUE aggregate expression which may specify ordering requirement that can evaluated at runtime during query execution
- Defines physical expressions that can evaluated at runtime during query execution
- Defines physical expressions that can evaluated at runtime during query execution
StringAgg
accumulator for thestring_agg
function- Defines
SUM
andSUM DISTINCT
aggregate accumulators VarianceSample
: variance sample aggregations.VariancePopulation
: variance population aggregations.
Functions§
- Returns all default aggregate functions
- Registers all enabled packages with a
FunctionRegistry