Expand description
Aggregate Function packages for DataFusion.
This crate contains a collection of various aggregate function packages for DataFusion, implemented using the extension API. Users may wish to control which functions are available to control the binary size of their application as well as use dialect specific implementations of functions (e.g. Spark vs Postgres)
Each package is implemented as a separate module, activated by a feature flag.
§Available Packages
See the list of modules in this crate for available packages.
§Using A Package
You can register all functions in all packages using the register_all
function.
Each package also exports an expr_fn
submodule to help create Expr
s that invoke
functions using a fluent style. For example:
§Implementing A New Package
To add a new package to this crate, you should follow the model of existing packages. The high level steps are:
-
Create a new module with the appropriate AggregateUDF implementations.
-
Use the macros in
macros
to create standard entry points. -
Add a new feature to
Cargo.toml
, with any optional dependencies -
Use the
make_package!
macro to expose the module when the feature is enabled.
Modules§
- approx_
distinct - Defines physical expressions that can evaluated at runtime during query execution
- approx_
median - Defines physical expressions for APPROX_MEDIAN that can be evaluated MEDIAN at runtime during query execution
- approx_
percentile_ cont - approx_
percentile_ cont_ with_ weight - array_
agg ARRAY_AGG
aggregate implementation:ArrayAgg
- average
- Defines
Avg
&Mean
aggregate & accumulators - bit_
and_ or_ xor - Defines
BitAnd
,BitOr
,BitXor
andBitXor DISTINCT
aggregate accumulators - bool_
and_ or - Defines physical expressions that can evaluated at runtime during query execution
- correlation
Correlation
: correlation sample aggregations.- count
- covariance
CovarianceSample
: covariance sample aggregations.- expr_fn
- Fluent-style API for creating
Expr
s - first_
last - Defines the FIRST_VALUE/LAST_VALUE aggregations.
- grouping
- Defines physical expressions that can evaluated at runtime during query execution
- hyperloglog
- HyperLogLog
- macros
- median
- min_max
Max
andMaxAccumulator
accumulator for themax
functionMin
andMinAccumulator
accumulator for themin
function- nth_
value - Defines NTH_VALUE aggregate expression which may specify ordering requirement that can evaluated at runtime during query execution
- regr
- Defines physical expressions that can evaluated at runtime during query execution
- stddev
- Defines physical expressions that can evaluated at runtime during query execution
- string_
agg StringAgg
accumulator for thestring_agg
function- sum
- Defines
SUM
andSUM DISTINCT
aggregate accumulators - variance
VarianceSample
: variance sample aggregations.VariancePopulation
: variance population aggregations.
Functions§
- all_
default_ aggregate_ functions - Returns all default aggregate functions
- register_
all - Registers all enabled packages with a
FunctionRegistry