Struct datafusion::common::config::OptimizerOptions
source · pub struct OptimizerOptions {Show 20 fields
pub enable_distinct_aggregation_soft_limit: bool,
pub enable_round_robin_repartition: bool,
pub enable_topk_aggregation: bool,
pub filter_null_join_keys: bool,
pub repartition_aggregations: bool,
pub repartition_file_min_size: usize,
pub repartition_joins: bool,
pub allow_symmetric_joins_without_pruning: bool,
pub repartition_file_scans: bool,
pub repartition_windows: bool,
pub repartition_sorts: bool,
pub prefer_existing_sort: bool,
pub skip_failed_rules: bool,
pub max_passes: usize,
pub top_down_join_key_reordering: bool,
pub prefer_hash_join: bool,
pub hash_join_single_partition_threshold: usize,
pub hash_join_single_partition_threshold_rows: usize,
pub default_filter_selectivity: u8,
pub prefer_existing_union: bool,
}
Expand description
Options related to query optimization
See also: SessionConfig
Fields§
§enable_distinct_aggregation_soft_limit: bool
When set to true, the optimizer will push a limit operation into grouped aggregations which have no aggregate expressions, as a soft limit, emitting groups once the limit is reached, before all rows in the group are read.
enable_round_robin_repartition: bool
When set to true, the physical plan optimizer will try to add round robin repartitioning to increase parallelism to leverage more CPU cores
enable_topk_aggregation: bool
When set to true, the optimizer will attempt to perform limit operations during aggregations, if possible
filter_null_join_keys: bool
When set to true, the optimizer will insert filters before a join between a nullable and non-nullable column to filter out nulls on the nullable side. This filter can add additional overhead when the file format does not fully support predicate push down.
repartition_aggregations: bool
Should DataFusion repartition data using the aggregate keys to execute aggregates
in parallel using the provided target_partitions
level
repartition_file_min_size: usize
Minimum total files size in bytes to perform file scan repartitioning.
repartition_joins: bool
Should DataFusion repartition data using the join keys to execute joins in parallel
using the provided target_partitions
level
allow_symmetric_joins_without_pruning: bool
Should DataFusion allow symmetric hash joins for unbounded data sources even when its inputs do not have any ordering or filtering If the flag is not enabled, the SymmetricHashJoin operator will be unable to prune its internal buffers, resulting in certain join types - such as Full, Left, LeftAnti, LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of the execution. This is not typical in stream processing. Additionally, without proper design for long runner execution, all types of joins may encounter out-of-memory errors.
repartition_file_scans: bool
When set to true
, file groups will be repartitioned to achieve maximum parallelism.
Currently Parquet and CSV formats are supported.
If set to true
, all files will be repartitioned evenly (i.e., a single large file
might be partitioned into smaller chunks) for parallel scanning.
If set to false
, different files will be read in parallel, but repartitioning won’t
happen within a single file.
repartition_windows: bool
Should DataFusion repartition data using the partitions keys to execute window
functions in parallel using the provided target_partitions
level
repartition_sorts: bool
Should DataFusion execute sorts in a per-partition fashion and merge afterwards instead of coalescing first and sorting globally. With this flag is enabled, plans in the form below
"SortExec: [a@0 ASC]",
" CoalescePartitionsExec",
" RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1",
would turn into the plan below which performs better in multithreaded environments
"SortPreservingMergeExec: [a@0 ASC]",
" SortExec: [a@0 ASC]",
" RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1",
prefer_existing_sort: bool
When true, DataFusion will opportunistically remove sorts when the data is already sorted,
(i.e. setting preserve_order
to true on RepartitionExec
and
using SortPreservingMergeExec
)
When false, DataFusion will maximize plan parallelism using
RepartitionExec
even if this requires subsequently resorting data using a SortExec
.
skip_failed_rules: bool
When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail
max_passes: usize
Number of times that the optimizer will attempt to optimize the plan
top_down_join_key_reordering: bool
When set to true, the physical plan optimizer will run a top down process to reorder the join keys
prefer_hash_join: bool
When set to true, the physical plan optimizer will prefer HashJoin over SortMergeJoin. HashJoin can work more efficiently than SortMergeJoin but consumes more memory
hash_join_single_partition_threshold: usize
The maximum estimated size in bytes for one input side of a HashJoin will be collected into a single partition
hash_join_single_partition_threshold_rows: usize
The maximum estimated size in rows for one input side of a HashJoin will be collected into a single partition
default_filter_selectivity: u8
The default filter selectivity used by Filter Statistics when an exact selectivity cannot be determined. Valid values are between 0 (no selectivity) and 100 (all rows are selected).
prefer_existing_union: bool
When set to true, the optimizer will not attempt to convert Union to Interleave
Trait Implementations§
source§impl Clone for OptimizerOptions
impl Clone for OptimizerOptions
source§fn clone(&self) -> OptimizerOptions
fn clone(&self) -> OptimizerOptions
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl ConfigField for OptimizerOptions
impl ConfigField for OptimizerOptions
source§impl Debug for OptimizerOptions
impl Debug for OptimizerOptions
source§impl Default for OptimizerOptions
impl Default for OptimizerOptions
source§fn default() -> OptimizerOptions
fn default() -> OptimizerOptions
source§impl PartialEq for OptimizerOptions
impl PartialEq for OptimizerOptions
impl StructuralPartialEq for OptimizerOptions
Auto Trait Implementations§
impl Freeze for OptimizerOptions
impl RefUnwindSafe for OptimizerOptions
impl Send for OptimizerOptions
impl Sync for OptimizerOptions
impl Unpin for OptimizerOptions
impl UnwindSafe for OptimizerOptions
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§default unsafe fn clone_to_uninit(&self, dst: *mut T)
default unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more