Struct datafusion_common::config::ExecutionOptions
source · pub struct ExecutionOptions {Show 20 fields
pub batch_size: usize,
pub coalesce_batches: bool,
pub collect_statistics: bool,
pub target_partitions: usize,
pub time_zone: Option<String>,
pub parquet: ParquetOptions,
pub aggregate: AggregateOptions,
pub planning_concurrency: usize,
pub sort_spill_reservation_bytes: usize,
pub sort_in_place_threshold_bytes: usize,
pub meta_fetch_concurrency: usize,
pub minimum_parallel_output_files: usize,
pub soft_max_rows_per_output_file: usize,
pub max_buffered_batches_per_output_file: usize,
pub listing_table_ignore_subdirectory: bool,
pub enable_recursive_ctes: bool,
pub split_file_groups_by_statistics: bool,
pub keep_partition_by_columns: bool,
pub skip_partial_aggregation_probe_ratio_threshold: f64,
pub skip_partial_aggregation_probe_rows_threshold: usize,
}
Expand description
Options related to query execution
See also: SessionConfig
Fields§
§batch_size: usize
Default batch size while creating new batches, it’s especially useful for buffer-in-memory batches since creating tiny batches would result in too much metadata memory consumption
coalesce_batches: bool
When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting
collect_statistics: bool
Should DataFusion collect statistics after listing files
target_partitions: usize
Number of partitions for query execution. Increasing partitions can increase concurrency.
Defaults to the number of CPU cores on the system
time_zone: Option<String>
The default time zone
Some functions, e.g. EXTRACT(HOUR from SOME_TIME)
, shift the underlying datetime
according to this time zone, and then extract the hour
parquet: ParquetOptions
Parquet options
aggregate: AggregateOptions
Aggregate options
planning_concurrency: usize
Fan-out during initial physical planning.
This is mostly use to plan UNION
children in parallel.
Defaults to the number of CPU cores on the system
sort_spill_reservation_bytes: usize
Specifies the reserved memory for each spillable sort operation to facilitate an in-memory merge.
When a sort operation spills to disk, the in-memory data must be sorted and merged before being written to a file. This setting reserves a specific amount of memory for that in-memory sort/merge process.
Note: This setting is irrelevant if the sort operation cannot spill
(i.e., if there’s no DiskManager
configured).
sort_in_place_threshold_bytes: usize
When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged.
meta_fetch_concurrency: usize
Number of files to read in parallel when inferring schema and statistics
minimum_parallel_output_files: usize
Guarantees a minimum level of output files running in parallel. RecordBatches will be distributed in round robin fashion to each parallel writer. Each writer is closed and a new file opened once soft_max_rows_per_output_file is reached.
soft_max_rows_per_output_file: usize
Target number of rows in output files when writing multiple. This is a soft max, so it can be exceeded slightly. There also will be one file smaller than the limit if the total number of rows written is not roughly divisible by the soft max
max_buffered_batches_per_output_file: usize
This is the maximum number of RecordBatches buffered for each output file being worked. Higher values can potentially give faster write performance at the cost of higher peak memory consumption
listing_table_ignore_subdirectory: bool
Should sub directories be ignored when scanning directories for data
files. Defaults to true (ignores subdirectories), consistent with
Hive. Note that this setting does not affect reading partitioned
tables (e.g. /table/year=2021/month=01/data.parquet
).
enable_recursive_ctes: bool
Should DataFusion support recursive CTEs
split_file_groups_by_statistics: bool
Attempt to eliminate sorts by packing & sorting files with non-overlapping statistics into the same file groups. Currently experimental
keep_partition_by_columns: bool
Should DataFusion keep the columns used for partition_by in the output RecordBatches
skip_partial_aggregation_probe_ratio_threshold: f64
Aggregation ratio (number of distinct groups / number of input rows) threshold for skipping partial aggregation. If the value is greater then partial aggregation will skip aggregation for further input
skip_partial_aggregation_probe_rows_threshold: usize
Number of input rows partial aggregation partition should process, before aggregation ratio check and trying to switch to skipping aggregation mode
Trait Implementations§
source§impl Clone for ExecutionOptions
impl Clone for ExecutionOptions
source§fn clone(&self) -> ExecutionOptions
fn clone(&self) -> ExecutionOptions
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl ConfigField for ExecutionOptions
impl ConfigField for ExecutionOptions
source§impl Debug for ExecutionOptions
impl Debug for ExecutionOptions
source§impl Default for ExecutionOptions
impl Default for ExecutionOptions
source§impl PartialEq for ExecutionOptions
impl PartialEq for ExecutionOptions
impl StructuralPartialEq for ExecutionOptions
Auto Trait Implementations§
impl Freeze for ExecutionOptions
impl RefUnwindSafe for ExecutionOptions
impl Send for ExecutionOptions
impl Sync for ExecutionOptions
impl Unpin for ExecutionOptions
impl UnwindSafe for ExecutionOptions
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§default unsafe fn clone_to_uninit(&self, dst: *mut T)
default unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)