Struct datafusion_common::config::ExecutionOptions

source ·
pub struct ExecutionOptions {
Show 20 fields pub batch_size: usize, pub coalesce_batches: bool, pub collect_statistics: bool, pub target_partitions: usize, pub time_zone: Option<String>, pub parquet: ParquetOptions, pub aggregate: AggregateOptions, pub planning_concurrency: usize, pub sort_spill_reservation_bytes: usize, pub sort_in_place_threshold_bytes: usize, pub meta_fetch_concurrency: usize, pub minimum_parallel_output_files: usize, pub soft_max_rows_per_output_file: usize, pub max_buffered_batches_per_output_file: usize, pub listing_table_ignore_subdirectory: bool, pub enable_recursive_ctes: bool, pub split_file_groups_by_statistics: bool, pub keep_partition_by_columns: bool, pub skip_partial_aggregation_probe_ratio_threshold: f64, pub skip_partial_aggregation_probe_rows_threshold: usize,
}
Expand description

Options related to query execution

See also: SessionConfig

Fields§

§batch_size: usize

Default batch size while creating new batches, it’s especially useful for buffer-in-memory batches since creating tiny batches would result in too much metadata memory consumption

§coalesce_batches: bool

When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting

§collect_statistics: bool

Should DataFusion collect statistics after listing files

§target_partitions: usize

Number of partitions for query execution. Increasing partitions can increase concurrency.

Defaults to the number of CPU cores on the system

§time_zone: Option<String>

The default time zone

Some functions, e.g. EXTRACT(HOUR from SOME_TIME), shift the underlying datetime according to this time zone, and then extract the hour

§parquet: ParquetOptions

Parquet options

§aggregate: AggregateOptions

Aggregate options

§planning_concurrency: usize

Fan-out during initial physical planning.

This is mostly use to plan UNION children in parallel.

Defaults to the number of CPU cores on the system

§sort_spill_reservation_bytes: usize

Specifies the reserved memory for each spillable sort operation to facilitate an in-memory merge.

When a sort operation spills to disk, the in-memory data must be sorted and merged before being written to a file. This setting reserves a specific amount of memory for that in-memory sort/merge process.

Note: This setting is irrelevant if the sort operation cannot spill (i.e., if there’s no DiskManager configured).

§sort_in_place_threshold_bytes: usize

When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged.

§meta_fetch_concurrency: usize

Number of files to read in parallel when inferring schema and statistics

§minimum_parallel_output_files: usize

Guarantees a minimum level of output files running in parallel. RecordBatches will be distributed in round robin fashion to each parallel writer. Each writer is closed and a new file opened once soft_max_rows_per_output_file is reached.

§soft_max_rows_per_output_file: usize

Target number of rows in output files when writing multiple. This is a soft max, so it can be exceeded slightly. There also will be one file smaller than the limit if the total number of rows written is not roughly divisible by the soft max

§max_buffered_batches_per_output_file: usize

This is the maximum number of RecordBatches buffered for each output file being worked. Higher values can potentially give faster write performance at the cost of higher peak memory consumption

§listing_table_ignore_subdirectory: bool

Should sub directories be ignored when scanning directories for data files. Defaults to true (ignores subdirectories), consistent with Hive. Note that this setting does not affect reading partitioned tables (e.g. /table/year=2021/month=01/data.parquet).

§enable_recursive_ctes: bool

Should DataFusion support recursive CTEs

§split_file_groups_by_statistics: bool

Attempt to eliminate sorts by packing & sorting files with non-overlapping statistics into the same file groups. Currently experimental

§keep_partition_by_columns: bool

Should DataFusion keep the columns used for partition_by in the output RecordBatches

§skip_partial_aggregation_probe_ratio_threshold: f64

Aggregation ratio (number of distinct groups / number of input rows) threshold for skipping partial aggregation. If the value is greater then partial aggregation will skip aggregation for further input

§skip_partial_aggregation_probe_rows_threshold: usize

Number of input rows partial aggregation partition should process, before aggregation ratio check and trying to switch to skipping aggregation mode

Trait Implementations§

source§

impl Clone for ExecutionOptions

source§

fn clone(&self) -> ExecutionOptions

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl ConfigField for ExecutionOptions

source§

fn set(&mut self, key: &str, value: &str) -> Result<()>

source§

fn visit<V: Visit>( &self, v: &mut V, key_prefix: &str, _description: &'static str, )

source§

impl Debug for ExecutionOptions

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for ExecutionOptions

source§

fn default() -> Self

Returns the “default value” for a type. Read more
source§

impl PartialEq for ExecutionOptions

source§

fn eq(&self, other: &ExecutionOptions) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl StructuralPartialEq for ExecutionOptions

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

default unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,