Struct ExecutionOptions

Source

pub struct ExecutionOptions {Show 22 fields
    pub batch_size: usize,
    pub coalesce_batches: bool,
    pub collect_statistics: bool,
    pub target_partitions: usize,
    pub time_zone: Option<String>,
    pub parquet: ParquetOptions,
    pub planning_concurrency: usize,
    pub skip_physical_aggregate_schema_check: bool,
    pub sort_spill_reservation_bytes: usize,
    pub sort_in_place_threshold_bytes: usize,
    pub meta_fetch_concurrency: usize,
    pub minimum_parallel_output_files: usize,
    pub soft_max_rows_per_output_file: usize,
    pub max_buffered_batches_per_output_file: usize,
    pub listing_table_ignore_subdirectory: bool,
    pub enable_recursive_ctes: bool,
    pub split_file_groups_by_statistics: bool,
    pub keep_partition_by_columns: bool,
    pub skip_partial_aggregation_probe_ratio_threshold: f64,
    pub skip_partial_aggregation_probe_rows_threshold: usize,
    pub use_row_number_estimates_to_optimize_partitioning: bool,
    pub enforce_batch_size_in_joins: bool,
}

Expand description

Options related to query execution

Fields§

§batch_size: usize

Default batch size while creating new batches, it’s especially useful for buffer-in-memory batches since creating tiny batches would result in too much metadata memory consumption

§coalesce_batches: bool

When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting

§collect_statistics: bool

Should DataFusion collect statistics after listing files

§target_partitions: usize

Number of partitions for query execution. Increasing partitions can increase concurrency.

Defaults to the number of CPU cores on the system

§time_zone: Option<String>

The default time zone

Some functions, e.g. EXTRACT(HOUR from SOME_TIME), shift the underlying datetime according to this time zone, and then extract the hour

§parquet: ParquetOptions

Parquet options

§planning_concurrency: usize

Fan-out during initial physical planning.

This is mostly use to plan UNION children in parallel.

Defaults to the number of CPU cores on the system

§skip_physical_aggregate_schema_check: bool

When set to true, skips verifying that the schema produced by planning the input of LogicalPlan::Aggregate exactly matches the schema of the input plan.

When set to false, if the schema does not match exactly (including nullability and metadata), a planning error will be raised.

This is used to workaround bugs in the planner that are now caught by the new schema verification step.

§sort_spill_reservation_bytes: usize

Specifies the reserved memory for each spillable sort operation to facilitate an in-memory merge.

When a sort operation spills to disk, the in-memory data must be sorted and merged before being written to a file. This setting reserves a specific amount of memory for that in-memory sort/merge process.

Note: This setting is irrelevant if the sort operation cannot spill (i.e., if there’s no DiskManager configured).

§sort_in_place_threshold_bytes: usize

When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged.

§meta_fetch_concurrency: usize

Number of files to read in parallel when inferring schema and statistics

§minimum_parallel_output_files: usize

Guarantees a minimum level of output files running in parallel. RecordBatches will be distributed in round robin fashion to each parallel writer. Each writer is closed and a new file opened once soft_max_rows_per_output_file is reached.

§soft_max_rows_per_output_file: usize

Target number of rows in output files when writing multiple. This is a soft max, so it can be exceeded slightly. There also will be one file smaller than the limit if the total number of rows written is not roughly divisible by the soft max

§max_buffered_batches_per_output_file: usize

This is the maximum number of RecordBatches buffered for each output file being worked. Higher values can potentially give faster write performance at the cost of higher peak memory consumption

§listing_table_ignore_subdirectory: bool

Should sub directories be ignored when scanning directories for data files. Defaults to true (ignores subdirectories), consistent with Hive. Note that this setting does not affect reading partitioned tables (e.g. /table/year=2021/month=01/data.parquet).

§enable_recursive_ctes: bool

Should DataFusion support recursive CTEs

§split_file_groups_by_statistics: bool

Attempt to eliminate sorts by packing & sorting files with non-overlapping statistics into the same file groups. Currently experimental

§keep_partition_by_columns: bool

Should DataFusion keep the columns used for partition_by in the output RecordBatches

§skip_partial_aggregation_probe_ratio_threshold: f64

Aggregation ratio (number of distinct groups / number of input rows) threshold for skipping partial aggregation. If the value is greater then partial aggregation will skip aggregation for further input

§skip_partial_aggregation_probe_rows_threshold: usize

Number of input rows partial aggregation partition should process, before aggregation ratio check and trying to switch to skipping aggregation mode

§use_row_number_estimates_to_optimize_partitioning: bool

Should DataFusion use row number estimates at the input to decide whether increasing parallelism is beneficial or not. By default, only exact row numbers (not estimates) are used for this decision. Setting this flag to true will likely produce better plans. if the source of statistics is accurate. We plan to make this the default in the future.

§enforce_batch_size_in_joins: bool

Should DataFusion enforce batch size in joins or not. By default, DataFusion will not enforce batch size in joins. Enforcing batch size in joins can reduce memory usage when joining large tables with a highly-selective join filter, but is also slightly slower.

Struct ExecutionOptionsCopy item path

Fields§

Trait Implementations§

impl Clone for ExecutionOptions

fn clone(&self) -> ExecutionOptions

fn clone_from(&mut self, source: &Self)

impl ConfigField for ExecutionOptions

fn set(&mut self, key: &str, value: &str) -> Result<()>

fn visit<V: Visit>( &self, v: &mut V, key_prefix: &str, _description: &'static str, )

impl Debug for ExecutionOptions

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for ExecutionOptions

fn default() -> Self

impl PartialEq for ExecutionOptions

fn eq(&self, other: &ExecutionOptions) -> bool

fn ne(&self, other: &Rhs) -> bool

impl StructuralPartialEq for ExecutionOptions

Auto Trait Implementations§

impl Freeze for ExecutionOptions

impl RefUnwindSafe for ExecutionOptions

impl Send for ExecutionOptions

impl Sync for ExecutionOptions

impl Unpin for ExecutionOptions

impl UnwindSafe for ExecutionOptions

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut T)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> Allocation for Twhere T: RefUnwindSafe + Send + Sync,

Struct ExecutionOptions

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,