pub struct CoalesceBatchesExec { /* private fields */ }
Expand description

CoalesceBatchesExec combines small batches into larger batches for more efficient use of vectorized processing by later operators.

The operator buffers batches until it collects target_batch_size rows and then emits a single concatenated batch. When only a limited number of rows are necessary (specified by the fetch parameter), the operator will stop buffering and returns the final batch once the number of collected rows reaches the fetch value.

§Background

Generally speaking, larger RecordBatches are more efficient to process than smaller record batches (until the CPU cache is exceeded) because there is fixed processing overhead per batch. This code concatenates multiple small record batches into larger ones to amortize this overhead.

┌────────────────────┐
│    RecordBatch     │
│   num_rows = 23    │
└────────────────────┘                 ┌────────────────────┐
                                       │                    │
┌────────────────────┐     Coalesce    │                    │
│                    │      Batches    │                    │
│    RecordBatch     │                 │                    │
│   num_rows = 50    │  ─ ─ ─ ─ ─ ─ ▶  │                    │
│                    │                 │    RecordBatch     │
│                    │                 │   num_rows = 106   │
└────────────────────┘                 │                    │
                                       │                    │
┌────────────────────┐                 │                    │
│                    │                 │                    │
│    RecordBatch     │                 │                    │
│   num_rows = 33    │                 └────────────────────┘
│                    │
└────────────────────┘

Implementations§

source§

impl CoalesceBatchesExec

source

pub fn new(input: Arc<dyn ExecutionPlan>, target_batch_size: usize) -> Self

Create a new CoalesceBatchesExec

source

pub fn with_fetch(self, fetch: Option<usize>) -> Self

Update fetch with the argument

source

pub fn input(&self) -> &Arc<dyn ExecutionPlan>

The input plan

source

pub fn target_batch_size(&self) -> usize

Minimum number of rows for coalesces batches

Trait Implementations§

source§

impl Debug for CoalesceBatchesExec

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl DisplayAs for CoalesceBatchesExec

source§

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result

Format according to DisplayFormatType, used when verbose representation looks different from the default one Read more
source§

impl ExecutionPlan for CoalesceBatchesExec

source§

fn as_any(&self) -> &dyn Any

Return a reference to Any that can be used for downcasting

source§

fn name(&self) -> &'static str

Short name for the ExecutionPlan, such as ‘ParquetExec’. Read more
source§

fn properties(&self) -> &PlanProperties

Return properties of the output of the ExecutionPlan, such as output ordering(s), partitioning information etc. Read more
source§

fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>

Get a list of children ExecutionPlans that act as inputs to this plan. The returned list will be empty for leaf nodes such as scans, will contain a single value for unary nodes, or two values for binary nodes (such as joins).
source§

fn maintains_input_order(&self) -> Vec<bool>

Returns false if this ExecutionPlan’s implementation may reorder rows within or between partitions. Read more
source§

fn benefits_from_input_partitioning(&self) -> Vec<bool>

Specifies whether the ExecutionPlan benefits from increased parallelization at its input for each child. Read more
source§

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>

Returns a new ExecutionPlan where all existing children were replaced by the children, in order
source§

fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>

Begin execution of partition, returning a Stream of RecordBatches. Read more
source§

fn metrics(&self) -> Option<MetricsSet>

Return a snapshot of the set of Metrics for this ExecutionPlan. If no Metrics are available, return None. Read more
source§

fn statistics(&self) -> Result<Statistics>

Returns statistics for this ExecutionPlan node. If statistics are not available, should return Statistics::new_unknown (the default), not an error.
source§

fn with_fetch(&self, limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>

Returns a fetching variant of this ExecutionPlan node, if it supports fetch limits. Returns None otherwise.
source§

fn static_name() -> &'static str
where Self: Sized,

Short name for the ExecutionPlan, such as ‘ParquetExec’. Like name but can be called without an instance.
source§

fn schema(&self) -> SchemaRef

Get the schema for this execution plan
source§

fn required_input_distribution(&self) -> Vec<Distribution>

Specifies the data distribution requirements for all the children for this ExecutionPlan, By default it’s [Distribution::UnspecifiedDistribution] for each child,
source§

fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>

Specifies the ordering required for all of the children of this ExecutionPlan. Read more
source§

fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

If supported, attempt to increase the partitioning of this ExecutionPlan to produce target_partitions partitions. Read more
source§

fn supports_limit_pushdown(&self) -> bool

Returns true if a limit can be safely pushed down through this ExecutionPlan node. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V