pub struct CoalesceBatchesExec { /* private fields */ }
Expand description
CoalesceBatchesExec
combines small batches into larger batches for more
efficient use of vectorized processing by later operators.
The operator buffers batches until it collects target_batch_size
rows and
then emits a single concatenated batch. When only a limited number of rows
are necessary (specified by the fetch
parameter), the operator will stop
buffering and returns the final batch once the number of collected rows
reaches the fetch
value.
§Background
Generally speaking, larger RecordBatches are more efficient to process than smaller record batches (until the CPU cache is exceeded) because there is fixed processing overhead per batch. This code concatenates multiple small record batches into larger ones to amortize this overhead.
┌────────────────────┐
│ RecordBatch │
│ num_rows = 23 │
└────────────────────┘ ┌────────────────────┐
│ │
┌────────────────────┐ Coalesce │ │
│ │ Batches │ │
│ RecordBatch │ │ │
│ num_rows = 50 │ ─ ─ ─ ─ ─ ─ ▶ │ │
│ │ │ RecordBatch │
│ │ │ num_rows = 106 │
└────────────────────┘ │ │
│ │
┌────────────────────┐ │ │
│ │ │ │
│ RecordBatch │ │ │
│ num_rows = 33 │ └────────────────────┘
│ │
└────────────────────┘
Implementations§
source§impl CoalesceBatchesExec
impl CoalesceBatchesExec
sourcepub fn new(input: Arc<dyn ExecutionPlan>, target_batch_size: usize) -> Self
pub fn new(input: Arc<dyn ExecutionPlan>, target_batch_size: usize) -> Self
Create a new CoalesceBatchesExec
sourcepub fn with_fetch(self, fetch: Option<usize>) -> Self
pub fn with_fetch(self, fetch: Option<usize>) -> Self
Update fetch with the argument
sourcepub fn input(&self) -> &Arc<dyn ExecutionPlan>
pub fn input(&self) -> &Arc<dyn ExecutionPlan>
The input plan
sourcepub fn target_batch_size(&self) -> usize
pub fn target_batch_size(&self) -> usize
Minimum number of rows for coalesces batches
Trait Implementations§
source§impl Debug for CoalesceBatchesExec
impl Debug for CoalesceBatchesExec
source§impl DisplayAs for CoalesceBatchesExec
impl DisplayAs for CoalesceBatchesExec
source§impl ExecutionPlan for CoalesceBatchesExec
impl ExecutionPlan for CoalesceBatchesExec
source§fn name(&self) -> &'static str
fn name(&self) -> &'static str
source§fn properties(&self) -> &PlanProperties
fn properties(&self) -> &PlanProperties
ExecutionPlan
, such as output
ordering(s), partitioning information etc. Read moresource§fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>
fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>
ExecutionPlan
s that act as inputs to this plan.
The returned list will be empty for leaf nodes such as scans, will contain
a single value for unary nodes, or two values for binary nodes (such as
joins).source§fn maintains_input_order(&self) -> Vec<bool>
fn maintains_input_order(&self) -> Vec<bool>
false
if this ExecutionPlan
’s implementation may reorder
rows within or between partitions. Read moresource§fn benefits_from_input_partitioning(&self) -> Vec<bool>
fn benefits_from_input_partitioning(&self) -> Vec<bool>
ExecutionPlan
benefits from increased
parallelization at its input for each child. Read moresource§fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>>
fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>
ExecutionPlan
where all existing children were replaced
by the children
, in ordersource§fn execute(
&self,
partition: usize,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream>
fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>
source§fn metrics(&self) -> Option<MetricsSet>
fn metrics(&self) -> Option<MetricsSet>
Metric
s for this
ExecutionPlan
. If no Metric
s are available, return None. Read moresource§fn statistics(&self) -> Result<Statistics>
fn statistics(&self) -> Result<Statistics>
ExecutionPlan
node. If statistics are not
available, should return Statistics::new_unknown
(the default), not
an error.source§fn with_fetch(&self, limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>
fn with_fetch(&self, limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>
ExecutionPlan
node, if it supports
fetch limits. Returns None
otherwise.source§fn static_name() -> &'static strwhere
Self: Sized,
fn static_name() -> &'static strwhere
Self: Sized,
name
but can be called without an instance.source§fn required_input_distribution(&self) -> Vec<Distribution>
fn required_input_distribution(&self) -> Vec<Distribution>
ExecutionPlan
, By default it’s [Distribution::UnspecifiedDistribution] for each child,source§fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>
fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>
ExecutionPlan
. Read moresource§fn repartitioned(
&self,
_target_partitions: usize,
_config: &ConfigOptions,
) -> Result<Option<Arc<dyn ExecutionPlan>>>
fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>
ExecutionPlan
to
produce target_partitions
partitions. Read moresource§fn supports_limit_pushdown(&self) -> bool
fn supports_limit_pushdown(&self) -> bool
Auto Trait Implementations§
impl Freeze for CoalesceBatchesExec
impl !RefUnwindSafe for CoalesceBatchesExec
impl Send for CoalesceBatchesExec
impl Sync for CoalesceBatchesExec
impl Unpin for CoalesceBatchesExec
impl !UnwindSafe for CoalesceBatchesExec
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more