pub struct BatchCoalescer { /* private fields */ }
Expand description
Concatenate multiple RecordBatch
es
BatchCoalescer
concatenates multiple small RecordBatch
es, produced by
operations such as FilterExec
and RepartitionExec
, into larger ones for
more efficient processing by subsequent operations.
§Background
Generally speaking, larger RecordBatch
es are more efficient to process
than smaller record batches (until the CPU cache is exceeded) because there
is fixed processing overhead per batch. DataFusion tries to operate on
batches of target_batch_size
rows to amortize this overhead
┌────────────────────┐
│ RecordBatch │
│ num_rows = 23 │
└────────────────────┘ ┌────────────────────┐
│ │
┌────────────────────┐ Coalesce │ │
│ │ Batches │ │
│ RecordBatch │ │ │
│ num_rows = 50 │ ─ ─ ─ ─ ─ ─ ▶ │ │
│ │ │ RecordBatch │
│ │ │ num_rows = 106 │
└────────────────────┘ │ │
│ │
┌────────────────────┐ │ │
│ │ │ │
│ RecordBatch │ │ │
│ num_rows = 33 │ └────────────────────┘
│ │
└────────────────────┘
§Notes:
-
Output rows are produced in the same order as the input rows
-
The output is a sequence of batches, with all but the last being at least
target_batch_size
rows. -
Eventually this may also be able to handle other optimizations such as a combined filter/coalesce operation.
Implementations§
Source§impl BatchCoalescer
impl BatchCoalescer
Sourcepub fn new(
schema: SchemaRef,
target_batch_size: usize,
fetch: Option<usize>,
) -> Self
pub fn new( schema: SchemaRef, target_batch_size: usize, fetch: Option<usize>, ) -> Self
Create a new BatchCoalescer
§Arguments
schema
- the schema of the output batchestarget_batch_size
- the minimum number of rows for each output batch (until limit reached)fetch
- the maximum number of rows to fetch,None
means fetch all rows
Sourcepub fn push_batch(&mut self, batch: RecordBatch) -> CoalescerState
pub fn push_batch(&mut self, batch: RecordBatch) -> CoalescerState
Push next batch, and returns CoalescerState
indicating the current
state of the buffer.
Sourcepub fn finish_batch(&mut self) -> Result<RecordBatch>
pub fn finish_batch(&mut self) -> Result<RecordBatch>
Concatenates and returns all buffered batches, and clears the buffer.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for BatchCoalescer
impl !RefUnwindSafe for BatchCoalescer
impl Send for BatchCoalescer
impl Sync for BatchCoalescer
impl Unpin for BatchCoalescer
impl !UnwindSafe for BatchCoalescer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more